Does Crime-Predicting Software Bias Judges? Unfortunately, There’s No Data
And I wonder how valid the predictions are: are data available for that? If you’re going to lock someone up on a (mere) prediction (or keep him or her in prison longer, which amounts to the same thing: lack of freedom because of a prediction), then I think the prediction should be damned accurate. At least, if it were me being locked up, and I imagine others feel the same.
Rose Eveleth reports in Motherboard:
For centuries judges have had to make guesses about the people in front of them. Will this person commit a crime again? Or is this punishment enough to deter them? Do they have the support they need at home to stay safe and healthy and away from crime? Or will they be thrust back into a situation that drives them to their old ways? Ultimately, judges have to guess.
But recently, judges in states including California and Florida have been given a new piece of information to aid in that guess work: a “risk assessment score” determined by an algorithm. These algorithms take a whole suite of variables into account, and spit out a number (usually between 1 and 10) that estimates the risk that the person in question will wind up back in jail.
If you’ve read this column before, you probably know where this is going. Algorithmsaren’t unbiased, and a recent ProPublica investigation suggests what researchers have long been worried about: that these algorithms might contain latent racial prejudice. According to ProPublica’s evaluation of a particular scoring method called the COMPAS system, which was created by a company called Northpointe, people of color are more likely to get higher scores than white people for essentially the same crimes.
Bias against folks of color isn’t a new phenomenon in the judicial system. (This might be the understatement of the year.) There’s a huge body of research that shows that judges, like all humans, are biased. Plenty of studies have shown that for the same crime, judges are more likely to sentence a black person more harshly than a white person. It’s important to question biases of all kinds, both human and algorithmic, but it’s also important to question them in relation to one another. And nobody has done that.
I’ve been doing some research of my own into these recidivism algorithms, and when I read the ProPublica story, I came out with the same question I’ve had since I started looking into this: these algorithms are likely biased against people of color. But so are judges. So how do they compare? How does the bias present in humans stack up against the bias programmed into algorithms?
This shouldn’t be hard to find out: ideally you would divide judges in a single county in half, and give one half access to a scoring system, and have the other half carry on as usual. If you don’t want to A/B test within a county—and there are some questions about whether that’s an ethical thing to do—then simply compare two counties with similar crime rates, in which one county uses rating systems and the other doesn’t. In either case, it’s essential to test whether these algorithmic recidivism scores exacerbate, reduce, or otherwise change existing bias.
I was wrong. As far as I can find, and according to everybody I’ve talked to in the field, nobody has done this work, or anything like it. These scores are being used by judges to help them sentence defendants and nobody knows whether the scores exacerbate existing racial bias or not. “I am not aware if we have any research on the comparison of judges who do and don’t have access to the scores,” Kris Hoy, the marketing director of Northpointe, told me.
I tried to reach Sharon Lansing, a researcher who worked on the validation study of COMPAS for New York State. I was told . . .
The key point:
All the researchers I talked to who study sentencing, risk assessment and these algorithms said they didn’t know of a single study that compared the sentencing patterns judges who do and don’t use these scores. There are studies out there on a variety of risk-assessment tools that look at questions of accuracy and reliability. There are plenty of studies that compare the algorithms’ guesses about recidivism with who really did return to jail. But there’s nothing that compares judges with and without the scores. Which means that states are using these scores in a variety of contexts without having any idea how they might impact decisions that impact people’s lives.