Baselight
Sign In
kaggle

COMPAS Recidivism Racial Bias

Kaggle

@kaggle.danofer_compass

Loading...
Loading...

Racial Bias in inmate COMPAS reoffense risk scores for Florida (ProPublica)

Dataset Description

Context

COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is a popular commercial algorithm used by judges and parole officers for scoring criminal defendant’s likelihood of reoffending (recidivism). It has been shown that the algorithm is biased in favor of white defendants, and against black inmates, based on a 2 year follow up study (i.e who actually committed crimes or violent crimes after 2 years). The pattern of mistakes, as measured by precision/sensitivity is notable.

Quoting from ProPublica:
"

Black defendants were often predicted to be at a higher risk of recidivism than they actually were. Our analysis found that black defendants who did not recidivate over a two-year period were nearly twice as likely to be misclassified as higher risk compared to their white counterparts (45 percent vs. 23 percent).
White defendants were often predicted to be less risky than they were. Our analysis found that white defendants who re-offended within the next two years were mistakenly labeled low risk almost twice as often as black re-offenders (48 percent vs. 28 percent).
The analysis also showed that even when controlling for prior crimes, future recidivism, age, and gender, black defendants were 45 percent more likely to be assigned higher risk scores than white defendants.

  • Black defendants were also twice as likely as white defendants to be misclassified as being a higher risk of violent recidivism. And white violent recidivists were 63 percent more likely to have been misclassified as a low risk of violent recidivism, compared with black violent recidivists.
  • The violent recidivism analysis also showed that even when controlling for prior crimes, future recidivism, age, and gender, black defendants were 77 percent more likely to be assigned higher risk scores than white defendants.
    "

Content

Data contains variables used by the COMPAS algorithm in scoring defendants, along with their outcomes within 2 years of the decision, for over 10,000 criminal defendants in Broward County, Florida.
3 subsets of the data are provided, including a subset of only violent recividism (as opposed to, e.g. being reincarcerated for non violent offenses such as vagrancy or Marijuana).

Indepth analysis by ProPublica can be found in their data methodology article.

Acknowledgements

Data & original analysis gathered by ProPublica.
Original Data methodology article:
https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm

Original Article:
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Original data from ProPublica:
https://github.com/propublica/compas-analysis

Additional "simple" subset provided by FairML, based on the proPublica data:

http://blog.fastforwardlabs.com/2017/03/09/fairml-auditing-black-box-predictive-models.html

Inspiration

Ideas:

  • Feature importance when predicting the COMPASS score itself, or recividism/crime risks.
  • Reweighting data to compensate for bias, e.g. subsetting for the violent offenders, or adjusting better for base risk.
  • Feature selection based on "legal usage"/fairness (E.g. exclude race and see how well your model works. It worked for me).

Related Datasets

Share link

Anyone who has the link will be able to view this.