Baselight

COMPAS Recidivism Racial Bias

Racial Bias in inmate COMPAS reoffense risk scores for Florida (ProPublica)

@kaggle.danofer_compass

Loading...
Loading...

About this Dataset

COMPAS Recidivism Racial Bias

Context

COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is a popular commercial algorithm used by judges and parole officers for scoring criminal defendant’s likelihood of reoffending (recidivism). It has been shown that the algorithm is biased in favor of white defendants, and against black inmates, based on a 2 year follow up study (i.e who actually committed crimes or violent crimes after 2 years). The pattern of mistakes, as measured by precision/sensitivity is notable.

Quoting from ProPublica:
"

Black defendants were often predicted to be at a higher risk of recidivism than they actually were. Our analysis found that black defendants who did not recidivate over a two-year period were nearly twice as likely to be misclassified as higher risk compared to their white counterparts (45 percent vs. 23 percent).
White defendants were often predicted to be less risky than they were. Our analysis found that white defendants who re-offended within the next two years were mistakenly labeled low risk almost twice as often as black re-offenders (48 percent vs. 28 percent).
The analysis also showed that even when controlling for prior crimes, future recidivism, age, and gender, black defendants were 45 percent more likely to be assigned higher risk scores than white defendants.

  • Black defendants were also twice as likely as white defendants to be misclassified as being a higher risk of violent recidivism. And white violent recidivists were 63 percent more likely to have been misclassified as a low risk of violent recidivism, compared with black violent recidivists.
  • The violent recidivism analysis also showed that even when controlling for prior crimes, future recidivism, age, and gender, black defendants were 77 percent more likely to be assigned higher risk scores than white defendants.
    "

Content

Data contains variables used by the COMPAS algorithm in scoring defendants, along with their outcomes within 2 years of the decision, for over 10,000 criminal defendants in Broward County, Florida.
3 subsets of the data are provided, including a subset of only violent recividism (as opposed to, e.g. being reincarcerated for non violent offenses such as vagrancy or Marijuana).

Indepth analysis by ProPublica can be found in their data methodology article.

Acknowledgements

Data & original analysis gathered by ProPublica.
Original Data methodology article:
https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm

Original Article:
https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Original data from ProPublica:
https://github.com/propublica/compas-analysis

Additional "simple" subset provided by FairML, based on the proPublica data:

http://blog.fastforwardlabs.com/2017/03/09/fairml-auditing-black-box-predictive-models.html

Inspiration

Ideas:

  • Feature importance when predicting the COMPASS score itself, or recividism/crime risks.
  • Reweighting data to compensate for bias, e.g. subsetting for the violent offenders, or adjusting better for base risk.
  • Feature selection based on "legal usage"/fairness (E.g. exclude race and see how well your model works. It worked for me).

Tables

Compas Scores Raw

@kaggle.danofer_compass.compas_scores_raw
  • 1.31 MB
  • 60843 rows
  • 28 columns
Loading...

CREATE TABLE compas_scores_raw (
  "person_id" BIGINT,
  "assessmentid" BIGINT,
  "case_id" BIGINT,
  "agency_text" VARCHAR,
  "lastname" VARCHAR,
  "firstname" VARCHAR,
  "middlename" VARCHAR,
  "sex_code_text" VARCHAR,
  "ethnic_code_text" VARCHAR,
  "dateofbirth" TIMESTAMP,
  "scaleset_id" BIGINT,
  "scaleset" VARCHAR,
  "assessmentreason" VARCHAR,
  "language" VARCHAR,
  "legalstatus" VARCHAR,
  "custodystatus" VARCHAR,
  "maritalstatus" VARCHAR,
  "screening_date" VARCHAR,
  "recsupervisionlevel" BIGINT,
  "recsupervisionleveltext" VARCHAR,
  "scale_id" BIGINT,
  "displaytext" VARCHAR,
  "rawscore" DOUBLE,
  "decilescore" BIGINT,
  "scoretext" VARCHAR,
  "assessmenttype" VARCHAR,
  "iscompleted" BIGINT,
  "isdeleted" BIGINT
);

Cox Violent Parsed

@kaggle.danofer_compass.cox_violent_parsed
  • 1.32 MB
  • 18316 rows
  • 52 columns
Loading...

CREATE TABLE cox_violent_parsed (
  "id" DOUBLE,
  "name" VARCHAR,
  "first" VARCHAR,
  "last" VARCHAR,
  "compas_screening_date" TIMESTAMP,
  "sex" VARCHAR,
  "dob" TIMESTAMP,
  "age" BIGINT,
  "age_cat" VARCHAR,
  "race" VARCHAR,
  "juv_fel_count" BIGINT,
  "decile_score" BIGINT,
  "juv_misd_count" BIGINT,
  "juv_other_count" BIGINT,
  "priors_count" BIGINT,
  "days_b_screening_arrest" DOUBLE,
  "c_jail_in" VARCHAR,
  "c_jail_out" VARCHAR,
  "c_case_number" VARCHAR,
  "c_offense_date" TIMESTAMP,
  "c_arrest_date" TIMESTAMP,
  "c_days_from_compas" DOUBLE,
  "c_charge_degree" VARCHAR,
  "c_charge_desc" VARCHAR,
  "is_recid" BIGINT,
  "r_case_number" VARCHAR,
  "r_charge_degree" VARCHAR,
  "r_days_from_arrest" DOUBLE,
  "r_offense_date" TIMESTAMP,
  "r_charge_desc" VARCHAR,
  "r_jail_in" TIMESTAMP,
  "r_jail_out" TIMESTAMP,
  "violent_recid" VARCHAR,
  "is_violent_recid" BIGINT,
  "vr_case_number" VARCHAR,
  "vr_charge_degree" VARCHAR,
  "vr_offense_date" TIMESTAMP,
  "vr_charge_desc" VARCHAR,
  "type_of_assessment" VARCHAR,
  "decile_score_1" BIGINT,
  "score_text" VARCHAR,
  "screening_date" TIMESTAMP,
  "v_type_of_assessment" VARCHAR,
  "v_decile_score" BIGINT,
  "v_score_text" VARCHAR,
  "v_screening_date" TIMESTAMP,
  "in_custody" TIMESTAMP,
  "out_custody" TIMESTAMP,
  "priors_count_1" BIGINT,
  "start" BIGINT,
  "end" BIGINT,
  "event" BIGINT
);

Cox Violent Parsed Filt

@kaggle.danofer_compass.cox_violent_parsed_filt
  • 951.46 KB
  • 18316 rows
  • 40 columns
Loading...

CREATE TABLE cox_violent_parsed_filt (
  "id" DOUBLE,
  "name" VARCHAR,
  "first" VARCHAR,
  "last" VARCHAR,
  "sex" VARCHAR,
  "dob" TIMESTAMP,
  "age" BIGINT,
  "age_cat" VARCHAR,
  "race" VARCHAR,
  "juv_fel_count" BIGINT,
  "decile_score" BIGINT,
  "juv_misd_count" BIGINT,
  "juv_other_count" BIGINT,
  "priors_count" BIGINT,
  "days_b_screening_arrest" DOUBLE,
  "c_jail_in" VARCHAR,
  "c_jail_out" VARCHAR,
  "c_days_from_compas" DOUBLE,
  "c_charge_degree" VARCHAR,
  "c_charge_desc" VARCHAR,
  "is_recid" BIGINT,
  "r_charge_degree" VARCHAR,
  "r_days_from_arrest" DOUBLE,
  "r_offense_date" TIMESTAMP,
  "r_charge_desc" VARCHAR,
  "r_jail_in" TIMESTAMP,
  "violent_recid" VARCHAR,
  "is_violent_recid" BIGINT,
  "vr_charge_degree" VARCHAR,
  "vr_offense_date" TIMESTAMP,
  "vr_charge_desc" VARCHAR,
  "type_of_assessment" VARCHAR,
  "decile_score_1" BIGINT,
  "score_text" VARCHAR,
  "screening_date" TIMESTAMP,
  "v_type_of_assessment" VARCHAR,
  "v_decile_score" BIGINT,
  "v_score_text" VARCHAR,
  "priors_count_1" BIGINT,
  "event" BIGINT
);

Share link

Anyone who has the link will be able to view this.