Baselight

CrowS-Pairs (Social Biases In MLMs)

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked LM

@kaggle.thedevastator_a_dataset_for_measuring_social_biases_in_mlms

Loading...
Loading...

About this Dataset

CrowS-Pairs (Social Biases In MLMs)


CrowS-Pairs (Social biases in MLMs)

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked LM

By [source]


About this dataset

The CrowS-Pairs dataset is a collection of 1,508 sentence pairs that cover nine types of biases: race/color, gender/gender identity, sexual orientation, religion, age, nationality, disability, physical appearance, and socioeconomic status. Each sentence pair is a minimal edit of the first sentence: The only words that change between them are those that identify the group. The first sentence can demonstrate or violate a stereotype. The other sentence is a minimal edit of the first sentence: The only words that change between them are those that identify the group. Each example has the following information:

Columns:,sent_more,sent_less,stereo_antistereo,bias_type,annotations,,anon_writer,,anon_annotators,,prompt,,source

The CrowS-Pairs dataset is a collection of 1,508 sentence pairs that cover nine types of biases: race/color, gender/gender identity, sexual orientation, religion, age

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

The CrowS-Pairs dataset is a collection of 1,508 sentence pairs that cover nine types of biases: race/color, gender/gender identity, sexual orientation, religion, age, nationality, disability, physical appearance, and socioeconomic status. Each sentence pair is a minimal edit of the first sentence: The only words that change between them are those that identify the group. The first sentence can demonstrate or violate a stereotype. The other sentence is a minimal edit of the first sentence: The only words that change between them are those that identify the group. Each example has the following information:

Columns:,sent_lesssent_more,,stereo_antistereo,,bias_type,,annotations,,anon_writer,,anon_annotators,,,,prompt,,source

This dataset can be used to measure social biases in MLMs by training models on it and evaluating their performance

Research Ideas

  • Measuring the ability of MLMs to identify and avoid social biases;
  • Developing new methods for reducing social biases in MLMs; and
  • Investigating the impact of social biases on downstream tasks such as reading comprehension or question answering

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: crows_pairs_anonymized.csv

Column name Description
sent_more The first sentence in the pair, which can demonstrate or violate a stereotype. (String)
sent_less The second sentence in the pair, which is a minimal edit of the first sentence. The only words that change between them are those that identify the group. (String)
stereo_antistereo Whether the first sentence demonstrates or violates a stereotype. (String)
bias_type The type of bias represented in the sentence pair. (String)
annotations The annotations made by the crowdworkers on the sentence pair. (String)
anon_writer The anonymous writer of the sentence pair. (String)
anon_annotators The anonymous annotators of the sentence pair. (String)

File: prompts.csv

Column name Description
sent_more The first sentence in the pair, which can demonstrate or violate a stereotype. (String)
prompt The prompt for the sentence pair. (String)
source The source of the sentence pair. (String)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit .

Tables

Crows Pairs Anonymized

@kaggle.thedevastator_a_dataset_for_measuring_social_biases_in_mlms.crows_pairs_anonymized
  • 193.87 KB
  • 1508 rows
  • 8 columns
Loading...

CREATE TABLE crows_pairs_anonymized (
  "unnamed_0" BIGINT,
  "sent_more" VARCHAR,
  "sent_less" VARCHAR,
  "stereo_antistereo" VARCHAR,
  "bias_type" VARCHAR,
  "annotations" VARCHAR,
  "anon_writer" VARCHAR,
  "anon_annotators" VARCHAR
);

Prompts

@kaggle.thedevastator_a_dataset_for_measuring_social_biases_in_mlms.prompts
  • 123.57 KB
  • 1508 rows
  • 3 columns
Loading...

CREATE TABLE prompts (
  "unnamed_0" BIGINT,
  "prompt" VARCHAR,
  "source" VARCHAR
);

Share link

Anyone who has the link will be able to view this.