Name: CrowS-Pairs (Social Biases In MLMs)
Creator: Kaggle
License: https://creativecommons.org/publicdomain/zero/1.0/

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked LM

CrowS-Pairs (Social biases in MLMs)

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked LM

By [source]

About this dataset

The CrowS-Pairs dataset is a collection of 1,508 sentence pairs that cover nine types of biases: race/color, gender/gender identity, sexual orientation, religion, age, nationality, disability, physical appearance, and socioeconomic status. Each sentence pair is a minimal edit of the first sentence: The only words that change between them are those that identify the group. The first sentence can demonstrate or violate a stereotype. The other sentence is a minimal edit of the first sentence: The only words that change between them are those that identify the group. Each example has the following information:

Columns:,sent_more,sent_less,stereo_antistereo,bias_type,annotations,,anon_writer,,anon_annotators,,prompt,,source

The CrowS-Pairs dataset is a collection of 1,508 sentence pairs that cover nine types of biases: race/color, gender/gender identity, sexual orientation, religion, age

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

The CrowS-Pairs dataset is a collection of 1,508 sentence pairs that cover nine types of biases: race/color, gender/gender identity, sexual orientation, religion, age, nationality, disability, physical appearance, and socioeconomic status. Each sentence pair is a minimal edit of the first sentence: The only words that change between them are those that identify the group. The first sentence can demonstrate or violate a stereotype. The other sentence is a minimal edit of the first sentence: The only words that change between them are those that identify the group. Each example has the following information:

Columns:,sent_lesssent_more,,stereo_antistereo,,bias_type,,annotations,,anon_writer,,anon_annotators,,,,prompt,,source

This dataset can be used to measure social biases in MLMs by training models on it and evaluating their performance

Research Ideas

Measuring the ability of MLMs to identify and avoid social biases;

Developing new methods for reducing social biases in MLMs; and

Investigating the impact of social biases on downstream tasks such as reading comprehension or question answering

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: crows_pairs_anonymized.csv

Column name	Description
sent_more	The first sentence in the pair, which can demonstrate or violate a stereotype. (String)
sent_less	The second sentence in the pair, which is a minimal edit of the first sentence. The only words that change between them are those that identify the group. (String)
stereo_antistereo	Whether the first sentence demonstrates or violates a stereotype. (String)
bias_type	The type of bias represented in the sentence pair. (String)
annotations	The annotations made by the crowdworkers on the sentence pair. (String)
anon_writer	The anonymous writer of the sentence pair. (String)
anon_annotators	The anonymous annotators of the sentence pair. (String)

File: prompts.csv

Column name	Description
sent_more	The first sentence in the pair, which can demonstrate or violate a stereotype. (String)
prompt	The prompt for the sentence pair. (String)
source	The source of the sentence pair. (String)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit .

CrowS-Pairs (Social Biases In MLMs)

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked LM

CrowS-Pairs (Social biases in MLMs)

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked LM

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Related Datasets

Social IQa (Social Interaction Q&A)

Teacher's Bias Dataset: A Factorial Survey Experiment

Ethnic Power Relations Dataset (ETH, 2021)

Trust Questions In The European Social Survey, Latinobarómetro And Afrobarometer

Gender Wage Gap (assigning Zeros For No Work)

Values, Beliefs, Norms, And Circular Citizenship Behaviours In A Dutch Representative Sample