CrowS-Pairs (Social biases in MLMs)
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked LM
By [source]
About this dataset
The CrowS-Pairs dataset is a collection of 1,508 sentence pairs that cover nine types of biases: race/color, gender/gender identity, sexual orientation, religion, age, nationality, disability, physical appearance, and socioeconomic status. Each sentence pair is a minimal edit of the first sentence: The only words that change between them are those that identify the group. The first sentence can demonstrate or violate a stereotype. The other sentence is a minimal edit of the first sentence: The only words that change between them are those that identify the group. Each example has the following information:
Columns:,sent_more,sent_less,stereo_antistereo,bias_type,annotations,,anon_writer,,anon_annotators,,prompt,,source
The CrowS-Pairs dataset is a collection of 1,508 sentence pairs that cover nine types of biases: race/color, gender/gender identity, sexual orientation, religion, age
More Datasets
For more datasets, click here.
Featured Notebooks
- 🚨 Your notebook can be here! 🚨!
How to use the dataset
The CrowS-Pairs dataset is a collection of 1,508 sentence pairs that cover nine types of biases: race/color, gender/gender identity, sexual orientation, religion, age, nationality, disability, physical appearance, and socioeconomic status. Each sentence pair is a minimal edit of the first sentence: The only words that change between them are those that identify the group. The first sentence can demonstrate or violate a stereotype. The other sentence is a minimal edit of the first sentence: The only words that change between them are those that identify the group. Each example has the following information:
Columns:,sent_lesssent_more,,stereo_antistereo,,bias_type,,annotations,,anon_writer,,anon_annotators,,,,prompt,,source
This dataset can be used to measure social biases in MLMs by training models on it and evaluating their performance
Research Ideas
- Measuring the ability of MLMs to identify and avoid social biases;
- Developing new methods for reducing social biases in MLMs; and
- Investigating the impact of social biases on downstream tasks such as reading comprehension or question answering
Acknowledgements
If you use this dataset in your research, please credit the original authors.
Data Source
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: crows_pairs_anonymized.csv
Column name |
Description |
sent_more |
The first sentence in the pair, which can demonstrate or violate a stereotype. (String) |
sent_less |
The second sentence in the pair, which is a minimal edit of the first sentence. The only words that change between them are those that identify the group. (String) |
stereo_antistereo |
Whether the first sentence demonstrates or violates a stereotype. (String) |
bias_type |
The type of bias represented in the sentence pair. (String) |
annotations |
The annotations made by the crowdworkers on the sentence pair. (String) |
anon_writer |
The anonymous writer of the sentence pair. (String) |
anon_annotators |
The anonymous annotators of the sentence pair. (String) |
File: prompts.csv
Column name |
Description |
sent_more |
The first sentence in the pair, which can demonstrate or violate a stereotype. (String) |
prompt |
The prompt for the sentence pair. (String) |
source |
The source of the sentence pair. (String) |
Acknowledgements
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit .