Cosmos QA (Commonsense QA)
Pushing Commonsense Reasoning to the Next Level
Source
Huggingface Hub: link
About this dataset
The Cosmos QA dataset is a large-scale dataset of 35.6K problems that require commonsense-based reading comprehension, formulated as multiple-choice questions. The dataset focuses on reading between the lines over a diverse collection of people's everyday narratives, asking questions concerning on the likely causes or effects of events that require reasoning beyond the exact text spans in the context.
This allows for much more sophisticated models to be built and evaluated, and could lead to better performance on real-world tasks
How to use the dataset
In order to use the Cosmos QA dataset, you will need to first download the data files from the Kaggle website. Once you have downloaded the files, you will need to unzip them and then place them in a directory on your computer.
Once you have the data files placed on your computer, you can begin using the dataset for commonsense-based reading comprehension tasks. The first step is to load the context file into a text editor such as Microsoft Word or Adobe Acrobat Reader. Once the context file is open, you will need to locate the section of text that contains the question that you want to answer.
Once you have located the section of text containing the question, you will need to read through thecontext in order to determine what type of answer would be most appropriate. After carefully reading throughthe context, you should then look at each of the answer choices and selectthe one that best fits with what you have read
Research Ideas
- This dataset can be used to develop and evaluate commonsense-based reading comprehension models.
- This dataset can be used to improve and customize question answering systems for educational or customer service applications.
- This dataset can be used to study how human beings process and understand narratives, in order to better design artificial intelligence systems that can do the same
Acknowledgements
License
> License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
> No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: validation.csv
Column name |
Description |
context |
The context of the question. (String) |
answer0 |
The first answer option. (String) |
answer1 |
The second answer option. (String) |
answer2 |
The third answer option. (String) |
answer3 |
The fourth answer option. (String) |
label |
The correct answer to the question. (String) |
File: train.csv
Column name |
Description |
context |
The context of the question. (String) |
answer0 |
The first answer option. (String) |
answer1 |
The second answer option. (String) |
answer2 |
The third answer option. (String) |
answer3 |
The fourth answer option. (String) |
label |
The correct answer to the question. (String) |
File: test.csv
Column name |
Description |
context |
The context of the question. (String) |
answer0 |
The first answer option. (String) |
answer1 |
The second answer option. (String) |
answer2 |
The third answer option. (String) |
answer3 |
The fourth answer option. (String) |
label |
The correct answer to the question. (String) |