SQuAD2.0
Adversarial questions & answers that look similar to answerable ones
Source
Huggingface Hub: link
About this dataset
combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering.
Research Ideas
- The SQuAD dataset can be used to train a machine learning model to automatically generate answers to questions.
- The SQuAD dataset can be used to train a machine learning model to automatically generate questions based on a given context.
- The SQuAD dataset can be used to improve the accuracy of existing question answering systems
Acknowledgements
The SQuAD2.0 dataset was created by the Stanford Question Answering Dataset (SQuAD) team at Stanford University.
The dataset is based on a set of documents from Wikipedia. The full text of each document is provided, along with human-generated questions about the document and corresponding answers
License
> License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
> No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: validation.csv
Column name |
Description |
title |
The title of the Wikipedia article. (String) |
context |
The full text of the Wikipedia article. (String) |
question |
The question that the model will be asked. (String) |
answers |
The answer to the question. (String) |
File: train.csv
Column name |
Description |
title |
The title of the Wikipedia article. (String) |
context |
The full text of the Wikipedia article. (String) |
question |
The question that the model will be asked. (String) |
answers |
The answer to the question. (String) |