Name: Stanford Question Answering Dataset (SQuAD)
Creator: Kaggle
License: https://creativecommons.org/publicdomain/zero/1.0/

Questions posed by crowdworkers on a set of Wikipedia articles

The Stanford Question Answering Dataset

A Challenge for Reading Comprehension

About this dataset

SQuAD is a reading comprehension dataset consisting of questions posed by crowdworkers on a set of Wikipedia articles. The answers to the questions are span of text, or segments, from the corresponding reading passages. The data fields in this dataset are the same across all splits

How to use the dataset

The SQuAD dataset is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. The data fields are the same among all splits

Columns:context,question,answers

To use this dataset, simply download one of the split files (train.csv or validation.csv) and load it into your preferred data analysis tool. Each row in the file corresponds to a single question-answer pair. The context column contains the full text of the corresponding Wikipedia article, while the question and answers columns contain the question posed by the crowdworker and its corresponding answer(s)

Research Ideas

Learning to answer multiple choice questions by extracting text spans from source materials

Developing Reading Comprehension models that can answer open-ended questions about passages of text

Building systems that can generate large training datasets for Reading Comprehension models by creating synthetic questions from existing passages

Acknowledgements

Thank you to the Stanford Natural Language Inference group and the creators of the SQuAD dataset for providing this data

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: validation.csv

Column name	Description
title	The title of the Wikipedia article. (String)
context	The full text of the article. (String)
question	The question posed by the crowdworker. (String)
answers	The answer to the question, as a string of text spans. (List of strings)

File: train.csv

Column name	Description
title	The title of the Wikipedia article. (String)
context	The full text of the article. (String)
question	The question posed by the crowdworker. (String)
answers	The answer to the question, as a string of text spans. (List of strings)

Related Datasets

Ultimate Soccer Dataset

@blt
Ultimate Basketball Dataset

@blt
Question-Answering Training And Testing Data

@kaggle
SFC2014 - REACT EU Overview Allocation Vs Decided

@esifunds
Trust Questions In The European Social Survey, Latinobarómetro And Afrobarometer

@owid
Eucalyptus Growth And Environmental Data

@euremarkable

Ultimate Soccer Dataset

Ultimate Basketball Dataset

Question-Answering Training And Testing Data

SFC2014 - REACT EU Overview Allocation Vs Decided

Trust Questions In The European Social Survey, Latinobarómetro And Afrobarometer

Eucalyptus Growth And Environmental Data