WikiQA (Open-Domain Q&A)
Discovering New Knowledge through Question and Sentence Pairs
About this dataset
The WikiQA dataset is a collection of question and sentence pairs, collected and annotated for research on open-domain question answering. The data fields are the same among all splits: question, document title, label. The questions come from different sources, including Wikipedia articles, news articles, and web forums. The sentences come from different sources as well, such as Wikipedia articles, news articles, web forums, and books. The labels indicate whether the answer is supported by the document
How to use the dataset
How to use this dataset
- The WikiQA dataset is a collection of question and sentence pairs, collected and annotated for research on open-domain question answering.
- The data fields are the same among all splits.
- Columns:question, question,document_title, document_title,label, label,question, question,document_title, document_title,label, label
- The file test.csv in the WikiQA dataset is a collection of question and sentence pairs used to evaluate the performance of different question answering models
Research Ideas
- The WikiQA dataset can be used to train a machine-learning model to answer questions automatically.
- The dataset can be used to research the feasibility of open-domain question answering.
- The dataset can be used to evaluate the performance of different question answering models
Acknowledgements
This dataset was proposed in WikiQA: A Challenge Dataset for Open-Domain Question Answering by Yang et al. The authors acknowledge the help of Aria Haghighi and Percy Liang in constructing the pairwise sentence similarity features, Wei Ying in providing additional insights about the dataset, Hannah Rashkin for helpful discussions, and Google for providing the computing infrastructure
License
> License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
> No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: validation.csv
Column name |
Description |
question |
The question that was asked. (String) |
document_title |
The title of the Wikipedia article that the question was asked about. (String) |
answer |
The answer to the question. (String) |
label |
Whether or not the answer is relevant to the question. (String) |
File: train.csv
Column name |
Description |
question |
The question that was asked. (String) |
document_title |
The title of the Wikipedia article that the question was asked about. (String) |
answer |
The answer to the question. (String) |
label |
Whether or not the answer is relevant to the question. (String) |
File: test.csv
Column name |
Description |
question |
The question that was asked. (String) |
document_title |
The title of the Wikipedia article that the question was asked about. (String) |
answer |
The answer to the question. (String) |
label |
Whether or not the answer is relevant to the question. (String) |