WikiQA (Open-Domain Q&A)
Discovering New Knowledge through Question and Sentence Pairs
@kaggle.thedevastator_wikiquestionanswer_a_dataset_for_open_domain_que
Discovering New Knowledge through Question and Sentence Pairs
@kaggle.thedevastator_wikiquestionanswer_a_dataset_for_open_domain_que
The WikiQA dataset is a collection of question and sentence pairs, collected and annotated for research on open-domain question answering. The data fields are the same among all splits: question, document title, label. The questions come from different sources, including Wikipedia articles, news articles, and web forums. The sentences come from different sources as well, such as Wikipedia articles, news articles, web forums, and books. The labels indicate whether the answer is supported by the document
How to use this dataset
- The WikiQA dataset is a collection of question and sentence pairs, collected and annotated for research on open-domain question answering.
- The data fields are the same among all splits.
- Columns:question, question,document_title, document_title,label, label,question, question,document_title, document_title,label, label
- The file test.csv in the WikiQA dataset is a collection of question and sentence pairs used to evaluate the performance of different question answering models
- The WikiQA dataset can be used to train a machine-learning model to answer questions automatically.
- The dataset can be used to research the feasibility of open-domain question answering.
- The dataset can be used to evaluate the performance of different question answering models
This dataset was proposed in WikiQA: A Challenge Dataset for Open-Domain Question Answering by Yang et al. The authors acknowledge the help of Aria Haghighi and Percy Liang in constructing the pairwise sentence similarity features, Wei Ying in providing additional insights about the dataset, Hannah Rashkin for helpful discussions, and Google for providing the computing infrastructure
License
> License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
> No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: validation.csv
| Column name | Description |
|---|---|
| question | The question that was asked. (String) |
| document_title | The title of the Wikipedia article that the question was asked about. (String) |
| answer | The answer to the question. (String) |
| label | Whether or not the answer is relevant to the question. (String) |
File: train.csv
| Column name | Description |
|---|---|
| question | The question that was asked. (String) |
| document_title | The title of the Wikipedia article that the question was asked about. (String) |
| answer | The answer to the question. (String) |
| label | Whether or not the answer is relevant to the question. (String) |
File: test.csv
| Column name | Description |
|---|---|
| question | The question that was asked. (String) |
| document_title | The title of the Wikipedia article that the question was asked about. (String) |
| answer | The answer to the question. (String) |
| label | Whether or not the answer is relevant to the question. (String) |
CREATE TABLE test (
"question_id" VARCHAR,
"question" VARCHAR,
"document_title" VARCHAR,
"answer" VARCHAR,
"label" BIGINT
);CREATE TABLE train (
"question_id" VARCHAR,
"question" VARCHAR,
"document_title" VARCHAR,
"answer" VARCHAR,
"label" BIGINT
);CREATE TABLE validation (
"question_id" VARCHAR,
"question" VARCHAR,
"document_title" VARCHAR,
"answer" VARCHAR,
"label" BIGINT
);Anyone who has the link will be able to view this.