Dataset: WikiQA (Open-Domain Q&A)

About this Dataset

WikiQA (Open-Domain Q&A)

Discovering New Knowledge through Question and Sentence Pairs

About this dataset

The WikiQA dataset is a collection of question and sentence pairs, collected and annotated for research on open-domain question answering. The data fields are the same among all splits: question, document title, label. The questions come from different sources, including Wikipedia articles, news articles, and web forums. The sentences come from different sources as well, such as Wikipedia articles, news articles, web forums, and books. The labels indicate whether the answer is supported by the document

How to use the dataset

How to use this dataset

The WikiQA dataset is a collection of question and sentence pairs, collected and annotated for research on open-domain question answering.

The data fields are the same among all splits.

Columns:question, question,document_title, document_title,label, label,question, question,document_title, document_title,label, label

The file test.csv in the WikiQA dataset is a collection of question and sentence pairs used to evaluate the performance of different question answering models

Research Ideas

The WikiQA dataset can be used to train a machine-learning model to answer questions automatically.

The dataset can be used to research the feasibility of open-domain question answering.

The dataset can be used to evaluate the performance of different question answering models

Acknowledgements

This dataset was proposed in WikiQA: A Challenge Dataset for Open-Domain Question Answering by Yang et al. The authors acknowledge the help of Aria Haghighi and Percy Liang in constructing the pairwise sentence similarity features, Wei Ying in providing additional insights about the dataset, Hannah Rashkin for helpful discussions, and Google for providing the computing infrastructure

License

> License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
> No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: validation.csv

Column name	Description
question	The question that was asked. (String)
document_title	The title of the Wikipedia article that the question was asked about. (String)
answer	The answer to the question. (String)
label	Whether or not the answer is relevant to the question. (String)

File: train.csv

Column name	Description
question	The question that was asked. (String)
document_title	The title of the Wikipedia article that the question was asked about. (String)
answer	The answer to the question. (String)
label	Whether or not the answer is relevant to the question. (String)

File: test.csv

Column name	Description
question	The question that was asked. (String)
document_title	The title of the Wikipedia article that the question was asked about. (String)
answer	The answer to the question. (String)
label	Whether or not the answer is relevant to the question. (String)

Tables

Test

@kaggle.thedevastator_wikiquestionanswer_a_dataset_for_open_domain_que.test

554.64 KB
6165 rows
5 columns


CREATE TABLE test (
  "question_id" VARCHAR,
  "question" VARCHAR,
  "document_title" VARCHAR,
  "answer" VARCHAR,
  "label" BIGINT
);

Train

@kaggle.thedevastator_wikiquestionanswer_a_dataset_for_open_domain_que.train

1.82 MB
20360 rows
5 columns


CREATE TABLE train (
  "question_id" VARCHAR,
  "question" VARCHAR,
  "document_title" VARCHAR,
  "answer" VARCHAR,
  "label" BIGINT
);

Validation

@kaggle.thedevastator_wikiquestionanswer_a_dataset_for_open_domain_que.validation

253.57 KB
2733 rows
5 columns


CREATE TABLE validation (
  "question_id" VARCHAR,
  "question" VARCHAR,
  "document_title" VARCHAR,
  "answer" VARCHAR,
  "label" BIGINT
);