Baselight

WikiQA (Open-Domain Q&A)

Discovering New Knowledge through Question and Sentence Pairs

@kaggle.thedevastator_wikiquestionanswer_a_dataset_for_open_domain_que

Loading...
Loading...

About this Dataset

WikiQA (Open-Domain Q&A)

WikiQA (Open-Domain Q&A)

Discovering New Knowledge through Question and Sentence Pairs


About this dataset

The WikiQA dataset is a collection of question and sentence pairs, collected and annotated for research on open-domain question answering. The data fields are the same among all splits: question, document title, label. The questions come from different sources, including Wikipedia articles, news articles, and web forums. The sentences come from different sources as well, such as Wikipedia articles, news articles, web forums, and books. The labels indicate whether the answer is supported by the document

How to use the dataset

How to use this dataset

  1. The WikiQA dataset is a collection of question and sentence pairs, collected and annotated for research on open-domain question answering.
  2. The data fields are the same among all splits.
  3. Columns:question, question,document_title, document_title,label, label,question, question,document_title, document_title,label, label
  4. The file test.csv in the WikiQA dataset is a collection of question and sentence pairs used to evaluate the performance of different question answering models

Research Ideas

  • The WikiQA dataset can be used to train a machine-learning model to answer questions automatically.
  • The dataset can be used to research the feasibility of open-domain question answering.
  • The dataset can be used to evaluate the performance of different question answering models

Acknowledgements

This dataset was proposed in WikiQA: A Challenge Dataset for Open-Domain Question Answering by Yang et al. The authors acknowledge the help of Aria Haghighi and Percy Liang in constructing the pairwise sentence similarity features, Wei Ying in providing additional insights about the dataset, Hannah Rashkin for helpful discussions, and Google for providing the computing infrastructure

License

> License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
> No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: validation.csv

Column name Description
question The question that was asked. (String)
document_title The title of the Wikipedia article that the question was asked about. (String)
answer The answer to the question. (String)
label Whether or not the answer is relevant to the question. (String)

File: train.csv

Column name Description
question The question that was asked. (String)
document_title The title of the Wikipedia article that the question was asked about. (String)
answer The answer to the question. (String)
label Whether or not the answer is relevant to the question. (String)

File: test.csv

Column name Description
question The question that was asked. (String)
document_title The title of the Wikipedia article that the question was asked about. (String)
answer The answer to the question. (String)
label Whether or not the answer is relevant to the question. (String)

Tables

Test

@kaggle.thedevastator_wikiquestionanswer_a_dataset_for_open_domain_que.test
  • 554.64 KB
  • 6165 rows
  • 5 columns
Loading...

CREATE TABLE test (
  "question_id" VARCHAR,
  "question" VARCHAR,
  "document_title" VARCHAR,
  "answer" VARCHAR,
  "label" BIGINT
);

Train

@kaggle.thedevastator_wikiquestionanswer_a_dataset_for_open_domain_que.train
  • 1.82 MB
  • 20360 rows
  • 5 columns
Loading...

CREATE TABLE train (
  "question_id" VARCHAR,
  "question" VARCHAR,
  "document_title" VARCHAR,
  "answer" VARCHAR,
  "label" BIGINT
);

Validation

@kaggle.thedevastator_wikiquestionanswer_a_dataset_for_open_domain_que.validation
  • 253.57 KB
  • 2733 rows
  • 5 columns
Loading...

CREATE TABLE validation (
  "question_id" VARCHAR,
  "question" VARCHAR,
  "document_title" VARCHAR,
  "answer" VARCHAR,
  "label" BIGINT
);

Share link

Anyone who has the link will be able to view this.