DuoRC: (Q&A: Wikipedia And IMDB) by Kaggle | Media and Entertainment

About this Dataset

DuoRC: (Q&A: Wikipedia And IMDB)

DuoRC: (Q&A: Wikipedia and IMDB)

English dataset of questions and answers from Wikipedia and IMDb movie plots

By Huggingface Hub [source]

About this dataset

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

The DuoRC dataset is an English language dataset of questions and answers gathered from crowdsourced AMT workers on Wikipedia and IMDb movie plots. The workers were given freedom to pick answer from the plots or synthesize their own answers. It contains two sub-datasets - SelfRC and ParaphraseRC. SelfRC dataset is built on Wikipedia movie plots solely. ParaphraseRC has questions written from Wikipedia movie plots and the answers are given based on corresponding IMDb movie plots.

Research Ideas

This dataset can be used to train a model to answer questions about movie plots.

This dataset can be used to train a model to answer questions about Wikipedia articles.

This dataset can be used to find paraphrases of questions about movie plots

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: SelfRC_train.csv

Column name	Description
plot	The plot of the movie. (String)
title	The title of the movie. (String)
question	The question about the plot. (String)
answers	The answers to the question. (List of strings)
no_answer	A binary value that indicates whether the question has a answer. (Integer)

File: SelfRC_test.csv

Column name	Description
plot	The plot of the movie. (String)
title	The title of the movie. (String)
question	The question about the plot. (String)
answers	The answers to the question. (List of strings)
no_answer	A binary value that indicates whether the question has a answer. (Integer)

File: ParaphraseRC_train.csv

Column name	Description
plot	The plot of the movie. (String)
title	The title of the movie. (String)
question	The question about the plot. (String)
answers	The answers to the question. (List of strings)
no_answer	A binary value that indicates whether the question has a answer. (Integer)

File: SelfRC_validation.csv

Column name	Description
plot	The plot of the movie. (String)
title	The title of the movie. (String)
question	The question about the plot. (String)
answers	The answers to the question. (List of strings)
no_answer	A binary value that indicates whether the question has a answer. (Integer)

File: ParaphraseRC_test.csv

Column name	Description
plot	The plot of the movie. (String)
title	The title of the movie. (String)
question	The question about the plot. (String)
answers	The answers to the question. (List of strings)
no_answer	A binary value that indicates whether the question has a answer. (Integer)

File: ParaphraseRC_validation.csv

Column name	Description
plot	The plot of the movie. (String)
title	The title of the movie. (String)
question	The question about the plot. (String)
answers	The answers to the question. (List of strings)
no_answer	A binary value that indicates whether the question has a answer. (Integer)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Paraphraserc Test

@kaggle.thedevastator_duorc_a_dataset_of_movie_plots.paraphraserc_test

18.62 MB
15857 rows
7 columns


CREATE TABLE paraphraserc_test (
  "plot_id" VARCHAR,
  "plot" VARCHAR,
  "title" VARCHAR,
  "question_id" VARCHAR,
  "question" VARCHAR,
  "answers" VARCHAR,
  "no_answer" BOOLEAN
);

Paraphraserc Train

@kaggle.thedevastator_duorc_a_dataset_of_movie_plots.paraphraserc_train

80.15 MB
69524 rows
7 columns


CREATE TABLE paraphraserc_train (
  "plot_id" VARCHAR,
  "plot" VARCHAR,
  "title" VARCHAR,
  "question_id" VARCHAR,
  "question" VARCHAR,
  "answers" VARCHAR,
  "no_answer" BOOLEAN
);

Paraphraserc Validation

@kaggle.thedevastator_duorc_a_dataset_of_movie_plots.paraphraserc_validation

16.25 MB
15591 rows
7 columns


CREATE TABLE paraphraserc_validation (
  "plot_id" VARCHAR,
  "plot" VARCHAR,
  "title" VARCHAR,
  "question_id" VARCHAR,
  "question" VARCHAR,
  "answers" VARCHAR,
  "no_answer" BOOLEAN
);

Selfrc Test

@kaggle.thedevastator_duorc_a_dataset_of_movie_plots.selfrc_test

5.23 MB
12559 rows
7 columns


CREATE TABLE selfrc_test (
  "plot_id" VARCHAR,
  "plot" VARCHAR,
  "title" VARCHAR,
  "question_id" VARCHAR,
  "question" VARCHAR,
  "answers" VARCHAR,
  "no_answer" BOOLEAN
);

Selfrc Train

@kaggle.thedevastator_duorc_a_dataset_of_movie_plots.selfrc_train

28.64 MB
60721 rows
7 columns


CREATE TABLE selfrc_train (
  "plot_id" VARCHAR,
  "plot" VARCHAR,
  "title" VARCHAR,
  "question_id" VARCHAR,
  "question" VARCHAR,
  "answers" VARCHAR,
  "no_answer" BOOLEAN
);

Selfrc Validation

@kaggle.thedevastator_duorc_a_dataset_of_movie_plots.selfrc_validation

5.42 MB
12961 rows
7 columns


CREATE TABLE selfrc_validation (
  "plot_id" VARCHAR,
  "plot" VARCHAR,
  "title" VARCHAR,
  "question_id" VARCHAR,
  "question" VARCHAR,
  "answers" VARCHAR,
  "no_answer" BOOLEAN
);