Baselight

DuoRC: (Q&A: Wikipedia And IMDB)

English dataset of questions and answers from Wikipedia and IMDb movie plots

@kaggle.thedevastator_duorc_a_dataset_of_movie_plots

Loading...
Loading...

About this Dataset

DuoRC: (Q&A: Wikipedia And IMDB)


DuoRC: (Q&A: Wikipedia and IMDB)

English dataset of questions and answers from Wikipedia and IMDb movie plots

By Huggingface Hub [source]


About this dataset

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

The DuoRC dataset is an English language dataset of questions and answers gathered from crowdsourced AMT workers on Wikipedia and IMDb movie plots. The workers were given freedom to pick answer from the plots or synthesize their own answers. It contains two sub-datasets - SelfRC and ParaphraseRC. SelfRC dataset is built on Wikipedia movie plots solely. ParaphraseRC has questions written from Wikipedia movie plots and the answers are given based on corresponding IMDb movie plots.

Research Ideas

  • This dataset can be used to train a model to answer questions about movie plots.
  • This dataset can be used to train a model to answer questions about Wikipedia articles.
  • This dataset can be used to find paraphrases of questions about movie plots

Acknowledgements

If you use this dataset in your research, please credit the original authors.

Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: SelfRC_train.csv

Column name Description
plot The plot of the movie. (String)
title The title of the movie. (String)
question The question about the plot. (String)
answers The answers to the question. (List of strings)
no_answer A binary value that indicates whether the question has a answer. (Integer)

File: SelfRC_test.csv

Column name Description
plot The plot of the movie. (String)
title The title of the movie. (String)
question The question about the plot. (String)
answers The answers to the question. (List of strings)
no_answer A binary value that indicates whether the question has a answer. (Integer)

File: ParaphraseRC_train.csv

Column name Description
plot The plot of the movie. (String)
title The title of the movie. (String)
question The question about the plot. (String)
answers The answers to the question. (List of strings)
no_answer A binary value that indicates whether the question has a answer. (Integer)

File: SelfRC_validation.csv

Column name Description
plot The plot of the movie. (String)
title The title of the movie. (String)
question The question about the plot. (String)
answers The answers to the question. (List of strings)
no_answer A binary value that indicates whether the question has a answer. (Integer)

File: ParaphraseRC_test.csv

Column name Description
plot The plot of the movie. (String)
title The title of the movie. (String)
question The question about the plot. (String)
answers The answers to the question. (List of strings)
no_answer A binary value that indicates whether the question has a answer. (Integer)

File: ParaphraseRC_validation.csv

Column name Description
plot The plot of the movie. (String)
title The title of the movie. (String)
question The question about the plot. (String)
answers The answers to the question. (List of strings)
no_answer A binary value that indicates whether the question has a answer. (Integer)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Paraphraserc Test

@kaggle.thedevastator_duorc_a_dataset_of_movie_plots.paraphraserc_test
  • 18.62 MB
  • 15857 rows
  • 7 columns
Loading...

CREATE TABLE paraphraserc_test (
  "plot_id" VARCHAR,
  "plot" VARCHAR,
  "title" VARCHAR,
  "question_id" VARCHAR,
  "question" VARCHAR,
  "answers" VARCHAR,
  "no_answer" BOOLEAN
);

Paraphraserc Train

@kaggle.thedevastator_duorc_a_dataset_of_movie_plots.paraphraserc_train
  • 80.15 MB
  • 69524 rows
  • 7 columns
Loading...

CREATE TABLE paraphraserc_train (
  "plot_id" VARCHAR,
  "plot" VARCHAR,
  "title" VARCHAR,
  "question_id" VARCHAR,
  "question" VARCHAR,
  "answers" VARCHAR,
  "no_answer" BOOLEAN
);

Paraphraserc Validation

@kaggle.thedevastator_duorc_a_dataset_of_movie_plots.paraphraserc_validation
  • 16.25 MB
  • 15591 rows
  • 7 columns
Loading...

CREATE TABLE paraphraserc_validation (
  "plot_id" VARCHAR,
  "plot" VARCHAR,
  "title" VARCHAR,
  "question_id" VARCHAR,
  "question" VARCHAR,
  "answers" VARCHAR,
  "no_answer" BOOLEAN
);

Selfrc Test

@kaggle.thedevastator_duorc_a_dataset_of_movie_plots.selfrc_test
  • 5.23 MB
  • 12559 rows
  • 7 columns
Loading...

CREATE TABLE selfrc_test (
  "plot_id" VARCHAR,
  "plot" VARCHAR,
  "title" VARCHAR,
  "question_id" VARCHAR,
  "question" VARCHAR,
  "answers" VARCHAR,
  "no_answer" BOOLEAN
);

Selfrc Train

@kaggle.thedevastator_duorc_a_dataset_of_movie_plots.selfrc_train
  • 28.64 MB
  • 60721 rows
  • 7 columns
Loading...

CREATE TABLE selfrc_train (
  "plot_id" VARCHAR,
  "plot" VARCHAR,
  "title" VARCHAR,
  "question_id" VARCHAR,
  "question" VARCHAR,
  "answers" VARCHAR,
  "no_answer" BOOLEAN
);

Selfrc Validation

@kaggle.thedevastator_duorc_a_dataset_of_movie_plots.selfrc_validation
  • 5.42 MB
  • 12961 rows
  • 7 columns
Loading...

CREATE TABLE selfrc_validation (
  "plot_id" VARCHAR,
  "plot" VARCHAR,
  "title" VARCHAR,
  "question_id" VARCHAR,
  "question" VARCHAR,
  "answers" VARCHAR,
  "no_answer" BOOLEAN
);

Share link

Anyone who has the link will be able to view this.