Movie Rationales (Rationales For Movie Reviews) by Kaggle | Media and Entertainment

About this Dataset

Movie Rationales (Rationales For Movie Reviews)

Human annotated rationales for movie reviews

By Huggingface Hub [source]

About this dataset

This dataset was created to allow researchers to gain an in-depth understanding of the inner workings of human-generated movie reviews. With these train, test, and validation sets, researchers can explore different aspects of movie reviews, such as sentiment labels or rationales behind them. By analyzing this information and finding patterns and correlations, insightful ideas can be discovered that can lead to developing models powerful enough to uncover importance of the unique human perspectives when interpreting movie reviews. Any data scientist or researcher interested in AI applications is encouraged to take advantage of this dataset which may potentially provide useful insights into better understanding user intent when reviewing movies

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is intended to enable researchers and developers to uncover the rationales behind movie reviews. To use it effectively, you must understand the data format and how each column in the dataset works.

What does each column mean?

review: The text of the movie review. (String)

label: The sentiment label of the review (Positive, Negative, or Neutral). (String)

validation.csv: The validation set which contains reviews, labels, and evidence which can be used to validate models developed for understanding human perspective on movie reviews.

train.csv: The train set which contains reviews, labels as well as evidence used for training a model based on human annotations of movie reviews.

test.csv: The test set which contains reviews, labels and evidence that can be used to evaluate models on unseen data related to understanding perspectives of humans when it comes to movie reviews..

How do I use this dataset?

To get started with this dataset you need a working environment such as Python or R where you have access library’s needed for natural language processing(NLP). After setting up an environment with libraries that support NLP tasks execute following steps :

Import csv files into your workspace using appropriate functions provided by specified language libraries e,.g., for Python use pandas read_csv() method .

Preprocess your text data in 'review' & 'label' columns by standardizing them like removing stopwords from sentences & converting words into lowercase etc .Following link link provides best possible preprocessing libraries available in Python .

Train&Test ML algorithms using appropriate feature extraction techniques related to NLP( Bag Of Words , TF-IDF , Word2Vec ) eines are some examples in many more are available Refer link

Measure performance accuracy after running experiments on datasets provided validation & test sets we have also included precision recall curves along famous metrics like F1 score & accuracy score so you could easily analyze hyperparameter tuning & algorithm efficiency according their outputs values you get while testing your ML algorithm

Recommendation systems are always fun! build a simple machine learning reccomendation system by collecting user visits logs post hand writting new featuers might

Research Ideas

Developing an automated movie review summarizer based on user ratings, that can accurately capture the salient points of a review and summarize it for moviegoers.

Training a model to predict the sentiment of a review, by combining machine learning models with human-annotated rationales from this dataset.

Building an AI system that can detect linguistic markers of deception in reviews (e.g., 'fake news', thin reviews etc) and issue warnings on possible fraudulent purchases or online reviews

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: validation.csv

Column name	Description
review	Text from the movie review. (String)
label	Indicates whether a particular review’s sentiment can be classified as Positive (1), Negative (-1) or Neutral (0). (Integer)

File: train.csv

Column name	Description
review	Text from the movie review. (String)
label	Indicates whether a particular review’s sentiment can be classified as Positive (1), Negative (-1) or Neutral (0). (Integer)

File: test.csv

Column name	Description
review	Text from the movie review. (String)
label	Indicates whether a particular review’s sentiment can be classified as Positive (1), Negative (-1) or Neutral (0). (Integer)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Test

@kaggle.thedevastator_unlocking_the_human_perspective_on_movie_reviews.test

676.26 KB
199 rows
3 columns


CREATE TABLE test (
  "review" VARCHAR,
  "label" BIGINT,
  "evidences" VARCHAR
);

Train

@kaggle.thedevastator_unlocking_the_human_perspective_on_movie_reviews.train

4.12 MB
1600 rows
3 columns


CREATE TABLE train (
  "review" VARCHAR,
  "label" BIGINT,
  "evidences" VARCHAR
);

Validation

@kaggle.thedevastator_unlocking_the_human_perspective_on_movie_reviews.validation

512.45 KB
200 rows
3 columns


CREATE TABLE validation (
  "review" VARCHAR,
  "label" BIGINT,
  "evidences" VARCHAR
);