IMDB Movie Reviews (Binary Sentiment)
The classic sentiment analysis dataset
@kaggle.thedevastator_imdb_large_movie_review_dataset_binary_sentiment
The classic sentiment analysis dataset
@kaggle.thedevastator_imdb_large_movie_review_dataset_binary_sentiment
Huggingface Hub: link
This is a large dataset for binary sentiment classification containing a substantial amount of data compared to previous benchmark datasets. Provided are 25,000 highly polar movie reviews for training and 25,000 for testing. There is also additional unlabeled data available for use. The data fields are consistent among all splits of the dataset
In order to use this dataset, you will need to first download the IMDB Large Movie Review Dataset. Once you have downloaded the dataset, you can either use it in its original form or split it into training and testing sets. To split the dataset, you will need to create a new file called unsupervised.csv and copy the text column from train.csv into it. You can then split unsupervised.csv into two files: train_unsupervised.csv and test_unsupervised.csv.
Once you have either the original dataset or the training and testing sets, you can begin using them for binary sentiment classification. In order to do this, you will need to use a machine learning algorithm that is capable of performing binary classification, such as logistic regression or support vector machines. Once you have trained your model on the training set, you can then evaluate its performance on the test set by predicting the labels of the reviews in test_unsupervised.csv
- This dataset can be used to train a binary sentiment classification model.
- This dataset can be used to train a model to classify movie reviews into positive and negative sentiment categories.
- This dataset can be used to build a large movie review database for research purposes
The dataset was originally posted on Huggingface Hub
License
> License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
> No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: train.csv
| Column name | Description |
|---|---|
| text | The text of the review. (String) |
| label | The label for the review, 0 for negative and 1 for positive. (Integer) |
File: test.csv
| Column name | Description |
|---|---|
| text | The text of the review. (String) |
| label | The label for the review, 0 for negative and 1 for positive. (Integer) |
File: unsupervised.csv
| Column name | Description |
|---|---|
| text | The text of the review. (String) |
| label | The label for the review, 0 for negative and 1 for positive. (Integer) |
CREATE TABLE test (
"text" VARCHAR,
"label" BIGINT
);CREATE TABLE train (
"text" VARCHAR,
"label" BIGINT
);CREATE TABLE unsupervised (
"text" VARCHAR,
"label" BIGINT
);Anyone who has the link will be able to view this.