Context
This is a small subset of dataset of Book reviews from Amazon Kindle Store category.
Content
5-core dataset of product reviews from Amazon Kindle Store category from May 1996 - July 2014. Contains total of 982619 entries. Each reviewer has at least 5 reviews and each product has at least 5 reviews in this dataset.
Columns
- asin - ID of the product, like B000FA64PK
-helpful - helpfulness rating of the review - example: 2/3.
-overall - rating of the product.
-reviewText - text of the review (heading).
-reviewTime - time of the review (raw).
-reviewerID - ID of the reviewer, like A3SPTOKDG7WBLN
-reviewerName - name of the reviewer.
-summary - summary of the review (description).
-unixReviewTime - unix timestamp.
Which file to use?
There are two files one is preprocessed ready for sentiment analysis and other is unprocessed to you basically have to process the dataset and then perform sentiment analysis
Acknowledgements
This dataset is taken from Amazon product data, Julian McAuley, UCSD website. http://jmcauley.ucsd.edu/data/amazon/
License to the data files belong to them.
Inspiration
-Sentiment analysis on reviews.
-Understanding how people rate usefulness of a review/ What factors influence helpfulness of a review.
-Fake reviews/ outliers.
-Best rated product IDs, or similarity between products based on reviews alone (not the best idea ikr).
-Any other interesting analysis