Context
The dataset was created for data science bootcamp final project. The goal of the project was to built a model for sentiment analysis. The dataset was created by the author using his python web scraping scripts.
Content
Data is downloaded from ebay website
Two files were uploaded:
-
ebay_reviews.csv
- the dataset consists of 4 columns: product category (e.g. headsets, cell phones etc.), review title, review content and rating. The rating is a numerical type that can take one of the following value: 1, 2, 3, 4, 5. The value of 1 is the worst score, the value of 5 is the best score. The data is not cleaned. It need to be preprocessed for building models
-
ebay_reviews_cleaned.csv
- the dataset that is preprocessed for machine learning algorithms.
It consists of two columns: rating column which can take one of three values:
-1 - this is for reviews with 1,2 rating score
0 - this is for reviews with 3 rating score
1 - this is for reviews for 4, 5 rating score
The second column is a connection of cleaned review title and review content. For more details see "text data cleaning using user-defined transformers" code which I wrote for this dataset
Let me know if you need the scripts for downloading ebay reviews. I will share it.