Baselight

Massive Product Text Classification Dataset

Large dataset for training models to classify product titles to categories

@kaggle.asaniczka_product_titles_text_classification

About this Dataset

Massive Product Text Classification Dataset

Product title classification is an important task in e-commerce, as it helps to categorize and organize millions of products available online.

This dataset provides a large-scale collection of product titles from Amazon USA, Canada, and UK, along with their corresponding categories.

With over 5 million samples and 700+ categories, this dataset is ideal for training models to suggest the best category for a given product title.

Please upvote if you find this dataset useful! 😊💙

Interesting Task Ideas:

  1. Train a text classification model to automatically categorize products based on their titles.
  2. Explore the distribution of categories and identify the most frequent and rare ones.
  3. Evaluate and compare different machine learning algorithms and deep learning architectures for product title classification.
  4. Implement transfer learning techniques to improve the classification performance with limited labeled data.
  5. Pretrain language models on this dataset for downstream tasks like product recommendation, search ranking, and sentiment analysis.
  6. Apply clustering techniques to identify the relationships between different categories based on the similarity of product titles.

Photo by Tim Mossholder on Unsplash

Share link

Anyone who has the link will be able to view this.