Context
Dataset is from Amazon ML Challenge 2021
Amazon catalog consists of billions of products that belong to thousands of browse nodes (each browse node represents a collection of items for sale). Browse nodes are used to help customers navigate through our website and classify products to product type groups. Hence, it is important to predict the node assignment at the time of listing of the product or when the browse node information is absent.
Content
This dataset is a part of Hackathon** "Amazon ML Challenge"** which was held on July 30, 2021. It contains:
Key column – PRODUCT_ID
Input features – TITLE, DESCRIPTION, BULLET_POINTS, BRAND
Target column – BROWSE_NODE_ID
Train dataset size – 2,903,024
Number of classes in Train – 9,919
Overall Test dataset size – 110,775
Thank you
Please do UPVOTE it if you find it useful 😊
Currently it has 5k+ views and 300+ downloads. Help it reach out to more users!!
Acknowledgements
All the credit for the dataset goes to Amazon and HackerEarth (the platform on which the Hackathon was hosted)
Inspiration
The contest used Accuracy as the evaluation metric to measure submissions quality. Since this is a multiclass classification problem, interested in subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of ground truth labels.