Amazon Product Browse Node Classification Data
Browse node help customer navigate website & classify product to product type
@kaggle.subhamjain_amazon_product_browse_node_classification_data
Browse node help customer navigate website & classify product to product type
@kaggle.subhamjain_amazon_product_browse_node_classification_data
Dataset is from Amazon ML Challenge 2021
Amazon catalog consists of billions of products that belong to thousands of browse nodes (each browse node represents a collection of items for sale). Browse nodes are used to help customers navigate through our website and classify products to product type groups. Hence, it is important to predict the node assignment at the time of listing of the product or when the browse node information is absent.
This dataset is a part of Hackathon** "Amazon ML Challenge"** which was held on July 30, 2021. It contains:
Key column – PRODUCT_ID
Input features – TITLE, DESCRIPTION, BULLET_POINTS, BRAND
Target column – BROWSE_NODE_ID
Train dataset size – 2,903,024
Number of classes in Train – 9,919
Overall Test dataset size – 110,775
Please do UPVOTE it if you find it useful 😊
Currently it has 5k+ views and 300+ downloads. Help it reach out to more users!!
All the credit for the dataset goes to Amazon and HackerEarth (the platform on which the Hackathon was hosted)
The contest used Accuracy as the evaluation metric to measure submissions quality. Since this is a multiclass classification problem, interested in subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of ground truth labels.
CREATE TABLE sample_submission (
"product_id" BIGINT,
"browse_node_id" BIGINT
);
CREATE TABLE test (
"product_id" VARCHAR,
"title" VARCHAR,
"description" VARCHAR,
"bullet_points" VARCHAR,
"brand" VARCHAR
);
CREATE TABLE train (
"title" VARCHAR,
"description" VARCHAR,
"bullet_points" VARCHAR,
"brand" VARCHAR,
"browse_node_id" VARCHAR
);
Anyone who has the link will be able to view this.