PriceRunner Product Classification & Clustering
35311 product offers from 10 categories, provided by 306 different merchants
@kaggle.joebeachcapital_pricerunner_product_classification_and_b4e256ef
35311 product offers from 10 categories, provided by 306 different merchants
@kaggle.joebeachcapital_pricerunner_product_classification_and_b4e256ef
This dataset was collected from PriceRunner, a popular product comparison platform. It includes 35311 product offers from 10 categories, provided by 306 different merchants. This dataset offers an ideal ground for evaluating classification, clustering, and entity matching algorithms. Although it contains product-related data, it can still be applied to any problem involving text/short-text mining.
For what purpose was the dataset created?
Product classification, clustering and entity matching.
Short-text clustering algorithms.
Who funded the creation of the dataset?
No funding
What do the instances in this dataset represent?
product offers by various merchants
Are there recommended data splits?
no
Does the dataset contain data that might be considered sensitive in any way?
no
Was there any data preprocessing performed?
Case folding and punctuation removal were applied to the titles of column 2.
CREATE TABLE pricerunner_aggregate (
"product_id" BIGINT,
"product_title" VARCHAR,
"n__merchant_id" BIGINT -- Merchant ID,
"n__cluster_id" BIGINT -- Cluster ID,
"n__cluster_label" VARCHAR -- Cluster Label,
"n__category_id" BIGINT -- Category ID,
"n__category_label" VARCHAR -- Category Label
);Anyone who has the link will be able to view this.