Baselight

PriceRunner Product Classification & Clustering

35311 product offers from 10 categories, provided by 306 different merchants

@kaggle.joebeachcapital_pricerunner_product_classification_and_b4e256ef

About this Dataset

PriceRunner Product Classification & Clustering

This dataset was collected from PriceRunner, a popular product comparison platform. It includes 35311 product offers from 10 categories, provided by 306 different merchants. This dataset offers an ideal ground for evaluating classification, clustering, and entity matching algorithms. Although it contains product-related data, it can still be applied to any problem involving text/short-text mining.

For what purpose was the dataset created?

Product classification, clustering and entity matching.
Short-text clustering algorithms.

Who funded the creation of the dataset?

No funding

What do the instances in this dataset represent?

product offers by various merchants

Are there recommended data splits?

no

Does the dataset contain data that might be considered sensitive in any way?

no

Was there any data preprocessing performed?

Case folding and punctuation removal were applied to the titles of column 2.

Share link

Anyone who has the link will be able to view this.