This dataset was collected from PriceRunner, a popular product comparison platform. It includes 35311 product offers from 10 categories, provided by 306 different merchants. This dataset offers an ideal ground for evaluating classification, clustering, and entity matching algorithms. Although it contains product-related data, it can still be applied to any problem involving text/short-text mining.
For what purpose was the dataset created?
Product classification, clustering and entity matching.
Short-text clustering algorithms.
Who funded the creation of the dataset?
No funding
What do the instances in this dataset represent?
product offers by various merchants
Are there recommended data splits?
no
Does the dataset contain data that might be considered sensitive in any way?
no
Was there any data preprocessing performed?
Case folding and punctuation removal were applied to the titles of column 2.