Product Classification And Clustering
3 datasets for evaluating product classification and categorization algorithms
@kaggle.lakritidis_product_classification_and_categorization
3 datasets for evaluating product classification and categorization algorithms
@kaggle.lakritidis_product_classification_and_categorization
This repository includes 3 datasets that offer an ideal ground for evaluating classification and categorization algorithms. All datasets contain e-commerce data; that is, product IDs, their titles, and their corresponding category. However, they can easily be applied to any problem which involves text/short-text mining.
The data originates from 3 real online electronic stores and product comparison platforms. It has been collected by a special focused Web crawler which has been developed for this purpose. For each of the 3 datasets, two versions exist: CSV and XML. The researchers that will use this repository may use any of these two versions according to their preferences. The following list contains some additional useful information:
The first dataset has been used in [1] to evaluate the proposed classifier. Anybody who will use this dataset in his/her research effort is kindly asked to cite [1] in his/her published article. On the other hand, the other two datasets have been employed in [2] for entity matching and clustering tasks. They can also be used in classification/categorization problems. Similarly, any researcher who will use any of these 2 datasets is kindly asked to cite [2] in his/her published article.
The datasets are licensed under General Public License (GPL 2.0) and can be used by anybody.
Nevertheless, in the case they are used for research purposes, the researchers are kindly requested to include the following articles into the References list of their published paper/s:
[1] L. Akritidis, A. Fevgas, P. Bozanis, C. Makris, "A Self-Verifying Clustering Approach to Unsupervised Matching of Product Titles", Artificial Intelligence Review (Springer), pp. 1-44, 2020.
[2] L. Akritidis, P. Bozanis, "Effective Unsupervised Matching of Product Titles with k-Combinations and Permutations", In Proceedings of the 14th IEEE International Conference on Innovations in Intelligent Systems and Applications (INISTA), pp. 1-10, 2018.
[3] L. Akritidis, A. Fevgas, P. Bozanis, "Effective Product Categorization with Importance Scores and Morphological Analysis of the Titles", In Proceedings of the 30th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pp. 213-220, 2018.
Anyone who has the link will be able to view this.