Baselight

Benchmarks Datasets For Cluster Analysis

25 simulated datasets generated by either Gaussian or Uniform distributions

@kaggle.onthada_benchmarks_datasets_for_clustering

About this Dataset

Benchmarks Datasets For Cluster Analysis

25 Artificial Datasets

The datasets are generated using either Gaussian or Uniform distributions. Each dataset contains several known sub-groups intended for testing centroid-based clustering results and cluster validity indices.

Cluster analysis is a popular machine learning used for segmenting datasets with similar data points in the same group. For those who are familiar with R, there is a new R package called "UniversalCVI" https://CRAN.R-project.org/package=UniversalCVI used for cluster evaluation. This package provides algorithms for checking the accuracy of a clustering result with known classes, computing cluster validity indices, and generating plots for comparing them. The package is compatible with K-means, fuzzy C means, EM clustering, and hierarchical clustering (single, average, and complete linkage). To use the "UniversalCVI" package, one can follow the instructions provided in the R documentation.

For more in-depth details of the package and cluster evaluation, please see the papers
https://doi.org/10.1016/j.patcog.2023.109910 and https://arxiv.org/abs/2308.14785

All the datasets are also available on GitHub at

https://github.com/O-PREEDASAWAKUL/FuzzyDatasets.git .

Share link

Anyone who has the link will be able to view this.