Benchmarks Datasets For Cluster Analysis
25 simulated datasets generated by either Gaussian or Uniform distributions
@kaggle.onthada_benchmarks_datasets_for_clustering
25 simulated datasets generated by either Gaussian or Uniform distributions
@kaggle.onthada_benchmarks_datasets_for_clustering
The datasets are generated using either Gaussian or Uniform distributions. Each dataset contains several known sub-groups intended for testing centroid-based clustering results and cluster validity indices.
Cluster analysis is a popular machine learning used for segmenting datasets with similar data points in the same group. For those who are familiar with R, there is a new R package called "UniversalCVI" https://CRAN.R-project.org/package=UniversalCVI used for cluster evaluation. This package provides algorithms for checking the accuracy of a clustering result with known classes, computing cluster validity indices, and generating plots for comparing them. The package is compatible with K-means, fuzzy C means, EM clustering, and hierarchical clustering (single, average, and complete linkage). To use the "UniversalCVI" package, one can follow the instructions provided in the R documentation.
For more in-depth details of the package and cluster evaluation, please see the papers
https://doi.org/10.1016/j.patcog.2023.109910 and https://arxiv.org/abs/2308.14785
CREATE TABLE d10_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE d11_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE d12_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE d13_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE d14_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE d15_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE d16_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE d17_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE d18_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE d1_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE d2_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE d3_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE d4_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE d5_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE d6_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE d7_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE d8_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE d9_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE r1_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE r2_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE r3_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE r4_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE r5_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE r6_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
CREATE TABLE r7_data (
"x" DOUBLE,
"y" DOUBLE,
"label" BIGINT
);
Anyone who has the link will be able to view this.