Baselight

Benchmarks Datasets For Cluster Analysis

25 simulated datasets generated by either Gaussian or Uniform distributions

@kaggle.onthada_benchmarks_datasets_for_clustering

Loading...
Loading...

About this Dataset

Benchmarks Datasets For Cluster Analysis

25 Artificial Datasets

The datasets are generated using either Gaussian or Uniform distributions. Each dataset contains several known sub-groups intended for testing centroid-based clustering results and cluster validity indices.

Cluster analysis is a popular machine learning used for segmenting datasets with similar data points in the same group. For those who are familiar with R, there is a new R package called "UniversalCVI" https://CRAN.R-project.org/package=UniversalCVI used for cluster evaluation. This package provides algorithms for checking the accuracy of a clustering result with known classes, computing cluster validity indices, and generating plots for comparing them. The package is compatible with K-means, fuzzy C means, EM clustering, and hierarchical clustering (single, average, and complete linkage). To use the "UniversalCVI" package, one can follow the instructions provided in the R documentation.

For more in-depth details of the package and cluster evaluation, please see the papers
https://doi.org/10.1016/j.patcog.2023.109910 and https://arxiv.org/abs/2308.14785

All the datasets are also available on GitHub at

https://github.com/O-PREEDASAWAKUL/FuzzyDatasets.git .

Tables

D10 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d10_data
  • 25.38 kB
  • 1,250 rows
  • 3 columns
Loading...
CREATE TABLE d10_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

D11 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d11_data
  • 11.91 kB
  • 500 rows
  • 3 columns
Loading...
CREATE TABLE d11_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

D12 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d12_data
  • 21.29 kB
  • 1,000 rows
  • 3 columns
Loading...
CREATE TABLE d12_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

D13 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d13_data
  • 21.27 kB
  • 1,000 rows
  • 3 columns
Loading...
CREATE TABLE d13_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

D14 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d14_data
  • 21.29 kB
  • 1,000 rows
  • 3 columns
Loading...
CREATE TABLE d14_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

D15 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d15_data
  • 21.29 kB
  • 1,000 rows
  • 3 columns
Loading...
CREATE TABLE d15_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

D16 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d16_data
  • 10.55 kB
  • 425 rows
  • 3 columns
Loading...
CREATE TABLE d16_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

D17 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d17_data
  • 35.62 kB
  • 1,750 rows
  • 3 columns
Loading...
CREATE TABLE d17_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

D18 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d18_data
  • 45.55 kB
  • 2,250 rows
  • 3 columns
Loading...
CREATE TABLE d18_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

D1 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d1_data
  • 30.95 kB
  • 1,500 rows
  • 3 columns
Loading...
CREATE TABLE d1_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

D2 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d2_data
  • 25.31 kB
  • 1,200 rows
  • 3 columns
Loading...
CREATE TABLE d2_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

D3 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d3_data
  • 29.05 kB
  • 1,400 rows
  • 3 columns
Loading...
CREATE TABLE d3_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

D4 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d4_data
  • 48.4 kB
  • 2,400 rows
  • 3 columns
Loading...
CREATE TABLE d4_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

D5 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d5_data
  • 9.16 kB
  • 350 rows
  • 3 columns
Loading...
CREATE TABLE d5_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

D6 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d6_data
  • 23.44 kB
  • 1,100 rows
  • 3 columns
Loading...
CREATE TABLE d6_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

D7 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d7_data
  • 30.95 kB
  • 1,500 rows
  • 3 columns
Loading...
CREATE TABLE d7_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

D8 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d8_data
  • 40.31 kB
  • 2,000 rows
  • 3 columns
Loading...
CREATE TABLE d8_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

D9 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.d9_data
  • 19.61 kB
  • 1,000 rows
  • 3 columns
Loading...
CREATE TABLE d9_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

R1 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.r1_data
  • 11.02 kB
  • 450 rows
  • 3 columns
Loading...
CREATE TABLE r1_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

R2 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.r2_data
  • 35.64 kB
  • 1,750 rows
  • 3 columns
Loading...
CREATE TABLE r2_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

R3 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.r3_data
  • 32.88 kB
  • 1,600 rows
  • 3 columns
Loading...
CREATE TABLE r3_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

R4 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.r4_data
  • 26.26 kB
  • 1,250 rows
  • 3 columns
Loading...
CREATE TABLE r4_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

R5 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.r5_data
  • 25.31 kB
  • 1,200 rows
  • 3 columns
Loading...
CREATE TABLE r5_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

R6 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.r6_data
  • 30.95 kB
  • 1,500 rows
  • 3 columns
Loading...
CREATE TABLE r6_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

R7 Data

@kaggle.onthada_benchmarks_datasets_for_clustering.r7_data
  • 19.63 kB
  • 1,200 rows
  • 3 columns
Loading...
CREATE TABLE r7_data (
  "x" DOUBLE,
  "y" DOUBLE,
  "label" BIGINT
);

Share link

Anyone who has the link will be able to view this.