Overview
Context
The method of disuniting similar data is called clustering. you can create dummy data for classifying clusters by method from sklearn package but it needs to put your effort into job.
For users who making hard test cases for example of clustering, I think this dataset helps them.
Try out to select a meaningful number of clusters, and dividing the data into clusters. Here are exercises for you.
Dataset
All csv files contain a lots of x
, y
and color
, and you can see above figures.
If you want to use position as type of integer, scale it and round off to integer as like x = round(x * 100)
.
Furthermore, here is GUI Tool to generate 2D points for clustering. you can make your dataset with this tool. https://www.joonas.io/cluster-paint
Stay tuned for further updates! also if any idea, you can comment me.