Context
The Chinese MNIST dataset uses data collected in the frame of a project at Newcastle University.
Project Description
One hundred Chinese nationals took part in data collection. Each participant wrote with a standard black ink pen all 15 numbers in a table with 15 designated regions drawn on a white A4 paper. This process was repeated 10 times with each participant. Each sheet was scanned at the resolution of 300x300 pixels.
It resulted a dataset of 15000 images, each representing one character from a set of 15 characters (grouped in samples, grouped in suites, with 10 samples/volunteer and 100 volunteers).
Further Data Processing
I downloaded from the original project page the raw images. Based on images names, I created an index for each image, as following:
original name (example): Locate{1,3,4}.jpg
index extracted: suite_id: 1, sample_id: 3, code: 4
resulted file name: input_1_3_4.jpg
I also added the mapping of each image code to the actual numeric value of Chinese number character and the actual Chinese character.
Here is described the mapping
Content
The dataset contains the following:
- an index file,
chinese_mnist.csv
- a folder with 15,000 jpg images, sized 64 x 64. See the images folder description for details.
Acknowledgements
I want to express my gratitude to the following people: Dr. K Nazarpour and Dr. M Chen from Newcastle University, who collected the data.
Inspiration
You can use this data the same way you used MNIST, KMNIST of Fashion MNIST: refine your image classification skills, use GPU & TPU to implement CNN architectures for models to perform such multiclass classifications.