Baselight

Hard To Learn Cryptography Dataset

Predict SHA256 cryptographic hash values

@kaggle.tukkonen_difficult_to_learn_perfectly_cryptographic_dataset

About this Dataset

Hard To Learn Cryptography Dataset

10->10 dimensional hard to learn dataset for machine learning. Dataset is created by creating 10 letter random ascii strings (input) and calculating its SHA-256 hash values (the first 10 bits). In theory because we have perfect mathematical function there should no be any errors after learning the dataset perfectly but this is cryptographic function with only a little predictability. The mean value of outputs is roughly 0.50 and standard deviation is 0.50 too.

Overfitting mean absolute error (sum(abs(err))/(N*dim(err)) by a small 10-140-140-10 ReLU neural net is 0.4899 per dimension (TensorFlow). Problem is even harder if you consider the inverse problem. In inverse problem the same neural network gives mean absolute error of 0.25046 or 64.868 if error is in the range of 0-255 (ASCII characters). Test learning capacity of your large neural net by calculating error with this dataset.

The files are ASCII files in which each row is a vector of floats separated by spaces and terminated by a new line. There are no headers in the file format.

Share link

Anyone who has the link will be able to view this.