Baselight

SMILES DataSet For Analysis & Prediction Dataset

ReLeaSE is a dataset, consisting of molecular structures and their corresponding

@kaggle.yanmaksi_big_molecules_smiles_dataset

Loading...
Loading...

About this Dataset

SMILES DataSet For Analysis & Prediction Dataset

Read this article to get unlock the wonderful world Deep Reinforcement Learning for Drug Design

ReLeaSE is a public dataset, consisting of molecular structures and their corresponding binding affinity to proteins. The dataset was created for the purpose of evaluating and comparing machine learning models for the prediction of protein-ligand binding affinity.

The dataset contains a total of 10,000 molecules and their binding affinity to several target proteins, including thrombin, kinase, and protease. The molecular structures are represented using Simplified Molecular Input Line Entry System (SMILES) notation, which is a standardized method for representing molecular structures as a string of characters. The binding affinity is represented as a negative logarithm of the dissociation constant (pKd), which is a measure of the strength of the interaction between the molecule and the target protein.

The ReLeaSE dataset provides a standardized benchmark for evaluating machine learning models for protein-ligand binding affinity prediction. The dataset is publicly available and can be used for research purposes, making it an important resource for the drug discovery community.

Tables

Smiles Big Data Set

@kaggle.yanmaksi_big_molecules_smiles_dataset.smiles_big_data_set
  • 513.99 KB
  • 16087 rows
  • 5 columns
Loading...

CREATE TABLE smiles_big_data_set (
  "smiles" VARCHAR,
  "pic50" DOUBLE,
  "mol" VARCHAR,
  "num_atoms" BIGINT,
  "logp" DOUBLE
);

Share link

Anyone who has the link will be able to view this.