Multi-label classification

Background

Enzymes are known to act on molecules with structural similarities with their substrates. This behaviour is called promiscuity. Scientists working in drug discovery use this behaviour to target/design drugs to either block or promote biological actions. But, correct prediction of EC class(s) of substrates associated with enzymes has been a challenge in biology. Since there is no shortage of data, ML techniques can be employed to solve the aforementioned problem.

Points to keep in mind

Substrate molecules can belong to multiple EC-Classes at the same time as same molecules participate in different types of reactions in biology
Dataset is highly imbalanced in labels

Need an algorithm that can tackle label imbalance
Smallest label count is 1 and highest label count is 248

Content

There are 3 files names mixed_(desc, ecfp, fcfp).csv containing chemical, structural, connectivity information.

Related Datasets

ArXiv Paper Abstracts

@kaggle
Selective Enzymatic C H-Oxyfunctionalization For The EfficientSynthesis Of Grevillic Acid

@zenodo
Production: Crops And Livestock Products

@owid
Eucalyptus Growth And Environmental Data

@euremarkable
Louisville Metro KY - Inspection Results School Food Service

@usgov
Pl@ntNet-300K-v2 Image Dataset

@zenodo

ArXiv Paper Abstracts

Selective Enzymatic C H-Oxyfunctionalization For The EfficientSynthesis Of Grevillic Acid

Production: Crops And Livestock Products

Eucalyptus Growth And Environmental Data

Louisville Metro KY - Inspection Results School Food Service

Pl@ntNet-300K-v2 Image Dataset