Baselight
Sign In
kaggle

Multi-label Classification Of Enzyme Substrates

Kaggle

@kaggle.gopalns_ec_mixed_class

Loading...
Loading...

Multi-label classification

Dataset Description

Background

Enzymes are known to act on molecules with structural similarities with their substrates. This behaviour is called promiscuity. Scientists working in drug discovery use this behaviour to target/design drugs to either block or promote biological actions. But, correct prediction of EC class(s) of substrates associated with enzymes has been a challenge in biology. Since there is no shortage of data, ML techniques can be employed to solve the aforementioned problem.

Points to keep in mind

  1. Substrate molecules can belong to multiple EC-Classes at the same time as same molecules participate in different types of reactions in biology
  2. Dataset is highly imbalanced in labels
  • Need an algorithm that can tackle label imbalance
  • Smallest label count is 1 and highest label count is 248

Content

There are 3 files names mixed_(desc, ecfp, fcfp).csv containing chemical, structural, connectivity information.


Related Datasets

Share link

Anyone who has the link will be able to view this.