Baselight

COCONUT: The COlleCtion Of Open NatUral ProducTs.

Unlocking Molecule Information

@kaggle.thedevastator_open_source_natural_product_annotations

About this Dataset

COCONUT: The COlleCtion Of Open NatUral ProducTs.


Open Source Natural Product Annotations

Unlocking Molecule Information

By [source]


About this dataset

This dataset contains data on a collection of natural products in the form of molecular annotations. Information includes the molecular formula, clean-SMILES representation, InChi representation, and corresponding InChiKey. With this unique data set, you are able to explore and gain insight into some of the most captivating organic molecules out there! Furthermore, it is an open source platform to help identify potential sources for novel compounds for drug discovery and other applications. So go ahead--discover new and exciting natural products that nature has bestowed upon us all!

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

  • Molecular Formula: This is a string that represents the elemental composition of a molecule. It can be used to easily distinguish different forms and structures of molecules.

  • Clean Smiles: This is a simplified molecular-input line-entry system (SMILES) representation of the natural product. It is a simple way to represent atoic and bond connectivity within molecules, making it readable by computers and databases while preserving necessary chemical information on the compounds.

  • InChi: This is an international chemical identifier (InChi) representation for each molecule in this collection. InChis are specifically designed to capture important structural characteristics from chemical compounds in order form that can be interpreted globallyy by managing data sources as well as multiple computer systems as it remains valid through different format transformations without any loss or alteration of data accuracy when decompressed or regenerated

This dataset provides researchers with an unified opportunity to access detailed molecular properties required for their research without requiring special software or hardware capabilities for their analysis, which makes exploration easier than before! With this dataset, researchers will gain access to deep knowledge about different molecular structures - allowing them to discover new and exciting possibilities with scientific applications such as drug discovery, materials science exploration etc.. If you are interested in learning more about other features available within our natural products database please refer directly ti our repository found here

Research Ideas

  • Automatically predicting the effects of natural products on biochemical pathways in biological cells to explore potential therapeutic activities.
  • Analyzing the effects of different mixtures of natural products and their individual components as starting points for drug discovery processes.
  • By computational exploration, highlighting active compounds within a library of natural products to be used as leads when designing novel drugs that target specific pathways

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: COCONUT4MetFrag_april.csv

Column name Description
molecular_formula This column contains the chemical formula of the molecule which describes each atom present in it and how many times they occur as well as their elemental composition. (String)
clean_smiles This column contains the Simplified Molecular-Input Line-Entry System (SMILES) representation of the molecule which allows implicit hydrogen atoms to be represented by brackets rather than explicit hydrogen atoms. (String)
inchi This column contains the International Chemical Identifier (InChI) representation of the molecule which provides a consistent means to represent chemical substances through creation of unique identifiers consisting functional groups found in its structure. (String)
inchikey This column contains the International Chemical Identifier (InChIKey) representation of the molecule which consists of 27 characters including numbers, capital letters and hyphens, which serves as a condensed version of InChI allowing for easier comparison across independent resources. (String)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit .

Tables

Coconut4metfrag April

@kaggle.thedevastator_open_source_natural_product_annotations.coconut4metfrag_april
  • 70.18 MB
  • 423706 rows
  • 6 columns
Loading...

CREATE TABLE coconut4metfrag_april (
  "coconut_id" VARCHAR,
  "molecular_formula" VARCHAR,
  "clean_smiles" VARCHAR,
  "inchi" VARCHAR,
  "inchikey" VARCHAR,
  "coconut_id_1" VARCHAR
);