Compounds For Studying Environmental Exposures by Kaggle | Academic Research

About this Dataset

Compounds For Studying Environmental Exposures

Compounds for Studying Environmental Exposures

PubChemLite: Annotation Categories for Translational and Applied Research

About this dataset

The PubChemLite Compound Collection for Exposomics is a comprehensive compilation of over 371,000 chemicals from a diverse range of areas and application domains. This invaluable library provides data on molecular structure and composition, annotation categories, chemical functionality, as well as useful information about associated disorders and diseases. It encompasses fields ranging from tumorology to drug-discovery, nutrition to toxicology - all enriched with PubMed papers and patents related to each substance. Moreover, the collection includes safety information regarding the pharmacological effects of each compound as well its toxicity profile when exposed in vitro or when metabolised by the liver. For food-related substances the FoodRelated field provides further details on whether their use is suitable for Human Consumption or not. With its comprehensive range of annotation categories this collection can provide invaluable insight into how environment affects human health giving researchers access to serious evidence backed source data helping them pursue important questions in exposomics

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset provides an invaluable resource for research in a range of fields, including tumorology, drug-discovery, food and nutrition, toxicology, and many others. It can be used to explore the relationships between various chemicals and related biological effects.

In order to use the PubChemLite Compound Collection for Exposomics effectively and efficiently there are several key steps to follow:

Familiarize yourself with the columns in the dataset. There are 15 columns available in this dataset which provide information on a range of topics as well as relevant annotation types related to each chemical compound. By understanding which columns are most relevant you can better focus your investigations into specific areas of interest.

Analyze each column according to its type. Each column contains data elements that can have different formats or data types (e.g., integer values for PubMed_Counts). Make sure you understand how these datatypes impact how you interpret or apply your analysis techniques on the data set. Additionally check whether any appropriate filtering is necessary according to certain criteria before further investigating individual rows .

Use tools such as visualization tools for visualizing patterns within specific variables or relationships between them if needed . Plotting techniques such as box scheme libraries (like seaborn ) may be used here where suitable .

Research Ideas

Developing a personalized nutrition plan by correlating individual food intake to the associated chemical compounds for better understanding of nutrient absorption and health effects.

Understanding reproducibility in drug-discovery and drug safety with detailed analysis of PubMed, Patent and Toxicity information linked to each compound in the dataset.

Identifying new opportunities for agrochemical research and product development through visibility into AgroChemInfo annotation data linked to key compounds found in the dataset

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: PubChemLite_31Oct2020_exposomics.csv

Column name	Description
FirstBlock	A unique identifier for each chemical compound. (String)
PubMed_Count	The number of times the chemical compound has been mentioned in PubMed. (Integer)
Patent_Count	The number of times the chemical compound has been mentioned in patents. (Integer)
Synonym	A list of alternative names for the chemical compound. (String)
MolecularFormula	The molecular formula of the chemical compound. (String)
SMILES	The simplified molecular-input line-entry system representation of the chemical compound. (String)
InChI	The International Chemical Identifier of the chemical compound. (String)
InChIKey	The InChIKey of the chemical compound. (String)
MonoisotopicMass	The monoisotopic mass of the chemical compound. (Float)
CompoundName	The name of the chemical compound. (String)
AnnoTypeCount	The number of annotation types associated with the chemical compound. (Integer)
AgroChemInfo	Information related to the use of the chemical compound in agriculture. (String)
BioPathway	Information related to the biological pathways associated with the chemical compound. (String)
DrugMedicInfo	Information related to the use of the chemical compound in drugs and medicines. (String)
FoodRelated	Information related to the use of the chemical compound in food. (String)
PharmacoInfo	Information related to the pharmacological properties of the chemical compound. (String)
SafetyInfo	Information related to the safety of the chemical compound. (String)
ToxicityInfo	Information related to the toxicity of the chemical compound. (String)
KnownUse	Information related to the known uses of the chemical compound. (String)
DisorderDisease	Information related to the disorders and diseases associated with the chemical compound. (String)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit .

Tables

Pubchemlite 31oct2020 Exposomics

@kaggle.thedevastator_pubchemlite_compound_collection_for_exposomics_3.pubchemlite_31oct2020_exposomics

70.4 MB
371663 rows
23 columns


CREATE TABLE pubchemlite_31oct2020_exposomics (
  "identifier" BIGINT,
  "firstblock" VARCHAR,
  "pubmed_count" BIGINT,
  "patent_count" BIGINT,
  "related_cids" VARCHAR,
  "synonym" VARCHAR,
  "molecularformula" VARCHAR,
  "smiles" VARCHAR,
  "inchi" VARCHAR,
  "inchikey" VARCHAR,
  "monoisotopicmass" DOUBLE,
  "compoundname" VARCHAR,
  "annotypecount" BIGINT,
  "agrocheminfo" BIGINT,
  "biopathway" BIGINT,
  "drugmedicinfo" BIGINT,
  "foodrelated" BIGINT,
  "pharmacoinfo" BIGINT,
  "safetyinfo" BIGINT,
  "toxicityinfo" BIGINT,
  "knownuse" BIGINT,
  "disorderdisease" BIGINT,
  "identification" BIGINT
);