Compounds For Studying Environmental Exposures
PubChemLite: Annotation Categories for Translational and Applied Research
@kaggle.thedevastator_pubchemlite_compound_collection_for_exposomics_3
PubChemLite: Annotation Categories for Translational and Applied Research
@kaggle.thedevastator_pubchemlite_compound_collection_for_exposomics_3
By [source]
The PubChemLite Compound Collection for Exposomics is a comprehensive compilation of over 371,000 chemicals from a diverse range of areas and application domains. This invaluable library provides data on molecular structure and composition, annotation categories, chemical functionality, as well as useful information about associated disorders and diseases. It encompasses fields ranging from tumorology to drug-discovery, nutrition to toxicology - all enriched with PubMed papers and patents related to each substance. Moreover, the collection includes safety information regarding the pharmacological effects of each compound as well its toxicity profile when exposed in vitro or when metabolised by the liver. For food-related substances the FoodRelated field provides further details on whether their use is suitable for Human Consumption or not. With its comprehensive range of annotation categories this collection can provide invaluable insight into how environment affects human health giving researchers access to serious evidence backed source data helping them pursue important questions in exposomics
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This dataset provides an invaluable resource for research in a range of fields, including tumorology, drug-discovery, food and nutrition, toxicology, and many others. It can be used to explore the relationships between various chemicals and related biological effects.
In order to use the PubChemLite Compound Collection for Exposomics effectively and efficiently there are several key steps to follow:
Familiarize yourself with the columns in the dataset. There are 15 columns available in this dataset which provide information on a range of topics as well as relevant annotation types related to each chemical compound. By understanding which columns are most relevant you can better focus your investigations into specific areas of interest.
Analyze each column according to its type. Each column contains data elements that can have different formats or data types (e.g., integer values for PubMed_Counts). Make sure you understand how these datatypes impact how you interpret or apply your analysis techniques on the data set. Additionally check whether any appropriate filtering is necessary according to certain criteria before further investigating individual rows .
Use tools such as visualization tools for visualizing patterns within specific variables or relationships between them if needed . Plotting techniques such as box scheme libraries (like seaborn ) may be used here where suitable .
- Developing a personalized nutrition plan by correlating individual food intake to the associated chemical compounds for better understanding of nutrient absorption and health effects.
- Understanding reproducibility in drug-discovery and drug safety with detailed analysis of PubMed, Patent and Toxicity information linked to each compound in the dataset.
- Identifying new opportunities for agrochemical research and product development through visibility into AgroChemInfo annotation data linked to key compounds found in the dataset
If you use this dataset in your research, please credit the original authors.
Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: PubChemLite_31Oct2020_exposomics.csv
| Column name | Description |
|---|---|
| FirstBlock | A unique identifier for each chemical compound. (String) |
| PubMed_Count | The number of times the chemical compound has been mentioned in PubMed. (Integer) |
| Patent_Count | The number of times the chemical compound has been mentioned in patents. (Integer) |
| Synonym | A list of alternative names for the chemical compound. (String) |
| MolecularFormula | The molecular formula of the chemical compound. (String) |
| SMILES | The simplified molecular-input line-entry system representation of the chemical compound. (String) |
| InChI | The International Chemical Identifier of the chemical compound. (String) |
| InChIKey | The InChIKey of the chemical compound. (String) |
| MonoisotopicMass | The monoisotopic mass of the chemical compound. (Float) |
| CompoundName | The name of the chemical compound. (String) |
| AnnoTypeCount | The number of annotation types associated with the chemical compound. (Integer) |
| AgroChemInfo | Information related to the use of the chemical compound in agriculture. (String) |
| BioPathway | Information related to the biological pathways associated with the chemical compound. (String) |
| DrugMedicInfo | Information related to the use of the chemical compound in drugs and medicines. (String) |
| FoodRelated | Information related to the use of the chemical compound in food. (String) |
| PharmacoInfo | Information related to the pharmacological properties of the chemical compound. (String) |
| SafetyInfo | Information related to the safety of the chemical compound. (String) |
| ToxicityInfo | Information related to the toxicity of the chemical compound. (String) |
| KnownUse | Information related to the known uses of the chemical compound. (String) |
| DisorderDisease | Information related to the disorders and diseases associated with the chemical compound. (String) |
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit .
CREATE TABLE pubchemlite_31oct2020_exposomics (
"identifier" BIGINT,
"firstblock" VARCHAR,
"pubmed_count" BIGINT,
"patent_count" BIGINT,
"related_cids" VARCHAR,
"synonym" VARCHAR,
"molecularformula" VARCHAR,
"smiles" VARCHAR,
"inchi" VARCHAR,
"inchikey" VARCHAR,
"monoisotopicmass" DOUBLE,
"compoundname" VARCHAR,
"annotypecount" BIGINT,
"agrocheminfo" BIGINT,
"biopathway" BIGINT,
"drugmedicinfo" BIGINT,
"foodrelated" BIGINT,
"pharmacoinfo" BIGINT,
"safetyinfo" BIGINT,
"toxicityinfo" BIGINT,
"knownuse" BIGINT,
"disorderdisease" BIGINT,
"identification" BIGINT
);Anyone who has the link will be able to view this.