Baselight

Pichache-McLean CCS Database

3800 CCS Values for Compound Characterization

@kaggle.thedevastator_pichache_mclean_ccs_database

Loading...
Loading...

About this Dataset

Pichache-McLean CCS Database


Pichache-McLean CCS Database

3800 CCS Values for Compound Characterization

By [source]


About this dataset

This dataset contains 3800 experimental collision cross section values, allowing us to get an unprecedented glimpse into the fundamentals of particle mobility. Collected by researchers Jackie Picache and John McLean of Vanderbilt University, these values were generated using drift tube MS and represent the CCS of different compounds. With this info we can uncover data on molecular properties that up until now remain hidden. From the adduct type and charge to the kingdom and superclass, many factors are at play when it comes to collisions between atoms and molecules- now we can study them in detail! Here you will find all relevant information such as: molecular formula, CAS number, InChIKey version 3 & 4, InChI string v1.02 & 1.04, PubChem CID SMILES string for both canonical & isomeric forms as well as XLogP values- not to mention an abundance of other metrics like m/z ratios, peak numbers & collision cross sections! All this makes it possible to study particle interactions on a level never seen before-- discover it today!

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

Guide: How to use the Picache-McLean CCS Database

This dataset contains 3800 experimental collision cross section (CCS) values collected by researchers Jackie Picache and John McLean of Vanderbilt University. These experimental values enable users to better understand the structure and properties of various compounds, and create more effective models for use in collisions studies where CCS is used as a measure of fragmentation efficiency.

In order to make optimal use of this dataset, it is important to understand the various columns present in it. This guide provides an overview of these columns and gives examples on how they can be used to get meaningful insights from the data.

The first column, MolecularFormula gives us an idea about the general structure of a compound represented by its empirical formula (e.g., C6H12O6). The second column CanonicalSMILES is an internationally accepted string notation developed for representing molecules; it allows computational models to accurately store chemical information as well as retrieve it quickly. The IsomericSMILES column stores data related specifically to stereoisomers; that is, any molecule which is composed of atoms connected in such a way that there are multiple ways in which they can be organized spatially but with same connectivity pattern have similar SMILES strings but different Isomeric SMILES strings due their different orientations (e.g., left handed vs right handed forms).
The InChI and InChIKey columns contain International Chemical Identifier (InChI) system codes generated by the software application known as Chemdraw along with separate keys generated from them respectively; these two codes allow us uniquely identify a particular chemical substance from other chemicals having completely different structures but same elements so if any chemical compound has same InChI key then both will correspondrom identical molecule representation in either 2D or 3D space also IUPACName helps user for easily understanding name of polymer without confusing its structural aspects or interpretation through advanced scholarly sources such as book literature preferably when no canonical representation like molecular formula exists for respective polymer for distinction (e.g., caffeine). XLogP entries provide us information about hydrophobicity associated with particular pharmacological agents whereas ExactMass determinates how much mass his held by single atom/molecule represented through each entry giving scientists accurate calculation enabling them designing more precise experiment whose results between difference amounting 0-1 error might range changes vastly than significant undeterminant increase/decrease expected when masses estimated manually depending upon experience good

Research Ideas

  • Using the Complete Collision Cross Sections (CCS) to predict physical properties of compounds such as boiling point, melting point and partition coefficient.
  • Utilizing the SMILES strings to create an AI/ML algorithm that can accurately generate CCS data for each compound quickly and easily.
  • Developing a web-based interface utilizing the InChIKeys to allow researchers to search and discover related compounds with similar collision cross sections

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: S50_CCSCOMPEND_Substances.csv

Column name Description
MolecularFormula The molecular formula of the compound. (String)
CanonicalSMILES The canonical SMILES string representation of the compound. (String)
IsomericSMILES The isomeric SMILES string representation of the compound. (String)
InChI The InChI representation of the compound. (String)
InChIKey The InChIKey representation of the compound. (String)
IUPACName The IUPAC name of the compound. (String)
XLogP The log octanol-water partition coefficient of the compound. (Float)
ExactMass The exact mass of the compound. (Float)
Title The title of the compound. (String)

File: 20190304JAP_CCSdatabase_final_ed.csv

Column name Description
Compound The name of the compound. (String)
Neutral.Formula The molecular formula of the compound. (String)
CAS The CAS registry number of the compound. (String)
CAS_RN The CAS registry number of the compound. (String)
InChIKey_original The InChIKey of the compound. (String)
InChIKey_parent The InChIKey of the parent compound. (String)
SMILES The SMILES string representation of the compound. (String)
mz The m/z value of the compound. (Float)
Adduct The adduct type of the compound. (String)
Charge The charge of the compound. (Integer)
CCS The collision cross section of the compound. (Float)
SD The standard deviation of the CCS. (Float)
RSD The relative standard deviation of the CCS. (Float)
CCS.z The normalized CCS of the compound. (Float)
Peak.N The peak number of the compound. (Integer)
Kingdom The kingdom name of the compound. (String)
Super.Class The superclass of the compound. (String)
Subclass The subclass of the compound. (String)
N.Rep The number of replicates of the compound. (Integer)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit .

Tables

N 20190304jap Ccsdatabase Final Ed

@kaggle.thedevastator_pichache_mclean_ccs_database.n_20190304jap_ccsdatabase_final_ed
  • 164.58 KB
  • 1983 rows
  • 22 columns
Loading...

CREATE TABLE n_20190304jap_ccsdatabase_final_ed (
  "compound" VARCHAR,
  "neutral_formula" VARCHAR,
  "cas" VARCHAR,
  "cas_rn" VARCHAR,
  "inchikey_original" VARCHAR,
  "inchikey_parent" VARCHAR,
  "pubchem_cid" BIGINT,
  "smiles" VARCHAR,
  "mz" DOUBLE,
  "adduct" VARCHAR,
  "charge" BIGINT,
  "ccs" DOUBLE,
  "sd" DOUBLE,
  "rsd" DOUBLE,
  "ccs_z" DOUBLE,
  "peak_n" BIGINT,
  "kingdom" VARCHAR,
  "super_class" VARCHAR,
  "class" VARCHAR,
  "subclass" VARCHAR,
  "n_rep" BIGINT,
  "sources" VARCHAR
);

S50 Ccscompend Substances

@kaggle.thedevastator_pichache_mclean_ccs_database.s50_ccscompend_substances
  • 165.76 KB
  • 869 rows
  • 10 columns
Loading...

CREATE TABLE s50_ccscompend_substances (
  "cid" BIGINT,
  "molecularformula" VARCHAR,
  "canonicalsmiles" VARCHAR,
  "isomericsmiles" VARCHAR,
  "inchi" VARCHAR,
  "inchikey" VARCHAR,
  "iupacname" VARCHAR,
  "xlogp" DOUBLE,
  "exactmass" DOUBLE,
  "title" VARCHAR
);

Share link

Anyone who has the link will be able to view this.