Name: Pichache-McLean CCS Database
Creator: Kaggle
Published: 2025-02-13T08:25:18.915Z
License: https://creativecommons.org/publicdomain/zero/1.0/

3800 CCS Values for Compound Characterization

Pichache-McLean CCS Database

3800 CCS Values for Compound Characterization

By [source]

About this dataset

This dataset contains 3800 experimental collision cross section values, allowing us to get an unprecedented glimpse into the fundamentals of particle mobility. Collected by researchers Jackie Picache and John McLean of Vanderbilt University, these values were generated using drift tube MS and represent the CCS of different compounds. With this info we can uncover data on molecular properties that up until now remain hidden. From the adduct type and charge to the kingdom and superclass, many factors are at play when it comes to collisions between atoms and molecules- now we can study them in detail! Here you will find all relevant information such as: molecular formula, CAS number, InChIKey version 3 & 4, InChI string v1.02 & 1.04, PubChem CID SMILES string for both canonical & isomeric forms as well as XLogP values- not to mention an abundance of other metrics like m/z ratios, peak numbers & collision cross sections! All this makes it possible to study particle interactions on a level never seen before-- discover it today!

More Datasets

For more datasets, click here.

Featured Notebooks

🚨 Your notebook can be here! 🚨!

How to use the dataset

Guide: How to use the Picache-McLean CCS Database

This dataset contains 3800 experimental collision cross section (CCS) values collected by researchers Jackie Picache and John McLean of Vanderbilt University. These experimental values enable users to better understand the structure and properties of various compounds, and create more effective models for use in collisions studies where CCS is used as a measure of fragmentation efficiency.

In order to make optimal use of this dataset, it is important to understand the various columns present in it. This guide provides an overview of these columns and gives examples on how they can be used to get meaningful insights from the data.

The first column, MolecularFormula gives us an idea about the general structure of a compound represented by its empirical formula (e.g., C6H12O6). The second column CanonicalSMILES is an internationally accepted string notation developed for representing molecules; it allows computational models to accurately store chemical information as well as retrieve it quickly. The IsomericSMILES column stores data related specifically to stereoisomers; that is, any molecule which is composed of atoms connected in such a way that there are multiple ways in which they can be organized spatially but with same connectivity pattern have similar SMILES strings but different Isomeric SMILES strings due their different orientations (e.g., left handed vs right handed forms).
The InChI and InChIKey columns contain International Chemical Identifier (InChI) system codes generated by the software application known as Chemdraw along with separate keys generated from them respectively; these two codes allow us uniquely identify a particular chemical substance from other chemicals having completely different structures but same elements so if any chemical compound has same InChI key then both will correspondrom identical molecule representation in either 2D or 3D space also IUPACName helps user for easily understanding name of polymer without confusing its structural aspects or interpretation through advanced scholarly sources such as book literature preferably when no canonical representation like molecular formula exists for respective polymer for distinction (e.g., caffeine). XLogP entries provide us information about hydrophobicity associated with particular pharmacological agents whereas ExactMass determinates how much mass his held by single atom/molecule represented through each entry giving scientists accurate calculation enabling them designing more precise experiment whose results between difference amounting 0-1 error might range changes vastly than significant undeterminant increase/decrease expected when masses estimated manually depending upon experience good

Research Ideas

Using the Complete Collision Cross Sections (CCS) to predict physical properties of compounds such as boiling point, melting point and partition coefficient.

Utilizing the SMILES strings to create an AI/ML algorithm that can accurately generate CCS data for each compound quickly and easily.

Developing a web-based interface utilizing the InChIKeys to allow researchers to search and discover related compounds with similar collision cross sections

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: S50_CCSCOMPEND_Substances.csv

Column name	Description
MolecularFormula	The molecular formula of the compound. (String)
CanonicalSMILES	The canonical SMILES string representation of the compound. (String)
IsomericSMILES	The isomeric SMILES string representation of the compound. (String)
InChI	The InChI representation of the compound. (String)
InChIKey	The InChIKey representation of the compound. (String)
IUPACName	The IUPAC name of the compound. (String)
XLogP	The log octanol-water partition coefficient of the compound. (Float)
ExactMass	The exact mass of the compound. (Float)
Title	The title of the compound. (String)

File: 20190304JAP_CCSdatabase_final_ed.csv

Column name	Description
Compound	The name of the compound. (String)
Neutral.Formula	The molecular formula of the compound. (String)
CAS	The CAS registry number of the compound. (String)
CAS_RN	The CAS registry number of the compound. (String)
InChIKey_original	The InChIKey of the compound. (String)
InChIKey_parent	The InChIKey of the parent compound. (String)
SMILES	The SMILES string representation of the compound. (String)
mz	The m/z value of the compound. (Float)
Adduct	The adduct type of the compound. (String)
Charge	The charge of the compound. (Integer)
CCS	The collision cross section of the compound. (Float)
SD	The standard deviation of the CCS. (Float)
RSD	The relative standard deviation of the CCS. (Float)
CCS.z	The normalized CCS of the compound. (Float)
Peak.N	The peak number of the compound. (Integer)
Kingdom	The kingdom name of the compound. (String)
Super.Class	The superclass of the compound. (String)
Subclass	The subclass of the compound. (String)
N.Rep	The number of replicates of the compound. (Integer)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit .

Pichache-McLean CCS Database

3800 CCS Values for Compound Characterization

Pichache-McLean CCS Database

3800 CCS Values for Compound Characterization

About this dataset

More Datasets

Featured Notebooks

How to use the dataset

Guide: How to use the Picache-McLean CCS Database

Research Ideas

Acknowledgements

License

Columns

Acknowledgements

Related Datasets

Antarctic Ice Cores Revised 800KYr CO2 Data

Food Composition

Thermodynamic Data From Unpublished Sources To Support The New Reference Equation Of State For Carbon Dioxide

Dr. Duke's Phytochemical And Ethnobotanical Databases

Historical Series Of Phenological Data For Cherry Tree Flowering At Kyoto City (and March Mean Temperature Reconstructions)

Data For: "Apportionment And Inventory Optimization Of Agriculture And Energy Sector Methane Emissions Using Multi-month Trace Gas Measurements In Northern Colorado"