High-Throughput Comp. Screening Of MOFs
Open Metal Sites, Cavity Diameters and Free Paths
@kaggle.thedevastator_high_throughput_comp_screening_of_mofs
Open Metal Sites, Cavity Diameters and Free Paths
@kaggle.thedevastator_high_throughput_comp_screening_of_mofs
By [source]
This dataset provides atomic coordinates for metal-organic frameworks (MOFs), enabling high-throughput computational screening of MOFs in a broad range of scenarios. The dataset is derived from the Cambridge Structural Database (CSD) and across the internet and offers an array of useful parameters, like accessible surface area (ASA), non-accessible surface area (NASA), largest cavity diameter (LCD), pore limiting diameter (PLD)and more.
The results yielded by this dataset may prove to be very helpful in assessing the potential of MOFs as prospective materials for chemical separations, transformations and functional nanoporous materials. This can bring about improvements to many industries and help devise better products for consumers worldwide. If errors are found in this data, there is a feedback form available which can be used to report your findings. We appreciate your interest in our project and hope you will make good use out of this data!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
This guide will introduce you to the CoRE MOF 2019 dataset and explain how to properly use it for high-throughput computational screenings. It will provide you with the necessary background information and knowledge for successful use of this dataset.
The CoRE MOF 2019 Dataset contains atomic coordinates for metal-organic frameworks (MOFs) which can be used as inputs for simulation software packages, enabling high-throughput computational screening of these MOFs. This dataset is derived from both the Cambridge Structural Database (CSD) and World Wide Web sources, providing powerful data on which MOF systems are suitable for potential applications in chemical separations, transformations, and functional nanoporous materials.
In order to make efficient use of this dataset, it is important that you familiarize yourself with all available columns. The columns contain information about a given MOF system such as LCD (largest cavity diameter), PLD (pore limiting diameter), LFPD (largest sphere along the free path), ASA (accessible surface area), NASA (non-accessible surface area), void fraction (AV_VF). Additionally there is also useful metadata such as public availability status, CSD overlap references in CoRE or CCDC databases, DOI details if available etc.. To get a full list of all these features please refer to the provided documentation or codebook on Kaggle website or your own research.
Once you are familiar with column specifications it's time to move forward by downloading the actual database file from Kaggle servers. The downloaded file should be opened in MS Excel/CSV format where each row will represent a single distinct MOFS whereas each respective column represents its corresponding parameters value/range depending upon type(integer/float/boolean). Considering specific row from database shows us every information related to particular Molecular Framework System like AAC: Surface Area accessible by molecules outside pore (m^2). Using such info one can easily compare two different molecular framework systems directly without need for any pre processing algorithm or manual calculations typically required when comparing right values across different datasets holding same type of informations like respective project MCMC Algorithm running upon obtain structure hypothesis produces various mathematical linear variables whose direct comparison over simple values won't make much useful score out [until processed#naturally]. Thus after ensuring minimum data loss occurred during formatting one should seriously consider performing direct analysis involving entire set rather loopin[g #ASAP] into individual rows and perform direct comparisions though they might appear simpler at first instance
- Create an open source library of automated SIM simulations for MOFs, which can be used to generate results quickly and accurately.
- Update the existing Porous Materials Database (PMD) software with additional data fields that leverage insights from this dataset, allowing users to easily search and filter MOFs by specific structural characteristics.
- Develop a web-based interface that allows researchers to visualize different MOF structures using realistic 3D images derived from the atomic data provided in the dataset
If you use this dataset in your research, please credit the original authors.
Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: 2019-11-01-ASR-public_12020.csv
| Column name | Description |
|---|---|
| filename | The name of the file containing the MOF data. (String) |
| LCD | The largest cavity diameter of the MOF. (Float) |
| PLD | The pore limiting diameter of the MOF. (Float) |
| LFPD | The largest sphere along the free path of the MOF. (Float) |
| cm3_g | The volume of the MOF in cm3/g. (Float) |
| ASA_m2_cm3 | The accessible surface area of the MOF in m2/cm3. (Float) |
| ASA_m2_g | The accessible surface area of the MOF in m2/g. (Float) |
| NASA_m2_cm3 | The non-accessible surface area of the MOF in m2/cm3. (Float) |
| NASA_m2_g | The non-accessible surface area of the MOF in m2/g. (Float) |
| AV_VF | The void fraction of the MOF. (Float) |
| AV_cm3_g | The accessible volume of the MOF in cm3/g. (Float) |
| NAV_cm3_g | The non-accessible volume of the MOF in cm3/g. (Float) |
| All_Metals | The metals present in the MOF. (String) |
| Has_OMS | Indicates if the MOF has open metal sites. (Boolean) |
| Open_Metal_Sites | The number of open metal sites in the MOF. (Integer) |
| Extension | The file extension of the MOF data. (String) |
| FSR_overlap | The fractional surface ratio overlap of the MOF. (Float) |
| from_CSD | Indicates if the MOF is from the Cambridge Structural Database. (Boolean) |
| public | Indicates if the MOF is publicly available. (Boolean) |
| DISORDER | Indicates if the MOF is disordered. (Boolean) |
| CSD_overlap_inCoRE | The overlap of the MOF in the CoRE dataset. ( |
File: 2019-11-01-ASR-internal_14142.csv
| Column name | Description |
|---|---|
| filename | The name of the file containing the MOF data. (String) |
| LCD | The largest cavity diameter of the MOF. (Float) |
| PLD | The pore limiting diameter of the MOF. (Float) |
| LFPD | The largest sphere along the free path of the MOF. (Float) |
| cm3_g | The volume of the MOF in cm3/g. (Float) |
| ASA_m2_cm3 | The accessible surface area of the MOF in m2/cm3. (Float) |
| ASA_m2_g | The accessible surface area of the MOF in m2/g. (Float) |
| NASA_m2_cm3 | The non-accessible surface area of the MOF in m2/cm3. (Float) |
| NASA_m2_g | The non-accessible surface area of the MOF in m2/g. (Float) |
| AV_VF | The void fraction of the MOF. (Float) |
| AV_cm3_g | The accessible volume of the MOF in cm3/g. (Float) |
| NAV_cm3_g | The non-accessible volume of the MOF in cm3/g. (Float) |
| All_Metals | The metals present in the MOF. (String) |
| Has_OMS | Indicates if the MOF has open metal sites. (Boolean) |
| Open_Metal_Sites | The number of open metal sites in the MOF. (Integer) |
| Extension | The file extension of the MOF data. (String) |
| FSR_overlap | The fractional surface ratio overlap of the MOF. (Float) |
| from_CSD | Indicates if the MOF is from the Cambridge Structural Database. (Boolean) |
| public | Indicates if the MOF is publicly available. (Boolean) |
| DISORDER | Indicates if the MOF is disordered. (Boolean) |
| CSD_overlap_inCoRE | The overlap of the MOF in the CoRE dataset. ( |
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit .
CREATE TABLE asr_full_unmodified (
"n_0" BIGINT -- 0,
"n_00958972_2016_1250260_1436516_clean" VARCHAR -- 00958972.2016.1250260–1436516 Clean
);CREATE TABLE fsr_full_unmodified (
"n_1" BIGINT -- 1,
"n_00958972_2016_1250260_1436516_freeonly" VARCHAR -- 00958972.2016.1250260–1436516 FreeONLY
);CREATE TABLE n_2019_11_01_asr_internal_14142 (
"filename" VARCHAR,
"lcd" DOUBLE,
"pld" DOUBLE,
"lfpd" DOUBLE,
"cm3_g" DOUBLE,
"asa_m2_cm3" DOUBLE,
"asa_m2_g" DOUBLE,
"nasa_m2_cm3" DOUBLE,
"nasa_m2_g" DOUBLE,
"av_vf" DOUBLE,
"av_cm3_g" DOUBLE,
"nav_cm3_g" DOUBLE,
"all_metals" VARCHAR,
"has_oms" VARCHAR,
"open_metal_sites" VARCHAR,
"extension" VARCHAR,
"fsr_overlap" VARCHAR,
"from_csd" VARCHAR,
"public" VARCHAR,
"disorder" VARCHAR,
"csd_overlap_incore" VARCHAR,
"csd_of_wos_incore" VARCHAR,
"csd_overlap_inccdc" VARCHAR,
"date_csd" VARCHAR,
"doi_public" VARCHAR,
"note" VARCHAR,
"matched_csd_of_core" VARCHAR,
"possible_list_csd_of_core" VARCHAR,
"unnamed_28" VARCHAR -- Unnamed: 28,
"unnamed_29" VARCHAR -- Unnamed: 29,
"unnamed_30" VARCHAR -- Unnamed: 30,
"unnamed_31" VARCHAR -- Unnamed: 31,
"unnamed_32" VARCHAR -- Unnamed: 32,
"unnamed_33" VARCHAR -- Unnamed: 33,
"unnamed_34" VARCHAR -- Unnamed: 34,
"unnamed_35" VARCHAR -- Unnamed: 35,
"unnamed_36" VARCHAR -- Unnamed: 36,
"unnamed_37" VARCHAR -- Unnamed: 37,
"unnamed_38" VARCHAR -- Unnamed: 38,
"unnamed_39" VARCHAR -- Unnamed: 39,
"unnamed_40" VARCHAR -- Unnamed: 40,
"unnamed_41" VARCHAR -- Unnamed: 41
);CREATE TABLE n_2019_11_01_asr_public_12020 (
"filename" VARCHAR,
"lcd" DOUBLE,
"pld" DOUBLE,
"lfpd" DOUBLE,
"cm3_g" DOUBLE,
"asa_m2_cm3" DOUBLE,
"asa_m2_g" DOUBLE,
"nasa_m2_cm3" DOUBLE,
"nasa_m2_g" DOUBLE,
"av_vf" DOUBLE,
"av_cm3_g" DOUBLE,
"nav_cm3_g" DOUBLE,
"all_metals" VARCHAR,
"has_oms" VARCHAR,
"open_metal_sites" VARCHAR,
"extension" VARCHAR,
"fsr_overlap" VARCHAR,
"from_csd" VARCHAR,
"public" VARCHAR,
"disorder" VARCHAR,
"csd_overlap_incore" VARCHAR,
"csd_of_wos_incore" VARCHAR,
"csd_overlap_inccdc" VARCHAR,
"date_csd" VARCHAR,
"doi_public" VARCHAR,
"note" VARCHAR,
"matched_csd_of_core" VARCHAR,
"possible_list_csd_of_core" VARCHAR,
"unnamed_28" VARCHAR -- Unnamed: 28,
"unnamed_29" VARCHAR -- Unnamed: 29,
"unnamed_30" VARCHAR -- Unnamed: 30,
"unnamed_31" VARCHAR -- Unnamed: 31,
"unnamed_32" VARCHAR -- Unnamed: 32,
"unnamed_33" VARCHAR -- Unnamed: 33,
"unnamed_34" VARCHAR -- Unnamed: 34,
"unnamed_35" VARCHAR -- Unnamed: 35,
"unnamed_36" VARCHAR -- Unnamed: 36,
"unnamed_37" VARCHAR -- Unnamed: 37,
"unnamed_38" VARCHAR -- Unnamed: 38,
"unnamed_39" VARCHAR -- Unnamed: 39,
"unnamed_40" VARCHAR -- Unnamed: 40,
"unnamed_41" VARCHAR -- Unnamed: 41
);CREATE TABLE n_2019_11_01_fsr_internal_overlap_freeonly_9146 (
"filename" VARCHAR,
"lcd" DOUBLE,
"pld" DOUBLE,
"lfpd" DOUBLE,
"cm3_g" DOUBLE,
"asa_m2_cm3" DOUBLE,
"asa_m2_g" DOUBLE,
"nasa_m2_cm3" DOUBLE,
"nasa_m2_g" DOUBLE,
"av_vf" DOUBLE,
"av_cm3_g" DOUBLE,
"nav_cm3_g" DOUBLE,
"all_metals" VARCHAR,
"has_oms" VARCHAR,
"open_metal_sites" VARCHAR,
"extension" VARCHAR,
"from_csd" VARCHAR,
"public" VARCHAR,
"state" VARCHAR,
"csd_overlap_inccdc" VARCHAR,
"date_csd" VARCHAR,
"doi_public" VARCHAR,
"note" VARCHAR,
"matched_csd_of_core" VARCHAR,
"possible_list_csd_of_core" VARCHAR,
"unnamed_25" VARCHAR -- Unnamed: 25,
"unnamed_26" VARCHAR -- Unnamed: 26,
"unnamed_27" VARCHAR -- Unnamed: 27,
"unnamed_28" VARCHAR -- Unnamed: 28,
"unnamed_29" VARCHAR -- Unnamed: 29,
"unnamed_30" VARCHAR -- Unnamed: 30,
"unnamed_31" VARCHAR -- Unnamed: 31,
"unnamed_32" VARCHAR -- Unnamed: 32,
"unnamed_33" VARCHAR -- Unnamed: 33,
"unnamed_34" VARCHAR -- Unnamed: 34,
"unnamed_35" VARCHAR -- Unnamed: 35,
"unnamed_36" VARCHAR -- Unnamed: 36,
"unnamed_37" VARCHAR -- Unnamed: 37
);CREATE TABLE n_2019_11_01_fsr_public_7061 (
"filename" VARCHAR,
"lcd" DOUBLE,
"pld" DOUBLE,
"lfpd" DOUBLE,
"cm3_g" DOUBLE,
"asa_m2_cm3" DOUBLE,
"asa_m2_g" DOUBLE,
"nasa_m2_cm3" DOUBLE,
"nasa_m2_g" DOUBLE,
"av_vf" DOUBLE,
"av_cm3_g" DOUBLE,
"nav_cm3_g" DOUBLE,
"all_metals" VARCHAR,
"has_oms" VARCHAR,
"open_metal_sites" VARCHAR,
"extension" VARCHAR,
"from_csd" VARCHAR,
"public" VARCHAR,
"state" VARCHAR,
"csd_overlap_inccdc" VARCHAR,
"date_csd" VARCHAR,
"doi_public" VARCHAR,
"note" VARCHAR,
"matched_csd_of_core" VARCHAR,
"possible_list_csd_of_core" VARCHAR,
"unnamed_25" VARCHAR -- Unnamed: 25,
"unnamed_26" VARCHAR -- Unnamed: 26,
"unnamed_27" VARCHAR -- Unnamed: 27,
"unnamed_28" VARCHAR -- Unnamed: 28,
"unnamed_29" VARCHAR -- Unnamed: 29,
"unnamed_30" VARCHAR -- Unnamed: 30,
"unnamed_31" VARCHAR -- Unnamed: 31,
"unnamed_32" VARCHAR -- Unnamed: 32,
"unnamed_33" VARCHAR -- Unnamed: 33,
"unnamed_34" VARCHAR -- Unnamed: 34,
"unnamed_35" VARCHAR -- Unnamed: 35,
"unnamed_36" VARCHAR -- Unnamed: 36,
"unnamed_37" VARCHAR -- Unnamed: 37
);Anyone who has the link will be able to view this.