Baselight

High-Throughput Comp. Screening Of MOFs

Open Metal Sites, Cavity Diameters and Free Paths

@kaggle.thedevastator_high_throughput_comp_screening_of_mofs

Loading...
Loading...

About this Dataset

High-Throughput Comp. Screening Of MOFs


High-Throughput Comp. Screening of MOFs

Open Metal Sites, Cavity Diameters and Free Paths

By [source]


About this dataset

This dataset provides atomic coordinates for metal-organic frameworks (MOFs), enabling high-throughput computational screening of MOFs in a broad range of scenarios. The dataset is derived from the Cambridge Structural Database (CSD) and across the internet and offers an array of useful parameters, like accessible surface area (ASA), non-accessible surface area (NASA), largest cavity diameter (LCD), pore limiting diameter (PLD)and more.
The results yielded by this dataset may prove to be very helpful in assessing the potential of MOFs as prospective materials for chemical separations, transformations and functional nanoporous materials. This can bring about improvements to many industries and help devise better products for consumers worldwide. If errors are found in this data, there is a feedback form available which can be used to report your findings. We appreciate your interest in our project and hope you will make good use out of this data!

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

This guide will introduce you to the CoRE MOF 2019 dataset and explain how to properly use it for high-throughput computational screenings. It will provide you with the necessary background information and knowledge for successful use of this dataset.

The CoRE MOF 2019 Dataset contains atomic coordinates for metal-organic frameworks (MOFs) which can be used as inputs for simulation software packages, enabling high-throughput computational screening of these MOFs. This dataset is derived from both the Cambridge Structural Database (CSD) and World Wide Web sources, providing powerful data on which MOF systems are suitable for potential applications in chemical separations, transformations, and functional nanoporous materials.

In order to make efficient use of this dataset, it is important that you familiarize yourself with all available columns. The columns contain information about a given MOF system such as LCD (largest cavity diameter), PLD (pore limiting diameter), LFPD (largest sphere along the free path), ASA (accessible surface area), NASA (non-accessible surface area), void fraction (AV_VF). Additionally there is also useful metadata such as public availability status, CSD overlap references in CoRE or CCDC databases, DOI details if available etc.. To get a full list of all these features please refer to the provided documentation or codebook on Kaggle website or your own research.

Once you are familiar with column specifications it's time to move forward by downloading the actual database file from Kaggle servers. The downloaded file should be opened in MS Excel/CSV format where each row will represent a single distinct MOFS whereas each respective column represents its corresponding parameters value/range depending upon type(integer/float/boolean). Considering specific row from database shows us every information related to particular Molecular Framework System like AAC: Surface Area accessible by molecules outside pore (m^2). Using such info one can easily compare two different molecular framework systems directly without need for any pre processing algorithm or manual calculations typically required when comparing right values across different datasets holding same type of informations like respective project MCMC Algorithm running upon obtain structure hypothesis produces various mathematical linear variables whose direct comparison over simple values won't make much useful score out [until processed#naturally]. Thus after ensuring minimum data loss occurred during formatting one should seriously consider performing direct analysis involving entire set rather loopin[g #ASAP] into individual rows and perform direct comparisions though they might appear simpler at first instance

Research Ideas

  • Create an open source library of automated SIM simulations for MOFs, which can be used to generate results quickly and accurately.
  • Update the existing Porous Materials Database (PMD) software with additional data fields that leverage insights from this dataset, allowing users to easily search and filter MOFs by specific structural characteristics.
  • Develop a web-based interface that allows researchers to visualize different MOF structures using realistic 3D images derived from the atomic data provided in the dataset

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: 2019-11-01-ASR-public_12020.csv

Column name Description
filename The name of the file containing the MOF data. (String)
LCD The largest cavity diameter of the MOF. (Float)
PLD The pore limiting diameter of the MOF. (Float)
LFPD The largest sphere along the free path of the MOF. (Float)
cm3_g The volume of the MOF in cm3/g. (Float)
ASA_m2_cm3 The accessible surface area of the MOF in m2/cm3. (Float)
ASA_m2_g The accessible surface area of the MOF in m2/g. (Float)
NASA_m2_cm3 The non-accessible surface area of the MOF in m2/cm3. (Float)
NASA_m2_g The non-accessible surface area of the MOF in m2/g. (Float)
AV_VF The void fraction of the MOF. (Float)
AV_cm3_g The accessible volume of the MOF in cm3/g. (Float)
NAV_cm3_g The non-accessible volume of the MOF in cm3/g. (Float)
All_Metals The metals present in the MOF. (String)
Has_OMS Indicates if the MOF has open metal sites. (Boolean)
Open_Metal_Sites The number of open metal sites in the MOF. (Integer)
Extension The file extension of the MOF data. (String)
FSR_overlap The fractional surface ratio overlap of the MOF. (Float)
from_CSD Indicates if the MOF is from the Cambridge Structural Database. (Boolean)
public Indicates if the MOF is publicly available. (Boolean)
DISORDER Indicates if the MOF is disordered. (Boolean)
CSD_overlap_inCoRE The overlap of the MOF in the CoRE dataset. (

File: 2019-11-01-ASR-internal_14142.csv

Column name Description
filename The name of the file containing the MOF data. (String)
LCD The largest cavity diameter of the MOF. (Float)
PLD The pore limiting diameter of the MOF. (Float)
LFPD The largest sphere along the free path of the MOF. (Float)
cm3_g The volume of the MOF in cm3/g. (Float)
ASA_m2_cm3 The accessible surface area of the MOF in m2/cm3. (Float)
ASA_m2_g The accessible surface area of the MOF in m2/g. (Float)
NASA_m2_cm3 The non-accessible surface area of the MOF in m2/cm3. (Float)
NASA_m2_g The non-accessible surface area of the MOF in m2/g. (Float)
AV_VF The void fraction of the MOF. (Float)
AV_cm3_g The accessible volume of the MOF in cm3/g. (Float)
NAV_cm3_g The non-accessible volume of the MOF in cm3/g. (Float)
All_Metals The metals present in the MOF. (String)
Has_OMS Indicates if the MOF has open metal sites. (Boolean)
Open_Metal_Sites The number of open metal sites in the MOF. (Integer)
Extension The file extension of the MOF data. (String)
FSR_overlap The fractional surface ratio overlap of the MOF. (Float)
from_CSD Indicates if the MOF is from the Cambridge Structural Database. (Boolean)
public Indicates if the MOF is publicly available. (Boolean)
DISORDER Indicates if the MOF is disordered. (Boolean)
CSD_overlap_inCoRE The overlap of the MOF in the CoRE dataset. (

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit .

Tables

N 2019–11–01 Asr Internal 14142

@kaggle.thedevastator_high_throughput_comp_screening_of_mofs.n_2019_11_01_asr_internal_14142
  • 1.12 MB
  • 14142 rows
  • 42 columns
Loading...

CREATE TABLE n_2019_11_01_asr_internal_14142 (
  "filename" VARCHAR,
  "lcd" DOUBLE,
  "pld" DOUBLE,
  "lfpd" DOUBLE,
  "cm3_g" DOUBLE,
  "asa_m2_cm3" DOUBLE,
  "asa_m2_g" DOUBLE,
  "nasa_m2_cm3" DOUBLE,
  "nasa_m2_g" DOUBLE,
  "av_vf" DOUBLE,
  "av_cm3_g" DOUBLE,
  "nav_cm3_g" DOUBLE,
  "all_metals" VARCHAR,
  "has_oms" VARCHAR,
  "open_metal_sites" VARCHAR,
  "extension" VARCHAR,
  "fsr_overlap" VARCHAR,
  "from_csd" VARCHAR,
  "public" VARCHAR,
  "disorder" VARCHAR,
  "csd_overlap_incore" VARCHAR,
  "csd_of_wos_incore" VARCHAR,
  "csd_overlap_inccdc" VARCHAR,
  "date_csd" VARCHAR,
  "doi_public" VARCHAR,
  "note" VARCHAR,
  "matched_csd_of_core" VARCHAR,
  "possible_list_csd_of_core" VARCHAR,
  "unnamed_28" VARCHAR,
  "unnamed_29" VARCHAR,
  "unnamed_30" VARCHAR,
  "unnamed_31" VARCHAR,
  "unnamed_32" VARCHAR,
  "unnamed_33" VARCHAR,
  "unnamed_34" VARCHAR,
  "unnamed_35" VARCHAR,
  "unnamed_36" VARCHAR,
  "unnamed_37" VARCHAR,
  "unnamed_38" VARCHAR,
  "unnamed_39" VARCHAR,
  "unnamed_40" VARCHAR,
  "unnamed_41" VARCHAR
);

N 2019–11–01 Asr Public 12020

@kaggle.thedevastator_high_throughput_comp_screening_of_mofs.n_2019_11_01_asr_public_12020
  • 987.89 KB
  • 12020 rows
  • 42 columns
Loading...

CREATE TABLE n_2019_11_01_asr_public_12020 (
  "filename" VARCHAR,
  "lcd" DOUBLE,
  "pld" DOUBLE,
  "lfpd" DOUBLE,
  "cm3_g" DOUBLE,
  "asa_m2_cm3" DOUBLE,
  "asa_m2_g" DOUBLE,
  "nasa_m2_cm3" DOUBLE,
  "nasa_m2_g" DOUBLE,
  "av_vf" DOUBLE,
  "av_cm3_g" DOUBLE,
  "nav_cm3_g" DOUBLE,
  "all_metals" VARCHAR,
  "has_oms" VARCHAR,
  "open_metal_sites" VARCHAR,
  "extension" VARCHAR,
  "fsr_overlap" VARCHAR,
  "from_csd" VARCHAR,
  "public" VARCHAR,
  "disorder" VARCHAR,
  "csd_overlap_incore" VARCHAR,
  "csd_of_wos_incore" VARCHAR,
  "csd_overlap_inccdc" VARCHAR,
  "date_csd" VARCHAR,
  "doi_public" VARCHAR,
  "note" VARCHAR,
  "matched_csd_of_core" VARCHAR,
  "possible_list_csd_of_core" VARCHAR,
  "unnamed_28" VARCHAR,
  "unnamed_29" VARCHAR,
  "unnamed_30" VARCHAR,
  "unnamed_31" VARCHAR,
  "unnamed_32" VARCHAR,
  "unnamed_33" VARCHAR,
  "unnamed_34" VARCHAR,
  "unnamed_35" VARCHAR,
  "unnamed_36" VARCHAR,
  "unnamed_37" VARCHAR,
  "unnamed_38" VARCHAR,
  "unnamed_39" VARCHAR,
  "unnamed_40" VARCHAR,
  "unnamed_41" VARCHAR
);

N 2019–11–01 Fsr Internal Overlap Freeonly 9146

@kaggle.thedevastator_high_throughput_comp_screening_of_mofs.n_2019_11_01_fsr_internal_overlap_freeonly_9146
  • 749.02 KB
  • 9146 rows
  • 38 columns
Loading...

CREATE TABLE n_2019_11_01_fsr_internal_overlap_freeonly_9146 (
  "filename" VARCHAR,
  "lcd" DOUBLE,
  "pld" DOUBLE,
  "lfpd" DOUBLE,
  "cm3_g" DOUBLE,
  "asa_m2_cm3" DOUBLE,
  "asa_m2_g" DOUBLE,
  "nasa_m2_cm3" DOUBLE,
  "nasa_m2_g" DOUBLE,
  "av_vf" DOUBLE,
  "av_cm3_g" DOUBLE,
  "nav_cm3_g" DOUBLE,
  "all_metals" VARCHAR,
  "has_oms" VARCHAR,
  "open_metal_sites" VARCHAR,
  "extension" VARCHAR,
  "from_csd" VARCHAR,
  "public" VARCHAR,
  "state" VARCHAR,
  "csd_overlap_inccdc" VARCHAR,
  "date_csd" VARCHAR,
  "doi_public" VARCHAR,
  "note" VARCHAR,
  "matched_csd_of_core" VARCHAR,
  "possible_list_csd_of_core" VARCHAR,
  "unnamed_25" VARCHAR,
  "unnamed_26" VARCHAR,
  "unnamed_27" VARCHAR,
  "unnamed_28" VARCHAR,
  "unnamed_29" VARCHAR,
  "unnamed_30" VARCHAR,
  "unnamed_31" VARCHAR,
  "unnamed_32" VARCHAR,
  "unnamed_33" VARCHAR,
  "unnamed_34" VARCHAR,
  "unnamed_35" VARCHAR,
  "unnamed_36" VARCHAR,
  "unnamed_37" VARCHAR
);

N 2019–11–01 Fsr Public 7061

@kaggle.thedevastator_high_throughput_comp_screening_of_mofs.n_2019_11_01_fsr_public_7061
  • 589.94 KB
  • 7061 rows
  • 38 columns
Loading...

CREATE TABLE n_2019_11_01_fsr_public_7061 (
  "filename" VARCHAR,
  "lcd" DOUBLE,
  "pld" DOUBLE,
  "lfpd" DOUBLE,
  "cm3_g" DOUBLE,
  "asa_m2_cm3" DOUBLE,
  "asa_m2_g" DOUBLE,
  "nasa_m2_cm3" DOUBLE,
  "nasa_m2_g" DOUBLE,
  "av_vf" DOUBLE,
  "av_cm3_g" DOUBLE,
  "nav_cm3_g" DOUBLE,
  "all_metals" VARCHAR,
  "has_oms" VARCHAR,
  "open_metal_sites" VARCHAR,
  "extension" VARCHAR,
  "from_csd" VARCHAR,
  "public" VARCHAR,
  "state" VARCHAR,
  "csd_overlap_inccdc" VARCHAR,
  "date_csd" VARCHAR,
  "doi_public" VARCHAR,
  "note" VARCHAR,
  "matched_csd_of_core" VARCHAR,
  "possible_list_csd_of_core" VARCHAR,
  "unnamed_25" VARCHAR,
  "unnamed_26" VARCHAR,
  "unnamed_27" VARCHAR,
  "unnamed_28" VARCHAR,
  "unnamed_29" VARCHAR,
  "unnamed_30" VARCHAR,
  "unnamed_31" VARCHAR,
  "unnamed_32" VARCHAR,
  "unnamed_33" VARCHAR,
  "unnamed_34" VARCHAR,
  "unnamed_35" VARCHAR,
  "unnamed_36" VARCHAR,
  "unnamed_37" VARCHAR
);

Asr Full Unmodified

@kaggle.thedevastator_high_throughput_comp_screening_of_mofs.asr_full_unmodified
  • 196.59 KB
  • 13473 rows
  • 2 columns
Loading...

CREATE TABLE asr_full_unmodified (
  "n_0" BIGINT,
  "n_00958972_2016_1250260_1436516_clean" VARCHAR
);

Fsr Full Unmodified

@kaggle.thedevastator_high_throughput_comp_screening_of_mofs.fsr_full_unmodified
  • 123.47 KB
  • 8295 rows
  • 2 columns
Loading...

CREATE TABLE fsr_full_unmodified (
  "n_1" BIGINT,
  "n_00958972_2016_1250260_1436516_freeonly" VARCHAR
);

Share link

Anyone who has the link will be able to view this.