Baselight

QSAR Molecular Descriptor Predictions

Analyzing Activation Energy in Chemical Compounds

@kaggle.thedevastator_qsar_molecular_descriptor_predictions

Loading...
Loading...

About this Dataset

QSAR Molecular Descriptor Predictions


QSAR Molecular Descriptor Predictions

Analyzing Activation Energy in Chemical Compounds

By [source]


About this dataset

This dataset explores the Quantitative Structure-Activity Relationship (QSAR) of molecules and their properties. It contains 30 variables known as descriptors that measure the structure and activity of a molecule in order to accurately predict its activation energy.

This data provides valuable insights into relationships between a molecule's composition and its reactivity, making it applicable for various activities such as drug design. Furthermore, this dataset encourages further research towards understanding how chemical compounds interact and how changes to a molecule's structure can affect its ability to interact with other molecules.

The descriptors used in this dataset are: SpMax_L, J_Dz(e), nHM, F01[N-N], F04[C-N], NssssC, nCb-, C%, nCp, nO, F03[C-N], SdssC, HyWi_B(m), LOC SM6_L ,F03[C-O], Me , Mi ,nN-N , nArNO2 ,nCRX3 ,SpPosA_B(p) ,nCIR B01[ -Br ], B03 [ C -Cl ], N –073,-026 SpMax _ A Psi i 1dB04 [ C -Br ], SdO TI2 L NO rt c O02a dH DonSpmax _ B mP s i A N SM6 BM Arco or NX Class .Using these feature vectors along with labels for the test rows in the data set, predictive models can be built to evaluate the accuracy of predicting molecule activation energies for future applications

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset can be used to explore the quantitative structure-activity relationship (QSAR) of molecules and their properties. The data includes 30 variables, or descriptors, that can be used to predict the activation energy of a molecule. By accurately predicting the activation energy, researchers can better understand how changes to a molecule's structure affect its reactivity and ability to interact with other molecules.

To get started with this dataset, first download it from Kaggle or clone this repository . After downloading the data set , you will need to import it into your preferred software as a CSV file . Make sure that all variables are properly labelled and categorized . Once your data has been imported , you should visualize certain molecuar factors by plotting various scatter plots or histograms . This will allow you to gain better insights into the relationships between different descriptor variables, as well as any outliers within your dataset.

Next , select a machine learning algorithm that best suits your objectives and fits reasonably within computational resources available in order for you create an effective predictive model. You may need several iterations before finding an optimal algorithm(s) for this purpose, including assessing whether they generalize well outside of training data sets based on accuracy levels achieved on test datasets supplied in parallel with original set through by Kaggle - file test_rows & test_rows_labels. The last step is then constructing cross-validation strategy which requires 5 0r 10 fold CV tests at least before measuring performance metrics & generating preliminary results correspondingly (including reporting any possible overfitting). Lastly generate report & present findings where needed (depending on outcomes & resulting recommendations )

Research Ideas

  • Developing novel predictive models for predicting activation energies of molecules and their properties, such as reactivity or solubility, by combining statistical methods such as regression with machine learning techniques like deep learning.
  • Developing methods for exploring patterns between molecules with known properties and those with unknowns using unsupervised ML algorithms like hierarchical clustering.
  • Identifying correlations between molecular descriptors and activity using exploratory data analysis techniques to form hypotheses that can be used to choose which compounds to synthesize in further studies into the structure-activity relationships of compounds

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: test_rows.csv

Column name Description
SpMax_L Maximum number of heavy atoms connected to a single heavy atom. (Numeric)
J_Dz(e) Number of electrons in the highest occupied molecular orbital. (Numeric)
nHM Number of hydrogen bond acceptors. (Numeric)
F01[N-N] Number of nitrogen-nitrogen single bonds. (Numeric)
F04[C-N] Number of carbon-nitrogen single bonds. (Numeric)
NssssC Number of atoms in the largest ring. (Numeric)
nCb- Number of negative charge centers. (Numeric)
C% Percentage of carbon atoms in the molecule. (Numeric)
nCp Number of cyclic (ring) structures. (Numeric)
nO Number of oxygen atoms. (Numeric)
F03[C-N] Number of carbon-nitrogen double bonds. (Numeric)
SdssC Number of atoms in the second largest ring. (Numeric)
HyWi_B(m) Number of hydrogen bond donors. (Numeric)
LOC Number of lone electron pairs. (Numeric)
SM6_L Sum of the squares of atomic van der Waals volumes. (Numeric)
F03[C-O] Number of carbon-oxygen double bonds. (Numeric)
Me Number of methyl groups. (Numeric)
Mi Number of methylene groups. (Numeric)
nN-N Number of nitrogen-nitrogen double bonds. (Numeric)
nArNO2 Number of nitro groups. (Numeric)
nCRX3 Number of rotatable bonds. (Numeric)
SpPosA_B(p) Sum of the positive atomic partial charges. (Numeric)
nCIR Number of cyclic (ring) structures with three or more atoms. (Numeric)
B01[C-Br] Number of carbon-bromine single bonds. (Numeric)
B03[C-Cl] Number of carbon-chlorine single bonds. (Numeric)
N-073 Number of atoms in the third largest ring. (Numeric)
SpMax_A Maximum number of heavy atoms connected to a single atom. (Numeric)
Psi_i_1d Number of atoms in the smallest ring. (Numeric)
B04[C-Br] Number of carbon-bromine double bonds. (Numeric)
SdO Number of oxygen atoms in the second largest ring. (Numeric)
TI2_L Sum of the squares of atomic polarizabilities. (Numeric)
nCrt Number of rotatable bonds with three or more atoms. (Numeric)
C-026 Number of atoms in the fourth largest ring. (Numeric)
F02[C-N] Number of carbon-nitrogen triple bonds. (Numeric)
nHDon Number of hydrogen bond donors. (Numeric)
SpMax_B(m) Maximum number of heavy atoms connected to a single atom in the second largest ring. (Numeric)
Psi_i_A Number of atoms in the largest ring. (Numeric)
nN Number of nitrogen atoms. (Numeric)
SM6_B(m) Sum of the squares of atomic van der Waals volumes in the second largest ring. (Numeric)
nArCOOR Number of ester groups. (Numeric)
nX Number of atoms in the molecule. (Numeric)

File: test_rows_labels.csv

Column name Description
SpMax_L Maximum number of heavy atoms connected to a single heavy atom. (Numeric)
J_Dz(e) Number of electrons in the highest occupied molecular orbital. (Numeric)
nHM Number of hydrogen bond acceptors. (Numeric)
F01[N-N] Number of nitrogen-nitrogen single bonds. (Numeric)
F04[C-N] Number of carbon-nitrogen single bonds. (Numeric)
NssssC Number of atoms in the largest ring. (Numeric)
nCb- Number of negative charge centers. (Numeric)
C% Percentage of carbon atoms in the molecule. (Numeric)
nCp Number of cyclic (ring) structures. (Numeric)
nO Number of oxygen atoms. (Numeric)
F03[C-N] Number of carbon-nitrogen double bonds. (Numeric)
SdssC Number of atoms in the second largest ring. (Numeric)
HyWi_B(m) Number of hydrogen bond donors. (Numeric)
LOC Number of lone electron pairs. (Numeric)
SM6_L Sum of the squares of atomic van der Waals volumes. (Numeric)
F03[C-O] Number of carbon-oxygen double bonds. (Numeric)
Me Number of methyl groups. (Numeric)
Mi Number of methylene groups. (Numeric)
nN-N Number of nitrogen-nitrogen double bonds. (Numeric)
nArNO2 Number of nitro groups. (Numeric)
nCRX3 Number of rotatable bonds. (Numeric)
SpPosA_B(p) Sum of the positive atomic partial charges. (Numeric)
nCIR Number of cyclic (ring) structures with three or more atoms. (Numeric)
B01[C-Br] Number of carbon-bromine single bonds. (Numeric)
B03[C-Cl] Number of carbon-chlorine single bonds. (Numeric)
N-073 Number of atoms in the third largest ring. (Numeric)
SpMax_A Maximum number of heavy atoms connected to a single atom. (Numeric)
Psi_i_1d Number of atoms in the smallest ring. (Numeric)
B04[C-Br] Number of carbon-bromine double bonds. (Numeric)
SdO Number of oxygen atoms in the second largest ring. (Numeric)
TI2_L Sum of the squares of atomic polarizabilities. (Numeric)
nCrt Number of rotatable bonds with three or more atoms. (Numeric)
C-026 Number of atoms in the fourth largest ring. (Numeric)
F02[C-N] Number of carbon-nitrogen triple bonds. (Numeric)
nHDon Number of hydrogen bond donors. (Numeric)
SpMax_B(m) Maximum number of heavy atoms connected to a single atom in the second largest ring. (Numeric)
Psi_i_A Number of atoms in the largest ring. (Numeric)
nN Number of nitrogen atoms. (Numeric)
SM6_B(m) Sum of the squares of atomic van der Waals volumes in the second largest ring. (Numeric)
nArCOOR Number of ester groups. (Numeric)
nX Number of atoms in the molecule. (Numeric)
Class The class of the molecule (Numeric)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit .

Tables

Test Rows

@kaggle.thedevastator_qsar_molecular_descriptor_predictions.test_rows
  • 84.39 KB
  • 837 rows
  • 41 columns
Loading...

CREATE TABLE test_rows (
  "spmax_l" DOUBLE,
  "j_dz_e" DOUBLE,
  "nhm" BIGINT,
  "f01_n_n" BIGINT,
  "f04_c_n" BIGINT,
  "nssssc" BIGINT,
  "ncb" BIGINT,
  "c" DOUBLE,
  "ncp" BIGINT,
  "no" BIGINT,
  "f03_c_n" BIGINT,
  "sdssc" DOUBLE,
  "hywi_b_m" DOUBLE,
  "loc" DOUBLE,
  "sm6_l" DOUBLE,
  "f03_c_o" BIGINT,
  "me" DOUBLE,
  "mi" DOUBLE,
  "nn_n" BIGINT,
  "narno2" BIGINT,
  "ncrx3" BIGINT,
  "spposa_b_p" DOUBLE,
  "ncir" BIGINT,
  "b01_c_br" BIGINT,
  "b03_c_cl" BIGINT,
  "n_073" BIGINT,
  "spmax_a" DOUBLE,
  "psi_i_1d" DOUBLE,
  "b04_c_br" BIGINT,
  "sdo" DOUBLE,
  "ti2_l" DOUBLE,
  "ncrt" BIGINT,
  "c_026" BIGINT,
  "f02_c_n" BIGINT,
  "nhdon" BIGINT,
  "spmax_b_m" DOUBLE,
  "psi_i_a" DOUBLE,
  "nn" BIGINT,
  "sm6_b_m" DOUBLE,
  "narcoor" BIGINT,
  "nx" BIGINT
);

Test Rows Labels

@kaggle.thedevastator_qsar_molecular_descriptor_predictions.test_rows_labels
  • 85.03 KB
  • 837 rows
  • 42 columns
Loading...

CREATE TABLE test_rows_labels (
  "spmax_l" DOUBLE,
  "j_dz_e" DOUBLE,
  "nhm" BIGINT,
  "f01_n_n" BIGINT,
  "f04_c_n" BIGINT,
  "nssssc" BIGINT,
  "ncb" BIGINT,
  "c" DOUBLE,
  "ncp" BIGINT,
  "no" BIGINT,
  "f03_c_n" BIGINT,
  "sdssc" DOUBLE,
  "hywi_b_m" DOUBLE,
  "loc" DOUBLE,
  "sm6_l" DOUBLE,
  "f03_c_o" BIGINT,
  "me" DOUBLE,
  "mi" DOUBLE,
  "nn_n" BIGINT,
  "narno2" BIGINT,
  "ncrx3" BIGINT,
  "spposa_b_p" DOUBLE,
  "ncir" BIGINT,
  "b01_c_br" BIGINT,
  "b03_c_cl" BIGINT,
  "n_073" BIGINT,
  "spmax_a" DOUBLE,
  "psi_i_1d" DOUBLE,
  "b04_c_br" BIGINT,
  "sdo" DOUBLE,
  "ti2_l" DOUBLE,
  "ncrt" BIGINT,
  "c_026" BIGINT,
  "f02_c_n" BIGINT,
  "nhdon" BIGINT,
  "spmax_b_m" DOUBLE,
  "psi_i_a" DOUBLE,
  "nn" BIGINT,
  "sm6_b_m" DOUBLE,
  "narcoor" BIGINT,
  "nx" BIGINT,
  "class" BIGINT
);

Train Rows

@kaggle.thedevastator_qsar_molecular_descriptor_predictions.train_rows
  • 85.03 KB
  • 837 rows
  • 42 columns
Loading...

CREATE TABLE train_rows (
  "spmax_l" DOUBLE,
  "j_dz_e" DOUBLE,
  "nhm" BIGINT,
  "f01_n_n" BIGINT,
  "f04_c_n" BIGINT,
  "nssssc" BIGINT,
  "ncb" BIGINT,
  "c" DOUBLE,
  "ncp" BIGINT,
  "no" BIGINT,
  "f03_c_n" BIGINT,
  "sdssc" DOUBLE,
  "hywi_b_m" DOUBLE,
  "loc" DOUBLE,
  "sm6_l" DOUBLE,
  "f03_c_o" BIGINT,
  "me" DOUBLE,
  "mi" DOUBLE,
  "nn_n" BIGINT,
  "narno2" BIGINT,
  "ncrx3" BIGINT,
  "spposa_b_p" DOUBLE,
  "ncir" BIGINT,
  "b01_c_br" BIGINT,
  "b03_c_cl" BIGINT,
  "n_073" BIGINT,
  "spmax_a" DOUBLE,
  "psi_i_1d" DOUBLE,
  "b04_c_br" BIGINT,
  "sdo" DOUBLE,
  "ti2_l" DOUBLE,
  "ncrt" BIGINT,
  "c_026" BIGINT,
  "f02_c_n" BIGINT,
  "nhdon" BIGINT,
  "spmax_b_m" DOUBLE,
  "psi_i_a" DOUBLE,
  "nn" BIGINT,
  "sm6_b_m" DOUBLE,
  "narcoor" BIGINT,
  "nx" BIGINT,
  "class" BIGINT
);

Share link

Anyone who has the link will be able to view this.