Baselight

Protein Secondary Structure Casp12 Cb513 Ts115

dataset of casp12, cb513,ts115

@kaggle.tamzidhasan_protein_secondary_structure_casp12_cb513_ts115

Loading...
Loading...

About this Dataset

Protein Secondary Structure Casp12 Cb513 Ts115

Introduction

Protein secondary structure can be calculated based on its atoms' 3D coordinates once the protein's 3D structure is solved using X-ray crystallography or NMR. The benchmark dataset casp13 , cb513 and ts115 is used in here . Every character in the secondary protein sequence is amino acid and described in below :

  1. C: Loops and irregular elements
  2. E: β-strand
  3. H: α-helix
  4. B: β-bridge
  5. G: 3-helix
  6. I: π-helix
  7. T: Turn
  8. S: Bend

However, X-ray or NMR is expensive. Ideally, we would like to predict the secondary structure of a protein based on its primary sequence directly, which has had a long history.

Dataset

The main dataset lists peptide sequences and their corresponding secondary structures.
Description of columns:
2. seq: the sequence of the peptide
3. sst3: the three-state (Q3) secondary structure
4. sst8: the eight-state (Q8) secondary structure

This is a secondary structure datasets with Eight states (H,B,E,G,I,T,S,C). There are 8679 Non-Redundant chains (25%) in training data set . Three state :[H=(G,H,I); E=(B,E); C=(T,S,C)]
The link of dataset - Protein Secondary Structure

Tables

Test Secondary Structure Casp12

@kaggle.tamzidhasan_protein_secondary_structure_casp12_cb513_ts115.test_secondary_structure_casp12
  • 18.8 KB
  • 21 rows
  • 3 columns
Loading...

CREATE TABLE test_secondary_structure_casp12 (
  "seq" VARCHAR,
  "sst3" VARCHAR,
  "sst8" VARCHAR
);

Test Secondary Structure Cb513

@kaggle.tamzidhasan_protein_secondary_structure_casp12_cb513_ts115.test_secondary_structure_cb513
  • 187.66 KB
  • 513 rows
  • 3 columns
Loading...

CREATE TABLE test_secondary_structure_cb513 (
  "seq" VARCHAR,
  "sst3" VARCHAR,
  "sst8" VARCHAR
);

Test Secondary Structure Ts115

@kaggle.tamzidhasan_protein_secondary_structure_casp12_cb513_ts115.test_secondary_structure_ts115
  • 55.55 KB
  • 115 rows
  • 3 columns
Loading...

CREATE TABLE test_secondary_structure_ts115 (
  "seq" VARCHAR,
  "sst3" VARCHAR,
  "sst8" VARCHAR
);

Training Secondary Structure Train

@kaggle.tamzidhasan_protein_secondary_structure_casp12_cb513_ts115.training_secondary_structure_train
  • 3.6 MB
  • 8678 rows
  • 3 columns
Loading...

CREATE TABLE training_secondary_structure_train (
  "seq" VARCHAR,
  "sst3" VARCHAR,
  "sst8" VARCHAR
);

Validation Secondary Structure Valid

@kaggle.tamzidhasan_protein_secondary_structure_casp12_cb513_ts115.validation_secondary_structure_valid
  • 929.72 KB
  • 2170 rows
  • 3 columns
Loading...

CREATE TABLE validation_secondary_structure_valid (
  "seq" VARCHAR,
  "sst3" VARCHAR,
  "sst8" VARCHAR
);

Share link

Anyone who has the link will be able to view this.