Baselight

Structural Protein Sequences

Sequence and meta data for various protein structures

@kaggle.shahir_protein_data_set

Pdb Data Seq
@kaggle.shahir_protein_data_set.pdb_data_seq

  • 43.72 MB
  • 467304 rows
  • 5 columns
structureid

StructureId

chainid

ChainId

sequence

Sequence

residuecount

ResidueCount

macromoleculetype

MacromoleculeType

100DACCGGCGCCGG20DNA/RNA Hybrid
100DBCCGGCGCCGG20DNA/RNA Hybrid
101DACGCGAATTCGCG24DNA
101DBCGCGAATTCGCG24DNA
101MAMVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRVKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGNFGADAQGAMNKALELFRKDIAAKYKELGYQG154Protein
102DACGCAAATTTGCG24DNA
102DBCGCAAATTTGCG24DNA
102LAMNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNAAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL165Protein
102MAMVLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASEDLKKAGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRHPGNFGADAQGAMNKALELFRKDIAAKYKELGYQG154Protein
103DAGTGGAATGGAAC24DNA

CREATE TABLE pdb_data_seq (
  "structureid" VARCHAR,
  "chainid" VARCHAR,
  "sequence" VARCHAR,
  "residuecount" BIGINT,
  "macromoleculetype" VARCHAR
);

Share link

Anyone who has the link will be able to view this.