Context
This is a protein data set retrieved from Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB).
The PDB archive is a repository of atomic coordinates and other information describing proteins and other important biological macromolecules. Structural biologists use methods such as X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy to determine the location of each atom relative to each other in the molecule. They then deposit this information, which is then annotated and publicly released into the archive by the wwPDB.
The constantly-growing PDB is a reflection of the research that is happening in laboratories across the world. This can make it both exciting and challenging to use the database in research and education. Structures are available for many of the proteins and nucleic acids involved in the central processes of life, so you can go to the PDB archive to find structures for ribosomes, oncogenes, drug targets, and even whole viruses. However, it can be a challenge to find the information that you need, since the PDB archives so many different structures. You will often find multiple structures for a given molecule, or partial structures, or structures that have been modified or inactivated from their native form.
Content
There are two data files. Both are arranged on "structureId" of the protein:
-
pdb_data_no_dups.csv contains protein meta data which includes details on protein classification, extraction methods, etc.
-
data_seq.csv contains >400,000 protein structure sequences.
Acknowledgements
Original data set down loaded from http://www.rcsb.org/pdb/
Inspiration
Protein data base helped the life science community to study about different diseases and come with new drugs and solution that help the human survival.