My Complete Genome
6,000 Base-Pairs of Phenotype SNPs - Complete Raw Data
@kaggle.zusmani_mygenome
6,000 Base-Pairs of Phenotype SNPs - Complete Raw Data
@kaggle.zusmani_mygenome
Zeeshan-ul-hassan Usmani’s Genome Phenotype SNPs Raw Data
Genomics is a branch of molecular biology that involves structure, function, variation, evolution and mapping of genomes. There are several companies offering next generation sequencing of human genomes from complete 3 billion base-pairs to a few thousand Phenotype SNPs. I’ve used 23andMe (using Illumina HumanOmniExpress-24) for my DNA’s Phenotype SNPs. I am sharing the entire raw dataset here for the international research community for following reasons:
I am a firm believer in open dataset, transparency, and the right to learn, research, explores, and educate. I do not want to restrict the knowledge flow for mere privacy concerns. Hence, I am offering my entire DNA raw data for the world to use for research without worrying about privacy. I call it copyleft dataset.
Most of available test datasets for research come from western world and we don’t see much from under-developing countries. I thought to share my data to bridge the gap and I expect others to follow the trend.
I would be the happiest man on earth, if a life can be saved, knowledge can be learned, an idea can be explore, or a fact can be found using my DNA data. Please use it the way you will
Name: Zeeshan-ul-hassan Usmani
Age: 38 Years
Country of Birth: Pakistan
Country of Ancestors: India (Utter Pradesh - UP)
File: GenomeZeeshanUsmani.csv
Size: 15 MB
Sources: 23andMe Personalized Genome Report
The research community is still progressively working in this domain and it is agreed upon by professionals that genomics is still in its infancy. You now have the chance to explore this novel domain via the dataset and become one of the few genomics early adopters.
The data-set is a complete genome extracted from www.23andme.com and is represented as a sequence of SNPs represented by the following symbols: A (adenine), C (cytosine), G (guanine), T (thymine), D (base deletions), I (base insertions), and '_' or '-' if the SNP for particular location is not accessible. It contains Chromosomes 1-22, X, Y, and mitochondrial DNA.
A complete list of the exact SNPs (base pairs) available and their data-set index can be found at
https://api.23andme.com/res/txt/snps.b4e00fe1db50.data
For more information about how the data-set was extracted follow https://api.23andme.com/docs/reference/#genomes
Moreover, for a more detailed understanding of the data-set content please acquaint yourself with the description of https://api.23andme.com/docs/reference/#genotypes
Users are allowed to use, copy, distribute and cite the dataset as follows: “Zeeshan-ul-hassan Usmani, Genome Phenotype SNPS Raw Data File by 23andMe, Kaggle Dataset Repository, Jan 25, 2017.”
You may use the following human genome database sites for help:
Some ideas worth exploring:
Please check out following reports to understand what can be done with this data
Ancestry –
https://www.23andme.com/published-report/eeb4f9bbd6b5474f/?share_id=f6c5562848e84586
Weight Report -
https://you.23andme.com/published/reports/65c9af9f8223456d/?share_id=0126f129e4f3458b
Anyone who has the link will be able to view this.