Genome Information By Organism
genomes including sequences, maps, chromosomes, assemblies and annotations
@kaggle.lsind18_genome_information_by_organism
genomes including sequences, maps, chromosomes, assemblies and annotations
@kaggle.lsind18_genome_information_by_organism
Information on genomes including sequences, maps, chromosomes, assemblies, and annotations organized by National Center for Biotechnology Information, U.S. National Library of Medicine.
This directory contains summary reports conveying the organism scope and detailed genome project reports grouped by major taxonomic divisions.
genomes.csv (4.25 MB)
: Comprehensive report of organisms that have one or many genome sequencing projects that may be complete, in progress or planned.
eukaryotes.csv (3.68 MB)
: Eukaryotic genome sequencing projects excluding projects that represent only organelles.
prokaryotes.csv (84.9 MB)
: Prokaryotic genome sequencing projects excluding projects that represent only plasmids.
viruses.csv (8.92 MB)
: Viral genome sequencing projects.
organelles.csv (2.39 MB)
: Organelle genome sequencing projects.
plasmids.csv (3.84 MB)
: Plasmid genome sequencing projects.
These files correspond to the tables available online at NCBI.
CREATE TABLE eukaryotes (
"organism_name" VARCHAR,
"organism_groups" VARCHAR,
"strain" VARCHAR,
"biosample" VARCHAR,
"bioproject" VARCHAR,
"assembly" VARCHAR,
"level" VARCHAR,
"size_mb" DOUBLE -- Size(Mb),
"gc" DOUBLE -- GC%,
"replicons" VARCHAR,
"wgs" VARCHAR,
"scaffolds" BIGINT,
"cds" BIGINT,
"release_date" TIMESTAMP,
"genbank_ftp" VARCHAR,
"refseq_ftp" VARCHAR,
"genes" BIGINT
);
CREATE TABLE genomes (
"organism_name" VARCHAR,
"organism_groups" VARCHAR,
"size_mb" DOUBLE -- Size(Mb),
"chromosomes" BIGINT,
"organelles" BIGINT,
"plasmids" BIGINT,
"assemblies" BIGINT
);
CREATE TABLE organelles (
"organism_name" VARCHAR,
"organism_groups" VARCHAR,
"strain" VARCHAR,
"biosample" VARCHAR,
"bioproject" VARCHAR,
"size_mb" DOUBLE -- Size(Mb),
"gc" DOUBLE -- GC%,
"type" VARCHAR,
"replicons" VARCHAR,
"cds" BIGINT,
"release_date" TIMESTAMP,
"genes" BIGINT
);
CREATE TABLE plasmids (
"organism_name" VARCHAR,
"organism_groups" VARCHAR,
"strain" VARCHAR,
"biosample" VARCHAR,
"bioproject" VARCHAR,
"size_mb" DOUBLE -- Size(Mb),
"gc" DOUBLE -- GC%,
"replicons" VARCHAR,
"cds" BIGINT,
"neighbors" BIGINT,
"release_date" TIMESTAMP,
"genes" BIGINT
);
CREATE TABLE prokaryotes (
"organism_name" VARCHAR,
"organism_groups" VARCHAR,
"strain" VARCHAR,
"biosample" VARCHAR,
"bioproject" VARCHAR,
"assembly" VARCHAR,
"level" VARCHAR,
"size_mb" DOUBLE -- Size(Mb),
"gc" DOUBLE -- GC%,
"replicons" VARCHAR,
"wgs" VARCHAR,
"scaffolds" BIGINT,
"cds" BIGINT,
"release_date" TIMESTAMP,
"genbank_ftp" VARCHAR,
"refseq_ftp" VARCHAR,
"genes" BIGINT
);
CREATE TABLE viruses (
"organism_name" VARCHAR,
"organism_groups" VARCHAR,
"biosample" VARCHAR,
"bioproject" VARCHAR,
"assembly" VARCHAR,
"level" VARCHAR,
"size_mb" DOUBLE -- Size(Mb),
"gc" DOUBLE -- GC%,
"host" VARCHAR,
"cds" BIGINT,
"neighbors" DOUBLE,
"release_date" TIMESTAMP,
"genbank_ftp" VARCHAR,
"refseq_ftp" VARCHAR,
"genes" BIGINT,
"scaffolds" BIGINT
);
Anyone who has the link will be able to view this.