Baselight

Genome Information By Organism

genomes including sequences, maps, chromosomes, assemblies and annotations

@kaggle.lsind18_genome_information_by_organism

Loading...
Loading...

About this Dataset

Genome Information By Organism

Context 🔬

Information on genomes including sequences, maps, chromosomes, assemblies, and annotations organized by National Center for Biotechnology Information, U.S. National Library of Medicine.

Content

This directory contains summary reports conveying the organism scope and detailed genome project reports grouped by major taxonomic divisions.

  1. genomes.csv (4.25 MB): Comprehensive report of organisms that have one or many genome sequencing projects that may be complete, in progress or planned.

  2. eukaryotes.csv (3.68 MB): Eukaryotic genome sequencing projects excluding projects that represent only organelles.

  3. prokaryotes.csv (84.9 MB): Prokaryotic genome sequencing projects excluding projects that represent only plasmids.

  4. viruses.csv (8.92 MB): Viral genome sequencing projects.

  5. organelles.csv (2.39 MB): Organelle genome sequencing projects.

  6. plasmids.csv (3.84 MB): Plasmid genome sequencing projects.

Acknowledgements

These files correspond to the tables available online at NCBI.

Tables

Eukaryotes

@kaggle.lsind18_genome_information_by_organism.eukaryotes
  • 1.3 MB
  • 11160 rows
  • 17 columns
Loading...

CREATE TABLE eukaryotes (
  "organism_name" VARCHAR,
  "organism_groups" VARCHAR,
  "strain" VARCHAR,
  "biosample" VARCHAR,
  "bioproject" VARCHAR,
  "assembly" VARCHAR,
  "level" VARCHAR,
  "size_mb" DOUBLE,
  "gc" DOUBLE,
  "replicons" VARCHAR,
  "wgs" VARCHAR,
  "scaffolds" BIGINT,
  "cds" BIGINT,
  "release_date" TIMESTAMP,
  "genbank_ftp" VARCHAR,
  "refseq_ftp" VARCHAR,
  "genes" BIGINT
);

Genomes

@kaggle.lsind18_genome_information_by_organism.genomes
  • 1.1 MB
  • 49903 rows
  • 7 columns
Loading...

CREATE TABLE genomes (
  "organism_name" VARCHAR,
  "organism_groups" VARCHAR,
  "size_mb" DOUBLE,
  "chromosomes" BIGINT,
  "organelles" BIGINT,
  "plasmids" BIGINT,
  "assemblies" BIGINT
);

Organelles

@kaggle.lsind18_genome_information_by_organism.organelles
  • 907.16 KB
  • 16357 rows
  • 12 columns
Loading...

CREATE TABLE organelles (
  "organism_name" VARCHAR,
  "organism_groups" VARCHAR,
  "strain" VARCHAR,
  "biosample" VARCHAR,
  "bioproject" VARCHAR,
  "size_mb" DOUBLE,
  "gc" DOUBLE,
  "type" VARCHAR,
  "replicons" VARCHAR,
  "cds" BIGINT,
  "release_date" TIMESTAMP,
  "genes" BIGINT
);

Plasmids

@kaggle.lsind18_genome_information_by_organism.plasmids
  • 1.06 MB
  • 21175 rows
  • 12 columns
Loading...

CREATE TABLE plasmids (
  "organism_name" VARCHAR,
  "organism_groups" VARCHAR,
  "strain" VARCHAR,
  "biosample" VARCHAR,
  "bioproject" VARCHAR,
  "size_mb" DOUBLE,
  "gc" DOUBLE,
  "replicons" VARCHAR,
  "cds" BIGINT,
  "neighbors" BIGINT,
  "release_date" TIMESTAMP,
  "genes" BIGINT
);

Prokaryotes

@kaggle.lsind18_genome_information_by_organism.prokaryotes
  • 24.87 MB
  • 249357 rows
  • 17 columns
Loading...

CREATE TABLE prokaryotes (
  "organism_name" VARCHAR,
  "organism_groups" VARCHAR,
  "strain" VARCHAR,
  "biosample" VARCHAR,
  "bioproject" VARCHAR,
  "assembly" VARCHAR,
  "level" VARCHAR,
  "size_mb" DOUBLE,
  "gc" DOUBLE,
  "replicons" VARCHAR,
  "wgs" VARCHAR,
  "scaffolds" BIGINT,
  "cds" BIGINT,
  "release_date" TIMESTAMP,
  "genbank_ftp" VARCHAR,
  "refseq_ftp" VARCHAR,
  "genes" BIGINT
);

Viruses

@kaggle.lsind18_genome_information_by_organism.viruses
  • 1.71 MB
  • 34747 rows
  • 16 columns
Loading...

CREATE TABLE viruses (
  "organism_name" VARCHAR,
  "organism_groups" VARCHAR,
  "biosample" VARCHAR,
  "bioproject" VARCHAR,
  "assembly" VARCHAR,
  "level" VARCHAR,
  "size_mb" DOUBLE,
  "gc" DOUBLE,
  "host" VARCHAR,
  "cds" BIGINT,
  "neighbors" DOUBLE,
  "release_date" TIMESTAMP,
  "genbank_ftp" VARCHAR,
  "refseq_ftp" VARCHAR,
  "genes" BIGINT,
  "scaffolds" BIGINT
);

Share link

Anyone who has the link will be able to view this.