Baselight

CORD-19 Research Article Dataset

Unlocking Insights to Combat COVID-19 Through Natural Language Processing

@kaggle.thedevastator_cord_19_research_article_dataset

Loading...
Loading...

About this Dataset

CORD-19 Research Article Dataset


CORD-19 Research Article Dataset

Unlocking Insights to Combat COVID-19 Through Natural Language Processing

By [source]


About this dataset

The COVID-19 Open Research Dataset (CORD-19) contains vital research articles with essential insights into the Coronavirus family of viruses and its impacts on human health. This extraordinary wealth of information serves to bring together medical experts, data scientists, and artificial intelligence professionals in a collaborative effort to develop key biomedical advances which could help defeat COVID-19.

This research dataset has been released to the public to provide unparalleled access to scholarly knowledge regarding this pandemic. It features full text content from sources such as journals, PMC open access corpora, WHO publications, bioRxiv pre-prints and medRxiv pre-prints; all of which belong to an array of different authors. Metadata from a comprehensive file is included alongside links referencing PubMed, Microsoft Academic and WHO databases – enabling integrative cross references in real time!

Armed with these scholarly writings it is our intention that this priceless asset provides valuable resources for researchers all over the world striving for advancements in their understanding of infectious diseases so that we may eventually conquer this virus for future generations..

Through CORD-19’s pioneering platform we can unite donations from individuals who have made financial contributions: corporations who are offering technological services pro bono; innovative companies whose creativity continues to aid the improvement of AI technologies: all deeply committed toward uncovering novel biological precepts leading toward therapy development or even potential cures! Joining forces via this platform grants us global access uniting talented minds across both scientific disciplines and geographical boundaries proving faster progress than ever before!

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

This dataset is designed with natural language processing in mind, making it easier for you to generate new insights in support of the fight against this infectious disease. With that being said, let’s get started on exploring this dataset!

  • Downloading: The first step is downloading a copy of the CORD-19Research article Dataset. You can find it online on Kaggle or Github repositories, or you can use one provided by your college/university library or another relevant organization.

  • Exploring Metadata: In order to get an understanding of what kind information is available here go ahead and explor the Metadata file included with this data set where you will be able to identify aspects such as authorship information (including author names), publication time as well as journals like PLoS ONE and BMC Medicine etc.. This will also reveal valuable information about licenses associated with each article.

  • Discovering Abstracts: Use this section to discover articles based on specific topics or keywords featured in their abstracts through text processing methods such as TFIDF for qualitative analysis using natural language processing techniques like sentiment analysis which can provide insight into how individual authors feel about certain topics around COVID-19 across various publications all at once quickly without tons of manual work!

4 . Finding Full Text Content: Looking for full text content? No problem—the full text file column provides direct access every article's contents including introduction sections overviews results implications discussion etc... This makes it easy enough for a researcher who knows what they're looking for within a paper whether its something more specific such as scientific terminology or strategies used but are unable to find out sources that focus solely on your topic until now !

5 . Locating Additional Resources : Finally if needed head over towards a few additional resources that might help broaden your research ; even further these options include PubMed MS Academic & WHOdb – all available right here within CORD - 19 so don't hesitate utilizing these options along side any other relevant resources related too COVID - 19 !

Research Ideas

  • Identifying emerging trends in the spread of COVID-19 by integrating epidemiological and public health data with large-scale data analysis methods, such as natural language processing, to gain insight into the research conducted to combat this disease.
  • Developing intelligent sentiment analysis algorithms based on articles that can be used to monitor public opinion and sentiment towards COVID-19.
  • Creating an interactive visualizations of recently published scientific studies to visualize relationships among authors, journals, institutions, topics and countries in order to better understand the current research landscape and advancements being made in combatting the pandemic

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: metadata.csv

Column name Description
sha A unique identifier for each article. (String)
source_x The source of the article. (String)
title The title of the article. (String)
doi The Digital Object Identifier for the article. (String)
license The license associated with the article. (String)
abstract A summary of the article. (String)
publish_time The date the article was published. (Date)
authors The authors of the article. (String)
journal The journal the article was published in. (String)
has_pdf_parse Whether or not the article has a PDF version. (Boolean)
has_pmc_xml_parse Whether or not the article has a PMC XML version. (Boolean)
full_text_file The full text of the article. (String)
url The URL of the article. (String)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit .

Tables

Metadata

@kaggle.thedevastator_cord_19_research_article_dataset.metadata
  • 47.77 MB
  • 59887 rows
  • 19 columns
Loading...

CREATE TABLE metadata (
  "cord_uid" VARCHAR,
  "sha" VARCHAR,
  "source_x" VARCHAR,
  "title" VARCHAR,
  "doi" VARCHAR,
  "pmcid" VARCHAR,
  "pubmed_id" DOUBLE,
  "license" VARCHAR,
  "abstract" VARCHAR,
  "publish_time" VARCHAR,
  "authors" VARCHAR,
  "journal" VARCHAR,
  "microsoft_academic_paper_id" DOUBLE,
  "who_covidence" VARCHAR,
  "arxiv_id" VARCHAR,
  "has_pdf_parse" BOOLEAN,
  "has_pmc_xml_parse" BOOLEAN,
  "full_text_file" VARCHAR,
  "url" VARCHAR
);

Share link

Anyone who has the link will be able to view this.