Baselight

COVID-19 Numerical Claims Open Research Dataset

Numerical claims related to COVID-19

@kaggle.dshah1612_covid19_numerical_claims_open_research_dataset

About this Dataset

COVID-19 Numerical Claims Open Research Dataset

The COVID-19 Numerical Claims Open Research Dataset (CONCORD) is a comprehensive, open-source dataset containing numerical claims extracted from academic papers published on COVID-19-related research. CONCORD contains approximately 203k numerical claims pertinent to COVID-19, extracted from more than 57,000 scientific research articles published between January 2020 to May 2022. These claims are extracted from full-text research articles annotated using a white box, weakly supervised model. We used the CORD-19 repository as the raw dataset for our research work.

Why numerical claims?

  • Adding a numerical entity often increases the claim’s credibility while providing fine-grained, tangible, and valuable information that can be of immense use, especially in the biomedical domain.

Thumbnail Image source: https://indianexpress.com/article/cities/bangalore/unsustainable-urbanisation-coronavirus-variants-8062078/