The COVID-19 Numerical Claims Open Research Dataset (CONCORD) is a comprehensive, open-source dataset containing numerical claims extracted from academic papers published on COVID-19-related research. CONCORD contains approximately 203k numerical claims pertinent to COVID-19, extracted from more than 57,000 scientific research articles published between January 2020 to May 2022. These claims are extracted from full-text research articles annotated using a white box, weakly supervised model. We used the CORD-19 repository as the raw dataset for our research work.
Why numerical claims?
- Adding a numerical entity often increases the claim’s credibility while providing fine-grained, tangible, and valuable information that can be of immense use, especially in the biomedical domain.
Thumbnail Image source: https://indianexpress.com/article/cities/bangalore/unsustainable-urbanisation-coronavirus-variants-8062078/