SciFact (Scientific Claims)
1.4K Expert-Written Claims with Structured Annotations
@kaggle.thedevastator_unlock_insight_into_scientific_claims_with_scifa
1.4K Expert-Written Claims with Structured Annotations
@kaggle.thedevastator_unlock_insight_into_scientific_claims_with_scifa
By Huggingface Hub [source]
The SciFact dataset is a unique and valuable resource for research that aims to uncover novel insights into sentiment, fact-checking, and trustworthiness of scientific claims. With 1.4K expert-written scientific claims paired with evidence-containing abstracts and contents, as well as human-generated structured annotations containing labels and rationales, this dataset provides ample opportunity for researchers to explore the nuances of science communication. Dive in to uncover previously untapped depths in the way scientists express their ideas through accurate language choices, persuasive arguments, and illuminating visuals!
For more datasets, click here.
- 🚨 Your notebook can be here! 🚨!
How to use the SciFact Dataset
The SciFact dataset is a valuable resource for exploring sentiment, fact-checking and trustworthiness of scientific claims and evidence. The dataset includes 1,400 expert written scientific claims (claims_train.csv), paired with evidence-containing abstracts and contents (corpus_train.csv). Each claim has been manually annotated with labels and rationales (structured) to help researchers obtain meaningful insights into the subject matter.
- Understanding the sentiment of scientific claims by measuring the trustworthiness or accuracy of its evidence.
- Developing algorithms for fact-checking scientific claims against associated evidence labels and rationales.
- Training predictive models to automatically generate structured annotations of the evidence contained in a claim and abstracts from a given corpus dataset
If you use this dataset in your research, please credit the original authors.
Data Source
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
File: corpus_train.csv
| Column name | Description |
|---|---|
| title | The title of the claim. (String) |
| abstract | Background information related to the claim. (String) |
| structured | Labels applied to each annotation along with some rationale as to why they were selected. (String) |
File: claims_validation.csv
| Column name | Description |
|---|---|
| claim | The claim or statement made by an expert. (String) |
File: claims_test.csv
| Column name | Description |
|---|---|
| claim | The claim or statement made by an expert. (String) |
File: claims_train.csv
| Column name | Description |
|---|---|
| claim | The claim or statement made by an expert. (String) |
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.
CREATE TABLE claims_test (
"id" BIGINT,
"claim" VARCHAR,
"evidence_doc_id" VARCHAR,
"evidence_label" VARCHAR,
"evidence_sentences" VARCHAR,
"cited_doc_ids" VARCHAR
);CREATE TABLE claims_train (
"id" BIGINT,
"claim" VARCHAR,
"evidence_doc_id" DOUBLE,
"evidence_label" VARCHAR,
"evidence_sentences" VARCHAR,
"cited_doc_ids" VARCHAR
);CREATE TABLE claims_validation (
"id" BIGINT,
"claim" VARCHAR,
"evidence_doc_id" DOUBLE,
"evidence_label" VARCHAR,
"evidence_sentences" VARCHAR,
"cited_doc_ids" VARCHAR
);CREATE TABLE corpus_train (
"doc_id" BIGINT,
"title" VARCHAR,
"abstract" VARCHAR,
"structured" BOOLEAN
);Anyone who has the link will be able to view this.