Baselight

SciFact (Scientific Claims)

1.4K Expert-Written Claims with Structured Annotations

@kaggle.thedevastator_unlock_insight_into_scientific_claims_with_scifa

Loading...
Loading...

About this Dataset

SciFact (Scientific Claims)


SciFact (Scientific Claims)

1.4K Expert-Written Claims with Structured Annotations

By Huggingface Hub [source]


About this dataset

The SciFact dataset is a unique and valuable resource for research that aims to uncover novel insights into sentiment, fact-checking, and trustworthiness of scientific claims. With 1.4K expert-written scientific claims paired with evidence-containing abstracts and contents, as well as human-generated structured annotations containing labels and rationales, this dataset provides ample opportunity for researchers to explore the nuances of science communication. Dive in to uncover previously untapped depths in the way scientists express their ideas through accurate language choices, persuasive arguments, and illuminating visuals!

More Datasets

For more datasets, click here.

Featured Notebooks

  • 🚨 Your notebook can be here! 🚨!

How to use the dataset

How to use the SciFact Dataset

The SciFact dataset is a valuable resource for exploring sentiment, fact-checking and trustworthiness of scientific claims and evidence. The dataset includes 1,400 expert written scientific claims (claims_train.csv), paired with evidence-containing abstracts and contents (corpus_train.csv). Each claim has been manually annotated with labels and rationales (structured) to help researchers obtain meaningful insights into the subject matter.

Research Ideas

  • Understanding the sentiment of scientific claims by measuring the trustworthiness or accuracy of its evidence.
  • Developing algorithms for fact-checking scientific claims against associated evidence labels and rationales.
  • Training predictive models to automatically generate structured annotations of the evidence contained in a claim and abstracts from a given corpus dataset

Acknowledgements

If you use this dataset in your research, please credit the original authors.
Data Source

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.

Columns

File: corpus_train.csv

Column name Description
title The title of the claim. (String)
abstract Background information related to the claim. (String)
structured Labels applied to each annotation along with some rationale as to why they were selected. (String)

File: claims_validation.csv

Column name Description
claim The claim or statement made by an expert. (String)

File: claims_test.csv

Column name Description
claim The claim or statement made by an expert. (String)

File: claims_train.csv

Column name Description
claim The claim or statement made by an expert. (String)

Acknowledgements

If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Huggingface Hub.

Tables

Claims Test

@kaggle.thedevastator_unlock_insight_into_scientific_claims_with_scifa.claims_test
  • 22.72 KB
  • 300 rows
  • 6 columns
Loading...

CREATE TABLE claims_test (
  "id" BIGINT,
  "claim" VARCHAR,
  "evidence_doc_id" VARCHAR,
  "evidence_label" VARCHAR,
  "evidence_sentences" VARCHAR,
  "cited_doc_ids" VARCHAR
);

Claims Train

@kaggle.thedevastator_unlock_insight_into_scientific_claims_with_scifa.claims_train
  • 61.22 KB
  • 1261 rows
  • 6 columns
Loading...

CREATE TABLE claims_train (
  "id" BIGINT,
  "claim" VARCHAR,
  "evidence_doc_id" DOUBLE,
  "evidence_label" VARCHAR,
  "evidence_sentences" VARCHAR,
  "cited_doc_ids" VARCHAR
);

Claims Validation

@kaggle.thedevastator_unlock_insight_into_scientific_claims_with_scifa.claims_validation
  • 29.93 KB
  • 450 rows
  • 6 columns
Loading...

CREATE TABLE claims_validation (
  "id" BIGINT,
  "claim" VARCHAR,
  "evidence_doc_id" DOUBLE,
  "evidence_label" VARCHAR,
  "evidence_sentences" VARCHAR,
  "cited_doc_ids" VARCHAR
);

Corpus Train

@kaggle.thedevastator_unlock_insight_into_scientific_claims_with_scifa.corpus_train
  • 4.37 MB
  • 5183 rows
  • 4 columns
Loading...

CREATE TABLE corpus_train (
  "doc_id" BIGINT,
  "title" VARCHAR,
  "abstract" VARCHAR,
  "structured" BOOLEAN
);

Share link

Anyone who has the link will be able to view this.