Dataset: Fake News Challenge

About this Dataset

Fake News Challenge

Context

The issue of “fake news” has arisen recently as a potential threat to high-quality journalism
and well-informed public discourse. The Fake News Challenge was organized in early
2017 to encourage development of machine learning-based classification systems that
perform “stance detection” -- i.e. identifying whether a particular news headline “agrees”
with, “disagrees” with, “discusses,” or is unrelated to a particular news article -- in order to
allow journalists and others to more easily find and investigate possible instances of “fake
news.”

Content

The data provided is (headline, body, stance) instances, where stance is one of {unrelated, discuss, agree, disagree}. The dataset is provided as two CSVs:

`train_bodies.csv`

This file contains the body text of articles (the articleBody column) with corresponding IDs (Body ID)

`train_stances.csv`

This file contains the labeled stances (the Stance column) for pairs of article headlines (Headline) and article bodies (Body ID, referring to entries in train_bodies.csv).

Distribution of the data

The distribution of Stance classes in train_stances.csv is as follows:

rows	unrelated	discuss	agree	disagree
49972	0.73131	0.17828	0.0736012	0.0168094

There are 4 possible classifications:

The article text agrees with the headline.
The article text disagrees with the headline.
The article text is a discussion of the headline, without taking a position on it.
The article text is unrelated to the headline (i.e. it doesn’t address the same topic).

Acknowledgements

For details of the task, see FakeNewsChallenge.org

Tables

Competition Test Bodies

@kaggle.abhinavkrjha_fake_news_challenge.competition_test_bodies

1.18 MB
904 rows
2 columns


CREATE TABLE competition_test_bodies (
  "body_id" BIGINT,
  "articlebody" VARCHAR
);

Competition Test Stances Unlabeled

@kaggle.abhinavkrjha_fake_news_challenge.competition_test_stances_unlabeled

109.29 KB
25413 rows
2 columns


CREATE TABLE competition_test_stances_unlabeled (
  "headline" VARCHAR,
  "body_id" BIGINT
);

Test Bodies

@kaggle.abhinavkrjha_fake_news_challenge.test_bodies

1.18 MB
904 rows
2 columns


CREATE TABLE test_bodies (
  "body_id" BIGINT,
  "articlebody" VARCHAR
);

Test Stances Unlabeled

@kaggle.abhinavkrjha_fake_news_challenge.test_stances_unlabeled

109.29 KB
25413 rows
2 columns


CREATE TABLE test_stances_unlabeled (
  "headline" VARCHAR,
  "body_id" BIGINT
);

Train Bodies

@kaggle.abhinavkrjha_fake_news_challenge.train_bodies

2.21 MB
1683 rows
2 columns


CREATE TABLE train_bodies (
  "body_id" BIGINT,
  "articlebody" VARCHAR
);

Train Stances

@kaggle.abhinavkrjha_fake_news_challenge.train_stances

235.46 KB
49972 rows
3 columns


CREATE TABLE train_stances (
  "headline" VARCHAR,
  "body_id" BIGINT,
  "stance" VARCHAR
);