Baselight

Fake News Challenge

Detecting abnormal news articles

@kaggle.abhinavkrjha_fake_news_challenge

Loading...
Loading...

About this Dataset

Fake News Challenge

Context

The issue of “fake news” has arisen recently as a potential threat to high-quality journalism
and well-informed public discourse. The Fake News Challenge was organized in early
2017 to encourage development of machine learning-based classification systems that
perform “stance detection” -- i.e. identifying whether a particular news headline “agrees”
with, “disagrees” with, “discusses,” or is unrelated to a particular news article -- in order to
allow journalists and others to more easily find and investigate possible instances of “fake
news.”

Content

The data provided is (headline, body, stance) instances, where stance is one of {unrelated, discuss, agree, disagree}. The dataset is provided as two CSVs:

train_bodies.csv

This file contains the body text of articles (the articleBody column) with corresponding IDs (Body ID)

train_stances.csv

This file contains the labeled stances (the Stance column) for pairs of article headlines (Headline) and article bodies (Body ID, referring to entries in train_bodies.csv).

Distribution of the data

The distribution of Stance classes in train_stances.csv is as follows:

rows unrelated discuss agree disagree
49972 0.73131 0.17828 0.0736012 0.0168094

There are 4 possible classifications:

  1. The article text agrees with the headline.
  2. The article text disagrees with the headline.
  3. The article text is a discussion of the headline, without taking a position on it.
  4. The article text is unrelated to the headline (i.e. it doesn’t address the same topic).

Acknowledgements

For details of the task, see FakeNewsChallenge.org

Tables

Competition Test Bodies

@kaggle.abhinavkrjha_fake_news_challenge.competition_test_bodies
  • 1.18 MB
  • 904 rows
  • 2 columns
Loading...

CREATE TABLE competition_test_bodies (
  "body_id" BIGINT,
  "articlebody" VARCHAR
);

Competition Test Stances Unlabeled

@kaggle.abhinavkrjha_fake_news_challenge.competition_test_stances_unlabeled
  • 109.29 KB
  • 25413 rows
  • 2 columns
Loading...

CREATE TABLE competition_test_stances_unlabeled (
  "headline" VARCHAR,
  "body_id" BIGINT
);

Test Bodies

@kaggle.abhinavkrjha_fake_news_challenge.test_bodies
  • 1.18 MB
  • 904 rows
  • 2 columns
Loading...

CREATE TABLE test_bodies (
  "body_id" BIGINT,
  "articlebody" VARCHAR
);

Test Stances Unlabeled

@kaggle.abhinavkrjha_fake_news_challenge.test_stances_unlabeled
  • 109.29 KB
  • 25413 rows
  • 2 columns
Loading...

CREATE TABLE test_stances_unlabeled (
  "headline" VARCHAR,
  "body_id" BIGINT
);

Train Bodies

@kaggle.abhinavkrjha_fake_news_challenge.train_bodies
  • 2.21 MB
  • 1683 rows
  • 2 columns
Loading...

CREATE TABLE train_bodies (
  "body_id" BIGINT,
  "articlebody" VARCHAR
);

Train Stances

@kaggle.abhinavkrjha_fake_news_challenge.train_stances
  • 235.46 KB
  • 49972 rows
  • 3 columns
Loading...

CREATE TABLE train_stances (
  "headline" VARCHAR,
  "body_id" BIGINT,
  "stance" VARCHAR
);

Share link

Anyone who has the link will be able to view this.