Fake News Challenge
Detecting abnormal news articles
@kaggle.abhinavkrjha_fake_news_challenge
Detecting abnormal news articles
@kaggle.abhinavkrjha_fake_news_challenge
The issue of “fake news” has arisen recently as a potential threat to high-quality journalism
and well-informed public discourse. The Fake News Challenge was organized in early
2017 to encourage development of machine learning-based classification systems that
perform “stance detection” -- i.e. identifying whether a particular news headline “agrees”
with, “disagrees” with, “discusses,” or is unrelated to a particular news article -- in order to
allow journalists and others to more easily find and investigate possible instances of “fake
news.”
The data provided is (headline, body, stance) instances, where stance is one of {unrelated, discuss, agree, disagree}. The dataset is provided as two CSVs:
train_bodies.csvThis file contains the body text of articles (the articleBody column) with corresponding IDs (Body ID)
train_stances.csvThis file contains the labeled stances (the Stance column) for pairs of article headlines (Headline) and article bodies (Body ID, referring to entries in train_bodies.csv).
The distribution of Stance classes in train_stances.csv is as follows:
| rows | unrelated | discuss | agree | disagree |
|---|---|---|---|---|
| 49972 | 0.73131 | 0.17828 | 0.0736012 | 0.0168094 |
There are 4 possible classifications:
For details of the task, see FakeNewsChallenge.org
CREATE TABLE competition_test_bodies (
"body_id" BIGINT,
"articlebody" VARCHAR
);CREATE TABLE competition_test_stances_unlabeled (
"headline" VARCHAR,
"body_id" BIGINT
);CREATE TABLE test_bodies (
"body_id" BIGINT,
"articlebody" VARCHAR
);CREATE TABLE test_stances_unlabeled (
"headline" VARCHAR,
"body_id" BIGINT
);CREATE TABLE train_bodies (
"body_id" BIGINT,
"articlebody" VARCHAR
);CREATE TABLE train_stances (
"headline" VARCHAR,
"body_id" BIGINT,
"stance" VARCHAR
);Anyone who has the link will be able to view this.