Dataset: Cyber/Non-Cyber Fraud News Classification

About this Dataset

Cyber/Non-Cyber Fraud News Classification

Context

We wanted to create a data-set to help us classify financial-fraud related news articles into cyber and non-cyber categories, and so this data-set came into being. Hope this becomes useful for others who are also trying to achieve something similar.

Content

We have two csv files in this data-set. The cyber.csv file contains snippets of news articles that talks about cyber-frauds in the finance domain. The noncyber.csv file contains snippets of news articles that talks about complementary subjects. Each file has 250 distinct entries complied from New York Times and Times of India. These two files are meant for classification of financial fraud related articles, i.e. when we are concerned only with finance related news being fed into the binary classifier. If one wants the non-cyberfraud related data to contain articles from a broader and more generic domain, she can borrow documents from the nonfraud.csv in this dataset.

Author

Created by: Sayan Biswas (sayanb@sahaj.ai).

Tables

Cyber

@kaggle.bitswazsky_cybernoncyber_news_classification.cyber

53.33 KB
250 rows
3 columns


CREATE TABLE cyber (
  "url" VARCHAR,
  "title" VARCHAR,
  "summary" VARCHAR
);

Noncyber

@kaggle.bitswazsky_cybernoncyber_news_classification.noncyber

54.94 KB
250 rows
3 columns


CREATE TABLE noncyber (
  "url" VARCHAR,
  "title" VARCHAR,
  "summary" VARCHAR
);