Baselight

Phishing Email Dataset

Phish No More: The Enron, Ling, CEAS, Nazario, Nigerian & SpamAssassin Datasets

@kaggle.naserabdullahalam_phishing_email_dataset

Loading...
Loading...

About this Dataset

Phishing Email Dataset

PHISHING EMAIL DATASET

This dataset was compiled by researchers to study phishing email tactics. It combines emails from a variety of sources to create a comprehensive resource for analysis.

Initial Datasets:

  • Enron and Ling Datasets: These datasets focus on the core content of phishing emails, containing subject lines, email body text, and labels indicating whether the email is spam (phishing) or legitimate.

  • CEAS, Nazario, Nigerian Fraud, and SpamAssassin Datasets: These datasets provide broader context for the emails, including sender information, recipient information, date, and labels for spam/legitimate classification.

Final Dataset:

The final dataset combines the information from the initial datasets into a single resource for analysis. This dataset contains:

  • Approximately 82,500 emails
  • 42,891 spam emails
  • 39,595 legitimate emails

This dataset allows researchers to study the content of phishing emails and the context in which they are sent to improve detection methods.

Please cite the following two articles if you are using this dataset:

  • Al-Subaiey, A., Al-Thani, M., Alam, N. A., Antora, K. F., Khandakar, A., & Zaman, S. A. U. (2024, May 19). Novel Interpretable and Robust Web-based AI Platform for Phishing Email Detection. ArXiv.org. https://arxiv.org/abs/2405.11619

Tables

Ceas 08

@kaggle.naserabdullahalam_phishing_email_dataset.ceas_08
  • 30.01 MB
  • 39,154 rows
  • 7 columns
Loading...
CREATE TABLE ceas_08 (
  "sender" VARCHAR,
  "receiver" VARCHAR,
  "date" VARCHAR,
  "subject" VARCHAR,
  "body" VARCHAR,
  "label" BIGINT,
  "urls" BIGINT
);

Enron

@kaggle.naserabdullahalam_phishing_email_dataset.enron
  • 23.99 MB
  • 29,767 rows
  • 3 columns
Loading...
CREATE TABLE enron (
  "subject" VARCHAR,
  "body" VARCHAR,
  "label" BIGINT
);

Ling

@kaggle.naserabdullahalam_phishing_email_dataset.ling
  • 5.18 MB
  • 2,859 rows
  • 3 columns
Loading...
CREATE TABLE ling (
  "subject" VARCHAR,
  "body" VARCHAR,
  "label" BIGINT
);

Nazario

@kaggle.naserabdullahalam_phishing_email_dataset.nazario
  • 4.96 MB
  • 1,565 rows
  • 7 columns
Loading...
CREATE TABLE nazario (
  "sender" VARCHAR,
  "receiver" VARCHAR,
  "date" VARCHAR,
  "subject" VARCHAR,
  "body" VARCHAR,
  "urls" BIGINT,
  "label" BIGINT
);

Nigerian Fraud

@kaggle.naserabdullahalam_phishing_email_dataset.nigerian_fraud
  • 5.08 MB
  • 3,332 rows
  • 7 columns
Loading...
CREATE TABLE nigerian_fraud (
  "sender" VARCHAR,
  "receiver" VARCHAR,
  "date" VARCHAR,
  "subject" VARCHAR,
  "body" VARCHAR,
  "urls" BIGINT,
  "label" BIGINT
);

Phishing Email

@kaggle.naserabdullahalam_phishing_email_dataset.phishing_email
  • 52.53 MB
  • 82,486 rows
  • 2 columns
Loading...
CREATE TABLE phishing_email (
  "text_combined" VARCHAR,
  "label" BIGINT
);

Spamassasin

@kaggle.naserabdullahalam_phishing_email_dataset.spamassasin
  • 7.9 MB
  • 5,809 rows
  • 7 columns
Loading...
CREATE TABLE spamassasin (
  "sender" VARCHAR,
  "receiver" VARCHAR,
  "date" VARCHAR,
  "subject" VARCHAR,
  "body" VARCHAR,
  "label" BIGINT,
  "urls" BIGINT
);

Share link

Anyone who has the link will be able to view this.