Baselight

Emails For Spam Or Ham Classification (Trec 2007)

2007 TREC Public Spam Corpus

@kaggle.bayes2003_emails_for_spam_or_ham_classification_trec_2007

About this Dataset

Emails For Spam Or Ham Classification (Trec 2007)

This dataset contains emails for spam or ham classification. It's from "2007 TREC Public Spam Corpus". There are three files:

  1. email_origin.csv: Original raw email with label.
    Columns:
  • label: Int type, 1 for spam and 0 for ham
  • origin: String type, original raw email
  1. email_text.csv: Processed email body with label.
    Columns:
  • label: Int type, 1 for spam and 0 for ham
  • text: String type, processed email body
  1. trec07p.tgz: Origin compressed file downloaded from source.

How I process email (from email_origin to email_text):

Email Processing

More dataset for spam or ham classification:

Emails for spam or ham classification (Trec 2006)

Emails for spam or ham classification (Trec 2005)

Emails for spam or ham classification (Enron 2006)

Emails for spam or ham classification SpamAssassin

Source:
https://plg.uwaterloo.ca/~gvcormac/treccorpus07/about.html

Share link

Anyone who has the link will be able to view this.