Spam-Ham Text Dataset(TREC 2007)
@kaggle.abhaykr0111_spam_ham_text_datasettrec_2007
@kaggle.abhaykr0111_spam_ham_text_datasettrec_2007
This dataset contains a total of 16,869 email messages, categorized into two classes: spam and ham (non-spam). Among these, 9,548 emails are labeled as spam and 7,321 emails are labeled as ham.
The dataset is a curated subset of the 2007 TREC Public Spam Corpus, a well-known benchmark collection widely used for research in spam detection, text classification, and natural language processing (NLP). Each entry in the dataset consists of the email text along with its corresponding label, making it suitable for building and evaluating machine learning models for binary email classification tasks.
This dataset can be used for:
@cdc
@owid
@cdc
@cdc
Share link
Anyone who has the link will be able to view this.