SMS Spam Collection (Text Classification)
SMS labeled messages that have been collected for mobile phone spam research
Source
Huggingface Hub: link
About this dataset
The SMS Spam Collection v.1 is a set of SMS messages that have been collected and labeled as either spam or not spam. This dataset contains 5574 English, real, and non-encoded messages. The SMS messages are thought-provoking and eye-catching. The dataset is useful for mobile phone spam research
How to use the dataset
Research Ideas
- This dataset could be used to train a machine learning model to classify SMS messages as spam or not spam.
- This dataset could be used to develop a tool that can automatically identify and block spam messages.
- This dataset could be used to study the characteristics of spam messages and develop strategies for identifying and avoiding them
Acknowledgements
_This dataset is used to train a machine learning model to classify SMS messages as spam or not spam.
The SMS Spam Collection v.1 is a public set of SMS labeled messages that have been collected for mobile phone spam research. This dataset contains 5574 English, real, and non-encoded messages, tagged as being legitimate (ham) or spam. The dataset has been collected from various sources and is released under the CC BY-SA 4.0 license by Kaggle user Almeida et al._
License
> License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
> No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: train.csv
Column name |
Description |
sms |
The text of the SMS message. (String) |
label |
The label for the SMS message, indicating whether it is ham or spam. (String) |