Email Spam Classification Dataset CSV
CSV file containing spam/not spam information about 5172 emails.
@kaggle.nubrin_email_spam_classification_dataset
CSV file containing spam/not spam information about 5172 emails.
@kaggle.nubrin_email_spam_classification_dataset
Introduction
This is a csv file containing related information of 5172 randomly picked email files and their respective labels for spam or not-spam classification.
About the Dataset
The csv file contains 5172 rows, each row for each email. There are 3002 columns. The first column indicates Email name. The name has been set with numbers and not recipients' name to protect privacy. The last column has the labels for prediction : 1 for spam, 0 for not spam. The remaining 3000 columns are the 3000 most common words in all the emails, after excluding the non-alphabetical characters/words. For each row, the count of each word(column) in that email(row) is stored in the respective cells. Thus, information regarding all 5172 emails are stored in a compact dataframe rather than as separate text files.
CREATE TABLE emails (
"email_no" VARCHAR -- Email No.,
"the" BIGINT,
"to" BIGINT,
"ect" BIGINT,
"and" BIGINT,
"for" BIGINT,
"of" BIGINT,
"a" BIGINT,
"you" BIGINT,
"hou" BIGINT,
"in" BIGINT,
"on" BIGINT,
"is" BIGINT,
"this" BIGINT,
"enron" BIGINT,
"i" BIGINT,
"be" BIGINT,
"that" BIGINT,
"will" BIGINT,
"have" BIGINT,
"with" BIGINT,
"your" BIGINT,
"at" BIGINT,
"we" BIGINT,
"s" BIGINT,
"are" BIGINT,
"it" BIGINT,
"by" BIGINT,
"com" BIGINT,
"as" BIGINT,
"from" BIGINT,
"gas" BIGINT,
"or" BIGINT,
"not" BIGINT,
"me" BIGINT,
"deal" BIGINT,
"if" BIGINT,
"meter" BIGINT,
"hpl" BIGINT,
"please" BIGINT,
"re" BIGINT,
"e" BIGINT,
"any" BIGINT,
"our" BIGINT,
"corp" BIGINT,
"can" BIGINT,
"d" BIGINT,
"all" BIGINT,
"has" BIGINT,
"was" BIGINT,
"know" BIGINT,
"need" BIGINT,
"an" BIGINT,
"forwarded" BIGINT,
"new" BIGINT,
"t" BIGINT,
"may" BIGINT,
"up" BIGINT,
"j" BIGINT,
"mmbtu" BIGINT,
"should" BIGINT,
"do" BIGINT,
"am" BIGINT,
"get" BIGINT,
"out" BIGINT,
"see" BIGINT,
"no" BIGINT,
"there" BIGINT,
"price" BIGINT,
"daren" BIGINT,
"but" BIGINT,
"been" BIGINT,
"company" BIGINT,
"l" BIGINT,
"these" BIGINT,
"let" BIGINT,
"so" BIGINT,
"would" BIGINT,
"m" BIGINT,
"into" BIGINT,
"xls" BIGINT,
"farmer" BIGINT,
"attached" BIGINT,
"us" BIGINT,
"information" BIGINT,
"they" BIGINT,
"message" BIGINT,
"day" BIGINT,
"time" BIGINT,
"my" BIGINT,
"one" BIGINT,
"what" BIGINT,
"only" BIGINT,
"http" BIGINT,
"th" BIGINT,
"volume" BIGINT,
"mail" BIGINT,
"contract" BIGINT,
"which" BIGINT,
"month" BIGINT
);
Anyone who has the link will be able to view this.