Email-blog
Classified emails taken from the public Enron repository.
@kaggle.mikeschmidtavemac_emailblog
Classified emails taken from the public Enron repository.
@kaggle.mikeschmidtavemac_emailblog
Supervised classification dataset produced as part of a blog series on classifying corporate email for morale and professional alignment.
Series covers raw data extraction, analysis, unsupervised topic discovery and supervised model development.
The blog posts are available at:
Part 1. Raw email processing. https://www.avemacconsulting.com/2021/08/24/email-insights-from-data-science-techniques-part-1/
Part 2. Data analysis. https://www.avemacconsulting.com/2021/08/27/email-insights-from-data-science-part-2/
Part 3. Unsupervised topic classification (creates this dataset). https://www.avemacconsulting.com/2021/09/23/email-insights-from-data-science-part-3/
Part 4. Supervised modeling (uses this dataset). https://www.avemacconsulting.com/2021/10/12/email-insights-from-data-science-part-4/
** Note. This data is part of a blog series so is not vetted 100%. Specifically the unsupervised topic extraction step should be further tuned for accuracy.
Original email content taking from the public Enron email repository located at https://www.cs.cmu.edu/~enron/.
Dataset contains email body text, various supporting features (email addresses, data/time, etc.) plus multiple classification labels.
Three (3) labels were generated for sentiment with three (3) classes (positive/negative/(neutral/unknown)).
Three (3) labels were also created for alignment(business/personal) with two (2) classes (fun/work)).
Uses sentiment lexicon from http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
Minqing Hu and Bing Liu. "Mining and Summarizing Customer Reviews."
Proceedings of the ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (KDD-2004), Aug 22-25, 2004, Seattle, Washington, USA,
Uses VADER from https://www.nltk.org/api/nltk.sentiment.html?highlight=vader#module-nltk.sentiment.vader
Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text.
Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.
Uses AFINN from http://corpustext.com/reference/sentiment_afinn.html
Finn Årup Nielsen A new ANEW: Evaluation of a word list for sentiment analysis in microblogs.
Proceedings of the ESWC2011 Workshop on 'Making Sense of Microposts': Big things come in small packages 718 in CEUR Workshop
Proceedings 93-98. 2011 May.
Anyone who has the link will be able to view this.