Context
Tweets collected if they reference the 5 banks below. Data to be used for natural language processing, such as the sentiment analysis
- Standard Bank
- FNB
- Capitec
- ABSA
- Nedbank
Search Dictionary used for scraping
Any tweets that references the bank: value:
{"FNB":"FNBSA", "StandardBank":"StandardBankZA OR "Standard Bank" OR "standard bank"","Nedbank":"Nedbank OR nedbank","ABSA": "Absa OR ABSA OR absa OR AbsaSouthAfrica","Capitec":"CapitecBankSA OR Capitec or capitec"}
Content
- Twint was used to scrape the tweets from 2019 to current date ( 2021 September)
- "Tweet" column contains the raw tweet string (unprocessed)
- "Cleaned_tweet" column contains the cleaned version of the tweet
Note: At the time of running there were multiple issues with Twint, which would cause the process to stop. The scaping process was completed on AWS EC2 servers
Cleaning process and POC
An initial proof of concept/ test run, with cleaning, sentiment and analysis can found found here.
Acknowledgements
Twint forums for assisting in overcoming the issues experienced
Inspiration
I currently work at one of the banks. The initial project was to check if Customer Satisfaction surveys are a true reflection of general customer sentiment (such as twitter sentiment)
A follow-up project will look at this correlation