Dataset: Tweets Of The Top 5 Banks In South Africa

About this Dataset

Tweets Of The Top 5 Banks In South Africa

Context

Tweets collected if they reference the 5 banks below. Data to be used for natural language processing, such as the sentiment analysis

Standard Bank
FNB
Capitec
ABSA
Nedbank

Search Dictionary used for scraping

Any tweets that references the bank: value:

{"FNB":"FNBSA", "StandardBank":"StandardBankZA OR "Standard Bank" OR "standard bank"","Nedbank":"Nedbank OR nedbank","ABSA": "Absa OR ABSA OR absa OR AbsaSouthAfrica","Capitec":"CapitecBankSA OR Capitec or capitec"}

Content

Twint was used to scrape the tweets from 2019 to current date ( 2021 September)
"Tweet" column contains the raw tweet string (unprocessed)
"Cleaned_tweet" column contains the cleaned version of the tweet

Note: At the time of running there were multiple issues with Twint, which would cause the process to stop. The scaping process was completed on AWS EC2 servers

Cleaning process and POC

An initial proof of concept/ test run, with cleaning, sentiment and analysis can found found here.

Acknowledgements

Twint forums for assisting in overcoming the issues experienced

Inspiration

I currently work at one of the banks. The initial project was to check if Customer Satisfaction surveys are a true reflection of general customer sentiment (such as twitter sentiment)
A follow-up project will look at this correlation

Tables

Full 2019

@kaggle.slythe_twitter_scrape_of_the_top_5_banks_in_south_africa.full_2019

67.71 MB
297949 rows
40 columns


CREATE TABLE full_2019 (
  "unnamed_0" BIGINT,
  "id" DOUBLE,
  "conversation_id" DOUBLE,
  "created_at" DOUBLE,
  "date" TIMESTAMP,
  "timezone" BIGINT,
  "place" VARCHAR,
  "base_tweet" VARCHAR,
  "cleaned_tweet" VARCHAR,
  "language" VARCHAR,
  "hashtags" VARCHAR,
  "cashtags" VARCHAR,
  "user_id" DOUBLE,
  "user_id_str" DOUBLE,
  "username" VARCHAR,
  "name" VARCHAR,
  "day" BIGINT,
  "hour" BIGINT,
  "link" VARCHAR,
  "urls" VARCHAR,
  "photos" VARCHAR,
  "video" BIGINT,
  "thumbnail" VARCHAR,
  "retweet" BOOLEAN,
  "nlikes" BIGINT,
  "nreplies" BIGINT,
  "nretweets" BIGINT,
  "quote_url" VARCHAR,
  "search" VARCHAR,
  "near" VARCHAR,
  "geo" VARCHAR,
  "source" VARCHAR,
  "user_rt_id" VARCHAR,
  "user_rt" VARCHAR,
  "retweet_id" VARCHAR,
  "reply_to" VARCHAR,
  "retweet_date" VARCHAR,
  "translate" VARCHAR,
  "trans_src" VARCHAR,
  "trans_dest" VARCHAR
);

Full 2020

@kaggle.slythe_twitter_scrape_of_the_top_5_banks_in_south_africa.full_2020

83.95 MB
376540 rows
40 columns


CREATE TABLE full_2020 (
  "unnamed_0" BIGINT,
  "id" DOUBLE,
  "conversation_id" DOUBLE,
  "created_at" DOUBLE,
  "date" TIMESTAMP,
  "timezone" BIGINT,
  "place" VARCHAR,
  "base_tweet" VARCHAR,
  "cleaned_tweet" VARCHAR,
  "language" VARCHAR,
  "hashtags" VARCHAR,
  "cashtags" VARCHAR,
  "user_id" DOUBLE,
  "user_id_str" DOUBLE,
  "username" VARCHAR,
  "name" VARCHAR,
  "day" BIGINT,
  "hour" BIGINT,
  "link" VARCHAR,
  "urls" VARCHAR,
  "photos" VARCHAR,
  "video" BIGINT,
  "thumbnail" VARCHAR,
  "retweet" BOOLEAN,
  "nlikes" BIGINT,
  "nreplies" BIGINT,
  "nretweets" BIGINT,
  "quote_url" VARCHAR,
  "search" VARCHAR,
  "near" VARCHAR,
  "geo" VARCHAR,
  "source" VARCHAR,
  "user_rt_id" VARCHAR,
  "user_rt" VARCHAR,
  "retweet_id" VARCHAR,
  "reply_to" VARCHAR,
  "retweet_date" VARCHAR,
  "translate" VARCHAR,
  "trans_src" VARCHAR,
  "trans_dest" VARCHAR
);

Full 2021

@kaggle.slythe_twitter_scrape_of_the_top_5_banks_in_south_africa.full_2021

74.27 MB
411240 rows
40 columns


CREATE TABLE full_2021 (
  "unnamed_0" BIGINT,
  "id" DOUBLE,
  "conversation_id" DOUBLE,
  "created_at" DOUBLE,
  "date" TIMESTAMP,
  "timezone" BIGINT,
  "place" VARCHAR,
  "base_tweet" VARCHAR,
  "cleaned_tweet" VARCHAR,
  "language" VARCHAR,
  "hashtags" VARCHAR,
  "cashtags" VARCHAR,
  "user_id" DOUBLE,
  "user_id_str" DOUBLE,
  "username" VARCHAR,
  "name" VARCHAR,
  "day" BIGINT,
  "hour" BIGINT,
  "link" VARCHAR,
  "urls" VARCHAR,
  "photos" VARCHAR,
  "video" BIGINT,
  "thumbnail" VARCHAR,
  "retweet" BOOLEAN,
  "nlikes" BIGINT,
  "nreplies" BIGINT,
  "nretweets" BIGINT,
  "quote_url" VARCHAR,
  "search" VARCHAR,
  "near" VARCHAR,
  "geo" VARCHAR,
  "source" VARCHAR,
  "user_rt_id" VARCHAR,
  "user_rt" VARCHAR,
  "retweet_id" VARCHAR,
  "reply_to" VARCHAR,
  "retweet_date" VARCHAR,
  "translate" VARCHAR,
  "trans_src" VARCHAR,
  "trans_dest" VARCHAR
);