Tweets Of The Top 5 Banks In South Africa
Tweets related to Capitec, Standard Bank, ABSA, FNB, Nedbank for 2019-2021 Sept
@kaggle.slythe_twitter_scrape_of_the_top_5_banks_in_south_africa
Tweets related to Capitec, Standard Bank, ABSA, FNB, Nedbank for 2019-2021 Sept
@kaggle.slythe_twitter_scrape_of_the_top_5_banks_in_south_africa
Tweets collected if they reference the 5 banks below. Data to be used for natural language processing, such as the sentiment analysis
Any tweets that references the bank: value:
{"FNB":"FNBSA", "StandardBank":"StandardBankZA OR "Standard Bank" OR "standard bank"","Nedbank":"Nedbank OR nedbank","ABSA": "Absa OR ABSA OR absa OR AbsaSouthAfrica","Capitec":"CapitecBankSA OR Capitec or capitec"}
Note: At the time of running there were multiple issues with Twint, which would cause the process to stop. The scaping process was completed on AWS EC2 servers
An initial proof of concept/ test run, with cleaning, sentiment and analysis can found found here.
Twint forums for assisting in overcoming the issues experienced
I currently work at one of the banks. The initial project was to check if Customer Satisfaction surveys are a true reflection of general customer sentiment (such as twitter sentiment)
A follow-up project will look at this correlation
CREATE TABLE full_2019 (
"unnamed_0" BIGINT -- Unnamed: 0,
"id" DOUBLE,
"conversation_id" DOUBLE,
"created_at" DOUBLE,
"date" TIMESTAMP,
"timezone" BIGINT,
"place" VARCHAR,
"base_tweet" VARCHAR,
"cleaned_tweet" VARCHAR,
"language" VARCHAR,
"hashtags" VARCHAR,
"cashtags" VARCHAR,
"user_id" DOUBLE,
"user_id_str" DOUBLE,
"username" VARCHAR,
"name" VARCHAR,
"day" BIGINT,
"hour" BIGINT,
"link" VARCHAR,
"urls" VARCHAR,
"photos" VARCHAR,
"video" BIGINT,
"thumbnail" VARCHAR,
"retweet" BOOLEAN,
"nlikes" BIGINT,
"nreplies" BIGINT,
"nretweets" BIGINT,
"quote_url" VARCHAR,
"search" VARCHAR,
"near" VARCHAR,
"geo" VARCHAR,
"source" VARCHAR,
"user_rt_id" VARCHAR,
"user_rt" VARCHAR,
"retweet_id" VARCHAR,
"reply_to" VARCHAR,
"retweet_date" VARCHAR,
"translate" VARCHAR,
"trans_src" VARCHAR,
"trans_dest" VARCHAR
);CREATE TABLE full_2020 (
"unnamed_0" BIGINT -- Unnamed: 0,
"id" DOUBLE,
"conversation_id" DOUBLE,
"created_at" DOUBLE,
"date" TIMESTAMP,
"timezone" BIGINT,
"place" VARCHAR,
"base_tweet" VARCHAR,
"cleaned_tweet" VARCHAR,
"language" VARCHAR,
"hashtags" VARCHAR,
"cashtags" VARCHAR,
"user_id" DOUBLE,
"user_id_str" DOUBLE,
"username" VARCHAR,
"name" VARCHAR,
"day" BIGINT,
"hour" BIGINT,
"link" VARCHAR,
"urls" VARCHAR,
"photos" VARCHAR,
"video" BIGINT,
"thumbnail" VARCHAR,
"retweet" BOOLEAN,
"nlikes" BIGINT,
"nreplies" BIGINT,
"nretweets" BIGINT,
"quote_url" VARCHAR,
"search" VARCHAR,
"near" VARCHAR,
"geo" VARCHAR,
"source" VARCHAR,
"user_rt_id" VARCHAR,
"user_rt" VARCHAR,
"retweet_id" VARCHAR,
"reply_to" VARCHAR,
"retweet_date" VARCHAR,
"translate" VARCHAR,
"trans_src" VARCHAR,
"trans_dest" VARCHAR
);CREATE TABLE full_2021 (
"unnamed_0" BIGINT -- Unnamed: 0,
"id" DOUBLE,
"conversation_id" DOUBLE,
"created_at" DOUBLE,
"date" TIMESTAMP,
"timezone" BIGINT,
"place" VARCHAR,
"base_tweet" VARCHAR,
"cleaned_tweet" VARCHAR,
"language" VARCHAR,
"hashtags" VARCHAR,
"cashtags" VARCHAR,
"user_id" DOUBLE,
"user_id_str" DOUBLE,
"username" VARCHAR,
"name" VARCHAR,
"day" BIGINT,
"hour" BIGINT,
"link" VARCHAR,
"urls" VARCHAR,
"photos" VARCHAR,
"video" BIGINT,
"thumbnail" VARCHAR,
"retweet" BOOLEAN,
"nlikes" BIGINT,
"nreplies" BIGINT,
"nretweets" BIGINT,
"quote_url" VARCHAR,
"search" VARCHAR,
"near" VARCHAR,
"geo" VARCHAR,
"source" VARCHAR,
"user_rt_id" VARCHAR,
"user_rt" VARCHAR,
"retweet_id" VARCHAR,
"reply_to" VARCHAR,
"retweet_date" VARCHAR,
"translate" VARCHAR,
"trans_src" VARCHAR,
"trans_dest" VARCHAR
);Anyone who has the link will be able to view this.