US Election 2020 Tweets
Oct 15th 2020 - Nov 8th 2020, 1.72M Tweets
@kaggle.manchunhui_us_election_2020_tweets
Oct 15th 2020 - Nov 8th 2020, 1.72M Tweets
@kaggle.manchunhui_us_election_2020_tweets
The 2020 US election is happening on the 3rd November 2020 and the resulting impact to the world will no doubt be large, irrespective of which candidate is elected! After reading the two papers, here and here, I was inspired to attempt a similar sentiment analysis myself!
Tweets collected, using the Twitter API statuses_lookup
and snsscrape
for keywords, with the original intention to try to update this dataset daily so that the timeframe will eventually cover 15.10.2020 and 04.11.2020. Added 06.11.2020 With the events of the election still ongoing as of the date that this comment was added, I've decided to keep updating the dataset with tweets until at least the end of the 6th Nov. Added 08.11.2020, just one more version pending to include tweets until at the end of the 8th Nov.
Columns are as follows:
created_at
: Date and time of tweet creationtweet_id
: Unique ID of the tweettweet
: Full tweet textlikes
: Number of likesretweet_count
: Number of retweetssource
: Utility used to post tweetuser_id
: User ID of tweet creatoruser_name
: Username of tweet creatoruser_screen_name
: Screen name of tweet creatoruser_description
: Description of self by tweet creatoruser_join_date
: Join date of tweet creatoruser_followers_count
: Followers count on tweet creatoruser_location
: Location given on tweet creator's profilelat
: Latitude parsed from user_locationlong
: Longitude parsed from user_locationcity
: City parsed from user_locationcountry
: Country parsed from user_locationstate
: State parsed from user_locationstate_code
: State code parsed from user_locationcollected_at
: Date and time tweet data was mined from twitter*
- @taniaj and her great Australian Election 2019 Tweets dataset, I used as a template for this dataset.
- Thanks to Twitter for providing the free API and snsscrape to allow collection of the tweet_ids.
Cover photo by Jorge Alcala on Unsplash
Unsplash Images are distributed under a unique Unsplash License.
My primary interest for creating this dataset is to ascertain if there is a correlation between the sentiment of users on Twitter and the eventual election results. Other ideas that might be interesting to investigate include:
- Can we detect if there are or were any attempts at manipulating the election.
- Can we predict the candidate from tweet text only.
- Can we predict the election outcome of each state.
I also included still valid and interesting ideas from the Australian Election 2019 Tweets dataset below:
- Take into account retweets and favourites to weight overall sentiment analysis.
- Which parts of the world are interested (ie: tweet about) in the US elections, apart from the US?
- How do the users who tweet about this sort of thing tend to describe themselves?
- Is there a correlation between when the user joined Twitter and their political views (this assumes the sentiment analysis is already working well)?
- Predict gender from username/screen name and segment tweet count and sentiment by gender
- Version 3 - 355,000 tweets collected, using the Twitter API
statuses_lookup
andsnsscrape
for keywords between 15.10.2020 and 22.10.2020.
- Version 5 - New tweets collected for the date of 23.10.2020, with a new total number of tweets at around 387,000 tweets.
- Version 6 - New tweets collected for the date of 24.10.2020, with a new total number of tweets at around 418,000 tweets. Additionally the "coordinates" column was removed with "lat" and "long" columns added for
geolocation
data (where possible).
- Version 7 - New tweets collected for the date of 25.10.2020, with a new total number of tweets at around 456,000 tweets. Added column "collected_at" to indicate when the data was mined from twitter. *Note this data is only accurate from 21.10.2020 onwards, data in the subject column before this date is an estimation.
- Version 8 - New tweets collected for the date of 26.10.2020, with a new total number of tweets at around 492,000 tweets.
- Version 9 - New tweets collected for the date of 27.10.2020, with a new total number of tweets at around 533,000 tweets.
- Version 10 - New tweets collected for the date of 28.10.2020, with a new total number of tweets at around 568,000 tweets. Added new geo location features
city, country, continent, state, state_code
.
- Version 11 - New tweets collected from 30.10.2020 to 31.10.2020, with a new total number of tweets at around 641,000 tweets.
- Version 12 - New tweets collected for the date of 01.11.2020, with a new total number of tweets at around 689,000 tweets.
- Version 13 - New tweets collected for the date of 02.11.2020, with a new total number of tweets at around 741,000 tweets.
- Version 14 - New tweets collected for the date of 03.11.2020, with a new total number of tweets at around 809,000 tweets.
- Version 15 - New tweets collected for the date of 04.11.2020, with a new total number of tweets at around 1,093,000 tweets.
- Version 16 - New tweets collected for the date of 05.11.2020, with a new total number of tweets at around 1,210,000 tweets.
- Version 17 - New tweets collected for the date of 06.11.2020, with a new total number of tweets at around 1,346,000 tweets.
- Version 18 - New tweets collected for the date of 07.11.2020, with a new total number of tweets at around 1,598,000 tweets.
- Version 19 - New tweets collected for the date of 08.11.2020, with a new total number of tweets at around 1,727,000 tweets.
Anyone who has the link will be able to view this.