Context
During the 2019 Australian election I noticed that almost everything I was seeing on Twitter was unusually left-wing. So I decided to scrape some data and investigate. Unfortunately my sentiment analysis has so far been too inaccurate to come to any useful conclusions. I decided to share the data so that others may be able to help with the sentiment or any other interesting analysis.
Content
Over 180,000 tweets collected using Twitter API keyword search between 10.05.2019 and 20.05.2019.
Columns are as follows:
- created_at: Date and time of tweet creation
- id: Unique ID of the tweet
- full_text: Full tweet text
- retweet_count: Number of retweets
- favorite_count: Number of likes
- user_id: User ID of tweet creator
- user_name: Username of tweet creator
- user_screen_name: Screen name of tweet creator
- user_description: Description on tweet creator's profile
- user_location: Location given on tweet creator's profile
- user_created_at: Date the tweet creator joined Twitter
The latitude and longitude of user_location is also available in location_geocode.csv. This information was retrieved using the Google Geocode API.
Acknowledgements
Thanks to Twitter for providing the free API.
Inspiration
There are a lot of interesting things that could be investigated with this data. Primarily I was interested to do sentiment analysis, before and after the election results were known, to determine whether Twitter users are indeed a left-leaning bunch. Did the tweets become more negative as the results were known?
Other ideas for investigation include:
-
Take into account retweets and favourites to weight overall sentiment analysis.
-
Which parts of the world are interested (ie: tweet about) the Australian elections, apart from Australia?
-
How do the users who tweet about this sort of thing tend to describe themselves?
-
Is there a correlation between when the user joined Twitter and their political views (this assumes the sentiment analysis is already working well)?
-
Predict gender from username/screen name and segment tweet count and sentiment by gender