Name: Tokyo Olympics 2020 Tweets
Creator: Kaggle
License: https://creativecommons.org/publicdomain/zero/1.0/

Tweets about Tokyo 2020 Olympics Venues, Events, Athletes and Results

Context

I collect recent tweets about the Tokyo Olympics 2020

Data collection

The data is collected using tweepy Python package to access Twitter API. I use a relevant search term for the topic (#Tokyo2020).

Data collection frequency

The data is collected continuously using a script that collects a small number of recent tweets (using Twitter API and tweepy), waits for a predefined time (currently set to 2 min) and restart the process. The dataset obtained at each sampling time step is merged with current (or previously collected) dataset and stored dataset in csv format is saved on disk. The script is running on Google Cloud on a small Jupyter instance. Once or several times per day the currently accumulated dataset is uploaded on Kaggle as a new version of the tweets dataset.

Inspiration

You can perform multiple operations on the Tokyo Olympics 2020 tweets. Here are few possible suggestions:

Study the subjects of recent tweets about the Tokyo Olympics;
Perform various NLP tasks on this data source (topic modelling, sentiment analysis);
Can you identify tweets about certain sports, countries, athletes?
Follow the trends in the news about the Olympics.
Perform sentiment analysis on the tweets corpus but also split on sports, countries etc.
Study the hashtags (associated to the tweets) distribution.

Related Datasets

Reddit Tokyo2020

@kaggle
Sports Pitches

@ukgov
Belfast Sport Pitches Playing Fields

@ukgov
SFC2014 - REACT EU Overview Allocation Vs Decided

@esifunds
Belfast Council Car Parks

@ukgov
Performance Indicators : Sport

@ukgov

Reddit Tokyo2020

Sports Pitches

Belfast Sport Pitches Playing Fields

SFC2014 - REACT EU Overview Allocation Vs Decided

Belfast Council Car Parks

Performance Indicators : Sport