A Twitter dataset about the FIFA World Cup 2022
Dataset Description
Football is one of the most loved sports worldwide. The FIFA World Cup, a global football sporting event that takes place every four years, is in Qatar this year. This dataset contains 30,000 tweets from the first day of the FIFA World Cup 2022.
Data Collection
The dataset was created using the Snscrape and the cardiffnlp/twitter-roberta-base-sentiment-latest model in Hugging Face Hub.
Data Preprocessing
The dataset includes tweets in English containing the hashtag #WorldCup2022. For data preprocessing, we used a tokenizer for the cardiffnlp/twitter-roberta-base-sentiment-latest model and the following function:
def preprocess(text):
new_text = []
for t in text.split(" "):
t = '@user' if t.startswith('@') and len(t) > 1 else t
t = 'http' if t.startswith('http') else t
new_text.append(t)
return " ".join(new_text)
Data Storage
The collected tweets have been consolidated into a single dataset & shared as a Comma Separated Values file, "fifa_world_cup_2022_tweets.csv".
Content
The dataset contains as following columns:
- Date Created
- Number of Likes
- Source of Tweet
- Tweet
- Sentiment
More information about this dataset, you can check this blog post.
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.
Happy learning 😀
Related Datasets
-
Fifa World Cup 2022
@kaggle
-
FIFA World Cup Audience And Economic Impact
@fivethirtyeight
-
AI Index Report (2022)
@owid