A Twitter dataset about the FIFA World Cup 2022

Football is one of the most loved sports worldwide. The FIFA World Cup, a global football sporting event that takes place every four years, is in Qatar this year. This dataset contains 30,000 tweets from the first day of the FIFA World Cup 2022.

Data Collection

The dataset was created using the Snscrape and the cardiffnlp/twitter-roberta-base-sentiment-latest model in Hugging Face Hub.

Data Preprocessing

The dataset includes tweets in English containing the hashtag #WorldCup2022. For data preprocessing, we used a tokenizer for the cardiffnlp/twitter-roberta-base-sentiment-latest model and the following function:

def preprocess(text):
    new_text = []
    for t in text.split(" "):
        t = '@user' if t.startswith('@') and len(t) &gt; 1 else t
        t = 'http' if t.startswith('http') else t
        new_text.append(t)
    return " ".join(new_text)

Data Storage

The collected tweets have been consolidated into a single dataset & shared as a Comma Separated Values file, "fifa_world_cup_2022_tweets.csv".

Content

The dataset contains as following columns:

Date Created
Number of Likes
Source of Tweet
Tweet
Sentiment

More information about this dataset, you can check this blog post.

License

License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.

Happy learning 😀

Related Datasets

Ultimate Soccer Dataset

@blt
Fifa World Cup 2022

@kaggle
FIFA World Cup Audience And Economic Impact

@fivethirtyeight
Lookup Comparison Of 2017-13 V 2014-2020 Thematic Categorisation Codes

@esifunds
Global Forest Resources Assessment

@owid
AI Index Report (2022)

@owid

Ultimate Soccer Dataset

Fifa World Cup 2022

FIFA World Cup Audience And Economic Impact

Lookup Comparison Of 2017-13 V 2014-2020 Thematic Categorisation Codes

Global Forest Resources Assessment

AI Index Report (2022)