Baselight

Twitter Vaccination Dataset

Tweets on vaccination 2019-3-10 : 2019-21-06

@kaggle.keplaxo_twitter_vaccination_dataset

Loading...
Loading...

About this Dataset

Twitter Vaccination Dataset

Context

There is a lot more that can we attain from social media sentiment and data than mere likes and shares especially where health care is concerned. This dataset is part of the data collected for the Vaccine hesitancy challenge on JOGL. We believe it is important to capture the views and trends of the public, social media sites like twitter provide a good window into this area.

Content

We collected all tweets containing at the search string: vaccination. Along with the tweet text, we downloaded the date and time when the tweet was published, and the location of the user (if provided). We also downloaded the user id, follower ids, and friends ids. The followers of a user A are those users who will receive messages from user A. The friends of a user A are those users from whom user A receives messages. Thus, information flows from a user to his followers. We collected tweets using the open source information tool, TWINT.(https://github.com/twintproject) and a python algorithm.

In contrast to the open Twitter Search API, which only allows one to query tweets posted within the last seven days, Twint makes it possible to collect a much larger sample of Twitter posts, ranging several years. We queried Twint for different key terms that relate to the topic of vaccination ranging from the year 2006 to 30th of November 2019 and stored in an aggregated CSV file.

Acknowledgements

We wouldn't be here without the help of others.

Inspiration

To my knowledge there is no active program that is currently actively carrying out qualitative analysis on Twitter data for sentiment associated with Vaccination. However, a number of studies have been carried out to analyse twitter for social media trends on Vaccination.

The Dataset can be used for analysis Including:

  • Topic modeling from the dataset
  • Graph analysis
  • Machine/deep learning models.
  • Descriptive analysis of twitter vaccination data with epidemiological data.
  • Model simulations for assessment of the effects of changing vaccine sentiment on outbreaks and disease spread.
  • Extracting high quality content from the tweets of users that have been identified as key influencers by our system and use it to train an LDA model, which will then be used to classify other users.
  • Extract topics using topic modelling per location.
  • Provide a filtering process for identifying polarising tweets.
  • Develop an iterative methodology that will be built upon the intelligence extracted by the already available high-quality content (top tweets – top URLs) to identify new trends and dynamically update the keywords used to track tweets of specific content.

Tables

Master

@kaggle.keplaxo_twitter_vaccination_dataset.master
  • 423.01 MB
  • 2195108 rows
  • 31 columns
Loading...

CREATE TABLE master (
  "id" DOUBLE,
  "conversation_id" DOUBLE,
  "created_at" DOUBLE,
  "date" TIMESTAMP,
  "time" VARCHAR,
  "timezone" VARCHAR,
  "user_id" DOUBLE,
  "username" VARCHAR,
  "name" VARCHAR,
  "place" VARCHAR,
  "tweet" VARCHAR,
  "mentions" VARCHAR,
  "urls" VARCHAR,
  "photos" VARCHAR,
  "replies_count" BIGINT,
  "retweets_count" BIGINT,
  "likes_count" BIGINT,
  "hashtags" VARCHAR,
  "cashtags" VARCHAR,
  "link" VARCHAR,
  "retweet" BOOLEAN,
  "quote_url" VARCHAR,
  "video" BIGINT,
  "near" VARCHAR,
  "geo" VARCHAR,
  "source" VARCHAR,
  "user_rt_id" VARCHAR,
  "user_rt" VARCHAR,
  "retweet_id" VARCHAR,
  "reply_to" VARCHAR,
  "retweet_date" VARCHAR
);

Vaccination2

@kaggle.keplaxo_twitter_vaccination_dataset.vaccination2
  • 27.53 MB
  • 89973 rows
  • 31 columns
Loading...

CREATE TABLE vaccination2 (
  "id" BIGINT,
  "conversation_id" BIGINT,
  "created_at" BIGINT,
  "date" TIMESTAMP,
  "time" VARCHAR,
  "timezone" VARCHAR,
  "user_id" BIGINT,
  "username" VARCHAR,
  "name" VARCHAR,
  "place" VARCHAR,
  "tweet" VARCHAR,
  "mentions" VARCHAR,
  "urls" VARCHAR,
  "photos" VARCHAR,
  "replies_count" BIGINT,
  "retweets_count" BIGINT,
  "likes_count" BIGINT,
  "hashtags" VARCHAR,
  "cashtags" VARCHAR,
  "link" VARCHAR,
  "retweet" BOOLEAN,
  "quote_url" VARCHAR,
  "video" BIGINT,
  "near" VARCHAR,
  "geo" VARCHAR,
  "source" VARCHAR,
  "user_rt_id" VARCHAR,
  "user_rt" VARCHAR,
  "retweet_id" VARCHAR,
  "reply_to" VARCHAR,
  "retweet_date" VARCHAR
);

Share link

Anyone who has the link will be able to view this.