Baselight

Coachella 2019 Tweets

@kaggle.pdp2600_coachella_2019_tweets

Loading...
Loading...

About this Dataset

Coachella 2019 Tweets

Context

This was my first stab at gathering social media data related to music artists, with my long term goal of figuring out different ways artists (especially more independent artists which do not have the reach of mega stars) can be able to use data available to gain usable insights to help guide their career choices.

Content

All Tweet data was gathered via Twitter's 7-day API (which provides at least 7 days of Tweets at the time of request), using the search string: '#coachella OR #coachella2019 OR #coachella19 OR Coachella -filter:retweets'

The query for each Coachella weekend was run the Tuesday evening after. There was some overlap in Tweets in the two weekend datasets, and duplicates were removed from the 2nd weekend dataset with the following code (applied to the Pandas dataframes the data was stored in):

weekend_2_possible_duplicates = weekend_2_tweet_df.loc[pd.to_datetime(weekend_2_tweet_df['tweeted_at_pst']) < '2019-04-16 17:34:37-07:00'].id_str

weekend_1_possible_duplicates = weekend_1_tweet_df.loc[pd.to_datetime(weekend_1_tweet_df['tweeted_at_pst']) > '2019-04-14 11:53:44-07:00'].id_str

tweet_duplicates_id_str = set(weekend_1_possible_duplicates.tolist()) & set(weekend_2_possible_duplicates.tolist())

tweet_duplicates_id_str = list(tweet_duplicates_id_str)

weekend_2_no_dups_df = weekend_2_tweet_df[~weekend_2_tweet_df['id_str'].isin(tweet_duplicates_id_str)]

The last three columns, is data which was assembled with additional processes outside of the Twitter API usage. In particular the full_tweet_text column has a bit of complexity which isn't fully covered in the column description.
full_tweet_text
If the Tweet wasn't truncated (truncated=0), this column is the value of the text column was copied over
If the Tweet was truncated (truncated=1), I attempted to scrape the full Tweet text with a script, if successful this column will have the full tweet text, in the case it wasn't successful, one of the following tokens will be the column value:

==============Full Tweet inaccessible (deleted or protected)================

==============Full Tweet inaccessible (account suspended)================

==============Full Tweet inaccessible (copyright complaint)================

==============Full Tweet inaccessible (undetermined reason)================

Overall, there were not a significant amount of tweets which couldn't be accessed, and the value replaced by these tokens (a few hundred in total).

Inspiration

Data was gathered for a project which attempted to tie the number of Coachella related tweets with either the performance or streamed performance times, and get a sense of which artists/bands who potentially generated the most Twitter activity.

Tables

Coachella 2019 Tweets Weekend 1–2019–04–07 To 2019–04–16

@kaggle.pdp2600_coachella_2019_tweets.coachella_2019_tweets_weekend_1_2019_04_07_to_2019_04_16
  • 157.63 MB
  • 568189 rows
  • 24 columns
Loading...

CREATE TABLE coachella_2019_tweets_weekend_1_2019_04_07_to_2019_04_16 (
  "created_at" VARCHAR,
  "id_str" BIGINT,
  "text" VARCHAR,
  "truncated" BIGINT,
  "reply_to_tweet_id" DOUBLE,
  "reply_to_user_id" DOUBLE,
  "reply_to_screen_name" VARCHAR,
  "is_quote_status" BIGINT,
  "retweet_count" BIGINT,
  "favorite_count" BIGINT,
  "user_id_str" BIGINT,
  "user_name" VARCHAR,
  "user_screen_name" VARCHAR,
  "user_location" VARCHAR,
  "user_description" VARCHAR,
  "user_profile_url" VARCHAR,
  "user_follower_count" BIGINT,
  "user_friends_count" BIGINT,
  "user_created_at" VARCHAR,
  "user_statuses_count" BIGINT,
  "user_language" VARCHAR,
  "tweet_url" VARCHAR,
  "full_tweet_text" VARCHAR,
  "tweeted_at_pst" TIMESTAMP
);

Coachella 2019 Tweets Weekend 2–2019–04–14 To 2019–04–23

@kaggle.pdp2600_coachella_2019_tweets.coachella_2019_tweets_weekend_2_2019_04_14_to_2019_04_23
  • 71.53 MB
  • 239144 rows
  • 24 columns
Loading...

CREATE TABLE coachella_2019_tweets_weekend_2_2019_04_14_to_2019_04_23 (
  "created_at" VARCHAR,
  "id_str" BIGINT,
  "text" VARCHAR,
  "truncated" BIGINT,
  "reply_to_tweet_id" DOUBLE,
  "reply_to_user_id" DOUBLE,
  "reply_to_screen_name" VARCHAR,
  "is_quote_status" BIGINT,
  "retweet_count" BIGINT,
  "favorite_count" BIGINT,
  "user_id_str" BIGINT,
  "user_name" VARCHAR,
  "user_screen_name" VARCHAR,
  "user_location" VARCHAR,
  "user_description" VARCHAR,
  "user_profile_url" VARCHAR,
  "user_follower_count" BIGINT,
  "user_friends_count" BIGINT,
  "user_created_at" VARCHAR,
  "user_statuses_count" BIGINT,
  "user_language" VARCHAR,
  "tweet_url" VARCHAR,
  "full_tweet_text" VARCHAR,
  "tweeted_at_pst" TIMESTAMP
);

Share link

Anyone who has the link will be able to view this.