Reddit Dataset
Over 400K reddit submissions across 4 most visited subreddits
@kaggle.butterfly232_reddit_dataset
Over 400K reddit submissions across 4 most visited subreddits
@kaggle.butterfly232_reddit_dataset
This dataset consists of over 400K Reddit posts scraped over 4 subreddits: r/technology, r/worldnews, r/entertainment and r/sports. The data has NOT been cleaned for duplicates, advertisements and deleted posts. The data has been collected by using Pushshift API. The purpose of this dataset is to perform a NER trend analysis and sentiment analysis of most sensitive topics on r/worldnews.
I will upload the revised dataset soon.
CREATE TABLE entertainment (
"subreddit" VARCHAR,
"title" VARCHAR,
"url" VARCHAR,
"id" VARCHAR,
"author" VARCHAR,
"utc_datetime_str" TIMESTAMP
);CREATE TABLE sports (
"subreddit" VARCHAR,
"title" VARCHAR,
"url" VARCHAR,
"id" VARCHAR,
"author" VARCHAR,
"utc_datetime_str" TIMESTAMP
);CREATE TABLE technology (
"subreddit" VARCHAR,
"title" VARCHAR,
"url" VARCHAR,
"id" VARCHAR,
"author" VARCHAR,
"utc_datetime_str" TIMESTAMP
);CREATE TABLE worldnews (
"subreddit" VARCHAR,
"title" VARCHAR,
"url" VARCHAR,
"id" VARCHAR,
"author" VARCHAR,
"utc_datetime_str" TIMESTAMP
);Anyone who has the link will be able to view this.