Baselight

Reddit Dataset

Over 400K reddit submissions across 4 most visited subreddits

@kaggle.butterfly232_reddit_dataset

Loading...
Loading...

About this Dataset

Reddit Dataset

This dataset consists of over 400K Reddit posts scraped over 4 subreddits: r/technology, r/worldnews, r/entertainment and r/sports. The data has NOT been cleaned for duplicates, advertisements and deleted posts. The data has been collected by using Pushshift API. The purpose of this dataset is to perform a NER trend analysis and sentiment analysis of most sensitive topics on r/worldnews.

I will upload the revised dataset soon.

Tables

Entertainment

@kaggle.butterfly232_reddit_dataset.entertainment
  • 1.6 MB
  • 89,989 rows
  • 6 columns
Loading...
CREATE TABLE entertainment (
  "subreddit" VARCHAR,
  "title" VARCHAR,
  "url" VARCHAR,
  "id" VARCHAR,
  "author" VARCHAR,
  "utc_datetime_str" TIMESTAMP
);

Sports

@kaggle.butterfly232_reddit_dataset.sports
  • 1.35 MB
  • 118,000 rows
  • 6 columns
Loading...
CREATE TABLE sports (
  "subreddit" VARCHAR,
  "title" VARCHAR,
  "url" VARCHAR,
  "id" VARCHAR,
  "author" VARCHAR,
  "utc_datetime_str" TIMESTAMP
);

Technology

@kaggle.butterfly232_reddit_dataset.technology
  • 7.13 MB
  • 117,904 rows
  • 6 columns
Loading...
CREATE TABLE technology (
  "subreddit" VARCHAR,
  "title" VARCHAR,
  "url" VARCHAR,
  "id" VARCHAR,
  "author" VARCHAR,
  "utc_datetime_str" TIMESTAMP
);

Worldnews

@kaggle.butterfly232_reddit_dataset.worldnews
  • 10.68 MB
  • 117,886 rows
  • 6 columns
Loading...
CREATE TABLE worldnews (
  "subreddit" VARCHAR,
  "title" VARCHAR,
  "url" VARCHAR,
  "id" VARCHAR,
  "author" VARCHAR,
  "utc_datetime_str" TIMESTAMP
);

Share link

Anyone who has the link will be able to view this.