Baselight

The Reddit Dataset Dataset

A meta dataset of Reddit's own /r/datasets community.

@kaggle.pavellexyr_the_reddit_dataset_dataset

Loading...
Loading...

About this Dataset

The Reddit Dataset Dataset

Context

Datasets... In a way, the Kaggle community is built around them. You can't analyze data without having it. Here, we aim to create a meta-corpus of datasets posted to Reddit. A dataset dataset, if you will.

Content

The following dataset is the comprehensive corpus of all the posts and comments made on Reddit's /r/datasets board, from its inception all the way to the first of March, 2022.

The dataset was procured using SocialGrep.

To preserve users' anonymity and to prevent targeted harassment, the data does not include usernames.

Acknowledgements

We would like to thank Chris Liverani for generously providing the cover image for this dataset.

Inspiration

Datasets are nice - we like our data.

Tables

The Reddit Dataset Dataset Comments

@kaggle.pavellexyr_the_reddit_dataset_dataset.the_reddit_dataset_dataset_comments
  • 8.89 MB
  • 54848 rows
  • 10 columns
Loading...

CREATE TABLE the_reddit_dataset_dataset_comments (
  "type" VARCHAR,
  "id" VARCHAR,
  "subreddit_id" VARCHAR,
  "subreddit_name" VARCHAR,
  "subreddit_nsfw" BOOLEAN,
  "created_utc" BIGINT,
  "permalink" VARCHAR,
  "body" VARCHAR,
  "sentiment" DOUBLE,
  "score" BIGINT
);

The Reddit Dataset Dataset Posts

@kaggle.pavellexyr_the_reddit_dataset_dataset.the_reddit_dataset_dataset_posts
  • 4.74 MB
  • 20292 rows
  • 12 columns
Loading...

CREATE TABLE the_reddit_dataset_dataset_posts (
  "type" VARCHAR,
  "id" VARCHAR,
  "subreddit_id" VARCHAR,
  "subreddit_name" VARCHAR,
  "subreddit_nsfw" BOOLEAN,
  "created_utc" BIGINT,
  "permalink" VARCHAR,
  "domain" VARCHAR,
  "url" VARCHAR,
  "selftext" VARCHAR,
  "title" VARCHAR,
  "score" BIGINT
);

Share link

Anyone who has the link will be able to view this.