The Reddit Dataset Dataset
A meta dataset of Reddit's own /r/datasets community.
@kaggle.pavellexyr_the_reddit_dataset_dataset
A meta dataset of Reddit's own /r/datasets community.
@kaggle.pavellexyr_the_reddit_dataset_dataset
Datasets... In a way, the Kaggle community is built around them. You can't analyze data without having it. Here, we aim to create a meta-corpus of datasets posted to Reddit. A dataset dataset, if you will.
The following dataset is the comprehensive corpus of all the posts and comments made on Reddit's /r/datasets board, from its inception all the way to the first of March, 2022.
The dataset was procured using SocialGrep.
To preserve users' anonymity and to prevent targeted harassment, the data does not include usernames.
We would like to thank Chris Liverani for generously providing the cover image for this dataset.
Datasets are nice - we like our data.
CREATE TABLE the_reddit_dataset_dataset_comments (
"type" VARCHAR,
"id" VARCHAR,
"subreddit_id" VARCHAR,
"subreddit_name" VARCHAR,
"subreddit_nsfw" BOOLEAN,
"created_utc" BIGINT,
"permalink" VARCHAR,
"body" VARCHAR,
"sentiment" DOUBLE,
"score" BIGINT
);CREATE TABLE the_reddit_dataset_dataset_posts (
"type" VARCHAR,
"id" VARCHAR,
"subreddit_id" VARCHAR,
"subreddit_name" VARCHAR,
"subreddit_nsfw" BOOLEAN,
"created_utc" BIGINT,
"permalink" VARCHAR,
"domain" VARCHAR,
"url" VARCHAR,
"selftext" VARCHAR,
"title" VARCHAR,
"score" BIGINT
);Anyone who has the link will be able to view this.