Reddit R/AskScience Flair Dataset
Dataset for Predicting Post Flair Categories on r/AskScience Subreddit
@kaggle.sumitm004_reddit_raskscience_flair_dataset
Dataset for Predicting Post Flair Categories on r/AskScience Subreddit
@kaggle.sumitm004_reddit_raskscience_flair_dataset
Reddit is a massive platform for news, content, and discussions, hosting millions of active users daily. Among its vast number of subreddits, we focus on the r/AskScience community, where users engage in science-related discussions and questions.
This dataset is derived from the r/AskScience subreddit, collected between January 1, 2016, and May 20, 2022. It includes 612,668 datapoints across 22 columns, featuring diverse information such as the content of the questions, submission descriptions, associated flairs, NSFW/SFW status, year of submission, and more. The data was extracted using Python and Pushshift's API, followed by some cleaning with NumPy and pandas. Detailed column descriptions are available for clarity.
CREATE TABLE flair_data (
"id" VARCHAR,
"author" VARCHAR,
"author_fullname" VARCHAR,
"domain" VARCHAR,
"question" VARCHAR,
"link_flair_css_class" VARCHAR,
"link_flair_text" VARCHAR,
"description" VARCHAR,
"contest_mode" VARCHAR,
"created_utc" TIMESTAMP,
"year" BIGINT,
"edited" DOUBLE,
"retrieved_on" TIMESTAMP,
"over_18" BOOLEAN,
"is_self" BOOLEAN,
"locked" BOOLEAN,
"num_comments" BIGINT,
"score" BIGINT,
"spoiler" VARCHAR,
"stickied" BOOLEAN,
"thumbnail" VARCHAR,
"banned" VARCHAR
);Anyone who has the link will be able to view this.