Baselight

Ten Million Reddit Answers

Ten million answers and their /r/AskReddit questions

@kaggle.pavellexyr_ten_million_reddit_answers

Loading...
Loading...

About this Dataset

Ten Million Reddit Answers

Context

The spiritual successor to our One Million Reddit Questions, this dataset presents ten millions of question-answer pairs, labelled by score and pre-analyzed sentiment.

Content

This dataset contains ten million comments on /r/AskReddit - and the associated parent posts, procured using SocialGrep.
The posts and the comments are labelled with date of creation and their score.

Acknowledgements

We would like to thank the Kaggle community. WIthout you, this dataset would not have been here.

Inspiration

This dataset presents a novel corpus, ripe for training question-answering language models and much more. What can you do with it, reader? The sky is the limit.

Tables

Ten Million Reddit Answers Questions

@kaggle.pavellexyr_ten_million_reddit_answers.ten_million_reddit_answers_questions
  • 76.55 MB
  • 631063 rows
  • 12 columns
Loading...

CREATE TABLE ten_million_reddit_answers_questions (
  "type" VARCHAR,
  "id" VARCHAR,
  "subreddit_id" VARCHAR,
  "subreddit_name" VARCHAR,
  "subreddit_nsfw" BOOLEAN,
  "created_utc" BIGINT,
  "permalink" VARCHAR,
  "domain" VARCHAR,
  "url" VARCHAR,
  "selftext" VARCHAR,
  "title" VARCHAR,
  "score" BIGINT
);

Ten Million Reddit Answers

@kaggle.pavellexyr_ten_million_reddit_answers.ten_million_reddit_answers
  • 1.14 GB
  • 10000000 rows
  • 10 columns
Loading...

CREATE TABLE ten_million_reddit_answers (
  "type" VARCHAR,
  "id" VARCHAR,
  "subreddit_id" VARCHAR,
  "subreddit_name" VARCHAR,
  "subreddit_nsfw" BOOLEAN,
  "created_utc" BIGINT,
  "permalink" VARCHAR,
  "body" VARCHAR,
  "sentiment" DOUBLE,
  "score" BIGINT
);

Share link

Anyone who has the link will be able to view this.