Baselight

Reddit Usernames

The usernames of all 26m reddit accounts that have commented since Dec. 2017

@kaggle.colinmorris_reddit_usernames

About this Dataset

Reddit Usernames

Content

This dataset contains the username of any reddit account that has left at least one comment, and their number of comments.

This data was grabbed in December 2017 from the Reddit comments dataset hosted on Google BigQuery. It should be current up to November 2017.

Quick stats

  • 26 million users
  • 8 million have left only a single comment
  • 13 million (50%) have left no more than 5 comments
  • 42,000 usernames demand something via PM (e.g. PM_ME_PIX_OF_UR_CAT, PM_me_your_successes, PMmeyourRGB, and lots of less wholesome ones)

Acknowledgements

Thanks to /u/Stuck_In_the_Matrix, who collected and maintains the original comments dataset.

Inspiration

  • What words commonly appear in Reddit usernames?
  • Can you identify frequently occurring username 'recipes' using clustering techniques?
  • What numbers most commonly appear as suffixes in Reddit usernames?
  • Can you train a generative language model to output new usernames based on this dataset?