Reddit: /r/NotTheOnion
Discriminating Truth and Satire
By Reddit [source]
About this dataset
This dataset offers an inside look at the often humorous world of news media. With content that combines truth and satire, it's all too easy to be fooled by false, outlandish stories or satirical headlines. But with this dataset you can better understand what makes the headlines found on r/NotTheOnion so funny--and differentiate between fact and fiction.
More Datasets
For more datasets, click here.
Featured Notebooks
- 🚨 Your notebook can be here! 🚨!
How to use the dataset
This dataset contains Reddit posts from the r/NotTheOnion subreddit, which is dedicated to uncovering humorous news content. This can make it difficult for readers to distinguish false stories from real ones, and the purpose of this dataset is to help with that discernment.
The following columns are included in this dataset: title, score, url, comms_num, created, body and timestamp.
Here’s how you can use each column in your analysis:
- title: This will help you get an idea of what the post is about before you dive into reading it.
- score: This indicates how well liked a particular post is amongst viewers; keep in mind that a high view count can indicate either good or bad reviews.
- url: URL links lead users directly to articles or websites which confirm the story's legitimacy (or lack thereof).
- comms_num: This reflects the number of comments a post has received; typically higher comms_nums suggest stories with more intrigue and controversy.
- created: The date and time at which point in time the post was created - this will give context as certain events happen at certain times of year or may affect people differently throughout various seasons or holidays around the world (and thus affect their outlook on posted content). - body : Actual article text will allow readers additional insight beyond simply looking at headlines without facts; they might need these details to properly interpret any jokes or understand sarcasm used in a piece which alters its perception entirely. Moreover analyzing word choice could provide further insight into any biases present when writing an article too along with finding further evidence based on its length as longer articles typically include more background information than shorter ones do hence reflecting authenticity/factual basis better usually speaking. - timestamp : The exact moment something was uploaded/updated so that we know exactly when did something happen - rather than relying on memory alone for timestamps aiming for accuracy should gives us insights into temporal correlations amongst posts!
Using these columns together will give readers more context when evaluating if a story is humorous fact or funny fiction!
Research Ideas
- Analyzing the most popular stories and examining their implications for news reporting.
- Identifying the narratives which capture discussions about current events and topics of interest to readers in a humorous way.
- Comparing different subreddits to view how r/NotTheOnion contributes to the international conversation around news media and what people find funny or entertaining when it comes to news content
Acknowledgements
If you use this dataset in your research, please credit the original authors.
Data Source
License
License: CC0 1.0 Universal (CC0 1.0) - Public Domain Dedication
No Copyright - You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. See Other Information.
Columns
File: nottheonion.csv
Column name |
Description |
title |
The title of the post. (String) |
score |
The number of upvotes the post has received. (Integer) |
url |
The URL of the post. (String) |
comms_num |
The number of comments the post has received. (Integer) |
created |
The date and time the post was created. (DateTime) |
body |
The body of the post. (String) |
timestamp |
The timestamp of the post. (Integer) |
Acknowledgements
If you use this dataset in your research, please credit the original authors.
If you use this dataset in your research, please credit Reddit.