This dataset provides a comprehensive collection of world news headlines spanning from May 2018 to April 2023. The news headlines were sourced from Reddit using the PMAW (Python Reddit API Wrapper) API.
The news headlines were collected by utilizing the PMAW API, which facilitates the extraction of posts and comments from Reddit. The dataset covers a wide range of topics, including international news, politics, economics, science, technology, entertainment, and more.
The dataset consists of seven columns:
Date: The date of the news headlines.
Top1 to Top25: The top 25 news headlines for each respective date. The headlines are ranked in order of importance, with Top1 being the most important headline.
Data Cleaning Considerations:
Users are encouraged to perform their own data cleaning and preprocessing based on their specific requirements and quality standards. This might include handling missing values, removing duplicates, standardizing text formats, or applying other relevant data cleaning techniques to ensure the accuracy and consistency of the data.