Global News Dataset
A Comprehensive Collection of more than 1 Million News Articles
@kaggle.everydaycodings_global_news_dataset
A Comprehensive Collection of more than 1 Million News Articles
@kaggle.everydaycodings_global_news_dataset
This dataset comprises news articles collected over the past few months using the NewsAPI. The primary motivation behind curating this dataset was to develop and experiment with various natural language processing (NLP) models. The dataset aims to support the creation of text summarization models, sentiment analysis models, and other NLP applications.
The data is sourced from the NewsAPI, a comprehensive and up-to-date news aggregation service. The API provides access to a wide range of news articles from various reputable sources, making it a valuable resource for constructing a diverse and informative dataset.
The data for this dataset was collected using a custom Python script. You can find the script used for data retrieval dailyWorker.py. This script leverages the NewsAPI to gather information on news articles over a specified period.
Feel free to explore and modify the script to suit your data collection needs. If you have any questions or suggestions for improvement, please don't hesitate to reach out.
The inspiration behind collecting this dataset stems from the growing interest in NLP applications and the need for high-quality, real-world data to train and evaluate these models effectively. By leveraging the NewsAPI, we aim to contribute to the development of robust text summarization and sentiment analysis models that can better understand and process news content.
Note:
Please refer to the NewsAPI documentation for terms of use and ensure compliance with their policies when using this dataset.
CREATE TABLE data (
"article_id" BIGINT,
"source_id" VARCHAR,
"source_name" VARCHAR,
"author" VARCHAR,
"title" VARCHAR,
"description" VARCHAR,
"url" VARCHAR,
"url_to_image" VARCHAR,
"published_at" VARCHAR,
"content" VARCHAR,
"category" VARCHAR,
"full_content" VARCHAR
);CREATE TABLE rating (
"article_id" BIGINT,
"source_id" VARCHAR,
"source_name" VARCHAR,
"author" VARCHAR,
"title" VARCHAR,
"description" VARCHAR,
"url" VARCHAR,
"url_to_image" VARCHAR,
"published_at" VARCHAR,
"content" VARCHAR,
"category" VARCHAR,
"article" VARCHAR,
"title_sentiment" VARCHAR
);CREATE TABLE raw_data (
"article_id" VARCHAR,
"source_id" VARCHAR,
"source_name" VARCHAR,
"author" VARCHAR,
"title" VARCHAR,
"description" VARCHAR,
"url" VARCHAR,
"url_to_image" VARCHAR,
"published_at" TIMESTAMP,
"content" VARCHAR,
"category" VARCHAR
);Anyone who has the link will be able to view this.