Baselight

Comprehensive News Articles Dataset

An All-Encompassing Dataset of News Articles for Multi-Domain Analysis

@kaggle.khushikhushikhushi_comprehensive_news_articles_dataset

About this Dataset

Comprehensive News Articles Dataset

Comprehensive News Articles Dataset

This dataset is a collection of news articles gathered from various sources, spanning multiple categories such as Technology, Sports, Finance, Politics, Education, and Health. It is designed to provide a diverse and extensive set of data for natural language processing (NLP) tasks, sentiment analysis, topic modeling, and other machine learning applications.

Dataset Overview

The dataset includes articles from the following categories:

  • Technology
  • Sports
  • Finance
  • Politics
  • Education
  • Health
  • Entertainment

Each article is accompanied by the following attributes:

  • source: The source from where the article was retrieved.
  • author: The author of the article.
  • title: The title of the article.
  • description: A brief description or summary of the article.
  • url: The URL of the full article.
  • urlToImage: The URL to an image associated with the article.
  • publishedAt: The publication date of the article.
  • content: The full content of the article.
  • category: The category to which the article belongs.

Usage

This dataset can be used for a variety of tasks including:

  • Text classification
  • Sentiment analysis
  • Topic modeling
  • Named entity recognition
  • And many other NLP and machine learning tasks

Share link

Anyone who has the link will be able to view this.