Baselight

IngestRSS-News-Data

A Large-Scale Dataset of 130K+ Articles - Content Analysis & News Classification

@kaggle.chasegormley_ingestrss_news_data_november

Loading...
Loading...

About this Dataset

IngestRSS-News-Data

Global RSS Feed Dataset: A Comprehensive News and Content Collection

About this Dataset

This dataset contains over 130,000 unique articles and content pieces collected from RSS feeds across various domains. The data spans multiple news sources, tech publications, sports websites, and other content providers, offering a rich source for text analysis, content classification, and trend detection.

Ingestion System

This data set was ingested using IngestRSS, an open-source, real-time, low-cost news ingestion system.

Dataset Details

Total entries: 130,388
Unique articles: 37,817
Unique RSS sources: 688
Time period: Recent data from November 2024
Format: CSV files split into multiple parts (November-1.csv, November-2.csv, November-3.csv)

Tables

Aggregated November

@kaggle.chasegormley_ingestrss_news_data_november.aggregated_november
  • 383.14 MB
  • 130388 rows
  • 8 columns
Loading...

CREATE TABLE aggregated_november (
  "link" VARCHAR,
  "rss" VARCHAR,
  "title" VARCHAR,
  "content" VARCHAR,
  "unixtime" BIGINT,
  "rss_id" VARCHAR,
  "article_id" VARCHAR,
  "unixtime_24f134" BIGINT
);

Share link

Anyone who has the link will be able to view this.