IngestRSS-News-Data
A Large-Scale Dataset of 130K+ Articles - Content Analysis & News Classification
@kaggle.chasegormley_ingestrss_news_data_november
A Large-Scale Dataset of 130K+ Articles - Content Analysis & News Classification
@kaggle.chasegormley_ingestrss_news_data_november
This dataset contains over 130,000 unique articles and content pieces collected from RSS feeds across various domains. The data spans multiple news sources, tech publications, sports websites, and other content providers, offering a rich source for text analysis, content classification, and trend detection.
This data set was ingested using IngestRSS, an open-source, real-time, low-cost news ingestion system.
Total entries: 130,388
Unique articles: 37,817
Unique RSS sources: 688
Time period: Recent data from November 2024
Format: CSV files split into multiple parts (November-1.csv, November-2.csv, November-3.csv)
CREATE TABLE aggregated_november (
"link" VARCHAR,
"rss" VARCHAR,
"title" VARCHAR,
"content" VARCHAR,
"unixtime" BIGINT,
"rss_id" VARCHAR,
"article_id" VARCHAR,
"unixtime_24f134" BIGINT -- Unixtime
);Anyone who has the link will be able to view this.