Yet Another Chinese News Dataset
With Article Titles, Descriptions, Cover Images, and Links.
@kaggle.ceshine_yet_another_chinese_news_dataset
With Article Titles, Descriptions, Cover Images, and Links.
@kaggle.ceshine_yet_another_chinese_news_dataset
A collections of news articles in Traditional and Simplified Chinese. It includes some Internet news outlets that are NOT Chinese state media (they deserve a separate dataset).
Complete coverage is not guaranteed. Therefore this dataset is not suitable for analyzing event coverage. It is meant for using as a corpus for NLP algorithms.
Note: Only minimal text cleaning has been performed on the meta tags.
og:title or twitter:title meta tag.twitter:description or og:description meta tag.twitter:image or og:image meta tag.This dataset does not provide full texts of the article. You'll need to scrape it yourself using the links provided.
CREATE TABLE news_collection (
"title" VARCHAR,
"desc" VARCHAR,
"image" VARCHAR,
"url" VARCHAR,
"source" VARCHAR,
"date" BIGINT
);Anyone who has the link will be able to view this.