Yet Another Chinese News Dataset
@kaggle.ceshine_yet_another_chinese_news_dataset
Loading...
Loading...
Loading...
Loading...
@kaggle.ceshine_yet_another_chinese_news_dataset
A collections of news articles in Traditional and Simplified Chinese. It includes some Internet news outlets that are NOT Chinese state media (they deserve a separate dataset).
Complete coverage is not guaranteed. Therefore this dataset is not suitable for analyzing event coverage. It is meant for using as a corpus for NLP algorithms.
Note: Only minimal text cleaning has been performed on the meta tags.
og:title or twitter:title meta tag.twitter:description or og:description meta tag.twitter:image or og:image meta tag.This dataset does not provide full texts of the article. You'll need to scrape it yourself using the links provided.
@kaggle
@ukgov
@euremarkable
@ukgov
Anyone who has the link will be able to view this.