Baselight

Vietnamese Online News .csv Dataset

A dataset that consists of 150K+ news

@kaggle.sarahhimeko_vietnamese_online_news_csv_dataset

Loading...
Loading...

About this Dataset

Vietnamese Online News .csv Dataset

Initially, the format of this dataset was .json, so I converted it to .csv for ease of data processing.

If you wish to work with the original .json file, you can look for it at: https://www.kaggle.com/datasets/haitranquangofficial/vietnamese-online-news-dataset

"Online articles from the 25 most popular news sites in Vietnam in July 2022, suitable for practicing Natural Language Processing in Vietnamese.

Online news outlets are an unavoidable part of our society today due to their easy access, mostly free. Their effects on the way communities think and act is becoming a concern for a multitude of groups of people, including legislators, content creators, and marketers, just to name a few. Aside from the effects, what is being written on the news should be a good reflection of people’s will, attention, and even cultural standard.

In Vietnam, even though journalists have received much criticism, especially in recent years, news outlets still receive a lot of traffic (27%) compared to other methods to receive information."

Tables

Original

@kaggle.sarahhimeko_vietnamese_online_news_csv_dataset.original
  • 24.51 KB
  • 32 rows
  • 25 columns
Loading...

CREATE TABLE original (
  "laodong" VARCHAR,
  "zingnews" VARCHAR,
  "thanhnien_vn" VARCHAR,
  "n_24h_com_vn" VARCHAR,
  "vtv_vn" VARCHAR,
  "vnexpress" VARCHAR,
  "danviet" VARCHAR,
  "dantri" VARCHAR,
  "vov_vn" VARCHAR,
  "tienphong" VARCHAR,
  "tuoitre_vn" VARCHAR,
  "soha" VARCHAR,
  "nld" VARCHAR,
  "kenh14" VARCHAR,
  "vietnamnet_vn" VARCHAR,
  "docbao_vn" VARCHAR,
  "cafebiz" VARCHAR,
  "vtc_vn" VARCHAR,
  "bongdaplus" VARCHAR,
  "baoquocte" VARCHAR,
  "anninhthudo" VARCHAR,
  "eva_vn" VARCHAR,
  "thethaovanhoa" VARCHAR,
  "baochinhphu" VARCHAR,
  "qdnd_vn" VARCHAR
);

Original News Dataset

@kaggle.sarahhimeko_vietnamese_online_news_csv_dataset.original_news_dataset
  • 307.76 MB
  • 184539 rows
  • 11 columns
Loading...

CREATE TABLE original_news_dataset (
  "unnamed_0" BIGINT,
  "id" BIGINT,
  "author" VARCHAR,
  "content" VARCHAR,
  "picture_count" BIGINT,
  "processed" BIGINT,
  "source" VARCHAR,
  "title" VARCHAR,
  "topic" VARCHAR,
  "url" VARCHAR,
  "crawled_at" TIMESTAMP
);

Fixed News Dataset

@kaggle.sarahhimeko_vietnamese_online_news_csv_dataset.fixed_news_dataset
  • 217.59 MB
  • 184539 rows
  • 12 columns
Loading...

CREATE TABLE fixed_news_dataset (
  "unnamed_0_1" BIGINT,
  "unnamed_0" BIGINT,
  "id" BIGINT,
  "author" VARCHAR,
  "content" VARCHAR,
  "picture_count" BIGINT,
  "processed" BIGINT,
  "source" VARCHAR,
  "title" VARCHAR,
  "topic" VARCHAR,
  "url" VARCHAR,
  "crawled_at" TIMESTAMP
);

Target

@kaggle.sarahhimeko_vietnamese_online_news_csv_dataset.target
  • 19.94 KB
  • 32 rows
  • 25 columns
Loading...

CREATE TABLE target (
  "laodong" VARCHAR,
  "zingnews" VARCHAR,
  "thanhnien_vn" VARCHAR,
  "n_24h_com_vn" VARCHAR,
  "vtv_vn" VARCHAR,
  "vnexpress" VARCHAR,
  "danviet" VARCHAR,
  "dantri" VARCHAR,
  "vov_vn" VARCHAR,
  "tienphong" VARCHAR,
  "tuoitre_vn" VARCHAR,
  "soha" VARCHAR,
  "nld" VARCHAR,
  "kenh14" VARCHAR,
  "vietnamnet_vn" VARCHAR,
  "docbao_vn" VARCHAR,
  "cafebiz" VARCHAR,
  "vtc_vn" VARCHAR,
  "bongdaplus" VARCHAR,
  "baoquocte" VARCHAR,
  "anninhthudo" VARCHAR,
  "eva_vn" VARCHAR,
  "thethaovanhoa" VARCHAR,
  "baochinhphu" VARCHAR,
  "qdnd_vn" VARCHAR
);

Share link

Anyone who has the link will be able to view this.