This dataset, which contains over 11,500 news, is designed for machine learning

The Bangla News Classification dataset is a large collection of text articles in the Bengali (Bangla) language from the Jamuna TV website. This dataset, which contains over 11,000 rows, is designed for machine learning and natural language processing (NLP) tasks.

Key Features:

Text Articles: A wide variety of news articles, including those on events, updates, and diverse topics.

Categories: Articles are organized into five main categories:

Sports
All-Bangladesh
International
Entertainment
National

Metadata: Each article includes:

Title: The headline.
Published Date: The date and time of publication.
Reporter: The name of the reporter (if available).
Category: The category of the article.
URL: The link to the full article.
Content: A brief summary or excerpt.

Language: The dataset is entirely in Bengali, focusing on NLP tasks specific to this language.

Applications:

Text Classification: Training models to automatically categorize articles.
Sentiment Analysis: Assessing the sentiment expressed in articles.
Information Retrieval: Developing systems to find relevant articles based on queries.
Language Modeling: Creating language models and tools for Bengali.

Usage:

Research: A useful resource for NLP research related to the Bengali language.
Education: Employed in educational settings for teaching machine learning and NLP.
Application Development: Assists in developing applications for processing Bengali text, such as news aggregators and recommendation systems.

Availability:

Access: Usually available through academic institutions, research repositories, or directly from publishers.
Format: Provided in csv format for easy integration with NLP tools.

Conclusion:

The Bangla News Classification dataset from Jamuna TV is a valuable resource for advancing research and applications in NLP for the Bengali language. It helps improve text classification, sentiment analysis, and the understanding of linguistic nuances in Bangladeshi media.

File format: csv

Related Datasets

Ultimate MMA / UFC Dataset

@blt
Ultimate Formula 1 Dataset

@blt
Somewhereinblog Data

@kaggle
GDELT 2.0 Events

@gdelt
Dynabench: Rethinking Benchmarking In NLP

@owid
AI Performance On Language Tasks

@owid

Ultimate MMA / UFC Dataset

Ultimate Formula 1 Dataset

Somewhereinblog Data

GDELT 2.0 Events

Dynabench: Rethinking Benchmarking In NLP

AI Performance On Language Tasks