Baselight

Over 11,500 Bangla News For NLP

This dataset, which contains over 11,500 news, is designed for machine learning

@kaggle.durjoychandrapaul_over_11500_bangla_news_for_nlp

About this Dataset

Over 11,500 Bangla News For NLP

The Bangla News Classification dataset is a large collection of text articles in the Bengali (Bangla) language from the Jamuna TV website. This dataset, which contains over 11,000 rows, is designed for machine learning and natural language processing (NLP) tasks.

Key Features:

Text Articles: A wide variety of news articles, including those on events, updates, and diverse topics.

Categories: Articles are organized into five main categories:

  • Sports
  • All-Bangladesh
  • International
  • Entertainment
  • National

Metadata: Each article includes:

  • Title: The headline.
  • Published Date: The date and time of publication.
  • Reporter: The name of the reporter (if available).
  • Category: The category of the article.
  • URL: The link to the full article.
  • Content: A brief summary or excerpt.

Language: The dataset is entirely in Bengali, focusing on NLP tasks specific to this language.

Applications:

  • Text Classification: Training models to automatically categorize articles.
  • Sentiment Analysis: Assessing the sentiment expressed in articles.
  • Information Retrieval: Developing systems to find relevant articles based on queries.
  • Language Modeling: Creating language models and tools for Bengali.

Usage:

  • Research: A useful resource for NLP research related to the Bengali language.
  • Education: Employed in educational settings for teaching machine learning and NLP.
  • Application Development: Assists in developing applications for processing Bengali text, such as news aggregators and recommendation systems.

Availability:

  • Access: Usually available through academic institutions, research repositories, or directly from publishers.
  • Format: Provided in csv format for easy integration with NLP tools.

Conclusion:

The Bangla News Classification dataset from Jamuna TV is a valuable resource for advancing research and applications in NLP for the Bengali language. It helps improve text classification, sentiment analysis, and the understanding of linguistic nuances in Bangladeshi media.

File format: csv