Baselight

BBC Datasets For Sentiment Analysis

BBC datasets for sentiment analysis

@kaggle.amunsentom_article_dataset_2

About this Dataset

BBC Datasets For Sentiment Analysis

Dataset Name: BBC Articles Sentiment Analysis Dataset

Source: BBC News

Description:
This dataset consists of articles from the BBC News website, containing a diverse range of topics such as business, politics, entertainment, technology, sports, and more. The dataset includes articles from various time periods and categories, along with labels representing the sentiment of the article. The sentiment labels indicate whether the tone of the article is positive, negative, or neutral, making it suitable for sentiment analysis tasks.

Number of Instances: [Specify the number of articles in the dataset, for example, 2,225 articles]

Number of Features:

  1. Article Text: The content of the article (string).
  2. Sentiment Label: The sentiment classification of the article. The possible labels are:
    • Positive
    • Negative
    • Neutral

Data Fields:

  • id: Unique identifier for each article.
  • category: The category or topic of the article (e.g., business, politics, sports).
  • title: The title of the article.
  • content: The full text of the article.
  • sentiment: The sentiment label (positive, negative, or neutral).

Example:

id category title content sentiment
1 Business "Stock Market Surge" "The stock market has surged to new highs, driven by strong earnings..." Positive
2 Politics "Election Results" "The election results were a mixed bag, with some surprises along the way." Neutral
3 Sports "Team Wins Championship" "The team won the championship after a thrilling final match." Positive
4 Technology "New Smartphone Release" "The new smartphone release has received mixed reactions from users." Negative

Preprocessing Notes:

  • The text has been preprocessed to remove special characters and any HTML tags that might have been included in the original articles.
  • Tokenization or further text cleaning (e.g., lowercasing, stopword removal) may be necessary depending on the model and method used for sentiment classification.

Use Case:
This dataset is ideal for training and evaluating machine learning models for sentiment classification, where the goal is to predict the sentiment (positive, negative, or neutral) based on the article's text.

Share link

Anyone who has the link will be able to view this.