Dataset Name: BBC Articles Sentiment Analysis Dataset
Source: BBC News
Description:
This dataset consists of articles from the BBC News website, containing a diverse range of topics such as business, politics, entertainment, technology, sports, and more. The dataset includes articles from various time periods and categories, along with labels representing the sentiment of the article. The sentiment labels indicate whether the tone of the article is positive, negative, or neutral, making it suitable for sentiment analysis tasks.
Number of Instances: [Specify the number of articles in the dataset, for example, 2,225 articles]
Number of Features:
- Article Text: The content of the article (string).
- Sentiment Label: The sentiment classification of the article. The possible labels are:
- Positive
- Negative
- Neutral
Data Fields:
- id: Unique identifier for each article.
- category: The category or topic of the article (e.g., business, politics, sports).
- title: The title of the article.
- content: The full text of the article.
- sentiment: The sentiment label (positive, negative, or neutral).
Example:
id |
category |
title |
content |
sentiment |
1 |
Business |
"Stock Market Surge" |
"The stock market has surged to new highs, driven by strong earnings..." |
Positive |
2 |
Politics |
"Election Results" |
"The election results were a mixed bag, with some surprises along the way." |
Neutral |
3 |
Sports |
"Team Wins Championship" |
"The team won the championship after a thrilling final match." |
Positive |
4 |
Technology |
"New Smartphone Release" |
"The new smartphone release has received mixed reactions from users." |
Negative |
Preprocessing Notes:
- The text has been preprocessed to remove special characters and any HTML tags that might have been included in the original articles.
- Tokenization or further text cleaning (e.g., lowercasing, stopword removal) may be necessary depending on the model and method used for sentiment classification.
Use Case:
This dataset is ideal for training and evaluating machine learning models for sentiment classification, where the goal is to predict the sentiment (positive, negative, or neutral) based on the article's text.