Dataset: Twitter Tweets Sentiment Dataset

About this Dataset

Twitter Tweets Sentiment Dataset

Description:

Twitter is an online Social Media Platform where people share their their though as tweets. It is observed that some people misuse it to tweet hateful content. Twitter is trying to tackle this problem and we shall help it by creating a strong NLP based-classifier model to distinguish the negative tweets & block such tweets. Can you build a strong classifier model to predict the same?

Each row contains the text of a tweet and a sentiment label. In the training set you are provided with a word or phrase drawn from the tweet (selected_text) that encapsulates the provided sentiment.

Make sure, when parsing the CSV, to remove the beginning / ending quotes from the text field, to ensure that you don't include them in your training.

You're attempting to predict the word or phrase from the tweet that exemplifies the provided sentiment. The word or phrase should include all characters within that span (i.e. including commas, spaces, etc.)

Columns:

textID - unique ID for each piece of text
text - the text of the tweet
sentiment - the general sentiment of the tweet

Acknowledgement:

The dataset is download from Kaggle Competetions:
https://www.kaggle.com/c/tweet-sentiment-extraction/data?select=train.csv

Objective:

Understand the Dataset & cleanup (if required).
Build classification models to predict the twitter sentiments.
Compare the evaluation metrics of vaious classification algorithms.

Tables

Tweets

@kaggle.yasserh_twitter_tweets_sentiment_dataset.tweets

2.55 MB
27481 rows
4 columns


CREATE TABLE tweets (
  "textid" VARCHAR,
  "text" VARCHAR,
  "selected_text" VARCHAR,
  "sentiment" VARCHAR
);