Baselight

Stock Market TWEETS NLP

Stock Market TWEETS Labelled With GCP NLP

@kaggle.dawoodaijaz_stock_market_tweets_labelled_with_gcp_nlp

Loading...
Loading...

About this Dataset

Stock Market TWEETS NLP

Context:

Stock Market TWEETS Sentiment Analysis

Data Set Overview:

This is a data set of tweets related to the stock market. Dataset is derived from an existing dataset Stock Market TWEETS Data-NLP-2021. In the original data set 943,9672 tweets are collected between April 9 and July 16, 2020, using the S&P 500 tag (#SPX500), the references to the top 25 companies in the S&P 500 index, and the Bloomberg tag (#stocks), However, In the original dataset, a total of 943,672 tweets are provided out of which only 1300 are labeled with the sentiment score which is a very small percentage.

I wanted to label the remaining dataset using my own model but it cannot be trusted. So to label the remaining data points I used google cloud NLP API. The powerful pre-trained models of the Natural Language API empower developers to easily apply natural language understanding (NLU) to their applications with features including sentiment analysis, entity analysis, entity sentiment analysis, content classification, and syntax analysis.
I used the sentiment analysis feature of this API. Out of 943,672 data points, I only marked a total of 12591. The cost of labeling all the data points I very high.

We can put a high amount of trust behind the google cloud NLP API and can consider the sentiment score as correct or at least better than a self trained model.

Inspiration

Data Scient, NLP, Artificial Intelligence, Stock Market, Financial Analysis and Other

Tables

Labelled Tweets

@kaggle.dawoodaijaz_stock_market_tweets_labelled_with_gcp_nlp.labelled_tweets
  • 1.15 MB
  • 12420 rows
  • 4 columns
Loading...

CREATE TABLE labelled_tweets (
  "id" BIGINT,
  "created_at" VARCHAR,
  "full_text" VARCHAR,
  "score" DOUBLE
);

Share link

Anyone who has the link will be able to view this.