Baselight

Stock Comments (Twits) - Tinkoff Pulse

1 mln anonymous posts about stocks on Russian market

@kaggle.ilovebeer228_stock_comments_twits_tinkoff_pulse

About this Dataset

Stock Comments (Twits) - Tinkoff Pulse

Hello there!

This is my data, which I've used for my bachelor diploma research in 2024 at HSE University. I have parsed all comments (or you can call them stock twits) from T-pulse threads from 01 JAN 2019 (launch of the platform) to 30 MARCH 2024. A total of 10 tickers were taken: SBER, GAZP, YNDX, TCSG, SGZH, PIKK, RTKM, MVID, KMAZ, BANE. During the chosen period, there were changes in the CCP of the Bank of Russia, the introduction of sanctions by Western countries against the Russian Federation.

Language: Russian (mostly) and English

Columns

  • inserted - date of posting of a comment (or post);
  • likesCount - amount of likes under comment (or post);
  • commentsCount - amount of comments under comment (or post);
  • text - raw text of a parsed comment (you should probably clean it from emoji etc);
  • reactions_counters - list of dicts with type and amount of reactions under comment. There are emoji-like reactions like "rocket", "like", "dislike", "not-convinced", "buy-up".

Additionally
I have added df_labelled_llm.csv dataset with labelled posts. Around 1000 from each ticker mentioned above, so total is around 10K posts. Labelling was done 90% with LLM and 10% manually for slang posts. You can use this as a starting point of your research.

Areas of application

  • Sentiment analysis of stock twits;
  • Fine-tuning BERT-based models;
  • Testing algotrading strategies based on sentiment analysis;
  • Research.

This data was gathered for educational purposes only. No exact names, phone numbers or addresses of the authors of posts/comments were included into the dataset.

Share link

Anyone who has the link will be able to view this.