Baselight

Stock Comments (Twits) - Tinkoff Pulse

1 mln anonymous posts about stocks on Russian market

@kaggle.ilovebeer228_stock_comments_twits_tinkoff_pulse

Loading...
Loading...

About this Dataset

Stock Comments (Twits) - Tinkoff Pulse

Hello there!

This is my data, which I've used for my bachelor diploma research in 2024 at HSE University. I have parsed all comments (or you can call them stock twits) from T-pulse threads from 01 JAN 2019 (launch of the platform) to 30 MARCH 2024. A total of 10 tickers were taken: SBER, GAZP, YNDX, TCSG, SGZH, PIKK, RTKM, MVID, KMAZ, BANE. During the chosen period, there were changes in the CCP of the Bank of Russia, the introduction of sanctions by Western countries against the Russian Federation.

Language: Russian (mostly) and English

Columns

  • inserted - date of posting of a comment (or post);
  • likesCount - amount of likes under comment (or post);
  • commentsCount - amount of comments under comment (or post);
  • text - raw text of a parsed comment (you should probably clean it from emoji etc);
  • reactions_counters - list of dicts with type and amount of reactions under comment. There are emoji-like reactions like "rocket", "like", "dislike", "not-convinced", "buy-up".

Additionally
I have added df_labelled_llm.csv dataset with labelled posts. Around 1000 from each ticker mentioned above, so total is around 10K posts. Labelling was done 90% with LLM and 10% manually for slang posts. You can use this as a starting point of your research.

Areas of application

  • Sentiment analysis of stock twits;
  • Fine-tuning BERT-based models;
  • Testing algotrading strategies based on sentiment analysis;
  • Research.

This data was gathered for educational purposes only. No exact names, phone numbers or addresses of the authors of posts/comments were included into the dataset.

Tables

Df Bane Data

@kaggle.ilovebeer228_stock_comments_twits_tinkoff_pulse.df_bane_data
  • 5.87 MB
  • 8187 rows
  • 5 columns
Loading...

CREATE TABLE df_bane_data (
  "inserted" TIMESTAMP,
  "likescount" BIGINT,
  "commentscount" BIGINT,
  "text" VARCHAR,
  "reactions_counters" VARCHAR
);

Df Gazp Data

@kaggle.ilovebeer228_stock_comments_twits_tinkoff_pulse.df_gazp_data
  • 105.45 MB
  • 263340 rows
  • 5 columns
Loading...

CREATE TABLE df_gazp_data (
  "inserted" TIMESTAMP,
  "likescount" BIGINT,
  "commentscount" BIGINT,
  "text" VARCHAR,
  "reactions_counters" VARCHAR
);

Df Kmaz Data

@kaggle.ilovebeer228_stock_comments_twits_tinkoff_pulse.df_kmaz_data
  • 7.83 MB
  • 15786 rows
  • 5 columns
Loading...

CREATE TABLE df_kmaz_data (
  "inserted" TIMESTAMP,
  "likescount" BIGINT,
  "commentscount" BIGINT,
  "text" VARCHAR,
  "reactions_counters" VARCHAR
);

Df Labelled Llm

@kaggle.ilovebeer228_stock_comments_twits_tinkoff_pulse.df_labelled_llm
  • 2.89 MB
  • 10108 rows
  • 2 columns
Loading...

CREATE TABLE df_labelled_llm (
  "text_preprocessed" VARCHAR,
  "label" VARCHAR
);

Df Mvid Data

@kaggle.ilovebeer228_stock_comments_twits_tinkoff_pulse.df_mvid_data
  • 9.2 MB
  • 18840 rows
  • 5 columns
Loading...

CREATE TABLE df_mvid_data (
  "inserted" TIMESTAMP,
  "likescount" BIGINT,
  "commentscount" BIGINT,
  "text" VARCHAR,
  "reactions_counters" VARCHAR
);

Df Pikk Data

@kaggle.ilovebeer228_stock_comments_twits_tinkoff_pulse.df_pikk_data
  • 14.1 MB
  • 28699 rows
  • 5 columns
Loading...

CREATE TABLE df_pikk_data (
  "inserted" TIMESTAMP,
  "likescount" BIGINT,
  "commentscount" BIGINT,
  "text" VARCHAR,
  "reactions_counters" VARCHAR
);

Df Rtkm Data

@kaggle.ilovebeer228_stock_comments_twits_tinkoff_pulse.df_rtkm_data
  • 14.49 MB
  • 19578 rows
  • 5 columns
Loading...

CREATE TABLE df_rtkm_data (
  "inserted" TIMESTAMP,
  "likescount" BIGINT,
  "commentscount" BIGINT,
  "text" VARCHAR,
  "reactions_counters" VARCHAR
);

Df Sber Data

@kaggle.ilovebeer228_stock_comments_twits_tinkoff_pulse.df_sber_data
  • 107.32 MB
  • 244255 rows
  • 5 columns
Loading...

CREATE TABLE df_sber_data (
  "inserted" TIMESTAMP,
  "likescount" BIGINT,
  "commentscount" BIGINT,
  "text" VARCHAR,
  "reactions_counters" VARCHAR
);

Df Sgzh Data

@kaggle.ilovebeer228_stock_comments_twits_tinkoff_pulse.df_sgzh_data
  • 18 MB
  • 52045 rows
  • 5 columns
Loading...

CREATE TABLE df_sgzh_data (
  "inserted" TIMESTAMP,
  "likescount" BIGINT,
  "commentscount" BIGINT,
  "text" VARCHAR,
  "reactions_counters" VARCHAR
);

Df Tcs Data

@kaggle.ilovebeer228_stock_comments_twits_tinkoff_pulse.df_tcs_data
  • 45.42 MB
  • 118998 rows
  • 5 columns
Loading...

CREATE TABLE df_tcs_data (
  "inserted" TIMESTAMP,
  "likescount" BIGINT,
  "commentscount" BIGINT,
  "text" VARCHAR,
  "reactions_counters" VARCHAR
);

Df Yndx Data

@kaggle.ilovebeer228_stock_comments_twits_tinkoff_pulse.df_yndx_data
  • 66.75 MB
  • 184287 rows
  • 5 columns
Loading...

CREATE TABLE df_yndx_data (
  "inserted" TIMESTAMP,
  "likescount" BIGINT,
  "commentscount" BIGINT,
  "text" VARCHAR,
  "reactions_counters" VARCHAR
);

Share link

Anyone who has the link will be able to view this.