Baselight

Russian Social Media Text Classification

Text classification data from VK CUP 2022

@kaggle.mikhailma_russian_social_media_text_classification

Loading...
Loading...

About this Dataset

Russian Social Media Text Classification

VKontakte communities can belong to one of several predefined categories. But even among the sports communities there is a fairly strong division by subject! The same authors can write about only one sport or at once about a large number.
Based on a given set of posts, determine the topic - what kind of sport is being discussed in the selected community?

Here is a list of available categories:

  1. athletics,
  2. autosport,
  3. basketball,
  4. boardgames,
  5. esport,
  6. extreme,
  7. football,
  8. hockey,
  9. martial arts,
  10. motosport,
  11. tennis,
  12. volleyball,
  13. winter_sport

evaluate metric look like:

def score(true, pred, n_samples):
    counter = 0
    if true == pred:
        counter += 1
    else:
        counter -= 1
    return counter / n_samples

Tables

Sample Submission

@kaggle.mikhailma_russian_social_media_text_classification.sample_submission
  • 22.6 KB
  • 2626 rows
  • 2 columns
Loading...

CREATE TABLE sample_submission (
  "oid" BIGINT,
  "category" VARCHAR
);

Test

@kaggle.mikhailma_russian_social_media_text_classification.test
  • 9.65 MB
  • 26260 rows
  • 2 columns
Loading...

CREATE TABLE test (
  "oid" BIGINT,
  "text" VARCHAR
);

Train

@kaggle.mikhailma_russian_social_media_text_classification.train
  • 14.1 MB
  • 38740 rows
  • 3 columns
Loading...

CREATE TABLE train (
  "oid" BIGINT,
  "category" VARCHAR,
  "text" VARCHAR
);

Share link

Anyone who has the link will be able to view this.