Baselight

Code-Mixed Sentences Dataset (Hinglish)

Catagorised the sentences in to cyberbullying (1) or Not Cyberbullying (0)

@kaggle.pankaazshah_code_mixed_text_dataset_hinglish

Loading...
Loading...

About this Dataset

Code-Mixed Sentences Dataset (Hinglish)

This dataset consists of 25,000 code-mixed Hindi-English (Hinglish) text samples, created to support the development and evaluation of machine learning models for cyberbullying detection. The dataset reflects the informal, Roman-script nature of Hinglish as used on social media, messaging platforms, and online forums.

Tables

Hinglish Cyberbullying Dataset 25000

@kaggle.pankaazshah_code_mixed_text_dataset_hinglish.hinglish_cyberbullying_dataset_25000
  • 165.98 KB
  • 25000 rows
  • 3 columns
Loading...

CREATE TABLE hinglish_cyberbullying_dataset_25000 (
  "id" BIGINT,
  "text" VARCHAR,
  "label" BIGINT
);

Share link

Anyone who has the link will be able to view this.