Dataset: Cyberbullying Dataset

About this Dataset

Cyberbullying Dataset

Context

This dataset is a collection of datasets from different sources related to the automatic detection of cyber-bullying. The data is from different social media platforms like Kaggle, Twitter, Wikipedia Talk pages and YouTube. The data contain text and labeled as bullying or not. The data contains different types of cyber-bullying like hate speech, aggression, insults and toxicity.

Content

The data is from different social media platforms like Kaggle, Twitter, Wikipedia Talk pages and YouTube. The data contain text and labeled as bullying or not. The data contains different types of cyber-bullying like hate speech, aggression, insults and toxicity.

Acknowledgements

Elsafoury, Fatma (2020), “Cyberbullying datasets”, Mendeley Data, V1, doi: 10.17632/jf4pzyvnpj.1

Tables

Aggression Parsed Dataset

@kaggle.saurabhshahane_cyberbullying_dataset.aggression_parsed_dataset

28.15 MB
115864 rows
5 columns


CREATE TABLE aggression_parsed_dataset (
  "index" BIGINT,
  "text" VARCHAR,
  "ed_label_0" DOUBLE,
  "ed_label_1" DOUBLE,
  "oh_label" BIGINT
);

Attack Parsed Dataset

@kaggle.saurabhshahane_cyberbullying_dataset.attack_parsed_dataset

28.14 MB
115864 rows
5 columns


CREATE TABLE attack_parsed_dataset (
  "index" BIGINT,
  "text" VARCHAR,
  "ed_label_0" DOUBLE,
  "ed_label_1" DOUBLE,
  "oh_label" BIGINT
);

Kaggle Parsed Dataset

@kaggle.saurabhshahane_cyberbullying_dataset.kaggle_parsed_dataset

1.15 MB
8799 rows
4 columns


CREATE TABLE kaggle_parsed_dataset (
  "index" BIGINT,
  "oh_label" BIGINT,
  "date" VARCHAR,
  "text" VARCHAR
);

Toxicity Parsed Dataset

@kaggle.saurabhshahane_cyberbullying_dataset.toxicity_parsed_dataset

38.33 MB
159686 rows
5 columns


CREATE TABLE toxicity_parsed_dataset (
  "index" BIGINT,
  "text" VARCHAR,
  "ed_label_0" DOUBLE,
  "ed_label_1" DOUBLE,
  "oh_label" BIGINT
);

Twitter Parsed Dataset

@kaggle.saurabhshahane_cyberbullying_dataset.twitter_parsed_dataset

1.69 MB
16851 rows
5 columns


CREATE TABLE twitter_parsed_dataset (
  "index" VARCHAR,
  "id" VARCHAR,
  "text" VARCHAR,
  "annotation" VARCHAR,
  "oh_label" DOUBLE
);

Twitter Racism Parsed Dataset

@kaggle.saurabhshahane_cyberbullying_dataset.twitter_racism_parsed_dataset

1.18 MB
13471 rows
5 columns


CREATE TABLE twitter_racism_parsed_dataset (
  "index" DOUBLE,
  "id" DOUBLE,
  "text" VARCHAR,
  "annotation" VARCHAR,
  "oh_label" BIGINT
);

Twitter Sexism Parsed Dataset

@kaggle.saurabhshahane_cyberbullying_dataset.twitter_sexism_parsed_dataset

1.46 MB
14881 rows
5 columns


CREATE TABLE twitter_sexism_parsed_dataset (
  "index" VARCHAR,
  "id" VARCHAR,
  "text" VARCHAR,
  "annotation" VARCHAR,
  "oh_label" DOUBLE
);

Youtube Parsed Dataset

@kaggle.saurabhshahane_cyberbullying_dataset.youtube_parsed_dataset

2.54 MB
3464 rows
10 columns


CREATE TABLE youtube_parsed_dataset (
  "index" BIGINT,
  "userindex" VARCHAR,
  "text" VARCHAR,
  "number_of_comments" BIGINT,
  "number_of_subscribers" BIGINT,
  "membership_duration" BIGINT,
  "number_of_uploads" BIGINT,
  "profanity_in_userid" BIGINT,
  "age" BIGINT,
  "oh_label" BIGINT
);