Baselight

Cyberbullying Classification

47k tweets belonging to 6 balanced classes.

@kaggle.andrewmvd_cyberbullying_classification

About this Dataset

Cyberbullying Classification

Abstract

With rise of social media coupled with the Covid-19 pandemic, cyberbullying has reached all time highs. We can combat this by creating models to automatically flag potentially harmful tweets as well as break down the patterns of hatred.

About this dataset

As social media usage becomes increasingly prevalent in every age group, a vast majority of citizens rely on this essential medium for day-to-day communication. Social media’s ubiquity means that cyberbullying can effectively impact anyone at any time or anywhere, and the relative anonymity
of the internet makes such personal attacks more difficult to stop than traditional bullying.

On April 15th, 2020, UNICEF issued a warning in response to the increased risk of cyberbullying during the COVID-19 pandemic due to widespread school closures, increased screen time, and decreased face-to-face social interaction. The statistics of cyberbullying are outright alarming: 36.5% of middle and high school students have felt cyberbullied and 87% have observed cyberbullying, with effects ranging from decreased academic performance to depression to suicidal thoughts.

In light of all of this, this dataset contains more than 47000 tweets labelled according to the class of cyberbullying:

  • Age;
  • Ethnicity;
  • Gender;
  • Religion;
  • Other type of cyberbullying;
  • Not cyberbullying

The data has been balanced in order to contain ~8000 of each class.

Trigger Warning These tweets either describe a bullying event or are the offense themselves, therefore explore it to the point where you feel comfortable.

How to use this dataset

  • Create a multiclassification model to predict cyberbullying type;
  • Create a binary classification model to flag potentially harmful tweets;
  • Explore words and patterns associated with each type of cyberbullying.

Highlighted Notebooks

Acknowledgements

If you use this dataset in your research, please credit the authors.

Citation

J. Wang, K. Fu, C.T. Lu, “SOSNet: A Graph Convolutional Network Approach to Fine-Grained Cyberbullying Detection,” Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), December 10-13, 2020.

License

CC BY 4.0

Splash banner

Icons by Freepik and Juicy Fish.

Share link

Anyone who has the link will be able to view this.