Comments, authors and conversations
Dataset Description
reddit_comments.json. It's a jsonarray where every json element is representing one comment. For each comment there are several attributes to analyze. I take just body like mandatory, there is the comment text. Also it's available one label on the variable is_hate where the codification is: hate speech (1) o not hate (0).
conversations.csv. Every row on the file is representing one conversational thread. Comma is the current separator for differents comments on the same thread.
reddit_authors.json. One jsonarray where every json element is representing one author. It's a complement to the informatión of reddit_comments.json with all the attributes related to the authors. It could be that not all the authors being on the file due that some of them could been suspended by Reddit.
Related Datasets
-
Reddit's /r/funny Subreddit
@kaggle
-
Wars On Territory
@owid
-
Fur Banning
@owid