Dataset: Reddit Comments

About this Dataset

Reddit Comments

reddit_comments.json. It's a jsonarray where every json element is representing one comment. For each comment there are several attributes to analyze. I take just body like mandatory, there is the comment text. Also it's available one label on the variable is_hate where the codification is: hate speech (1) o not hate (0).

conversations.csv. Every row on the file is representing one conversational thread. Comma is the current separator for differents comments on the same thread.

reddit_authors.json. One jsonarray where every json element is representing one author. It's a complement to the informatión of reddit_comments.json with all the attributes related to the authors. It could be that not all the authors being on the file due that some of them could been suspended by Reddit.

Tables

Melbourne House Prices Less

@kaggle.ignaciorusso_reddit_comments.melbourne_house_prices_less

1.2 MB
63023 rows
13 columns


CREATE TABLE melbourne_house_prices_less (
  "suburb" VARCHAR,
  "address" VARCHAR,
  "rooms" BIGINT,
  "type" VARCHAR,
  "price" DOUBLE,
  "method" VARCHAR,
  "sellerg" VARCHAR,
  "date" TIMESTAMP,
  "postcode" BIGINT,
  "regionname" VARCHAR,
  "propertycount" BIGINT,
  "distance" DOUBLE,
  "councilarea" VARCHAR
);

Melbourne Housing Full

@kaggle.ignaciorusso_reddit_comments.melbourne_housing_full

1.09 MB
34857 rows
21 columns


CREATE TABLE melbourne_housing_full (
  "suburb" VARCHAR,
  "address" VARCHAR,
  "rooms" BIGINT,
  "type" VARCHAR,
  "price" DOUBLE,
  "method" VARCHAR,
  "sellerg" VARCHAR,
  "date" TIMESTAMP,
  "distance" DOUBLE,
  "postcode" DOUBLE,
  "bedroom2" DOUBLE,
  "bathroom" DOUBLE,
  "car" DOUBLE,
  "landsize" DOUBLE,
  "buildingarea" DOUBLE,
  "yearbuilt" DOUBLE,
  "councilarea" VARCHAR,
  "lattitude" DOUBLE,
  "longtitude" DOUBLE,
  "regionname" VARCHAR,
  "propertycount" DOUBLE
);