Cleaned Toxic Comments
Preprocessed data for Toxic Comments Classification Challenge
@kaggle.fizzbuzz_cleaned_toxic_comments
Preprocessed data for Toxic Comments Classification Challenge
@kaggle.fizzbuzz_cleaned_toxic_comments
The obstacle I faced in Toxic Comments Classification Challenge was the preprocessing part. One can easily improve their LB performance if the preprocessing is done right.
This is the preprocessed version of Toxic Comments Classification Challenge dataset. The code for preprocessing: https://www.kaggle.com/fizzbuzz/toxic-data-preprocessing
CREATE TABLE test_preprocessed (
"comment_text" VARCHAR,
"id" VARCHAR,
"identity_hate" VARCHAR,
"insult" VARCHAR,
"obscene" VARCHAR,
"set" VARCHAR,
"severe_toxic" VARCHAR,
"threat" VARCHAR,
"toxic" VARCHAR,
"toxicity" VARCHAR
);CREATE TABLE train_preprocessed (
"comment_text" VARCHAR,
"id" VARCHAR,
"identity_hate" DOUBLE,
"insult" DOUBLE,
"obscene" DOUBLE,
"set" VARCHAR,
"severe_toxic" DOUBLE,
"threat" DOUBLE,
"toxic" DOUBLE,
"toxicity" DOUBLE
);Anyone who has the link will be able to view this.