WritingQuality|MemoryReduction
Memory reduced training data for the Writing Process-Writing Quality competition
@kaggle.ravi20076_writingqualitymemoryreduction
Memory reduced training data for the Writing Process-Writing Quality competition
@kaggle.ravi20076_writingqualitymemoryreduction
This is a memory reduced dataset for the Writing Process-Writing Quality competition. I encoded text columns into np.int8 type and binned categories with extremely low occurrences into a common bin. I also down-casted certain columns in the data based on their min-max values to save memory. I have saved the train-logs data in a binary format and the encoded text strings and their categories too as one may need them while inferring on the test data.
This is also available in my baseline data prep kernel.
We will use this data as input for all our future steps including EDA, model development and inference development. We hope not to fall prey to memory errors using such an approach.
All the best for the competition!
CREATE TABLE downevents_v1 (
"down_event" VARCHAR,
"counts" BIGINT,
"down_event_nb" BIGINT
);CREATE TABLE textchange_v1 (
"text_change" VARCHAR,
"counts" BIGINT,
"text_change_nb" BIGINT
);CREATE TABLE trainlogsid642 (
"unnamed_0" BIGINT -- Unnamed: 0,
"down_time" BIGINT,
"up_time" BIGINT,
"action_time" BIGINT,
"activity" BIGINT,
"cursor_position" BIGINT,
"word_count" BIGINT,
"id_nb" BIGINT,
"down_event_nb" BIGINT,
"up_event_nb" BIGINT,
"text_change_nb" BIGINT
);CREATE TABLE trainscore_v1 (
"id_nb" BIGINT,
"id" VARCHAR,
"score" BIGINT
);CREATE TABLE upevents_v1 (
"up_event" VARCHAR,
"counts" BIGINT,
"up_event_nb" BIGINT
);Anyone who has the link will be able to view this.