Dataset: WritingQuality|MemoryReduction

About this Dataset

WritingQuality|MemoryReduction

This is a memory reduced dataset for the Writing Process-Writing Quality competition. I encoded text columns into np.int8 type and binned categories with extremely low occurrences into a common bin. I also down-casted certain columns in the data based on their min-max values to save memory. I have saved the train-logs data in a binary format and the encoded text strings and their categories too as one may need them while inferring on the test data.
This is also available in my baseline data prep kernel.
We will use this data as input for all our future steps including EDA, model development and inference development. We hope not to fall prey to memory errors using such an approach.
All the best for the competition!

Tables

Downevents V1

@kaggle.ravi20076_writingqualitymemoryreduction.downevents_v1

4.3 KB
131 rows
3 columns


CREATE TABLE downevents_v1 (
  "down_event" VARCHAR,
  "counts" BIGINT,
  "down_event_nb" BIGINT
);

Textchange V1

@kaggle.ravi20076_writingqualitymemoryreduction.textchange_v1

118.71 KB
4111 rows
3 columns


CREATE TABLE textchange_v1 (
  "text_change" VARCHAR,
  "counts" BIGINT,
  "text_change_nb" BIGINT
);

Trainlogsid642

@kaggle.ravi20076_writingqualitymemoryreduction.trainlogsid642

72.15 KB
2706 rows
11 columns


CREATE TABLE trainlogsid642 (
  "unnamed_0" BIGINT,
  "down_time" BIGINT,
  "up_time" BIGINT,
  "action_time" BIGINT,
  "activity" BIGINT,
  "cursor_position" BIGINT,
  "word_count" BIGINT,
  "id_nb" BIGINT,
  "down_event_nb" BIGINT,
  "up_event_nb" BIGINT,
  "text_change_nb" BIGINT
);

Trainscore V1

@kaggle.ravi20076_writingqualitymemoryreduction.trainscore_v1

42.4 KB
2471 rows
3 columns


CREATE TABLE trainscore_v1 (
  "id_nb" BIGINT,
  "id" VARCHAR,
  "score" BIGINT
);

Upevents V1

@kaggle.ravi20076_writingqualitymemoryreduction.upevents_v1

4.29 KB
130 rows
3 columns


CREATE TABLE upevents_v1 (
  "up_event" VARCHAR,
  "counts" BIGINT,
  "up_event_nb" BIGINT
);