Baselight

WritingQuality|MemoryReduction

Memory reduced training data for the Writing Process-Writing Quality competition

@kaggle.ravi20076_writingqualitymemoryreduction

About this Dataset

WritingQuality|MemoryReduction

This is a memory reduced dataset for the Writing Process-Writing Quality competition. I encoded text columns into np.int8 type and binned categories with extremely low occurrences into a common bin. I also down-casted certain columns in the data based on their min-max values to save memory. I have saved the train-logs data in a binary format and the encoded text strings and their categories too as one may need them while inferring on the test data.
This is also available in my baseline data prep kernel.
We will use this data as input for all our future steps including EDA, model development and inference development. We hope not to fall prey to memory errors using such an approach.
All the best for the competition!

Share link

Anyone who has the link will be able to view this.