Baselight

Preprocessing-1 Of Titanic Dataset

New features such as FamilySize, IsAlone, CabinFloor, CabinNumber. +Mapping

@kaggle.spektrum_preprocessing1_of_titanic_dataset

About this Dataset

Preprocessing-1 Of Titanic Dataset

Context

preprocessed dataset for the titanic disaster competition

Content

the train_set contains following features : PassengerId, Survived, Pclass, Sex, Age, SibSp, Parch, Fare, Embarked, CabinFloor, CabinNumber, FamilySize, IsAlone, Title.

  • Age, Fare and Title have been simplified into differents categories.
  • Cabin has been split into CabinFloor and CabinNumber
  • CabinNumber has been simplified into two sides.
  • SibSp & Parch have been used to create new features FamilySize & IsAlone
  • Name, Cabin and Ticket have been dropped.

All of them have been mapped into floats.

Acknowledgements

Most of the ideas come from tutorials.

Tables

Preproc2 Test

@kaggle.spektrum_preprocessing1_of_titanic_dataset.preproc2_test
  • 12.9 KB
  • 418 rows
  • 13 columns
Loading...

CREATE TABLE preproc2_test (
  "passengerid" BIGINT,
  "pclass" BIGINT,
  "sex" BIGINT,
  "age" BIGINT,
  "sibsp" BIGINT,
  "parch" BIGINT,
  "fare" BIGINT,
  "embarked" BIGINT,
  "cabinfloor" BIGINT,
  "cabinnumber" BIGINT,
  "familysize" BIGINT,
  "isalone" BIGINT,
  "title" BIGINT
);

Preproc2 Train

@kaggle.spektrum_preprocessing1_of_titanic_dataset.preproc2_train
  • 17.81 KB
  • 891 rows
  • 14 columns
Loading...

CREATE TABLE preproc2_train (
  "passengerid" BIGINT,
  "survived" BIGINT,
  "pclass" BIGINT,
  "sex" BIGINT,
  "age" BIGINT,
  "sibsp" BIGINT,
  "parch" BIGINT,
  "fare" BIGINT,
  "embarked" BIGINT,
  "cabinfloor" BIGINT,
  "cabinnumber" BIGINT,
  "familysize" BIGINT,
  "isalone" BIGINT,
  "title" BIGINT
);