Baselight

Home Loan Prediction Prepared Datasets

Prepared datasets for home loan default risk prediction

@kaggle.defcodeking_home_loan_prediction_prepared_datasets

Loading...
Loading...

About this Dataset

Home Loan Prediction Prepared Datasets

This dataset is a prepared dataset for the AI511-HOMELOAN-2022 competiton. There are 4 pairs of files, where each pair has the training and test sets.

  • train_with_new_features.csv and test_with_new_features.csv - These files have all the new features that have been engineered. There is no further preprocessing. This is useful if you want to run your own experiments with missing value imputation and scaling.
  • train_scaled_no_nulls.csv and test_scaled_no_nulls.csv - These files have the new features, all the missing values have been imputed using median for numerical features and mode for categorical features, and the numerical columns have been scaled using min-max scaling. This is useful if you want to experiment with ways to encode categorical features.
  • train_le_scaled_no_nulls.csv and test_le_scaled_no_nulls.csv - In addition to all of the above, the categorical features have been label encoded. Note that there are some features where categories in the training set are not present in the test set and vice-versa. All such categories have been encoded with -1 in the test set.
  • train_ohe_scaled_no_nulls.csv and test_ohe_scaled_no_nulls.csv - Same as above except categorical features are one-hot encoded.

The new features that have been engineered are as follows:

  • loan_rate - This is the ratio between amt_credit and amt_annuity. Thus, it is a direct measure of the duration of the loan in months.​
  • loan_income_ratio - This is the ratio between amt_credit and amt_income_total. Thus, it acts as an indirect measure of the amount of liability that the applicant is taking on themselves. A higher value indicates that the loan amount is much bigger than the applicant's income, suggesting that they've taken a bigger liability than someone with a smaller ratio.​
  • annuity_income_ratio - This is the ratio between amt_annuity and amt_income_total. It is another measure of liability but measures it at a finer timescale (per month vs over multiple months) since amt_annuiity is the amount to be paid each month against the loan.​
  • application_is_incomplete - Denotes whether the applicant submitted an incomplete application or not.
  • es1_is_missing - True if ext_source_1 is missing for the applicant.
  • es3_is_missing - True if ext_source_3 is missing for the applicant.

Additionally, the train set in the last 2 pairs of files has a kfold column which has been generated using Stratified K-Fold (due to target imbalance) and can be used in a CV loop.

Tables

Numerical Features

@kaggle.defcodeking_home_loan_prediction_prepared_datasets.numerical_features
  • 2.71 KB
  • 65 rows
  • 2 columns
Loading...

CREATE TABLE numerical_features (
  "col" VARCHAR,
  "mn" VARCHAR
);

Test Le Scaled No Null

@kaggle.defcodeking_home_loan_prediction_prepared_datasets.test_le_scaled_no_null
  • 12.17 MB
  • 123005 rows
  • 127 columns
Loading...

CREATE TABLE test_le_scaled_no_null (
  "sk_id_curr" BIGINT,
  "name_contract_type" DOUBLE,
  "code_gender" DOUBLE,
  "flag_own_car" DOUBLE,
  "flag_own_realty" DOUBLE,
  "cnt_children" DOUBLE,
  "amt_income_total" DOUBLE,
  "amt_credit" DOUBLE,
  "amt_annuity" DOUBLE,
  "amt_goods_price" DOUBLE,
  "name_type_suite" DOUBLE,
  "name_income_type" DOUBLE,
  "name_education_type" DOUBLE,
  "name_family_status" DOUBLE,
  "name_housing_type" DOUBLE,
  "region_population_relative" DOUBLE,
  "days_birth" DOUBLE,
  "days_employed" DOUBLE,
  "days_registration" DOUBLE,
  "days_id_publish" DOUBLE,
  "own_car_age" DOUBLE,
  "flag_mobil" DOUBLE,
  "flag_emp_phone" DOUBLE,
  "flag_work_phone" DOUBLE,
  "flag_cont_mobile" DOUBLE,
  "flag_phone" DOUBLE,
  "flag_email" DOUBLE,
  "occupation_type" DOUBLE,
  "cnt_fam_members" DOUBLE,
  "region_rating_client" DOUBLE,
  "region_rating_client_w_city" DOUBLE,
  "weekday_appr_process_start" DOUBLE,
  "hour_appr_process_start" DOUBLE,
  "reg_region_not_live_region" DOUBLE,
  "reg_region_not_work_region" DOUBLE,
  "live_region_not_work_region" DOUBLE,
  "reg_city_not_live_city" DOUBLE,
  "reg_city_not_work_city" DOUBLE,
  "live_city_not_work_city" DOUBLE,
  "organization_type" DOUBLE,
  "ext_source_1" DOUBLE,
  "ext_source_2" DOUBLE,
  "ext_source_3" DOUBLE,
  "apartments_avg" DOUBLE,
  "basementarea_avg" DOUBLE,
  "years_beginexpluatation_avg" DOUBLE,
  "years_build_avg" DOUBLE,
  "commonarea_avg" DOUBLE,
  "elevators_avg" DOUBLE,
  "entrances_avg" DOUBLE,
  "floorsmax_avg" DOUBLE,
  "floorsmin_avg" DOUBLE,
  "landarea_avg" DOUBLE,
  "livingapartments_avg" DOUBLE,
  "livingarea_avg" DOUBLE,
  "nonlivingapartments_avg" DOUBLE,
  "nonlivingarea_avg" DOUBLE,
  "apartments_mode" DOUBLE,
  "basementarea_mode" DOUBLE,
  "years_beginexpluatation_mode" DOUBLE,
  "years_build_mode" DOUBLE,
  "commonarea_mode" DOUBLE,
  "elevators_mode" DOUBLE,
  "entrances_mode" DOUBLE,
  "floorsmax_mode" DOUBLE,
  "floorsmin_mode" DOUBLE,
  "landarea_mode" DOUBLE,
  "livingapartments_mode" DOUBLE,
  "livingarea_mode" DOUBLE,
  "nonlivingapartments_mode" DOUBLE,
  "nonlivingarea_mode" DOUBLE,
  "apartments_medi" DOUBLE,
  "basementarea_medi" DOUBLE,
  "years_beginexpluatation_medi" DOUBLE,
  "years_build_medi" DOUBLE,
  "commonarea_medi" DOUBLE,
  "elevators_medi" DOUBLE,
  "entrances_medi" DOUBLE,
  "floorsmax_medi" DOUBLE,
  "floorsmin_medi" DOUBLE,
  "landarea_medi" DOUBLE,
  "livingapartments_medi" DOUBLE,
  "livingarea_medi" DOUBLE,
  "nonlivingapartments_medi" DOUBLE,
  "nonlivingarea_medi" DOUBLE,
  "fondkapremont_mode" DOUBLE,
  "housetype_mode" DOUBLE,
  "totalarea_mode" DOUBLE,
  "wallsmaterial_mode" DOUBLE,
  "emergencystate_mode" DOUBLE,
  "obs_30_cnt_social_circle" DOUBLE,
  "def_30_cnt_social_circle" DOUBLE,
  "obs_60_cnt_social_circle" DOUBLE,
  "def_60_cnt_social_circle" DOUBLE,
  "days_last_phone_change" DOUBLE,
  "flag_document_2" DOUBLE,
  "flag_document_3" DOUBLE,
  "flag_document_4" DOUBLE,
  "flag_document_5" DOUBLE,
  "flag_document_6" DOUBLE
);

Test Ohe Scaled No Null

@kaggle.defcodeking_home_loan_prediction_prepared_datasets.test_ohe_scaled_no_null
  • 14.51 MB
  • 123005 rows
  • 345 columns
Loading...

CREATE TABLE test_ohe_scaled_no_null (
  "sk_id_curr" BIGINT,
  "amt_income_total" DOUBLE,
  "amt_credit" DOUBLE,
  "amt_annuity" DOUBLE,
  "amt_goods_price" DOUBLE,
  "region_population_relative" DOUBLE,
  "days_birth" DOUBLE,
  "days_employed" DOUBLE,
  "days_registration" DOUBLE,
  "days_id_publish" DOUBLE,
  "own_car_age" DOUBLE,
  "hour_appr_process_start" DOUBLE,
  "ext_source_1" DOUBLE,
  "ext_source_2" DOUBLE,
  "ext_source_3" DOUBLE,
  "apartments_avg" DOUBLE,
  "basementarea_avg" DOUBLE,
  "years_beginexpluatation_avg" DOUBLE,
  "years_build_avg" DOUBLE,
  "commonarea_avg" DOUBLE,
  "elevators_avg" DOUBLE,
  "entrances_avg" DOUBLE,
  "floorsmax_avg" DOUBLE,
  "floorsmin_avg" DOUBLE,
  "landarea_avg" DOUBLE,
  "livingapartments_avg" DOUBLE,
  "livingarea_avg" DOUBLE,
  "nonlivingapartments_avg" DOUBLE,
  "nonlivingarea_avg" DOUBLE,
  "apartments_mode" DOUBLE,
  "basementarea_mode" DOUBLE,
  "years_beginexpluatation_mode" DOUBLE,
  "years_build_mode" DOUBLE,
  "commonarea_mode" DOUBLE,
  "elevators_mode" DOUBLE,
  "entrances_mode" DOUBLE,
  "floorsmax_mode" DOUBLE,
  "floorsmin_mode" DOUBLE,
  "landarea_mode" DOUBLE,
  "livingapartments_mode" DOUBLE,
  "livingarea_mode" DOUBLE,
  "nonlivingapartments_mode" DOUBLE,
  "nonlivingarea_mode" DOUBLE,
  "apartments_medi" DOUBLE,
  "basementarea_medi" DOUBLE,
  "years_beginexpluatation_medi" DOUBLE,
  "years_build_medi" DOUBLE,
  "commonarea_medi" DOUBLE,
  "elevators_medi" DOUBLE,
  "entrances_medi" DOUBLE,
  "floorsmax_medi" DOUBLE,
  "floorsmin_medi" DOUBLE,
  "landarea_medi" DOUBLE,
  "livingapartments_medi" DOUBLE,
  "livingarea_medi" DOUBLE,
  "nonlivingapartments_medi" DOUBLE,
  "nonlivingarea_medi" DOUBLE,
  "totalarea_mode" DOUBLE,
  "obs_30_cnt_social_circle" DOUBLE,
  "def_30_cnt_social_circle" DOUBLE,
  "obs_60_cnt_social_circle" DOUBLE,
  "def_60_cnt_social_circle" DOUBLE,
  "days_last_phone_change" DOUBLE,
  "loan_rate" DOUBLE,
  "loan_income_ratio" DOUBLE,
  "annuity_income_ratio" DOUBLE,
  "name_contract_type_revolving_loans" DOUBLE,
  "code_gender_m" DOUBLE,
  "name_type_suite_children" DOUBLE,
  "name_type_suite_family" DOUBLE,
  "name_type_suite_group_of_people" DOUBLE,
  "name_type_suite_other_a" DOUBLE,
  "name_type_suite_other_b" DOUBLE,
  "name_type_suite_spouse_partner" DOUBLE,
  "name_type_suite_unaccompanied" DOUBLE,
  "name_income_type_businessman" DOUBLE,
  "name_income_type_commercial_associate" DOUBLE,
  "name_income_type_maternity_leave" DOUBLE,
  "name_income_type_pensioner" DOUBLE,
  "name_income_type_state_servant" DOUBLE,
  "name_income_type_student" DOUBLE,
  "name_income_type_unemployed" DOUBLE,
  "name_income_type_working" DOUBLE,
  "name_education_type_academic_degree" DOUBLE,
  "name_education_type_higher_education" DOUBLE,
  "name_education_type_incomplete_higher" DOUBLE,
  "name_education_type_lower_secondary" DOUBLE,
  "name_education_type_secondary_secondary_special" DOUBLE,
  "name_family_status_civil_marriage" DOUBLE,
  "name_family_status_married" DOUBLE,
  "name_family_status_separated" DOUBLE,
  "name_family_status_single_not_married" DOUBLE,
  "name_family_status_widow" DOUBLE,
  "name_housing_type_co_op_apartment" DOUBLE,
  "name_housing_type_house_apartment" DOUBLE,
  "name_housing_type_municipal_apartment" DOUBLE,
  "name_housing_type_office_apartment" DOUBLE,
  "name_housing_type_rented_apartment" DOUBLE,
  "name_housing_type_with_parents" DOUBLE,
  "occupation_type_accountants" DOUBLE
);

Test Scaled No Nulls

@kaggle.defcodeking_home_loan_prediction_prepared_datasets.test_scaled_no_nulls
  • 13.05 MB
  • 123005 rows
  • 127 columns
Loading...

CREATE TABLE test_scaled_no_nulls (
  "sk_id_curr" BIGINT,
  "name_contract_type" VARCHAR,
  "code_gender" VARCHAR,
  "flag_own_car" VARCHAR,
  "flag_own_realty" VARCHAR,
  "cnt_children" BIGINT,
  "amt_income_total" DOUBLE,
  "amt_credit" DOUBLE,
  "amt_annuity" DOUBLE,
  "amt_goods_price" DOUBLE,
  "name_type_suite" VARCHAR,
  "name_income_type" VARCHAR,
  "name_education_type" VARCHAR,
  "name_family_status" VARCHAR,
  "name_housing_type" VARCHAR,
  "region_population_relative" DOUBLE,
  "days_birth" DOUBLE,
  "days_employed" DOUBLE,
  "days_registration" DOUBLE,
  "days_id_publish" DOUBLE,
  "own_car_age" DOUBLE,
  "flag_mobil" BIGINT,
  "flag_emp_phone" BIGINT,
  "flag_work_phone" BIGINT,
  "flag_cont_mobile" BIGINT,
  "flag_phone" BIGINT,
  "flag_email" BIGINT,
  "occupation_type" VARCHAR,
  "cnt_fam_members" DOUBLE,
  "region_rating_client" BIGINT,
  "region_rating_client_w_city" BIGINT,
  "weekday_appr_process_start" VARCHAR,
  "hour_appr_process_start" DOUBLE,
  "reg_region_not_live_region" BIGINT,
  "reg_region_not_work_region" BIGINT,
  "live_region_not_work_region" BIGINT,
  "reg_city_not_live_city" BIGINT,
  "reg_city_not_work_city" BIGINT,
  "live_city_not_work_city" BIGINT,
  "organization_type" VARCHAR,
  "ext_source_1" DOUBLE,
  "ext_source_2" DOUBLE,
  "ext_source_3" DOUBLE,
  "apartments_avg" DOUBLE,
  "basementarea_avg" DOUBLE,
  "years_beginexpluatation_avg" DOUBLE,
  "years_build_avg" DOUBLE,
  "commonarea_avg" DOUBLE,
  "elevators_avg" DOUBLE,
  "entrances_avg" DOUBLE,
  "floorsmax_avg" DOUBLE,
  "floorsmin_avg" DOUBLE,
  "landarea_avg" DOUBLE,
  "livingapartments_avg" DOUBLE,
  "livingarea_avg" DOUBLE,
  "nonlivingapartments_avg" DOUBLE,
  "nonlivingarea_avg" DOUBLE,
  "apartments_mode" DOUBLE,
  "basementarea_mode" DOUBLE,
  "years_beginexpluatation_mode" DOUBLE,
  "years_build_mode" DOUBLE,
  "commonarea_mode" DOUBLE,
  "elevators_mode" DOUBLE,
  "entrances_mode" DOUBLE,
  "floorsmax_mode" DOUBLE,
  "floorsmin_mode" DOUBLE,
  "landarea_mode" DOUBLE,
  "livingapartments_mode" DOUBLE,
  "livingarea_mode" DOUBLE,
  "nonlivingapartments_mode" DOUBLE,
  "nonlivingarea_mode" DOUBLE,
  "apartments_medi" DOUBLE,
  "basementarea_medi" DOUBLE,
  "years_beginexpluatation_medi" DOUBLE,
  "years_build_medi" DOUBLE,
  "commonarea_medi" DOUBLE,
  "elevators_medi" DOUBLE,
  "entrances_medi" DOUBLE,
  "floorsmax_medi" DOUBLE,
  "floorsmin_medi" DOUBLE,
  "landarea_medi" DOUBLE,
  "livingapartments_medi" DOUBLE,
  "livingarea_medi" DOUBLE,
  "nonlivingapartments_medi" DOUBLE,
  "nonlivingarea_medi" DOUBLE,
  "fondkapremont_mode" VARCHAR,
  "housetype_mode" VARCHAR,
  "totalarea_mode" DOUBLE,
  "wallsmaterial_mode" VARCHAR,
  "emergencystate_mode" VARCHAR,
  "obs_30_cnt_social_circle" DOUBLE,
  "def_30_cnt_social_circle" DOUBLE,
  "obs_60_cnt_social_circle" DOUBLE,
  "def_60_cnt_social_circle" DOUBLE,
  "days_last_phone_change" DOUBLE,
  "flag_document_2" BIGINT,
  "flag_document_3" BIGINT,
  "flag_document_4" BIGINT,
  "flag_document_5" BIGINT,
  "flag_document_6" BIGINT
);

Test With New Features

@kaggle.defcodeking_home_loan_prediction_prepared_datasets.test_with_new_features
  • 11.56 MB
  • 123005 rows
  • 127 columns
Loading...

CREATE TABLE test_with_new_features (
  "sk_id_curr" BIGINT,
  "name_contract_type" VARCHAR,
  "code_gender" VARCHAR,
  "flag_own_car" VARCHAR,
  "flag_own_realty" VARCHAR,
  "cnt_children" BIGINT,
  "amt_income_total" DOUBLE,
  "amt_credit" DOUBLE,
  "amt_annuity" DOUBLE,
  "amt_goods_price" DOUBLE,
  "name_type_suite" VARCHAR,
  "name_income_type" VARCHAR,
  "name_education_type" VARCHAR,
  "name_family_status" VARCHAR,
  "name_housing_type" VARCHAR,
  "region_population_relative" DOUBLE,
  "days_birth" BIGINT,
  "days_employed" BIGINT,
  "days_registration" DOUBLE,
  "days_id_publish" BIGINT,
  "own_car_age" DOUBLE,
  "flag_mobil" BIGINT,
  "flag_emp_phone" BIGINT,
  "flag_work_phone" BIGINT,
  "flag_cont_mobile" BIGINT,
  "flag_phone" BIGINT,
  "flag_email" BIGINT,
  "occupation_type" VARCHAR,
  "cnt_fam_members" DOUBLE,
  "region_rating_client" BIGINT,
  "region_rating_client_w_city" BIGINT,
  "weekday_appr_process_start" VARCHAR,
  "hour_appr_process_start" BIGINT,
  "reg_region_not_live_region" BIGINT,
  "reg_region_not_work_region" BIGINT,
  "live_region_not_work_region" BIGINT,
  "reg_city_not_live_city" BIGINT,
  "reg_city_not_work_city" BIGINT,
  "live_city_not_work_city" BIGINT,
  "organization_type" VARCHAR,
  "ext_source_1" DOUBLE,
  "ext_source_2" DOUBLE,
  "ext_source_3" DOUBLE,
  "apartments_avg" DOUBLE,
  "basementarea_avg" DOUBLE,
  "years_beginexpluatation_avg" DOUBLE,
  "years_build_avg" DOUBLE,
  "commonarea_avg" DOUBLE,
  "elevators_avg" DOUBLE,
  "entrances_avg" DOUBLE,
  "floorsmax_avg" DOUBLE,
  "floorsmin_avg" DOUBLE,
  "landarea_avg" DOUBLE,
  "livingapartments_avg" DOUBLE,
  "livingarea_avg" DOUBLE,
  "nonlivingapartments_avg" DOUBLE,
  "nonlivingarea_avg" DOUBLE,
  "apartments_mode" DOUBLE,
  "basementarea_mode" DOUBLE,
  "years_beginexpluatation_mode" DOUBLE,
  "years_build_mode" DOUBLE,
  "commonarea_mode" DOUBLE,
  "elevators_mode" DOUBLE,
  "entrances_mode" DOUBLE,
  "floorsmax_mode" DOUBLE,
  "floorsmin_mode" DOUBLE,
  "landarea_mode" DOUBLE,
  "livingapartments_mode" DOUBLE,
  "livingarea_mode" DOUBLE,
  "nonlivingapartments_mode" DOUBLE,
  "nonlivingarea_mode" DOUBLE,
  "apartments_medi" DOUBLE,
  "basementarea_medi" DOUBLE,
  "years_beginexpluatation_medi" DOUBLE,
  "years_build_medi" DOUBLE,
  "commonarea_medi" DOUBLE,
  "elevators_medi" DOUBLE,
  "entrances_medi" DOUBLE,
  "floorsmax_medi" DOUBLE,
  "floorsmin_medi" DOUBLE,
  "landarea_medi" DOUBLE,
  "livingapartments_medi" DOUBLE,
  "livingarea_medi" DOUBLE,
  "nonlivingapartments_medi" DOUBLE,
  "nonlivingarea_medi" DOUBLE,
  "fondkapremont_mode" VARCHAR,
  "housetype_mode" VARCHAR,
  "totalarea_mode" DOUBLE,
  "wallsmaterial_mode" VARCHAR,
  "emergencystate_mode" VARCHAR,
  "obs_30_cnt_social_circle" DOUBLE,
  "def_30_cnt_social_circle" DOUBLE,
  "obs_60_cnt_social_circle" DOUBLE,
  "def_60_cnt_social_circle" DOUBLE,
  "days_last_phone_change" DOUBLE,
  "flag_document_2" BIGINT,
  "flag_document_3" BIGINT,
  "flag_document_4" BIGINT,
  "flag_document_5" BIGINT,
  "flag_document_6" BIGINT
);

Train Le Scaled No Null

@kaggle.defcodeking_home_loan_prediction_prepared_datasets.train_le_scaled_no_null
  • 18.49 MB
  • 182681 rows
  • 129 columns
Loading...

CREATE TABLE train_le_scaled_no_null (
  "sk_id_curr" BIGINT,
  "name_contract_type" DOUBLE,
  "code_gender" DOUBLE,
  "flag_own_car" DOUBLE,
  "flag_own_realty" DOUBLE,
  "cnt_children" DOUBLE,
  "amt_income_total" DOUBLE,
  "amt_credit" DOUBLE,
  "amt_annuity" DOUBLE,
  "amt_goods_price" DOUBLE,
  "name_type_suite" DOUBLE,
  "name_income_type" DOUBLE,
  "name_education_type" DOUBLE,
  "name_family_status" DOUBLE,
  "name_housing_type" DOUBLE,
  "region_population_relative" DOUBLE,
  "days_birth" DOUBLE,
  "days_employed" DOUBLE,
  "days_registration" DOUBLE,
  "days_id_publish" DOUBLE,
  "own_car_age" DOUBLE,
  "flag_mobil" DOUBLE,
  "flag_emp_phone" DOUBLE,
  "flag_work_phone" DOUBLE,
  "flag_cont_mobile" DOUBLE,
  "flag_phone" DOUBLE,
  "flag_email" DOUBLE,
  "occupation_type" DOUBLE,
  "cnt_fam_members" DOUBLE,
  "region_rating_client" DOUBLE,
  "region_rating_client_w_city" DOUBLE,
  "weekday_appr_process_start" DOUBLE,
  "hour_appr_process_start" DOUBLE,
  "reg_region_not_live_region" DOUBLE,
  "reg_region_not_work_region" DOUBLE,
  "live_region_not_work_region" DOUBLE,
  "reg_city_not_live_city" DOUBLE,
  "reg_city_not_work_city" DOUBLE,
  "live_city_not_work_city" DOUBLE,
  "organization_type" DOUBLE,
  "ext_source_1" DOUBLE,
  "ext_source_2" DOUBLE,
  "ext_source_3" DOUBLE,
  "apartments_avg" DOUBLE,
  "basementarea_avg" DOUBLE,
  "years_beginexpluatation_avg" DOUBLE,
  "years_build_avg" DOUBLE,
  "commonarea_avg" DOUBLE,
  "elevators_avg" DOUBLE,
  "entrances_avg" DOUBLE,
  "floorsmax_avg" DOUBLE,
  "floorsmin_avg" DOUBLE,
  "landarea_avg" DOUBLE,
  "livingapartments_avg" DOUBLE,
  "livingarea_avg" DOUBLE,
  "nonlivingapartments_avg" DOUBLE,
  "nonlivingarea_avg" DOUBLE,
  "apartments_mode" DOUBLE,
  "basementarea_mode" DOUBLE,
  "years_beginexpluatation_mode" DOUBLE,
  "years_build_mode" DOUBLE,
  "commonarea_mode" DOUBLE,
  "elevators_mode" DOUBLE,
  "entrances_mode" DOUBLE,
  "floorsmax_mode" DOUBLE,
  "floorsmin_mode" DOUBLE,
  "landarea_mode" DOUBLE,
  "livingapartments_mode" DOUBLE,
  "livingarea_mode" DOUBLE,
  "nonlivingapartments_mode" DOUBLE,
  "nonlivingarea_mode" DOUBLE,
  "apartments_medi" DOUBLE,
  "basementarea_medi" DOUBLE,
  "years_beginexpluatation_medi" DOUBLE,
  "years_build_medi" DOUBLE,
  "commonarea_medi" DOUBLE,
  "elevators_medi" DOUBLE,
  "entrances_medi" DOUBLE,
  "floorsmax_medi" DOUBLE,
  "floorsmin_medi" DOUBLE,
  "landarea_medi" DOUBLE,
  "livingapartments_medi" DOUBLE,
  "livingarea_medi" DOUBLE,
  "nonlivingapartments_medi" DOUBLE,
  "nonlivingarea_medi" DOUBLE,
  "fondkapremont_mode" DOUBLE,
  "housetype_mode" DOUBLE,
  "totalarea_mode" DOUBLE,
  "wallsmaterial_mode" DOUBLE,
  "emergencystate_mode" DOUBLE,
  "obs_30_cnt_social_circle" DOUBLE,
  "def_30_cnt_social_circle" DOUBLE,
  "obs_60_cnt_social_circle" DOUBLE,
  "def_60_cnt_social_circle" DOUBLE,
  "days_last_phone_change" DOUBLE,
  "flag_document_2" DOUBLE,
  "flag_document_3" DOUBLE,
  "flag_document_4" DOUBLE,
  "flag_document_5" DOUBLE,
  "flag_document_6" DOUBLE
);

Train Ohe Scaled No Null

@kaggle.defcodeking_home_loan_prediction_prepared_datasets.train_ohe_scaled_no_null
  • 21.8 MB
  • 184491 rows
  • 347 columns
Loading...

CREATE TABLE train_ohe_scaled_no_null (
  "sk_id_curr" DOUBLE,
  "amt_income_total" DOUBLE,
  "amt_credit" DOUBLE,
  "amt_annuity" DOUBLE,
  "amt_goods_price" DOUBLE,
  "region_population_relative" DOUBLE,
  "days_birth" DOUBLE,
  "days_employed" DOUBLE,
  "days_registration" DOUBLE,
  "days_id_publish" DOUBLE,
  "own_car_age" DOUBLE,
  "hour_appr_process_start" DOUBLE,
  "ext_source_1" DOUBLE,
  "ext_source_2" DOUBLE,
  "ext_source_3" DOUBLE,
  "apartments_avg" DOUBLE,
  "basementarea_avg" DOUBLE,
  "years_beginexpluatation_avg" DOUBLE,
  "years_build_avg" DOUBLE,
  "commonarea_avg" DOUBLE,
  "elevators_avg" DOUBLE,
  "entrances_avg" DOUBLE,
  "floorsmax_avg" DOUBLE,
  "floorsmin_avg" DOUBLE,
  "landarea_avg" DOUBLE,
  "livingapartments_avg" DOUBLE,
  "livingarea_avg" DOUBLE,
  "nonlivingapartments_avg" DOUBLE,
  "nonlivingarea_avg" DOUBLE,
  "apartments_mode" DOUBLE,
  "basementarea_mode" DOUBLE,
  "years_beginexpluatation_mode" DOUBLE,
  "years_build_mode" DOUBLE,
  "commonarea_mode" DOUBLE,
  "elevators_mode" DOUBLE,
  "entrances_mode" DOUBLE,
  "floorsmax_mode" DOUBLE,
  "floorsmin_mode" DOUBLE,
  "landarea_mode" DOUBLE,
  "livingapartments_mode" DOUBLE,
  "livingarea_mode" DOUBLE,
  "nonlivingapartments_mode" DOUBLE,
  "nonlivingarea_mode" DOUBLE,
  "apartments_medi" DOUBLE,
  "basementarea_medi" DOUBLE,
  "years_beginexpluatation_medi" DOUBLE,
  "years_build_medi" DOUBLE,
  "commonarea_medi" DOUBLE,
  "elevators_medi" DOUBLE,
  "entrances_medi" DOUBLE,
  "floorsmax_medi" DOUBLE,
  "floorsmin_medi" DOUBLE,
  "landarea_medi" DOUBLE,
  "livingapartments_medi" DOUBLE,
  "livingarea_medi" DOUBLE,
  "nonlivingapartments_medi" DOUBLE,
  "nonlivingarea_medi" DOUBLE,
  "totalarea_mode" DOUBLE,
  "obs_30_cnt_social_circle" DOUBLE,
  "def_30_cnt_social_circle" DOUBLE,
  "obs_60_cnt_social_circle" DOUBLE,
  "def_60_cnt_social_circle" DOUBLE,
  "days_last_phone_change" DOUBLE,
  "target" DOUBLE,
  "loan_rate" DOUBLE,
  "loan_income_ratio" DOUBLE,
  "annuity_income_ratio" DOUBLE,
  "kfold" DOUBLE,
  "name_contract_type_revolving_loans" DOUBLE,
  "code_gender_m" DOUBLE,
  "name_type_suite_children" DOUBLE,
  "name_type_suite_family" DOUBLE,
  "name_type_suite_group_of_people" DOUBLE,
  "name_type_suite_other_a" DOUBLE,
  "name_type_suite_other_b" DOUBLE,
  "name_type_suite_spouse_partner" DOUBLE,
  "name_type_suite_unaccompanied" DOUBLE,
  "name_income_type_businessman" DOUBLE,
  "name_income_type_commercial_associate" DOUBLE,
  "name_income_type_maternity_leave" DOUBLE,
  "name_income_type_pensioner" DOUBLE,
  "name_income_type_state_servant" DOUBLE,
  "name_income_type_student" DOUBLE,
  "name_income_type_unemployed" DOUBLE,
  "name_income_type_working" DOUBLE,
  "name_education_type_academic_degree" DOUBLE,
  "name_education_type_higher_education" DOUBLE,
  "name_education_type_incomplete_higher" DOUBLE,
  "name_education_type_lower_secondary" DOUBLE,
  "name_education_type_secondary_secondary_special" DOUBLE,
  "name_family_status_civil_marriage" DOUBLE,
  "name_family_status_married" DOUBLE,
  "name_family_status_separated" DOUBLE,
  "name_family_status_single_not_married" DOUBLE,
  "name_family_status_widow" DOUBLE,
  "name_housing_type_co_op_apartment" DOUBLE,
  "name_housing_type_house_apartment" DOUBLE,
  "name_housing_type_municipal_apartment" DOUBLE,
  "name_housing_type_office_apartment" DOUBLE,
  "name_housing_type_rented_apartment" DOUBLE
);

Train Scaled No Nulls

@kaggle.defcodeking_home_loan_prediction_prepared_datasets.train_scaled_no_nulls
  • 19.45 MB
  • 182681 rows
  • 129 columns
Loading...

CREATE TABLE train_scaled_no_nulls (
  "sk_id_curr" BIGINT,
  "name_contract_type" VARCHAR,
  "code_gender" VARCHAR,
  "flag_own_car" VARCHAR,
  "flag_own_realty" VARCHAR,
  "cnt_children" BIGINT,
  "amt_income_total" DOUBLE,
  "amt_credit" DOUBLE,
  "amt_annuity" DOUBLE,
  "amt_goods_price" DOUBLE,
  "name_type_suite" VARCHAR,
  "name_income_type" VARCHAR,
  "name_education_type" VARCHAR,
  "name_family_status" VARCHAR,
  "name_housing_type" VARCHAR,
  "region_population_relative" DOUBLE,
  "days_birth" DOUBLE,
  "days_employed" DOUBLE,
  "days_registration" DOUBLE,
  "days_id_publish" DOUBLE,
  "own_car_age" DOUBLE,
  "flag_mobil" BIGINT,
  "flag_emp_phone" BIGINT,
  "flag_work_phone" BIGINT,
  "flag_cont_mobile" BIGINT,
  "flag_phone" BIGINT,
  "flag_email" BIGINT,
  "occupation_type" VARCHAR,
  "cnt_fam_members" DOUBLE,
  "region_rating_client" BIGINT,
  "region_rating_client_w_city" BIGINT,
  "weekday_appr_process_start" VARCHAR,
  "hour_appr_process_start" DOUBLE,
  "reg_region_not_live_region" BIGINT,
  "reg_region_not_work_region" BIGINT,
  "live_region_not_work_region" BIGINT,
  "reg_city_not_live_city" BIGINT,
  "reg_city_not_work_city" BIGINT,
  "live_city_not_work_city" BIGINT,
  "organization_type" VARCHAR,
  "ext_source_1" DOUBLE,
  "ext_source_2" DOUBLE,
  "ext_source_3" DOUBLE,
  "apartments_avg" DOUBLE,
  "basementarea_avg" DOUBLE,
  "years_beginexpluatation_avg" DOUBLE,
  "years_build_avg" DOUBLE,
  "commonarea_avg" DOUBLE,
  "elevators_avg" DOUBLE,
  "entrances_avg" DOUBLE,
  "floorsmax_avg" DOUBLE,
  "floorsmin_avg" DOUBLE,
  "landarea_avg" DOUBLE,
  "livingapartments_avg" DOUBLE,
  "livingarea_avg" DOUBLE,
  "nonlivingapartments_avg" DOUBLE,
  "nonlivingarea_avg" DOUBLE,
  "apartments_mode" DOUBLE,
  "basementarea_mode" DOUBLE,
  "years_beginexpluatation_mode" DOUBLE,
  "years_build_mode" DOUBLE,
  "commonarea_mode" DOUBLE,
  "elevators_mode" DOUBLE,
  "entrances_mode" DOUBLE,
  "floorsmax_mode" DOUBLE,
  "floorsmin_mode" DOUBLE,
  "landarea_mode" DOUBLE,
  "livingapartments_mode" DOUBLE,
  "livingarea_mode" DOUBLE,
  "nonlivingapartments_mode" DOUBLE,
  "nonlivingarea_mode" DOUBLE,
  "apartments_medi" DOUBLE,
  "basementarea_medi" DOUBLE,
  "years_beginexpluatation_medi" DOUBLE,
  "years_build_medi" DOUBLE,
  "commonarea_medi" DOUBLE,
  "elevators_medi" DOUBLE,
  "entrances_medi" DOUBLE,
  "floorsmax_medi" DOUBLE,
  "floorsmin_medi" DOUBLE,
  "landarea_medi" DOUBLE,
  "livingapartments_medi" DOUBLE,
  "livingarea_medi" DOUBLE,
  "nonlivingapartments_medi" DOUBLE,
  "nonlivingarea_medi" DOUBLE,
  "fondkapremont_mode" VARCHAR,
  "housetype_mode" VARCHAR,
  "totalarea_mode" DOUBLE,
  "wallsmaterial_mode" VARCHAR,
  "emergencystate_mode" VARCHAR,
  "obs_30_cnt_social_circle" DOUBLE,
  "def_30_cnt_social_circle" DOUBLE,
  "obs_60_cnt_social_circle" DOUBLE,
  "def_60_cnt_social_circle" DOUBLE,
  "days_last_phone_change" DOUBLE,
  "flag_document_2" BIGINT,
  "flag_document_3" BIGINT,
  "flag_document_4" BIGINT,
  "flag_document_5" BIGINT,
  "flag_document_6" BIGINT
);

Train With New Features

@kaggle.defcodeking_home_loan_prediction_prepared_datasets.train_with_new_features
  • 17.19 MB
  • 182681 rows
  • 129 columns
Loading...

CREATE TABLE train_with_new_features (
  "sk_id_curr" BIGINT,
  "name_contract_type" VARCHAR,
  "code_gender" VARCHAR,
  "flag_own_car" VARCHAR,
  "flag_own_realty" VARCHAR,
  "cnt_children" BIGINT,
  "amt_income_total" DOUBLE,
  "amt_credit" DOUBLE,
  "amt_annuity" DOUBLE,
  "amt_goods_price" DOUBLE,
  "name_type_suite" VARCHAR,
  "name_income_type" VARCHAR,
  "name_education_type" VARCHAR,
  "name_family_status" VARCHAR,
  "name_housing_type" VARCHAR,
  "region_population_relative" DOUBLE,
  "days_birth" BIGINT,
  "days_employed" DOUBLE,
  "days_registration" DOUBLE,
  "days_id_publish" BIGINT,
  "own_car_age" DOUBLE,
  "flag_mobil" BIGINT,
  "flag_emp_phone" BIGINT,
  "flag_work_phone" BIGINT,
  "flag_cont_mobile" BIGINT,
  "flag_phone" BIGINT,
  "flag_email" BIGINT,
  "occupation_type" VARCHAR,
  "cnt_fam_members" DOUBLE,
  "region_rating_client" BIGINT,
  "region_rating_client_w_city" BIGINT,
  "weekday_appr_process_start" VARCHAR,
  "hour_appr_process_start" BIGINT,
  "reg_region_not_live_region" BIGINT,
  "reg_region_not_work_region" BIGINT,
  "live_region_not_work_region" BIGINT,
  "reg_city_not_live_city" BIGINT,
  "reg_city_not_work_city" BIGINT,
  "live_city_not_work_city" BIGINT,
  "organization_type" VARCHAR,
  "ext_source_1" DOUBLE,
  "ext_source_2" DOUBLE,
  "ext_source_3" DOUBLE,
  "apartments_avg" DOUBLE,
  "basementarea_avg" DOUBLE,
  "years_beginexpluatation_avg" DOUBLE,
  "years_build_avg" DOUBLE,
  "commonarea_avg" DOUBLE,
  "elevators_avg" DOUBLE,
  "entrances_avg" DOUBLE,
  "floorsmax_avg" DOUBLE,
  "floorsmin_avg" DOUBLE,
  "landarea_avg" DOUBLE,
  "livingapartments_avg" DOUBLE,
  "livingarea_avg" DOUBLE,
  "nonlivingapartments_avg" DOUBLE,
  "nonlivingarea_avg" DOUBLE,
  "apartments_mode" DOUBLE,
  "basementarea_mode" DOUBLE,
  "years_beginexpluatation_mode" DOUBLE,
  "years_build_mode" DOUBLE,
  "commonarea_mode" DOUBLE,
  "elevators_mode" DOUBLE,
  "entrances_mode" DOUBLE,
  "floorsmax_mode" DOUBLE,
  "floorsmin_mode" DOUBLE,
  "landarea_mode" DOUBLE,
  "livingapartments_mode" DOUBLE,
  "livingarea_mode" DOUBLE,
  "nonlivingapartments_mode" DOUBLE,
  "nonlivingarea_mode" DOUBLE,
  "apartments_medi" DOUBLE,
  "basementarea_medi" DOUBLE,
  "years_beginexpluatation_medi" DOUBLE,
  "years_build_medi" DOUBLE,
  "commonarea_medi" DOUBLE,
  "elevators_medi" DOUBLE,
  "entrances_medi" DOUBLE,
  "floorsmax_medi" DOUBLE,
  "floorsmin_medi" DOUBLE,
  "landarea_medi" DOUBLE,
  "livingapartments_medi" DOUBLE,
  "livingarea_medi" DOUBLE,
  "nonlivingapartments_medi" DOUBLE,
  "nonlivingarea_medi" DOUBLE,
  "fondkapremont_mode" VARCHAR,
  "housetype_mode" VARCHAR,
  "totalarea_mode" DOUBLE,
  "wallsmaterial_mode" VARCHAR,
  "emergencystate_mode" VARCHAR,
  "obs_30_cnt_social_circle" DOUBLE,
  "def_30_cnt_social_circle" DOUBLE,
  "obs_60_cnt_social_circle" DOUBLE,
  "def_60_cnt_social_circle" DOUBLE,
  "days_last_phone_change" DOUBLE,
  "flag_document_2" BIGINT,
  "flag_document_3" BIGINT,
  "flag_document_4" BIGINT,
  "flag_document_5" BIGINT,
  "flag_document_6" BIGINT
);

Share link

Anyone who has the link will be able to view this.