Home Loan Prediction Prepared Datasets
Prepared datasets for home loan default risk prediction
@kaggle.defcodeking_home_loan_prediction_prepared_datasets
Prepared datasets for home loan default risk prediction
@kaggle.defcodeking_home_loan_prediction_prepared_datasets
This dataset is a prepared dataset for the AI511-HOMELOAN-2022 competiton. There are 4 pairs of files, where each pair has the training and test sets.
train_with_new_features.csv and test_with_new_features.csv - These files have all the new features that have been engineered. There is no further preprocessing. This is useful if you want to run your own experiments with missing value imputation and scaling.train_scaled_no_nulls.csv and test_scaled_no_nulls.csv - These files have the new features, all the missing values have been imputed using median for numerical features and mode for categorical features, and the numerical columns have been scaled using min-max scaling. This is useful if you want to experiment with ways to encode categorical features.train_le_scaled_no_nulls.csv and test_le_scaled_no_nulls.csv - In addition to all of the above, the categorical features have been label encoded. Note that there are some features where categories in the training set are not present in the test set and vice-versa. All such categories have been encoded with -1 in the test set.train_ohe_scaled_no_nulls.csv and test_ohe_scaled_no_nulls.csv - Same as above except categorical features are one-hot encoded.The new features that have been engineered are as follows:
loan_rate - This is the ratio between amt_credit and amt_annuity. Thus, it is a direct measure of the duration of the loan in months.loan_income_ratio - This is the ratio between amt_credit and amt_income_total. Thus, it acts as an indirect measure of the amount of liability that the applicant is taking on themselves. A higher value indicates that the loan amount is much bigger than the applicant's income, suggesting that they've taken a bigger liability than someone with a smaller ratio.annuity_income_ratio - This is the ratio between amt_annuity and amt_income_total. It is another measure of liability but measures it at a finer timescale (per month vs over multiple months) since amt_annuiity is the amount to be paid each month against the loan.application_is_incomplete - Denotes whether the applicant submitted an incomplete application or not.es1_is_missing - True if ext_source_1 is missing for the applicant.es3_is_missing - True if ext_source_3 is missing for the applicant.Additionally, the train set in the last 2 pairs of files has a kfold column which has been generated using Stratified K-Fold (due to target imbalance) and can be used in a CV loop.
CREATE TABLE numerical_features (
"col" VARCHAR,
"mn" VARCHAR
);CREATE TABLE test_le_scaled_no_null (
"sk_id_curr" BIGINT,
"name_contract_type" DOUBLE,
"code_gender" DOUBLE,
"flag_own_car" DOUBLE,
"flag_own_realty" DOUBLE,
"cnt_children" DOUBLE,
"amt_income_total" DOUBLE,
"amt_credit" DOUBLE,
"amt_annuity" DOUBLE,
"amt_goods_price" DOUBLE,
"name_type_suite" DOUBLE,
"name_income_type" DOUBLE,
"name_education_type" DOUBLE,
"name_family_status" DOUBLE,
"name_housing_type" DOUBLE,
"region_population_relative" DOUBLE,
"days_birth" DOUBLE,
"days_employed" DOUBLE,
"days_registration" DOUBLE,
"days_id_publish" DOUBLE,
"own_car_age" DOUBLE,
"flag_mobil" DOUBLE,
"flag_emp_phone" DOUBLE,
"flag_work_phone" DOUBLE,
"flag_cont_mobile" DOUBLE,
"flag_phone" DOUBLE,
"flag_email" DOUBLE,
"occupation_type" DOUBLE,
"cnt_fam_members" DOUBLE,
"region_rating_client" DOUBLE,
"region_rating_client_w_city" DOUBLE,
"weekday_appr_process_start" DOUBLE,
"hour_appr_process_start" DOUBLE,
"reg_region_not_live_region" DOUBLE,
"reg_region_not_work_region" DOUBLE,
"live_region_not_work_region" DOUBLE,
"reg_city_not_live_city" DOUBLE,
"reg_city_not_work_city" DOUBLE,
"live_city_not_work_city" DOUBLE,
"organization_type" DOUBLE,
"ext_source_1" DOUBLE,
"ext_source_2" DOUBLE,
"ext_source_3" DOUBLE,
"apartments_avg" DOUBLE,
"basementarea_avg" DOUBLE,
"years_beginexpluatation_avg" DOUBLE,
"years_build_avg" DOUBLE,
"commonarea_avg" DOUBLE,
"elevators_avg" DOUBLE,
"entrances_avg" DOUBLE,
"floorsmax_avg" DOUBLE,
"floorsmin_avg" DOUBLE,
"landarea_avg" DOUBLE,
"livingapartments_avg" DOUBLE,
"livingarea_avg" DOUBLE,
"nonlivingapartments_avg" DOUBLE,
"nonlivingarea_avg" DOUBLE,
"apartments_mode" DOUBLE,
"basementarea_mode" DOUBLE,
"years_beginexpluatation_mode" DOUBLE,
"years_build_mode" DOUBLE,
"commonarea_mode" DOUBLE,
"elevators_mode" DOUBLE,
"entrances_mode" DOUBLE,
"floorsmax_mode" DOUBLE,
"floorsmin_mode" DOUBLE,
"landarea_mode" DOUBLE,
"livingapartments_mode" DOUBLE,
"livingarea_mode" DOUBLE,
"nonlivingapartments_mode" DOUBLE,
"nonlivingarea_mode" DOUBLE,
"apartments_medi" DOUBLE,
"basementarea_medi" DOUBLE,
"years_beginexpluatation_medi" DOUBLE,
"years_build_medi" DOUBLE,
"commonarea_medi" DOUBLE,
"elevators_medi" DOUBLE,
"entrances_medi" DOUBLE,
"floorsmax_medi" DOUBLE,
"floorsmin_medi" DOUBLE,
"landarea_medi" DOUBLE,
"livingapartments_medi" DOUBLE,
"livingarea_medi" DOUBLE,
"nonlivingapartments_medi" DOUBLE,
"nonlivingarea_medi" DOUBLE,
"fondkapremont_mode" DOUBLE,
"housetype_mode" DOUBLE,
"totalarea_mode" DOUBLE,
"wallsmaterial_mode" DOUBLE,
"emergencystate_mode" DOUBLE,
"obs_30_cnt_social_circle" DOUBLE,
"def_30_cnt_social_circle" DOUBLE,
"obs_60_cnt_social_circle" DOUBLE,
"def_60_cnt_social_circle" DOUBLE,
"days_last_phone_change" DOUBLE,
"flag_document_2" DOUBLE,
"flag_document_3" DOUBLE,
"flag_document_4" DOUBLE,
"flag_document_5" DOUBLE,
"flag_document_6" DOUBLE
);CREATE TABLE test_ohe_scaled_no_null (
"sk_id_curr" BIGINT,
"amt_income_total" DOUBLE,
"amt_credit" DOUBLE,
"amt_annuity" DOUBLE,
"amt_goods_price" DOUBLE,
"region_population_relative" DOUBLE,
"days_birth" DOUBLE,
"days_employed" DOUBLE,
"days_registration" DOUBLE,
"days_id_publish" DOUBLE,
"own_car_age" DOUBLE,
"hour_appr_process_start" DOUBLE,
"ext_source_1" DOUBLE,
"ext_source_2" DOUBLE,
"ext_source_3" DOUBLE,
"apartments_avg" DOUBLE,
"basementarea_avg" DOUBLE,
"years_beginexpluatation_avg" DOUBLE,
"years_build_avg" DOUBLE,
"commonarea_avg" DOUBLE,
"elevators_avg" DOUBLE,
"entrances_avg" DOUBLE,
"floorsmax_avg" DOUBLE,
"floorsmin_avg" DOUBLE,
"landarea_avg" DOUBLE,
"livingapartments_avg" DOUBLE,
"livingarea_avg" DOUBLE,
"nonlivingapartments_avg" DOUBLE,
"nonlivingarea_avg" DOUBLE,
"apartments_mode" DOUBLE,
"basementarea_mode" DOUBLE,
"years_beginexpluatation_mode" DOUBLE,
"years_build_mode" DOUBLE,
"commonarea_mode" DOUBLE,
"elevators_mode" DOUBLE,
"entrances_mode" DOUBLE,
"floorsmax_mode" DOUBLE,
"floorsmin_mode" DOUBLE,
"landarea_mode" DOUBLE,
"livingapartments_mode" DOUBLE,
"livingarea_mode" DOUBLE,
"nonlivingapartments_mode" DOUBLE,
"nonlivingarea_mode" DOUBLE,
"apartments_medi" DOUBLE,
"basementarea_medi" DOUBLE,
"years_beginexpluatation_medi" DOUBLE,
"years_build_medi" DOUBLE,
"commonarea_medi" DOUBLE,
"elevators_medi" DOUBLE,
"entrances_medi" DOUBLE,
"floorsmax_medi" DOUBLE,
"floorsmin_medi" DOUBLE,
"landarea_medi" DOUBLE,
"livingapartments_medi" DOUBLE,
"livingarea_medi" DOUBLE,
"nonlivingapartments_medi" DOUBLE,
"nonlivingarea_medi" DOUBLE,
"totalarea_mode" DOUBLE,
"obs_30_cnt_social_circle" DOUBLE,
"def_30_cnt_social_circle" DOUBLE,
"obs_60_cnt_social_circle" DOUBLE,
"def_60_cnt_social_circle" DOUBLE,
"days_last_phone_change" DOUBLE,
"loan_rate" DOUBLE,
"loan_income_ratio" DOUBLE,
"annuity_income_ratio" DOUBLE,
"name_contract_type_revolving_loans" DOUBLE,
"code_gender_m" DOUBLE,
"name_type_suite_children" DOUBLE,
"name_type_suite_family" DOUBLE,
"name_type_suite_group_of_people" DOUBLE,
"name_type_suite_other_a" DOUBLE,
"name_type_suite_other_b" DOUBLE,
"name_type_suite_spouse_partner" DOUBLE -- Name Type Suite Spouse, Partner,
"name_type_suite_unaccompanied" DOUBLE,
"name_income_type_businessman" DOUBLE,
"name_income_type_commercial_associate" DOUBLE,
"name_income_type_maternity_leave" DOUBLE,
"name_income_type_pensioner" DOUBLE,
"name_income_type_state_servant" DOUBLE,
"name_income_type_student" DOUBLE,
"name_income_type_unemployed" DOUBLE,
"name_income_type_working" DOUBLE,
"name_education_type_academic_degree" DOUBLE,
"name_education_type_higher_education" DOUBLE,
"name_education_type_incomplete_higher" DOUBLE,
"name_education_type_lower_secondary" DOUBLE,
"name_education_type_secondary_secondary_special" DOUBLE -- Name Education Type Secondary / Secondary Special,
"name_family_status_civil_marriage" DOUBLE,
"name_family_status_married" DOUBLE,
"name_family_status_separated" DOUBLE,
"name_family_status_single_not_married" DOUBLE -- Name Family Status Single / Not Married,
"name_family_status_widow" DOUBLE,
"name_housing_type_co_op_apartment" DOUBLE,
"name_housing_type_house_apartment" DOUBLE -- Name Housing Type House / Apartment,
"name_housing_type_municipal_apartment" DOUBLE,
"name_housing_type_office_apartment" DOUBLE,
"name_housing_type_rented_apartment" DOUBLE,
"name_housing_type_with_parents" DOUBLE,
"occupation_type_accountants" DOUBLE
);CREATE TABLE test_scaled_no_nulls (
"sk_id_curr" BIGINT,
"name_contract_type" VARCHAR,
"code_gender" VARCHAR,
"flag_own_car" VARCHAR,
"flag_own_realty" VARCHAR,
"cnt_children" BIGINT,
"amt_income_total" DOUBLE,
"amt_credit" DOUBLE,
"amt_annuity" DOUBLE,
"amt_goods_price" DOUBLE,
"name_type_suite" VARCHAR,
"name_income_type" VARCHAR,
"name_education_type" VARCHAR,
"name_family_status" VARCHAR,
"name_housing_type" VARCHAR,
"region_population_relative" DOUBLE,
"days_birth" DOUBLE,
"days_employed" DOUBLE,
"days_registration" DOUBLE,
"days_id_publish" DOUBLE,
"own_car_age" DOUBLE,
"flag_mobil" BIGINT,
"flag_emp_phone" BIGINT,
"flag_work_phone" BIGINT,
"flag_cont_mobile" BIGINT,
"flag_phone" BIGINT,
"flag_email" BIGINT,
"occupation_type" VARCHAR,
"cnt_fam_members" DOUBLE,
"region_rating_client" BIGINT,
"region_rating_client_w_city" BIGINT,
"weekday_appr_process_start" VARCHAR,
"hour_appr_process_start" DOUBLE,
"reg_region_not_live_region" BIGINT,
"reg_region_not_work_region" BIGINT,
"live_region_not_work_region" BIGINT,
"reg_city_not_live_city" BIGINT,
"reg_city_not_work_city" BIGINT,
"live_city_not_work_city" BIGINT,
"organization_type" VARCHAR,
"ext_source_1" DOUBLE,
"ext_source_2" DOUBLE,
"ext_source_3" DOUBLE,
"apartments_avg" DOUBLE,
"basementarea_avg" DOUBLE,
"years_beginexpluatation_avg" DOUBLE,
"years_build_avg" DOUBLE,
"commonarea_avg" DOUBLE,
"elevators_avg" DOUBLE,
"entrances_avg" DOUBLE,
"floorsmax_avg" DOUBLE,
"floorsmin_avg" DOUBLE,
"landarea_avg" DOUBLE,
"livingapartments_avg" DOUBLE,
"livingarea_avg" DOUBLE,
"nonlivingapartments_avg" DOUBLE,
"nonlivingarea_avg" DOUBLE,
"apartments_mode" DOUBLE,
"basementarea_mode" DOUBLE,
"years_beginexpluatation_mode" DOUBLE,
"years_build_mode" DOUBLE,
"commonarea_mode" DOUBLE,
"elevators_mode" DOUBLE,
"entrances_mode" DOUBLE,
"floorsmax_mode" DOUBLE,
"floorsmin_mode" DOUBLE,
"landarea_mode" DOUBLE,
"livingapartments_mode" DOUBLE,
"livingarea_mode" DOUBLE,
"nonlivingapartments_mode" DOUBLE,
"nonlivingarea_mode" DOUBLE,
"apartments_medi" DOUBLE,
"basementarea_medi" DOUBLE,
"years_beginexpluatation_medi" DOUBLE,
"years_build_medi" DOUBLE,
"commonarea_medi" DOUBLE,
"elevators_medi" DOUBLE,
"entrances_medi" DOUBLE,
"floorsmax_medi" DOUBLE,
"floorsmin_medi" DOUBLE,
"landarea_medi" DOUBLE,
"livingapartments_medi" DOUBLE,
"livingarea_medi" DOUBLE,
"nonlivingapartments_medi" DOUBLE,
"nonlivingarea_medi" DOUBLE,
"fondkapremont_mode" VARCHAR,
"housetype_mode" VARCHAR,
"totalarea_mode" DOUBLE,
"wallsmaterial_mode" VARCHAR,
"emergencystate_mode" VARCHAR,
"obs_30_cnt_social_circle" DOUBLE,
"def_30_cnt_social_circle" DOUBLE,
"obs_60_cnt_social_circle" DOUBLE,
"def_60_cnt_social_circle" DOUBLE,
"days_last_phone_change" DOUBLE,
"flag_document_2" BIGINT,
"flag_document_3" BIGINT,
"flag_document_4" BIGINT,
"flag_document_5" BIGINT,
"flag_document_6" BIGINT
);CREATE TABLE test_with_new_features (
"sk_id_curr" BIGINT,
"name_contract_type" VARCHAR,
"code_gender" VARCHAR,
"flag_own_car" VARCHAR,
"flag_own_realty" VARCHAR,
"cnt_children" BIGINT,
"amt_income_total" DOUBLE,
"amt_credit" DOUBLE,
"amt_annuity" DOUBLE,
"amt_goods_price" DOUBLE,
"name_type_suite" VARCHAR,
"name_income_type" VARCHAR,
"name_education_type" VARCHAR,
"name_family_status" VARCHAR,
"name_housing_type" VARCHAR,
"region_population_relative" DOUBLE,
"days_birth" BIGINT,
"days_employed" BIGINT,
"days_registration" DOUBLE,
"days_id_publish" BIGINT,
"own_car_age" DOUBLE,
"flag_mobil" BIGINT,
"flag_emp_phone" BIGINT,
"flag_work_phone" BIGINT,
"flag_cont_mobile" BIGINT,
"flag_phone" BIGINT,
"flag_email" BIGINT,
"occupation_type" VARCHAR,
"cnt_fam_members" DOUBLE,
"region_rating_client" BIGINT,
"region_rating_client_w_city" BIGINT,
"weekday_appr_process_start" VARCHAR,
"hour_appr_process_start" BIGINT,
"reg_region_not_live_region" BIGINT,
"reg_region_not_work_region" BIGINT,
"live_region_not_work_region" BIGINT,
"reg_city_not_live_city" BIGINT,
"reg_city_not_work_city" BIGINT,
"live_city_not_work_city" BIGINT,
"organization_type" VARCHAR,
"ext_source_1" DOUBLE,
"ext_source_2" DOUBLE,
"ext_source_3" DOUBLE,
"apartments_avg" DOUBLE,
"basementarea_avg" DOUBLE,
"years_beginexpluatation_avg" DOUBLE,
"years_build_avg" DOUBLE,
"commonarea_avg" DOUBLE,
"elevators_avg" DOUBLE,
"entrances_avg" DOUBLE,
"floorsmax_avg" DOUBLE,
"floorsmin_avg" DOUBLE,
"landarea_avg" DOUBLE,
"livingapartments_avg" DOUBLE,
"livingarea_avg" DOUBLE,
"nonlivingapartments_avg" DOUBLE,
"nonlivingarea_avg" DOUBLE,
"apartments_mode" DOUBLE,
"basementarea_mode" DOUBLE,
"years_beginexpluatation_mode" DOUBLE,
"years_build_mode" DOUBLE,
"commonarea_mode" DOUBLE,
"elevators_mode" DOUBLE,
"entrances_mode" DOUBLE,
"floorsmax_mode" DOUBLE,
"floorsmin_mode" DOUBLE,
"landarea_mode" DOUBLE,
"livingapartments_mode" DOUBLE,
"livingarea_mode" DOUBLE,
"nonlivingapartments_mode" DOUBLE,
"nonlivingarea_mode" DOUBLE,
"apartments_medi" DOUBLE,
"basementarea_medi" DOUBLE,
"years_beginexpluatation_medi" DOUBLE,
"years_build_medi" DOUBLE,
"commonarea_medi" DOUBLE,
"elevators_medi" DOUBLE,
"entrances_medi" DOUBLE,
"floorsmax_medi" DOUBLE,
"floorsmin_medi" DOUBLE,
"landarea_medi" DOUBLE,
"livingapartments_medi" DOUBLE,
"livingarea_medi" DOUBLE,
"nonlivingapartments_medi" DOUBLE,
"nonlivingarea_medi" DOUBLE,
"fondkapremont_mode" VARCHAR,
"housetype_mode" VARCHAR,
"totalarea_mode" DOUBLE,
"wallsmaterial_mode" VARCHAR,
"emergencystate_mode" VARCHAR,
"obs_30_cnt_social_circle" DOUBLE,
"def_30_cnt_social_circle" DOUBLE,
"obs_60_cnt_social_circle" DOUBLE,
"def_60_cnt_social_circle" DOUBLE,
"days_last_phone_change" DOUBLE,
"flag_document_2" BIGINT,
"flag_document_3" BIGINT,
"flag_document_4" BIGINT,
"flag_document_5" BIGINT,
"flag_document_6" BIGINT
);CREATE TABLE train_le_scaled_no_null (
"sk_id_curr" BIGINT,
"name_contract_type" DOUBLE,
"code_gender" DOUBLE,
"flag_own_car" DOUBLE,
"flag_own_realty" DOUBLE,
"cnt_children" DOUBLE,
"amt_income_total" DOUBLE,
"amt_credit" DOUBLE,
"amt_annuity" DOUBLE,
"amt_goods_price" DOUBLE,
"name_type_suite" DOUBLE,
"name_income_type" DOUBLE,
"name_education_type" DOUBLE,
"name_family_status" DOUBLE,
"name_housing_type" DOUBLE,
"region_population_relative" DOUBLE,
"days_birth" DOUBLE,
"days_employed" DOUBLE,
"days_registration" DOUBLE,
"days_id_publish" DOUBLE,
"own_car_age" DOUBLE,
"flag_mobil" DOUBLE,
"flag_emp_phone" DOUBLE,
"flag_work_phone" DOUBLE,
"flag_cont_mobile" DOUBLE,
"flag_phone" DOUBLE,
"flag_email" DOUBLE,
"occupation_type" DOUBLE,
"cnt_fam_members" DOUBLE,
"region_rating_client" DOUBLE,
"region_rating_client_w_city" DOUBLE,
"weekday_appr_process_start" DOUBLE,
"hour_appr_process_start" DOUBLE,
"reg_region_not_live_region" DOUBLE,
"reg_region_not_work_region" DOUBLE,
"live_region_not_work_region" DOUBLE,
"reg_city_not_live_city" DOUBLE,
"reg_city_not_work_city" DOUBLE,
"live_city_not_work_city" DOUBLE,
"organization_type" DOUBLE,
"ext_source_1" DOUBLE,
"ext_source_2" DOUBLE,
"ext_source_3" DOUBLE,
"apartments_avg" DOUBLE,
"basementarea_avg" DOUBLE,
"years_beginexpluatation_avg" DOUBLE,
"years_build_avg" DOUBLE,
"commonarea_avg" DOUBLE,
"elevators_avg" DOUBLE,
"entrances_avg" DOUBLE,
"floorsmax_avg" DOUBLE,
"floorsmin_avg" DOUBLE,
"landarea_avg" DOUBLE,
"livingapartments_avg" DOUBLE,
"livingarea_avg" DOUBLE,
"nonlivingapartments_avg" DOUBLE,
"nonlivingarea_avg" DOUBLE,
"apartments_mode" DOUBLE,
"basementarea_mode" DOUBLE,
"years_beginexpluatation_mode" DOUBLE,
"years_build_mode" DOUBLE,
"commonarea_mode" DOUBLE,
"elevators_mode" DOUBLE,
"entrances_mode" DOUBLE,
"floorsmax_mode" DOUBLE,
"floorsmin_mode" DOUBLE,
"landarea_mode" DOUBLE,
"livingapartments_mode" DOUBLE,
"livingarea_mode" DOUBLE,
"nonlivingapartments_mode" DOUBLE,
"nonlivingarea_mode" DOUBLE,
"apartments_medi" DOUBLE,
"basementarea_medi" DOUBLE,
"years_beginexpluatation_medi" DOUBLE,
"years_build_medi" DOUBLE,
"commonarea_medi" DOUBLE,
"elevators_medi" DOUBLE,
"entrances_medi" DOUBLE,
"floorsmax_medi" DOUBLE,
"floorsmin_medi" DOUBLE,
"landarea_medi" DOUBLE,
"livingapartments_medi" DOUBLE,
"livingarea_medi" DOUBLE,
"nonlivingapartments_medi" DOUBLE,
"nonlivingarea_medi" DOUBLE,
"fondkapremont_mode" DOUBLE,
"housetype_mode" DOUBLE,
"totalarea_mode" DOUBLE,
"wallsmaterial_mode" DOUBLE,
"emergencystate_mode" DOUBLE,
"obs_30_cnt_social_circle" DOUBLE,
"def_30_cnt_social_circle" DOUBLE,
"obs_60_cnt_social_circle" DOUBLE,
"def_60_cnt_social_circle" DOUBLE,
"days_last_phone_change" DOUBLE,
"flag_document_2" DOUBLE,
"flag_document_3" DOUBLE,
"flag_document_4" DOUBLE,
"flag_document_5" DOUBLE,
"flag_document_6" DOUBLE
);CREATE TABLE train_ohe_scaled_no_null (
"sk_id_curr" DOUBLE,
"amt_income_total" DOUBLE,
"amt_credit" DOUBLE,
"amt_annuity" DOUBLE,
"amt_goods_price" DOUBLE,
"region_population_relative" DOUBLE,
"days_birth" DOUBLE,
"days_employed" DOUBLE,
"days_registration" DOUBLE,
"days_id_publish" DOUBLE,
"own_car_age" DOUBLE,
"hour_appr_process_start" DOUBLE,
"ext_source_1" DOUBLE,
"ext_source_2" DOUBLE,
"ext_source_3" DOUBLE,
"apartments_avg" DOUBLE,
"basementarea_avg" DOUBLE,
"years_beginexpluatation_avg" DOUBLE,
"years_build_avg" DOUBLE,
"commonarea_avg" DOUBLE,
"elevators_avg" DOUBLE,
"entrances_avg" DOUBLE,
"floorsmax_avg" DOUBLE,
"floorsmin_avg" DOUBLE,
"landarea_avg" DOUBLE,
"livingapartments_avg" DOUBLE,
"livingarea_avg" DOUBLE,
"nonlivingapartments_avg" DOUBLE,
"nonlivingarea_avg" DOUBLE,
"apartments_mode" DOUBLE,
"basementarea_mode" DOUBLE,
"years_beginexpluatation_mode" DOUBLE,
"years_build_mode" DOUBLE,
"commonarea_mode" DOUBLE,
"elevators_mode" DOUBLE,
"entrances_mode" DOUBLE,
"floorsmax_mode" DOUBLE,
"floorsmin_mode" DOUBLE,
"landarea_mode" DOUBLE,
"livingapartments_mode" DOUBLE,
"livingarea_mode" DOUBLE,
"nonlivingapartments_mode" DOUBLE,
"nonlivingarea_mode" DOUBLE,
"apartments_medi" DOUBLE,
"basementarea_medi" DOUBLE,
"years_beginexpluatation_medi" DOUBLE,
"years_build_medi" DOUBLE,
"commonarea_medi" DOUBLE,
"elevators_medi" DOUBLE,
"entrances_medi" DOUBLE,
"floorsmax_medi" DOUBLE,
"floorsmin_medi" DOUBLE,
"landarea_medi" DOUBLE,
"livingapartments_medi" DOUBLE,
"livingarea_medi" DOUBLE,
"nonlivingapartments_medi" DOUBLE,
"nonlivingarea_medi" DOUBLE,
"totalarea_mode" DOUBLE,
"obs_30_cnt_social_circle" DOUBLE,
"def_30_cnt_social_circle" DOUBLE,
"obs_60_cnt_social_circle" DOUBLE,
"def_60_cnt_social_circle" DOUBLE,
"days_last_phone_change" DOUBLE,
"target" DOUBLE,
"loan_rate" DOUBLE,
"loan_income_ratio" DOUBLE,
"annuity_income_ratio" DOUBLE,
"kfold" DOUBLE,
"name_contract_type_revolving_loans" DOUBLE,
"code_gender_m" DOUBLE,
"name_type_suite_children" DOUBLE,
"name_type_suite_family" DOUBLE,
"name_type_suite_group_of_people" DOUBLE,
"name_type_suite_other_a" DOUBLE,
"name_type_suite_other_b" DOUBLE,
"name_type_suite_spouse_partner" DOUBLE -- Name Type Suite Spouse, Partner,
"name_type_suite_unaccompanied" DOUBLE,
"name_income_type_businessman" DOUBLE,
"name_income_type_commercial_associate" DOUBLE,
"name_income_type_maternity_leave" DOUBLE,
"name_income_type_pensioner" DOUBLE,
"name_income_type_state_servant" DOUBLE,
"name_income_type_student" DOUBLE,
"name_income_type_unemployed" DOUBLE,
"name_income_type_working" DOUBLE,
"name_education_type_academic_degree" DOUBLE,
"name_education_type_higher_education" DOUBLE,
"name_education_type_incomplete_higher" DOUBLE,
"name_education_type_lower_secondary" DOUBLE,
"name_education_type_secondary_secondary_special" DOUBLE -- Name Education Type Secondary / Secondary Special,
"name_family_status_civil_marriage" DOUBLE,
"name_family_status_married" DOUBLE,
"name_family_status_separated" DOUBLE,
"name_family_status_single_not_married" DOUBLE -- Name Family Status Single / Not Married,
"name_family_status_widow" DOUBLE,
"name_housing_type_co_op_apartment" DOUBLE,
"name_housing_type_house_apartment" DOUBLE -- Name Housing Type House / Apartment,
"name_housing_type_municipal_apartment" DOUBLE,
"name_housing_type_office_apartment" DOUBLE,
"name_housing_type_rented_apartment" DOUBLE
);CREATE TABLE train_scaled_no_nulls (
"sk_id_curr" BIGINT,
"name_contract_type" VARCHAR,
"code_gender" VARCHAR,
"flag_own_car" VARCHAR,
"flag_own_realty" VARCHAR,
"cnt_children" BIGINT,
"amt_income_total" DOUBLE,
"amt_credit" DOUBLE,
"amt_annuity" DOUBLE,
"amt_goods_price" DOUBLE,
"name_type_suite" VARCHAR,
"name_income_type" VARCHAR,
"name_education_type" VARCHAR,
"name_family_status" VARCHAR,
"name_housing_type" VARCHAR,
"region_population_relative" DOUBLE,
"days_birth" DOUBLE,
"days_employed" DOUBLE,
"days_registration" DOUBLE,
"days_id_publish" DOUBLE,
"own_car_age" DOUBLE,
"flag_mobil" BIGINT,
"flag_emp_phone" BIGINT,
"flag_work_phone" BIGINT,
"flag_cont_mobile" BIGINT,
"flag_phone" BIGINT,
"flag_email" BIGINT,
"occupation_type" VARCHAR,
"cnt_fam_members" DOUBLE,
"region_rating_client" BIGINT,
"region_rating_client_w_city" BIGINT,
"weekday_appr_process_start" VARCHAR,
"hour_appr_process_start" DOUBLE,
"reg_region_not_live_region" BIGINT,
"reg_region_not_work_region" BIGINT,
"live_region_not_work_region" BIGINT,
"reg_city_not_live_city" BIGINT,
"reg_city_not_work_city" BIGINT,
"live_city_not_work_city" BIGINT,
"organization_type" VARCHAR,
"ext_source_1" DOUBLE,
"ext_source_2" DOUBLE,
"ext_source_3" DOUBLE,
"apartments_avg" DOUBLE,
"basementarea_avg" DOUBLE,
"years_beginexpluatation_avg" DOUBLE,
"years_build_avg" DOUBLE,
"commonarea_avg" DOUBLE,
"elevators_avg" DOUBLE,
"entrances_avg" DOUBLE,
"floorsmax_avg" DOUBLE,
"floorsmin_avg" DOUBLE,
"landarea_avg" DOUBLE,
"livingapartments_avg" DOUBLE,
"livingarea_avg" DOUBLE,
"nonlivingapartments_avg" DOUBLE,
"nonlivingarea_avg" DOUBLE,
"apartments_mode" DOUBLE,
"basementarea_mode" DOUBLE,
"years_beginexpluatation_mode" DOUBLE,
"years_build_mode" DOUBLE,
"commonarea_mode" DOUBLE,
"elevators_mode" DOUBLE,
"entrances_mode" DOUBLE,
"floorsmax_mode" DOUBLE,
"floorsmin_mode" DOUBLE,
"landarea_mode" DOUBLE,
"livingapartments_mode" DOUBLE,
"livingarea_mode" DOUBLE,
"nonlivingapartments_mode" DOUBLE,
"nonlivingarea_mode" DOUBLE,
"apartments_medi" DOUBLE,
"basementarea_medi" DOUBLE,
"years_beginexpluatation_medi" DOUBLE,
"years_build_medi" DOUBLE,
"commonarea_medi" DOUBLE,
"elevators_medi" DOUBLE,
"entrances_medi" DOUBLE,
"floorsmax_medi" DOUBLE,
"floorsmin_medi" DOUBLE,
"landarea_medi" DOUBLE,
"livingapartments_medi" DOUBLE,
"livingarea_medi" DOUBLE,
"nonlivingapartments_medi" DOUBLE,
"nonlivingarea_medi" DOUBLE,
"fondkapremont_mode" VARCHAR,
"housetype_mode" VARCHAR,
"totalarea_mode" DOUBLE,
"wallsmaterial_mode" VARCHAR,
"emergencystate_mode" VARCHAR,
"obs_30_cnt_social_circle" DOUBLE,
"def_30_cnt_social_circle" DOUBLE,
"obs_60_cnt_social_circle" DOUBLE,
"def_60_cnt_social_circle" DOUBLE,
"days_last_phone_change" DOUBLE,
"flag_document_2" BIGINT,
"flag_document_3" BIGINT,
"flag_document_4" BIGINT,
"flag_document_5" BIGINT,
"flag_document_6" BIGINT
);CREATE TABLE train_with_new_features (
"sk_id_curr" BIGINT,
"name_contract_type" VARCHAR,
"code_gender" VARCHAR,
"flag_own_car" VARCHAR,
"flag_own_realty" VARCHAR,
"cnt_children" BIGINT,
"amt_income_total" DOUBLE,
"amt_credit" DOUBLE,
"amt_annuity" DOUBLE,
"amt_goods_price" DOUBLE,
"name_type_suite" VARCHAR,
"name_income_type" VARCHAR,
"name_education_type" VARCHAR,
"name_family_status" VARCHAR,
"name_housing_type" VARCHAR,
"region_population_relative" DOUBLE,
"days_birth" BIGINT,
"days_employed" DOUBLE,
"days_registration" DOUBLE,
"days_id_publish" BIGINT,
"own_car_age" DOUBLE,
"flag_mobil" BIGINT,
"flag_emp_phone" BIGINT,
"flag_work_phone" BIGINT,
"flag_cont_mobile" BIGINT,
"flag_phone" BIGINT,
"flag_email" BIGINT,
"occupation_type" VARCHAR,
"cnt_fam_members" DOUBLE,
"region_rating_client" BIGINT,
"region_rating_client_w_city" BIGINT,
"weekday_appr_process_start" VARCHAR,
"hour_appr_process_start" BIGINT,
"reg_region_not_live_region" BIGINT,
"reg_region_not_work_region" BIGINT,
"live_region_not_work_region" BIGINT,
"reg_city_not_live_city" BIGINT,
"reg_city_not_work_city" BIGINT,
"live_city_not_work_city" BIGINT,
"organization_type" VARCHAR,
"ext_source_1" DOUBLE,
"ext_source_2" DOUBLE,
"ext_source_3" DOUBLE,
"apartments_avg" DOUBLE,
"basementarea_avg" DOUBLE,
"years_beginexpluatation_avg" DOUBLE,
"years_build_avg" DOUBLE,
"commonarea_avg" DOUBLE,
"elevators_avg" DOUBLE,
"entrances_avg" DOUBLE,
"floorsmax_avg" DOUBLE,
"floorsmin_avg" DOUBLE,
"landarea_avg" DOUBLE,
"livingapartments_avg" DOUBLE,
"livingarea_avg" DOUBLE,
"nonlivingapartments_avg" DOUBLE,
"nonlivingarea_avg" DOUBLE,
"apartments_mode" DOUBLE,
"basementarea_mode" DOUBLE,
"years_beginexpluatation_mode" DOUBLE,
"years_build_mode" DOUBLE,
"commonarea_mode" DOUBLE,
"elevators_mode" DOUBLE,
"entrances_mode" DOUBLE,
"floorsmax_mode" DOUBLE,
"floorsmin_mode" DOUBLE,
"landarea_mode" DOUBLE,
"livingapartments_mode" DOUBLE,
"livingarea_mode" DOUBLE,
"nonlivingapartments_mode" DOUBLE,
"nonlivingarea_mode" DOUBLE,
"apartments_medi" DOUBLE,
"basementarea_medi" DOUBLE,
"years_beginexpluatation_medi" DOUBLE,
"years_build_medi" DOUBLE,
"commonarea_medi" DOUBLE,
"elevators_medi" DOUBLE,
"entrances_medi" DOUBLE,
"floorsmax_medi" DOUBLE,
"floorsmin_medi" DOUBLE,
"landarea_medi" DOUBLE,
"livingapartments_medi" DOUBLE,
"livingarea_medi" DOUBLE,
"nonlivingapartments_medi" DOUBLE,
"nonlivingarea_medi" DOUBLE,
"fondkapremont_mode" VARCHAR,
"housetype_mode" VARCHAR,
"totalarea_mode" DOUBLE,
"wallsmaterial_mode" VARCHAR,
"emergencystate_mode" VARCHAR,
"obs_30_cnt_social_circle" DOUBLE,
"def_30_cnt_social_circle" DOUBLE,
"obs_60_cnt_social_circle" DOUBLE,
"def_60_cnt_social_circle" DOUBLE,
"days_last_phone_change" DOUBLE,
"flag_document_2" BIGINT,
"flag_document_3" BIGINT,
"flag_document_4" BIGINT,
"flag_document_5" BIGINT,
"flag_document_6" BIGINT
);Anyone who has the link will be able to view this.