Baselight

Loan Approval Classification Dataset

Synthetic Data for binary classification on Loan Approval

@kaggle.taweilo_loan_approval_classification_data

Loading...
Loading...

About this Dataset

Loan Approval Classification Dataset

1. Data Source

This dataset is a synthetic version inspired by the original Credit Risk dataset on Kaggle and enriched with additional variables based on Financial Risk for Loan Approval data. SMOTENC was used to simulate new data points to enlarge the instances. The dataset is structured for both categorical and continuous features.

2. Metadata

The dataset contains 45,000 records and 14 variables, each described below:

Column Description Type
person_age Age of the person Float
person_gender Gender of the person Categorical
person_education Highest education level Categorical
person_income Annual income Float
person_emp_exp Years of employment experience Integer
person_home_ownership Home ownership status (e.g., rent, own, mortgage) Categorical
loan_amnt Loan amount requested Float
loan_intent Purpose of the loan Categorical
loan_int_rate Loan interest rate Float
loan_percent_income Loan amount as a percentage of annual income Float
cb_person_cred_hist_length Length of credit history in years Float
credit_score Credit score of the person Integer
previous_loan_defaults_on_file Indicator of previous loan defaults Categorical
loan_status (target variable) Loan approval status: 1 = approved; 0 = rejected Integer

3. Data Usage

The dataset can be used for multiple purposes:

  • Exploratory Data Analysis (EDA): Analyze key features, distribution patterns, and relationships to understand credit risk factors.
  • Classification: Build predictive models to classify the loan_status variable (approved/not approved) for potential applicants.
  • Regression: Develop regression models to predict the credit_score variable based on individual and loan-related attributes.

Mind the data issue from the original data, such as the instance > 100-year-old as age.

This dataset provides a rich basis for understanding financial risk factors and simulating predictive modeling processes for loan approval and credit scoring.

Feel free to leave comments on the discussion. I'd appreciate your upvote if you find my dataset useful! 😀

Tables

Loan Data

@kaggle.taweilo_loan_approval_classification_data.loan_data
  • 646.97 KB
  • 45000 rows
  • 14 columns
Loading...

CREATE TABLE loan_data (
  "person_age" DOUBLE,
  "person_gender" VARCHAR,
  "person_education" VARCHAR,
  "person_income" DOUBLE,
  "person_emp_exp" BIGINT,
  "person_home_ownership" VARCHAR,
  "loan_amnt" DOUBLE,
  "loan_intent" VARCHAR,
  "loan_int_rate" DOUBLE,
  "loan_percent_income" DOUBLE,
  "cb_person_cred_hist_length" DOUBLE,
  "credit_score" BIGINT,
  "previous_loan_defaults_on_file" VARCHAR,
  "loan_status" BIGINT
);

Share link

Anyone who has the link will be able to view this.