Bank Churn Pre-processed Dataset
A complete pipeline from raw data to model-ready features 🚀
@kaggle.faizanyousafonly_bank_churn_pre_processed_dataset
A complete pipeline from raw data to model-ready features 🚀
@kaggle.faizanyousafonly_bank_churn_pre_processed_dataset
This dataset is a transformed and preprocessed version of the Bank Churn Dataset from a Kaggle competition. The original dataset was designed to predict customer churn in the banking industry, containing key customer attributes such as credit score, age, account balance, and activity status.
In this version, I have applied a complete data preprocessing pipeline, ensuring the dataset is cleaned, structured, and optimized for machine learning models. This includes handling missing values, encoding categorical features, scaling numerical attributes, detecting and treating outliers, and feature engineering. The processed dataset is now ready for training and evaluation, making it an ideal resource for anyone working on churn prediction, customer retention strategies, or financial analytics.
This work was inspired by the need for high-quality, well-prepared datasets that enable better model performance and reduce preprocessing time for data scientists and machine learning practitioners. 🚀
Below is the refined breakdown of the dataset columns, incorporating feature engineering and transformations:
Column Name | Description | Data Type |
---|---|---|
CustomerId | Unique identifier for each customer. | int64 |
Surname | Last name of the customer (not used in ML modeling). | object |
CreditScore | Customer's credit score, ranging from 350 to 850. | int64 |
Geography | Country of the customer (France , Germany , or Spain ). |
object |
Gender | Gender of the customer (Male or Female ). |
object |
Age | Age of the customer (18-92 years). | float64 |
Tenure | Number of years the customer has been with the bank (0-10). | int64 |
Balance | Account balance of the customer (0.0 to 250,898.09). | float64 |
NumOfProducts | Number of products the customer uses (1-4). | int64 |
HasCrCard | Whether the customer owns a credit card (1 = Yes, 0 = No ). |
int64 |
IsActiveMember | Whether the customer is an active bank member (1 = Yes, 0 = No ). |
int64 |
EstimatedSalary | Estimated annual salary of the customer (11.58 to 199,992.48). | float64 |
Exited (Only in train_preprocessed.csv) | Target variable indicating if the customer churned (1 = Yes, 0 = No ). |
int64 |
AgeGroup | Categorized age group (Child , Teen , Young Adult , Middle-Aged Adult , Senior ). |
object |
BalanceCategory | Categorized balance levels (No Balance , 0-100K , ..., 900K-1M ). |
object |
SalaryCategory | Categorized salary levels (Zero Income , Low Income , ..., Very High Income ). |
object |
CreditScoreCategory | Categorized credit score (Low , Fair , Good , High , Exceptional ). |
object |
This breakdown provides a comprehensive overview of the dataset's structure and transformations. 🚀
Anyone who has the link will be able to view this.