Baselight

Bank Churn Pre-processed Dataset

A complete pipeline from raw data to model-ready features 🚀

@kaggle.faizanyousafonly_bank_churn_pre_processed_dataset

About this Dataset

Bank Churn Pre-processed Dataset

Description

This dataset is a transformed and preprocessed version of the Bank Churn Dataset from a Kaggle competition. The original dataset was designed to predict customer churn in the banking industry, containing key customer attributes such as credit score, age, account balance, and activity status.

In this version, I have applied a complete data preprocessing pipeline, ensuring the dataset is cleaned, structured, and optimized for machine learning models. This includes handling missing values, encoding categorical features, scaling numerical attributes, detecting and treating outliers, and feature engineering. The processed dataset is now ready for training and evaluation, making it an ideal resource for anyone working on churn prediction, customer retention strategies, or financial analytics.

This work was inspired by the need for high-quality, well-prepared datasets that enable better model performance and reduce preprocessing time for data scientists and machine learning practitioners. 🚀

Column Descriptions

Below is the refined breakdown of the dataset columns, incorporating feature engineering and transformations:

Column Name Description Data Type
CustomerId Unique identifier for each customer. int64
Surname Last name of the customer (not used in ML modeling). object
CreditScore Customer's credit score, ranging from 350 to 850. int64
Geography Country of the customer (France, Germany, or Spain). object
Gender Gender of the customer (Male or Female). object
Age Age of the customer (18-92 years). float64
Tenure Number of years the customer has been with the bank (0-10). int64
Balance Account balance of the customer (0.0 to 250,898.09). float64
NumOfProducts Number of products the customer uses (1-4). int64
HasCrCard Whether the customer owns a credit card (1 = Yes, 0 = No). int64
IsActiveMember Whether the customer is an active bank member (1 = Yes, 0 = No). int64
EstimatedSalary Estimated annual salary of the customer (11.58 to 199,992.48). float64
Exited (Only in train_preprocessed.csv) Target variable indicating if the customer churned (1 = Yes, 0 = No). int64
AgeGroup Categorized age group (Child, Teen, Young Adult, Middle-Aged Adult, Senior). object
BalanceCategory Categorized balance levels (No Balance, 0-100K, ..., 900K-1M). object
SalaryCategory Categorized salary levels (Zero Income, Low Income, ..., Very High Income). object
CreditScoreCategory Categorized credit score (Low, Fair, Good, High, Exceptional). object

This breakdown provides a comprehensive overview of the dataset's structure and transformations. 🚀

Share link

Anyone who has the link will be able to view this.