Dataset: Credit Card Fraud

About this Dataset

Credit Card Fraud

Credit Card Fraud Detection
Introduction
Credit card fraud detection is a critical challenge in the financial sector. This project aims to build a machine learning model to identify fraudulent credit card transactions using a comprehensive dataset.

Dataset Overview
The dataset contains transactions made by credit cards in September 2013 by European cardholders. It presents a significant class imbalance, with the majority of transactions being non-fraudulent.

Features:

Time: Seconds elapsed between this transaction and the first transaction in the dataset.
V1 to V28: Anonymized features resulting from a PCA transformation.
Amount: Transaction amount.
Class: Target variable (1 for fraud, 0 for non-fraud).
Steps Taken

Data Preprocessing
Standardization: Standardized numeric features to improve model performance.
Handling Imbalance: Applied SMOTE (Synthetic Minority Over-sampling Technique) to balance the dataset and ensure the model is well-trained on both classes.
Exploratory Data Analysis
Correlation Analysis: Examined correlations between features to understand relationships and their potential impact on the model.
Model Building
Algorithm Used: Random Forest Classifier, chosen for its robustness and high performance.
Hyperparameter Tuning: Employed RandomizedSearchCV to find the best hyperparameters and enhance model accuracy.
Model Evaluation
Confusion Matrix & Classification Report: Evaluated the model’s performance using key metrics such as precision, recall, F1-score, and overall accuracy.
Feature Importance: Analyzed feature importances to identify which features contribute most to detecting fraud.
Results
The model achieved outstanding performance metrics:

Accuracy: 100%
Precision, Recall, F1-score: 1.00 for both classes
Confusion Matrix:
True Negatives (TN): 9906
False Positives (FP): 8
False Negatives (FN): 9
True Positives (TP): 9757
Conclusion
This project demonstrates the effectiveness of machine learning in detecting fraudulent credit card transactions. The key steps, including data preprocessing, handling class imbalance, and hyperparameter tuning, were crucial in achieving high model performance. The feature importance analysis provided valuable insights into the key indicators of fraudulent activity.

Check out the full code and detailed analysis in the GitHub Repository.

Tables

Creditcardfraud

@kaggle.oscaryezfeijo_credit_card_fraud.creditcardfraud

13.59 MB
49692 rows
31 columns


CREATE TABLE creditcardfraud (
  "time" BIGINT,
  "v1" DOUBLE,
  "v2" DOUBLE,
  "v3" DOUBLE,
  "v4" DOUBLE,
  "v5" DOUBLE,
  "v6" DOUBLE,
  "v7" DOUBLE,
  "v8" DOUBLE,
  "v9" DOUBLE,
  "v10" DOUBLE,
  "v11" DOUBLE,
  "v12" DOUBLE,
  "v13" DOUBLE,
  "v14" DOUBLE,
  "v15" DOUBLE,
  "v16" DOUBLE,
  "v17" DOUBLE,
  "v18" DOUBLE,
  "v19" DOUBLE,
  "v20" DOUBLE,
  "v21" DOUBLE,
  "v22" DOUBLE,
  "v23" DOUBLE,
  "v24" DOUBLE,
  "v25" DOUBLE,
  "v26" DOUBLE,
  "v27" DOUBLE,
  "v28" DOUBLE,
  "amount" DOUBLE,
  "class" VARCHAR
);