Baselight
Sign In
kaggle

Credit Card Fraud Detection Dataset

Kaggle

@kaggle.rehanliaqat17_creidct_card

Loading...
Loading...

An Anonymized Dataset for Credit Card Fraud Detection and Transaction Analysis

Dataset Description

The dataset consists of 284,807 credit card transactions made by European cardholders over a period of two days. Each transaction is described using 31 features, most of which have been transformed through Principal Component Analysis (PCA) to ensure data privacy and confidentiality.
The features labeled V1 to V28 represent these PCA-transformed components and capture underlying transaction patterns without exposing sensitive financial information. In addition to these features, the dataset includes Time, which indicates the elapsed time between consecutive transactions, and Amount, which represents the monetary value of each transaction.
This dataset contains real-world credit card transaction data collected for the purpose of detecting fraudulent activities. It is a large-scale, clean, and highly imbalanced dataset commonly used for binary classification problems in fraud detection. The main objective is to accurately distinguish between legitimate and fraudulent transactions using anonymized numerical features.

  • Real-world credit card transaction data
  • Designed for fraud detection (binary classification)
  • Highly imbalanced target variable
  • Clean dataset with no missing values
  • Widely used for machine learning benchmarking

This dataset contains real-world credit card transaction data collected for the purpose of detecting fraudulent activities. It is a large-scale, clean, and highly imbalanced dataset commonly used for binary classification problems in fraud detection. The main objective is to accurately distinguish between legitimate and fraudulent transactions using anonymized numerical features.
The target column, Class, identifies whether a transaction is legitimate (0) or fraudulent (1). Fraudulent transactions are extremely rare, accounting for a very small percentage of the total data, which makes this dataset particularly valuable for studying class imbalance, anomaly detection, and precision-recall optimization in machine learning models.
Because of its realistic structure, privacy-preserving features, and challenging imbalance nature, this dataset is extensively used in academic research, industry projects, and data science competitions.


Related Datasets

Share link

Anyone who has the link will be able to view this.