Financial Fraud Detection Dataset
Synthetic Mobile Money Transactions for Fraud Detection Research
@kaggle.sriharshaeedala_financial_fraud_detection_dataset
Synthetic Mobile Money Transactions for Fraud Detection Research
@kaggle.sriharshaeedala_financial_fraud_detection_dataset
This dataset presents a synthetic representation of mobile money transactions, meticulously crafted to mirror the complexities of real-world financial activities while integrating fraudulent behaviors for research purposes. Derived from a simulator named PaySim, which utilizes aggregated data from actual financial logs of a mobile money service in an African country, this dataset aims to fill the gap in publicly available financial datasets for fraud detection studies. It encompasses a variety of transaction types including CASH-IN, CASH-OUT, DEBIT, PAYMENT, and TRANSFER over a simulated period of 30 days, providing a comprehensive environment for evaluating fraud detection methodologies. By addressing the intrinsic privacy concerns associated with financial transactions, this dataset offers a unique resource for researchers and analysts in the field of financial security and fraud detection, scaled to 1/4 of the original dataset size for efficient use within the Kaggle platform. Please note that transactions marked as fraudulent have been nullified, emphasizing the importance of non-balance columns for fraud analysis. This dataset is a contribution to the field from the "Scalable resource-efficient systems for big data analytics" project, funded by the Knowledge Foundation in Sweden.
PaySim synthesizes mobile money transactions using data derived from a month's worth of financial logs from a mobile money service operating in an African country. These logs were provided by a multinational company that offers this financial service across more than 14 countries globally.
This synthetic dataset has been scaled to one-quarter the size of the original dataset and is specifically tailored for Kaggle.
Important Note: Transactions identified as fraudulent are annulled. Hence, for fraud detection analysis, the following columns should not be utilized: oldbalanceOrg, newbalanceOrig, oldbalanceDest, newbalanceDest.
This dataset has been generated through multiple runs of the PaySim simulator, each simulating a month of real-time transactions over 744 steps. Each run produced approximately 24 million financial records across the five transaction categories.
This project is part of the "Scalable resource-efficient systems for big data analytics" research, supported by the Knowledge Foundation (grant: 20140032) in Sweden.
For citations and further references, please use:
E. A. Lopez-Rojas, A. Elmir, and S. Axelsson. "PaySim: A financial mobile money simulator for fraud detection". In: The 28th European Modeling and Simulation Symposium-EMSS, Larnaca, Cyprus. 2016
Anyone who has the link will be able to view this.