Objective: Let us assume that a hypothetical company has decided to rollout one or more modules in its app and is trying to understand the impact of these new modules on the users, transaction frequency, transaction amount, complications with new modules, etc. Try out your AB Testing skills in this dataset and report back your findings!
Navigation: Both datasets show information before and after a module is introduced to an online cryptocurrency app. The df_pre_mod dataset is the one prior to the introduction of the module. While the df_post_mod is the one after the introduction of the module. I will also share the original creation notebook (for the data generation) as well as a follow-up notebook to help you kickstart your experimentation!
Overview: In this project I have artificially created synthetic datasets with the objective to closely resemble actual crypto-currency data from financial apps and/or digital wallets. This was done mainly for my own desire to experiment and study certain techniques but couldn't find the appropriate crypto data. I've decided to make it public in case this can assist other users as well.
Inspiration: Several real financial datasets and data patterns were used as inspirattion for this project in order to understand the schema, data range, data types, and underlying patterns of users and transactions.
Randomization and manipulation: However, there is absolutely no similarity with the actual, real projects. User IDs and Trading IDs have been randomized, and the data patterns have been manipulated to give the perception of real fluctuations.
Limitations: Please note that this project is for illustration purposes and in no way offers any actual customer information. There are still a few limitations in the data ranges and patterns which have been intentionally left in the code and will merit additional examination.
Further improvements: There's almost an endless stream of improvements that can be made in further iterations if users find this useful. Starting from individual statistical regression modelling for each individual variable, injection of outliers, establishing hidden underlying insights buried in the data, etc. Should you find this interesting let me know.