Project is not maintained anymore and probably will never be again. Whole python code used for data scrapping is pasted into a notebook attached to this project. I'm sorry for the inconvenience.
Context
Data warehouse has been created as a University project throughout 3 months (march - may 2018).
I haven't find any useful databases containing historical prices of many computer parts (CPUs, GPUs and RAMs), so I had to web scrap it from web prices comparison engines:
For webscrapping I've used Python with BeautifulSoup
library as well as PCPartPicker-API
.
Stuff that I'd like to add but I have no time to do it:
- Compare RAM prices with raising demand for memory chips in smartphones industry and Data Centers for cloud computing
- Scrap more price comparison engines
- Add new dimension for
DIM_PROD_GPU
-> Series
. I've made it too shallow and my analysis wasn't so great, additional step between Manufacturer
and Product Name
would make it easier for analytics.
Content
Database contains data about:
- 15 most popular cryptocurrencies and their rates
- 1664 CPU Products
- 2054 GPU Products
- 3706 RAM Products
- 6 000 000+ records containing products historical prices
Acknowledgements
Few redditers who helped me to find some libraries and datasets to get inspired with.
My request post on r/datasets
Inspiration
As a gamer I've seen a huge prices spikes on a GPU market and as I haven't find too many analysis about this phenomenon and because it's very fresh I thought that might be a good topic to make a Uni project for a Data Warehouse course.
Photo Source
I release this data online so it won't waste. Maybe you will find some more interesting results that I did.
Installation
Uploaded csv files have been dumped from my data warehouse. Below you can see whole ERD diagram
Only difference: DIM_REGION
contains one additional column that I didn't include in my data warehouse