Description
Individual player data from all games from start of 1996-97 season (when my source began tracking individual plus-minus) to December 31, 2020
- Scraped from Basketball Reference using Python (bs4) and added additional columns using Python and a VBA macro
- Passed all assertion tests during scraping but not verified rigorously (I take no responsibility for screwing up your analysis hehe)
- Users can append GAME_ID to "https://www.basketball-reference.com" to get URL of game webpage
Limitations
- Does not include information about whether it is a playoff game
- MP may not be entirely accurate (sometimes does not add to 48)
- TOTAL_MINS and STARTER columns is only available on primary "games.csv" file
- Had to round MP for one game due to TOTAL_MINS inaccuracy
- A few players are identified with the wrong team in decade files (have been manually corrected in primary file)
Credits
If you use this data, I'd love to hear what projects my fellow basketball nerds are up to or even help collaborate! Shoot me an email at kh19@princeton.edu and please credit me with a link to this kaggle or my website. This is my first venture into scraping, so any suggestions or tips would be greatly appreciated. Enjoy!
Coding Time: 20 hours
Scraping Time: 9 hours