Structured IPL ball-level records extracted from Cricsheet JSONs
Dataset Description
Context
Raw cricket data is notoriously difficult to work with. Sourced from Cricsheet, the original match files are deeply nested JSON structures that require loops to unpack. This dataset solves that problem and provides a completely flattened, tabular, ball-by-ball record of every Indian Premier League (IPL) match from 2008 through 2026.
Content & Granularity
The dataset is structured at the delivery level. Every single row represents a unique ball bowled in the IPL .
To maximize usability, match-level metadata has been broadcast down to the ball level. This allows you to immediately filter or group deliveries by venue, city, season, toss decisions, or match winners without needing to perform any complex relational joins.
Data Engineering & Cleaning Highlights
Unlike raw dumps, this dataset has undergone rigorous programmatic cleaning to ensure a perfect 10.0 Usability Score:
- Missing Value Integrity: Deliveries without extras or wickets maintain structural correctness. Numerical counts default to
0(preventing errors during mathematical operations), while missing categorical strings (like wicket types on non-wicket balls) are left safely as nulls. - Geospatial Imputation: Historical gaps in raw data such as missing city records for international IPL fixtures played in the United Arab Emirates have been programmatically repaired based on stadium locations.
Inspiration & Project Ideas
This dataset is a perfect playground for portfolios. Here are a few questions you can try to answer:
- Predictive Modeling: Can you predict the final score of an innings using only the performance metrics of the 6-over Powerplay?
- Player Valuation: Who are the most economical bowlers in the death overs (overs 16-20), and which batters strike at >150 against spin?
- Venue Analysis: How significantly does winning the toss and choosing to field first impact the match outcome at specific stadiums like the M. Chinnaswamy or Wankhede?
Acknowledgements
All raw files are generously provided by Cricsheet.org under open data licenses. If you use this dataset, please remember to credit their incredible work in preserving public sports analytics data.
Related Datasets
-
IPL 2008-2023 All Match Dataset
@kaggle
-
Foul Ball Trajectories And Stadium Zones
@fivethirtyeight