Baselight

MLB Game Data

Pitch and other game data for 2016-present

@kaggle.josephvm_mlb_game_data

About this Dataset

MLB Game Data

Content

This dataset currently contains data from the start of the 2016 season through the end of the 2021 postseason. I plan to update it roughly weekly.

pitches.csv is one of the main files. It contains information about each pitch (as found on ESPN, I've noticed some games that are missing some at-bats, with the newest being from 2019 I believe). games.csv and events.csv are two other files of note.

Some of the other files contain information that can be gleamed from the other files. Over time I may try to cut down on this to reduce the number of files. I may also try to reduce the overall size of the dataset by changing several fields to use IDs instead of strings (just a heads-up).

Files

  • games.csv - general game info

  • hittersByGame.csv - how each player did in each game

  • pitchersByGame.csv - how each player did in each game

  • plays.csv - batter events - batter singled, batter struck out, etc

  • events.csv - general events - Have event id (per game) to join with next

  • pitches.csv - one row per pitch per game

  • inningScore.csv - score per inning

  • inningHighlights.csv - # of runs, hits, and errors per inning

  • hittingNotes.csv

  • pitchingNotes.csv

  • baserunningNotes.csv

  • fieldingNotes.csv

  • letterNotes.csv - for notes attached to batters (and maybe pitchers)

  • awards directory
    -- one file per award

Links

Inspiration

  • Which pitchers are best at getting out of a 3-2 count?
  • What hitters have the highest likelihood of scoring the runner on third if there is one?
  • What is the highest number of pitches each team has thrown in one inning over the last few years?-

Share link

Anyone who has the link will be able to view this.