Baselight

MLB Game Data

Pitch and other game data for 2016-present

@kaggle.josephvm_mlb_game_data

Loading...
Loading...

About this Dataset

MLB Game Data

Content

This dataset currently contains data from the start of the 2016 season through the end of the 2021 postseason. I plan to update it roughly weekly.

pitches.csv is one of the main files. It contains information about each pitch (as found on ESPN, I've noticed some games that are missing some at-bats, with the newest being from 2019 I believe). games.csv and events.csv are two other files of note.

Some of the other files contain information that can be gleamed from the other files. Over time I may try to cut down on this to reduce the number of files. I may also try to reduce the overall size of the dataset by changing several fields to use IDs instead of strings (just a heads-up).

Files

  • games.csv - general game info

  • hittersByGame.csv - how each player did in each game

  • pitchersByGame.csv - how each player did in each game

  • plays.csv - batter events - batter singled, batter struck out, etc

  • events.csv - general events - Have event id (per game) to join with next

  • pitches.csv - one row per pitch per game

  • inningScore.csv - score per inning

  • inningHighlights.csv - # of runs, hits, and errors per inning

  • hittingNotes.csv

  • pitchingNotes.csv

  • baserunningNotes.csv

  • fieldingNotes.csv

  • letterNotes.csv - for notes attached to batters (and maybe pitchers)

  • awards directory
    -- one file per award

Links

Inspiration

  • Which pitchers are best at getting out of a 3-2 count?
  • What hitters have the highest likelihood of scoring the runner on third if there is one?
  • What is the highest number of pitches each team has thrown in one inning over the last few years?-

Tables

Baserunningnotes

@kaggle.josephvm_mlb_game_data.baserunningnotes
  • 417.59 kB
  • 15,195 rows
  • 4 columns
Loading...
CREATE TABLE baserunningnotes (
  "game" BIGINT,
  "team" VARCHAR,
  "stat" VARCHAR,
  "data" VARCHAR
);

Events

@kaggle.josephvm_mlb_game_data.events
  • 18.19 MB
  • 1,410,131 rows
  • 8 columns
Loading...
CREATE TABLE events (
  "game" BIGINT,
  "pitching_team" VARCHAR,
  "batting_team" VARCHAR,
  "inning" VARCHAR,
  "event_id" BIGINT,
  "events" VARCHAR,
  "away" VARCHAR,
  "home" VARCHAR
);

Fieldingnotes

@kaggle.josephvm_mlb_game_data.fieldingnotes
  • 583.78 kB
  • 32,602 rows
  • 4 columns
Loading...
CREATE TABLE fieldingnotes (
  "game" BIGINT,
  "team" VARCHAR,
  "stat" VARCHAR,
  "data" VARCHAR
);

Games

@kaggle.josephvm_mlb_game_data.games
  • 1.18 MB
  • 13,439 rows
  • 43 columns
Loading...
CREATE TABLE games (
  "game" BIGINT,
  "away" VARCHAR,
  "away_record" VARCHAR,
  "awayaway_record" VARCHAR,
  "home" VARCHAR,
  "home_record" VARCHAR,
  "homehome_record" VARCHAR,
  "away_score" DOUBLE,
  "home_score" DOUBLE,
  "postseason_info" VARCHAR,
  "walks_issued_away" DOUBLE  -- Walks Issued - Away,
  "walks_issued_home" DOUBLE  -- Walks Issued - Home,
  "stolen_bases_away" DOUBLE  -- Stolen Bases - Away,
  "stolen_bases_home" DOUBLE  -- Stolen Bases - Home,
  "strikeouts_thrown_away" DOUBLE  -- Strikeouts Thrown - Away,
  "strikeouts_thrown_home" DOUBLE  -- Strikeouts Thrown - Home,
  "total_bases_away" DOUBLE  -- Total Bases - Away,
  "total_bases_home" DOUBLE  -- Total Bases - Home,
  "stadium" VARCHAR,
  "date" VARCHAR,
  "location" VARCHAR,
  "odds" VARCHAR,
  "o_u" VARCHAR,
  "attendance" DOUBLE,
  "capacity" DOUBLE,
  "duration" VARCHAR,
  "umpires" VARCHAR,
  "win_pitcher_stats" VARCHAR  -- WIN - Pitcher - Stats,
  "win_pitcher_id" DOUBLE  -- WIN - Pitcher - Id,
  "win_pitcher_name" VARCHAR  -- WIN - Pitcher - Name,
  "win_pitcher_abbrname" VARCHAR  -- WIN - Pitcher - AbbrName,
  "win_pitcher_record" VARCHAR  -- WIN - Pitcher - Record,
  "loss_pitcher_stats" VARCHAR  -- LOSS - Pitcher - Stats,
  "loss_pitcher_id" DOUBLE  -- LOSS - Pitcher - Id,
  "loss_pitcher_name" VARCHAR  -- LOSS - Pitcher - Name,
  "loss_pitcher_abbrname" VARCHAR  -- LOSS - Pitcher - AbbrName,
  "loss_pitcher_record" VARCHAR  -- LOSS - Pitcher - Record,
  "save_pitcher_stats" VARCHAR  -- SAVE - Pitcher - Stats,
  "save_pitcher_id" DOUBLE  -- SAVE - Pitcher - Id,
  "save_pitcher_name" VARCHAR  -- SAVE - Pitcher - Name,
  "save_pitcher_abbrname" VARCHAR  -- SAVE - Pitcher - AbbrName,
  "save_pitcher_record" VARCHAR  -- SAVE - Pitcher - Record,
  "extra_innings" VARCHAR
);

Hittersbygame

@kaggle.josephvm_mlb_game_data.hittersbygame
  • 4.75 MB
  • 361,864 rows
  • 16 columns
Loading...
CREATE TABLE hittersbygame (
  "hitters" VARCHAR,
  "h_ab" VARCHAR,
  "ab" VARCHAR,
  "r" VARCHAR,
  "h" VARCHAR,
  "rbi" VARCHAR,
  "bb" VARCHAR,
  "k" VARCHAR,
  "n__p" VARCHAR  -- #P,
  "avg" VARCHAR,
  "obp" VARCHAR,
  "slg" VARCHAR,
  "game" BIGINT,
  "team" VARCHAR,
  "position" VARCHAR,
  "hitter_id" VARCHAR
);

Hittingnotes

@kaggle.josephvm_mlb_game_data.hittingnotes
  • 3.41 MB
  • 163,667 rows
  • 4 columns
Loading...
CREATE TABLE hittingnotes (
  "game" BIGINT,
  "team" VARCHAR,
  "stat" VARCHAR,
  "data" VARCHAR
);

Inninghighlights

@kaggle.josephvm_mlb_game_data.inninghighlights
  • 364.17 kB
  • 238,483 rows
  • 5 columns
Loading...
CREATE TABLE inninghighlights (
  "inning" VARCHAR,
  "runs" BIGINT,
  "hits" BIGINT,
  "errors" BIGINT,
  "game" BIGINT
);

Inningscore

@kaggle.josephvm_mlb_game_data.inningscore
  • 275.69 kB
  • 26,868 rows
  • 24 columns
Loading...
CREATE TABLE inningscore (
  "team" VARCHAR,
  "n_1" BIGINT  -- 1,
  "n_2" BIGINT  -- 2,
  "n_3" BIGINT  -- 3,
  "n_4" BIGINT  -- 4,
  "n_5" BIGINT  -- 5,
  "n_6" VARCHAR  -- 6,
  "n_7" VARCHAR  -- 7,
  "n_8" VARCHAR  -- 8,
  "n_9" VARCHAR  -- 9,
  "r" BIGINT,
  "h" BIGINT,
  "e" BIGINT,
  "game" BIGINT,
  "n_10" DOUBLE  -- 10,
  "n_11" DOUBLE  -- 11,
  "n_12" DOUBLE  -- 12,
  "n_13" DOUBLE  -- 13,
  "n_14" DOUBLE  -- 14,
  "n_15" DOUBLE  -- 15,
  "n_16" DOUBLE  -- 16,
  "n_17" DOUBLE  -- 17,
  "n_18" DOUBLE  -- 18,
  "n_19" DOUBLE  -- 19
);

Letternotes

@kaggle.josephvm_mlb_game_data.letternotes
  • 559.03 kB
  • 30,454 rows
  • 4 columns
Loading...
CREATE TABLE letternotes (
  "game" BIGINT,
  "player_id" BIGINT,
  "player_note_id" BIGINT,
  "note" VARCHAR
);

Pitchersbygame

@kaggle.josephvm_mlb_game_data.pitchersbygame
  • 1.94 MB
  • 143,687 rows
  • 15 columns
Loading...
CREATE TABLE pitchersbygame (
  "pitchers" VARCHAR,
  "ip" DOUBLE,
  "h" BIGINT,
  "r" BIGINT,
  "er" BIGINT,
  "bb" BIGINT,
  "k" BIGINT,
  "hr" BIGINT,
  "pc_st" VARCHAR,
  "era" VARCHAR,
  "pc" VARCHAR,
  "game" BIGINT,
  "team" VARCHAR,
  "extra" VARCHAR,
  "pitcher_id" VARCHAR
);

Pitches

@kaggle.josephvm_mlb_game_data.pitches
  • 29.69 MB
  • 3,953,197 rows
  • 13 columns
Loading...
CREATE TABLE pitches (
  "num" BIGINT,
  "pitch" VARCHAR,
  "type" VARCHAR,
  "mph" VARCHAR,
  "play_hitzone" VARCHAR,
  "play_bases" DOUBLE,
  "play_field" VARCHAR,
  "pitcher" VARCHAR,
  "pitching_team" VARCHAR,
  "batting_team" VARCHAR,
  "inning" VARCHAR,
  "event_id" BIGINT,
  "game" BIGINT
);

Pitchingnotes

@kaggle.josephvm_mlb_game_data.pitchingnotes
  • 3.3 MB
  • 126,421 rows
  • 4 columns
Loading...
CREATE TABLE pitchingnotes (
  "game" BIGINT,
  "team" VARCHAR,
  "stat" VARCHAR,
  "data" VARCHAR
);

Plays

@kaggle.josephvm_mlb_game_data.plays
  • 16.48 MB
  • 1,061,113 rows
  • 5 columns
Loading...
CREATE TABLE plays (
  "game" BIGINT,
  "team" VARCHAR,
  "batter_id" VARCHAR,
  "batter" VARCHAR,
  "event" VARCHAR
);

Share link

Anyone who has the link will be able to view this.