The Data
There are two csvs with match information, matches.csv and events.csv, that contain information about each match from the 2001-2002 season through roughly current during the 2021-2022 season. matches.csv contains information such as the teams playing, final score, date, and lineups. events.csv contains events that happened in a game, at what time, and in what game.
There is one csv with table information, all_tables.csv, that contains the tables from the 2001-2002 season through roughly current during the 2021-2022 season.
There is 3 csvs with aggregated stats in the agg_stats folder. They have data from 2002 through present.
I plan on updating this dataset with data approximately weekly while a season is ongoing.
Note: The Year column in matches.csv contains the year that the season started in, not the year that the match took place.
Note: 107 of 380 matches in the 2001-2002 season have no commentary.
Data Source
Match data was scrapped by me from: https://www.espn.com/soccer/fixtures/_/date/20210413/league/eng.1
Tables were scrapped by me from: https://www.espn.com/soccer/standings/_/league/ENG.1/season/2020
Image
Code used to scrap the data is located here.
Inspiration
- How have having crowds/the long break effected teams
- Teams that win/lose by the most goals
- If a team is more likely to get yellow cards against certain other teams