Baselight

European Soccer Dataset

Season 2017/18, 2018/19 and 2019/20

@kaggle.alessiasimone_european_soccer_dataset_season_20172020

About this Dataset

European Soccer Dataset

The soccer market is one of the most competitive markets, and it often involves some critical decisions for coaches regarding players' and teams' winning game strategy.
The dataset contains real-life teams' and players' performances in the leading five European leagues in three seasons. Are you able to answer to the following questions by analyzing the dataset?

• What are the top European leagues among the seasons, and why?
• Which teams contributed the most to these leagues?
• Who are the best goalscorer and goalkeepers of those teams?
• Is it possible to define a winning game strategy?

This dataset has been produced during a staging area in which ETL process has been applied on Tableau Prep. You can read and download my entire report here or play with my interactive Tableau dashboard here.

From the previous three dataset, columns with 'm' final, regarding 90 minutes only, has been deselected (to save space, as they are easy to compute during a preprocessing step). Then, the datasets has been unified and the final dataset has been cleaned: duplicate columns due to unification step has been removed, players' and teams' names have been corrected due to some typo errors, "position" and "nationality" abbreviations have been extended for a better understanding, rollup operation on "position" column has been applied, numerical binary values regarding the Champions League has been transformed into strings "Yes" or "No" and the "season" has been transformed into a datetime.
The final dataset contains 6824 observations of 46 features:

  • Squad, Teams name, String
  • Season, Season, Date
  • Pts, Number of standing points, Integer
  • GF, Number of goals made,Integer
  • GA, Number of goals suffered, Integer
  • Attendance, Number of attendance, Integer
  • CL, Teams' presence at Champions League matches, Binary
  • WinCL, Wins of teams that played at Champions League matches, Binary
  • CLBestScorer, If the player has been the best scorer in Champions League, Binary
  • MP Number of teams' matches played, Integer
  • W Number of teams' matches won, Integer
  • D Number of teams' matches draws, Integer
  • L Number of teams' matches losses, Integer
  • Player Players’ first and last name, String
  • Age Age of each player, Integer
  • Height Height of each player (cm), Integer
  • Nationality Nation where the player has born, String
  • Value Value of the player in soccer market (€), Floating
  • Position Player’s position on the pitch, String
  • League League’s name, String
  • Lg Rk League’s ranking points, integer
  • Games Number of matches played integer
  • Games starts Number of matches where the player was starting player integer
  • Minutes Number of minutes played integer
  • Balls recovery Number of balls recovered integer
  • Yellow cards Number of yellow cards integer
  • Passes completed Number of passes succeeded integer
  • Fouls Number of fouls made integer
  • Fouled Number of fouls integer
  • Offsides Number of offsides integer
  • Crosses Number of crosses integer
  • Throw balls Number of throw-balls integer
  • Shots Number of shots integer
  • Goals Number of goals integer
  • Assists Number of assists integer
  • Penalties Number of penalties made integer
  • Touches Number of touches integer
  • Dribbles Number of dribbles integer
  • Sca Number of shots creation actions integer
  • Gca Number of goals creation actions integer
  • Tackle Number of tackles integer
  • Block Number of blocks integer
  • Pressure Number of pressure integer
  • Shots on target against Number of shots conceded integer
  • Saves Number of saves integer
  • Goals against Number of goals conceded integer

Please refer to the source for more details about the features.
Enjoy!

Share link

Anyone who has the link will be able to view this.