Baselight

NBA Player Data (1996-2024)

NBA Dataset containing height in cm, weight in kg, stats and bio data

@kaggle.damirdizdarevic_nba_dataset_eda_and_ml_compatible

Loading...
Loading...

About this Dataset

NBA Player Data (1996-2024)

NBA data ranging from 1996 to 2024 contains physical attributes, bio information, (advanced) stats, and positions of players.

No missing values, certain data preprocessing will be needed depending on the task.

Data was gathered from the nba.com and Basketball Reference - starting with the season 1996/97 and up until the latest season 2023/24.

A lot of options for EDA & ML present - analyzing the change of physical attributes by position, how the number of 3-point shots changed throughout years, how the number of foreign players increased; using Machine Learning to predict player's points, rebounds and assists, predicting player's position, player clustering, etc.

The issue with the data was that the data about player height and weight was in Imperial system, so the scatterplot of heights and weights was not looking good (around only 20 distinct values for height and around 150 for weight, which is quite bad for the dataset of 13.000 players). I created a script in which I assign a random height to the player between 2 heights (let's say between 200.66 cm and 203.2 cm, which would be 6-7 and 6-8 in Imperial system), but I did it in a way that 80% of values fall in the range of 5 to 35% increase, which still keeps the integrity of the data (average height of the whole dataset increased for less than 1 cm). I did the same thing for the weight: since difference between 2 pounds is around 0.44 kg, I would assign a random value for weight for each player that is either +/- 0.22 from his original weight. Here I observed a change in the average weight of the whole dataset of around 0.09 kg, which is insignificant.

Unfortunately the NBA doesn't provide the data in cm and kg, and although this is not the perfect approach regarding accuracy, it is still much better than assigning only 20 heights to the dataset of 13.000 players.

Tables

Final Dataset Master

@kaggle.damirdizdarevic_nba_dataset_eda_and_ml_compatible.final_dataset_master
  • 571.87 KB
  • 13391 rows
  • 36 columns
Loading...

CREATE TABLE final_dataset_master (
  "normalized_name" VARCHAR,
  "age" BIGINT,
  "player_height" DOUBLE,
  "player_weight" DOUBLE,
  "college" VARCHAR,
  "country" VARCHAR,
  "draft_year" VARCHAR,
  "draft_round" VARCHAR,
  "draft_number" VARCHAR,
  "pts" DOUBLE,
  "reb" DOUBLE,
  "ast" DOUBLE,
  "season" VARCHAR,
  "pos_x" VARCHAR,
  "mp_x" DOUBLE,
  "g_x" BIGINT,
  "efg" DOUBLE,
  "x3p" DOUBLE,
  "x3pa" DOUBLE,
  "x3p_0f195a" DOUBLE,
  "x3par" DOUBLE,
  "x2p" DOUBLE,
  "x2pa" DOUBLE,
  "x2p_35b594" DOUBLE,
  "ft" DOUBLE,
  "fta" DOUBLE,
  "ft_ba25d5" DOUBLE,
  "per" DOUBLE,
  "ts" DOUBLE,
  "trb" DOUBLE,
  "ast_d11618" DOUBLE,
  "tov" DOUBLE,
  "usg" DOUBLE,
  "ws" DOUBLE,
  "vorp" DOUBLE,
  "bpm" DOUBLE
);

Share link

Anyone who has the link will be able to view this.