Baselight

NFL LEAGUE DATA

NFL League Stats by Team (2000-2021)

@kaggle.chancev_nfl_league_data

About this Dataset

NFL LEAGUE DATA

This dataset was scraped from tables found at https://www.nfl.com/standings/league/2021/REG (for each year). This dataset contains each individual year's LEAGUE data by team. There is also a master file which has compiled all the data into one csv for analysis and comparison by year. The column description can be found at the bottom of this section. If you are interested in the code used to scrape the data, you can view the full project details at https://github.com/cvandergeugten/NFL-LEAGUE-DATA/blob/main/nfl_league_data_scraper.py

Challenge:

This dataset replicates the table found on the NFL's website exactly. There are some columns that can be cleaned up, renamed, or altered to allow use for analysis. There are also columns that can be used to create new features to be used in analysis. For those that want some practice on tidying up datasets and using them for predictive modeling or exploratory analysis, here is a list of objectives you can try to accomplish with this data:

1. Change names of PCT columns to reflect which stats they are calculating the percentage for.

2. Ideas for feature engineering (creating new features):

  • Extract information from the 'record' columns (Home, Road, Division). These columns are not formatted to be directly used for analysis so you can create new columns that indicate each statistics individually. For example, you can create a new column called "Home Wins" and then write some code to extract the number of wins from the 'Home' column. Repeat with 'Home Losses' and 'Home Ties'. If you do this for each record column, you will have transformed all that information into useable data for modeling and analysis.

  • Create a feature called 'Undefeated' which will be a binary categorical variable. Input a 1 if the team never lost a game in that particular record column, and put a 0 if that team had any losses within that record. Repeat for all the different record columns (you might want to specify the record in the variable like this: 'Undefeated Home')

  • Create new columns for the winning and losing streak's value. You can name two columns 'Win Streak #' and 'Lose Streak #' and then write some code that will extract that information from the 'Strk' column. If a team was on a winning streak, then the value for their 'Lose Streak #' should be 0.

  • Create new columns that indicate which division a team is in!

  • Have some fun and engineer some of your own features!!

3. Use the data to answer these questions:

  • Over the last 21 years, who has been the best/worst performing teams?
  • Which teams perform better at home and which teams perform better on the road?
  • Which teams tie the most?
  • Pick your favorite team! What were they best years for this team in terms of performance? Did they ever go undefeated?

Column Info:

  • NFL Team: Team name (includes the name of the home city)
  • W: Total number of wins
  • L: Total number of losses
  • T: Number of ties
  • PCT: Win percentage
  • PF: Total points scored for the team
  • PA: Total points scored against the team
  • Net Pts: Net points
  • Home: Home record
  • Road: Road record
  • Div: Division record
  • Pct: Win percentage (for division record)
  • Conf: Conference record
  • Pct.1: Win percentage (for conference record)
  • Non-conf: Non-Conference record
  • Strk: Win or Loss streak
  • Last 5: Record from last 5 games played
  • Year: Year of the stats

Share link

Anyone who has the link will be able to view this.