Baselight

GTA V Vehicle Dataset

Datapoints on vehicles currently available in GTA V

@kaggle.lucasokwudishu_gta_v_vehicle_dataset

About this Dataset

GTA V Vehicle Dataset

Context

This dataset contains information about vehicles currently available in GTA V. When merged together, you should have about 567 rows with complete data. While there are a total of 807 vehicles currently in GTA V, the webscraping script failed with some of the vehicle urls.

Acknowledgments

This dataset was retrieved via webscraping from gtabase.com. The data is publicly available to everyone.

Content

You'll notice there are 6 CSV files. Here are the contents. When fully merged, there should be 567 rows and 36 columns. The files are separate due to the nature of the webscraping script. I'm still new to webscraping so I scraped the data in batches. Additionally, I forgot to scrape the upgrade_cost in the original script so I had to do that piece separately as well.

Vehicle Links:

  • vehicle_links: url for each vehicle in GTA V. May contain duplicates.

GTA Data Batch (1, 2 & 3):

  • v1: Row ID. This can be dropped.
  • title: Name of the vehicle.
  • vehicle_class: Vehicle category (Planes, Utility, SUVs, Sports, Super, etc).
  • manufacturer: Vehicle manufacturer.
  • features: Vehicle features.
  • acquisition: Method of obtaining the vehicle in game.
  • price: Vehicle price.
  • storage_location: Where the vehicle can be stored in game.
  • delivery_method: How the vehicle is delivered in game.
  • modifications: Where the vehicle can be modified in game.
  • resale_flag: If the vehicle can be resold in game.
  • resale_price: Resale price of vehicle. Contains 2 values, resale price and resale price when fully upgraded.
  • race_availability: Whether the vehicle can be used in races.
  • top_speed_in_game: Vehicle top speed in game. Contains values for MPH and KMH.
  • based_on: The real life vehicle that this vehicle is based on.
  • seats: Number of seats in the vehicle.
  • weight_in_kg: Vehicle weight (KG).
  • drive_train: Vehicle drivetrain.
  • gears: Number of gears in the vehicle.
  • release_date: Vehicle release date in game.
  • release_dlc: Name of the DLC the vehicle was released in.
  • top_speed_real: I believe this is the top speed of the real life vehicle, not the GTA V version.
  • lap_time: Vehicle lap time in minutes and seconds, in game.
  • bulletproof: If the vehicle is bulletproof or not.
  • weapon1_resistance: Resistance to HOMING LAUNCHER / OPPRESSOR MISSILES / JET MISSILES.
  • weapon2_resistance: Resistance to RPG / GRENADES / STICKY BOMB / MOC CANNON
  • weapon3_resistance: Resistance to EXPLOSIVE ROUNDS (HEAVY SNIPER MK II)
  • weapon4_resistance: Resistance to TANK CANNON (RHINO / APC)
  • weapon5_resistance: Resistance to ANTI-AIRCRAFT TRAILER DUAL 20MM FLAK
  • speed: Vehicle speed score.
  • acceleration: Vehicle acceleration score.
  • braking: Vehicle braking score.
  • handling: Vehicle handling score.
  • overall: Vehicle overall score.
  • vehicle_url: Vehicle url.

GTA Data Upgrade Cost (1 & 2):

  • ...1: Row ID. This can be dropped.
  • upgrade_cost: Upgrade Cost for vehicle.
  • vehicle_url: Vehicle url.

How To Merge Datasets

The common common key in all datasets in the vehicle_url. This may also be called vehicle_link.

  1. Merge the gta_data_batch csvs by rows.
  2. Merge the gta_data_upgrade_cost csvs by rows.
  3. Left join the gta_data_upgrade_cost to the ga_data_batch using the vehicle_url as the common key.
  4. For any vehicle url in vehicle_links csv that does not have data in gta_data_batch or gta_data_upgrade_cost files, this would be on the urls that failed in the script.

Data Cleaning Tips

The dataset will require some data cleaning. I decided NOT to clean the data before posting, to add an additional challenge. Some files may contain duplicates. Be sure to remove duplicates after merging. Some additional tips on data cleaning -

  1. title: Remove the string pattern "GTA 5:"
  2. acquisition: Remove the string pattern "/ found"
  3. resale_price: Separate into 2 columns to get the normal resale price and the resale price when fully upgraded.
  4. top_speed: Get rid of value for km/h, you only need the mph value.
  5. upgrade_cost: Remove all non numeric elements.
  6. numeric values: Remove all non numeric elements from columns that should be numeric. Convert to numeric.
  7. Remove leading and trailing white spaces from columns as necessary.

Inspiration

This dataset can be used for exploratory data analysis on GTA V vehicles. Some ideas -

  • Counts: View counts of vehicles by vehicle class, manufacturer, release dlc, etc.
  • Resale Value: What vehicles have the best resale value after taking into account upgrade cost.
  • Speed: What are the best vehicles in terms of in game place and racing.
  • Price: What are the most expensive vehicles by vehicle class, manufacturer, etc.
  • Price: What is the distribution of vehicle prices by vehicle class.
  • Price: GTA V appears to been steadily increasing the price of vehicles with each release dlc and release year. Is this price increase correlated to any new features in vehicles being released. Can you predict how much GTA players can expect to pay for vehicles in the next dlc.