GTA V Vehicle Dataset
Datapoints on vehicles currently available in GTA V
@kaggle.lucasokwudishu_gta_v_vehicle_dataset
Datapoints on vehicles currently available in GTA V
@kaggle.lucasokwudishu_gta_v_vehicle_dataset
This dataset contains information about vehicles currently available in GTA V. When merged together, you should have about 567 rows with complete data. While there are a total of 807 vehicles currently in GTA V, the webscraping script failed with some of the vehicle urls.
This dataset was retrieved via webscraping from gtabase.com. The data is publicly available to everyone.
You'll notice there are 6 CSV files. Here are the contents. When fully merged, there should be 567 rows and 36 columns. The files are separate due to the nature of the webscraping script. I'm still new to webscraping so I scraped the data in batches. Additionally, I forgot to scrape the upgrade_cost
in the original script so I had to do that piece separately as well.
vehicle_links
: url for each vehicle in GTA V. May contain duplicates.v1
: Row ID. This can be dropped.title
: Name of the vehicle.vehicle_class
: Vehicle category (Planes, Utility, SUVs, Sports, Super, etc).manufacturer
: Vehicle manufacturer.features
: Vehicle features.acquisition
: Method of obtaining the vehicle in game.price
: Vehicle price.storage_location
: Where the vehicle can be stored in game.delivery_method
: How the vehicle is delivered in game.modifications
: Where the vehicle can be modified in game.resale_flag
: If the vehicle can be resold in game.resale_price
: Resale price of vehicle. Contains 2 values, resale price and resale price when fully upgraded.race_availability
: Whether the vehicle can be used in races.top_speed_in_game
: Vehicle top speed in game. Contains values for MPH and KMH.based_on
: The real life vehicle that this vehicle is based on.seats
: Number of seats in the vehicle.weight_in_kg
: Vehicle weight (KG).drive_train
: Vehicle drivetrain.gears
: Number of gears in the vehicle.release_date
: Vehicle release date in game.release_dlc
: Name of the DLC the vehicle was released in.top_speed_real
: I believe this is the top speed of the real life vehicle, not the GTA V version.lap_time
: Vehicle lap time in minutes and seconds, in game.bulletproof
: If the vehicle is bulletproof or not.weapon1_resistance
: Resistance to HOMING LAUNCHER / OPPRESSOR MISSILES / JET MISSILES.weapon2_resistance
: Resistance to RPG / GRENADES / STICKY BOMB / MOC CANNONweapon3_resistance
: Resistance to EXPLOSIVE ROUNDS (HEAVY SNIPER MK II)weapon4_resistance
: Resistance to TANK CANNON (RHINO / APC)weapon5_resistance
: Resistance to ANTI-AIRCRAFT TRAILER DUAL 20MM FLAKspeed
: Vehicle speed score.acceleration
: Vehicle acceleration score.braking
: Vehicle braking score.handling
: Vehicle handling score.overall
: Vehicle overall score.vehicle_url
: Vehicle url....1
: Row ID. This can be dropped.upgrade_cost
: Upgrade Cost for vehicle.vehicle_url
: Vehicle url.The common common key in all datasets in the vehicle_url
. This may also be called vehicle_link
.
gta_data_batch
csvs by rows.gta_data_upgrade_cost
csvs by rows.gta_data_upgrade_cost
to the ga_data_batch
using the vehicle_url
as the common key.vehicle_links
csv that does not have data in gta_data_batch
or gta_data_upgrade_cost
files, this would be on the urls that failed in the script.The dataset will require some data cleaning. I decided NOT to clean the data before posting, to add an additional challenge. Some files may contain duplicates. Be sure to remove duplicates after merging. Some additional tips on data cleaning -
title
: Remove the string pattern "GTA 5:"acquisition
: Remove the string pattern "/ found"resale_price
: Separate into 2 columns to get the normal resale price and the resale price when fully upgraded.top_speed
: Get rid of value for km/h, you only need the mph value.upgrade_cost
: Remove all non numeric elements.numeric values
: Remove all non numeric elements from columns that should be numeric. Convert to numeric.This dataset can be used for exploratory data analysis on GTA V vehicles. Some ideas -
Anyone who has the link will be able to view this.