Baselight

Automotive Price Prediction Dataset

Synthetic dataset of 1M+ vehicles for high-accuracy price prediction.

@kaggle.metawave_vehicle_price_prediction

Loading...
Loading...

About this Dataset

Automotive Price Prediction Dataset

Overview

This comprehensive dataset contains 1,000,000 entries for used vehicles, designed specifically for training high-accuracy price prediction models. The data was synthetically generated using a Python script that establishes realistic correlations between a vehicle's attributes and its market price. It includes 25 of the most common car brands, covering a wide range of models and specifications.

How the Data Was Generated

The dataset was created programmatically. The script's logic ensures realistic data distributions and relationships, such as:

  • Depreciation: Vehicle age is the primary factor in price calculation, following an exponential decay curve.
  • Wear and Tear: Mileage is correlated with age and negatively impacts the final price.
  • Performance: Higher engine horsepower contributes positively to the vehicle's value.
  • Brand Value: The base price for each brand is different, reflecting real-world market positioning.

Potential Use Cases

This dataset is ideal for a variety of machine learning tasks, including:

  • Regression: Training a model to predict the price column.
  • Feature Engineering: Exploring new ways to combine features to improve model performance.
  • Exploratory Data Analysis (EDA): Practicing data visualization and uncovering patterns in automotive data.
  • Educational Purposes: A great resource for students and data scientists looking to work with a large, clean, and realistic dataset.

Tables

Vehicle Price Prediction

@kaggle.metawave_vehicle_price_prediction.vehicle_price_prediction
  • 26.71 MB
  • 1000000 rows
  • 20 columns
Loading...

CREATE TABLE vehicle_price_prediction (
  "make" VARCHAR,
  "model" VARCHAR,
  "year" BIGINT,
  "mileage" BIGINT,
  "engine_hp" BIGINT,
  "transmission" VARCHAR,
  "fuel_type" VARCHAR,
  "drivetrain" VARCHAR,
  "body_type" VARCHAR,
  "exterior_color" VARCHAR,
  "interior_color" VARCHAR,
  "owner_count" BIGINT,
  "accident_history" VARCHAR,
  "seller_type" VARCHAR,
  "condition" VARCHAR,
  "trim" VARCHAR,
  "vehicle_age" BIGINT,
  "mileage_per_year" DOUBLE,
  "brand_popularity" DOUBLE,
  "price" DOUBLE
);

Share link

Anyone who has the link will be able to view this.