Baselight
Sign In
kaggle

BDs Behavior Selection In Self-driving Cars

Kaggle

@kaggle.autonomousvehicle_dataset_for_comparing_mdps_cart_and_mlps

Loading...
Loading...

Datasets for comparing MDPs CART and MLPs

Dataset Description

Discrete Driving Behavior Datasets for Behavior Selection in Self-Driving Cars

Overview

This repository contains three discrete driving behavior datasets derived from full telemetry recordings of a simulated self-driving car system. These files represent driving decisions in a compact, abstracted format: each row is a state-action pair where the environment state is fully described by seven binary occupancy variables and the outcome is the driving action chosen in that state.

This discrete representation was specifically designed for training and evaluating behavior selection models — classifiers that map perceived environment states to driving actions. It is directly compatible with Probabilistic Logic Factored Markov Decision Processes (PL-fMDPs).


Simulation Environment and Data Origin

All data originates from a custom self-driving car system built on Webots 2022a and the Robot Operating System (ROS) Noetic, running on Ubuntu 20.04. The testbed is a two-lane road with straight segments and curves. The vehicle's main task is to safely overtake obstacle vehicles while preferring the right lane for travel and using the left lane exclusively for overtaking.

At each timestep, the vehicle perceives six predefined spatial zones — East (E), North-East (NE), North-West (NW), South-East (SE), South-West (SW), and West (W) — each encoded as a boolean variable indicating whether that zone is free of obstacles. Together with the current lane indicator (curr_lane), these seven variables form the complete discrete state space:

  • 2⁷ = 128 possible states
  • 4 possible actions: cruise, keep, change_to_left, change_to_right
  • 512 total state-action points in the full space

All data collection scenarios were constructed by permuting the presence or absence of up to four obstacle vehicles at predefined detection positions, yielding 32 distinct driving scenarios used consistently across autonomous and human sessions.


Datasets in this Repository

File Controller Rows Columns Unique state-action pairs
D.csv Autonomous (PL-fMDP) 962,107 8 201 / 512
D_humans.csv Human (4 drivers) 329,798 9 313 / 512
complete_DB_discrete.csv Autonomous + Human + Synthetic 1,958,889 9

D.csv — Autonomous Driving
Derived from the autonomous driving telemetry by selecting the 7 boolean state variables and the action column, and removing rows with missing values and collision-flagged timesteps (24 crashes out of 893 runs). Covers 893 simulation runs and 604.7 km of simulated travel. The AV's behavior selector is a hierarchy of action policies derived from PL-fMDPs.

D_humans.csv — Human Driver Control
Derived from the human driving telemetry under the same process, with the addition of the driver column identifying which of the four human drivers made each decision. Covers 3,923 short runs (max. 200 m each) and 597.58 km of simulated travel, with 105 collisions removed. The four drivers were undergraduate students aged 20–23: Driver 1 (female, Mechatronics, 32.8%), Driver 2 (male, IT, 28.2%), Driver 3 (male, IT, 20.3%), and Driver 4 (female, Mechatronics, 18.7%).

complete_DB_discrete.csv — Complete and Augmented
The most comprehensive dataset in this repository. It merges the real driving data from both D.csv and D_humans.csv and augments it with 128,000 synthetically generated swerve examples to address the near-total absence of emergency lateral maneuvers in real data. It also adds a latent_collision boolean column indicating whether each observation is associated with a collision-risk context. This dataset is intended as the primary source for training and benchmarking behavior selection models that include swerve actions.


Shared Column Structure

All three datasets share the following core columns:

Column Type Description
action string Driving action selected at this timestep
curr_lane bool True if in the right (preferred) lane; False if in the left (overtaking) lane
free_E bool East zone (directly right) free of obstacle vehicles
free_NE bool North-East zone (front-right) free of obstacle vehicles
free_NW bool North-West zone (front-left) free of obstacle vehicles
free_SE bool South-East zone (rear-right) free of obstacle vehicles
free_SW bool South-West zone (rear-left) free of obstacle vehicles
free_W bool West zone (directly left) free of obstacle vehicles

Additional columns by dataset:

Column Present in Type Description
driver D_humans.csv, complete_DB_discrete.csv (via merged source) int Human driver identifier (1–4)
latent_collision complete_DB_discrete.csv bool True if the observation is associated with collision risk

Action Distribution Across Datasets

Action D.csv D_humans.csv complete_DB_discrete.csv
cruise 429,888 237,996 included
keep 438,582 62,594 included
change_to_left 44,929 15,314 included
change_to_right 48,708 13,894 included
swerve_left 64,000 (synthetic)
swerve_right 64,000 (synthetic)
Total 962,107 329,798 1,958,889

Related Datasets

Share link

Anyone who has the link will be able to view this.