BDs Behavior Selection In Self-driving Cars
@kaggle.autonomousvehicle_dataset_for_comparing_mdps_cart_and_mlps
@kaggle.autonomousvehicle_dataset_for_comparing_mdps_cart_and_mlps
This repository contains three discrete driving behavior datasets derived from full telemetry recordings of a simulated self-driving car system. These files represent driving decisions in a compact, abstracted format: each row is a state-action pair where the environment state is fully described by seven binary occupancy variables and the outcome is the driving action chosen in that state.
This discrete representation was specifically designed for training and evaluating behavior selection models — classifiers that map perceived environment states to driving actions. It is directly compatible with Probabilistic Logic Factored Markov Decision Processes (PL-fMDPs).
All data originates from a custom self-driving car system built on Webots 2022a and the Robot Operating System (ROS) Noetic, running on Ubuntu 20.04. The testbed is a two-lane road with straight segments and curves. The vehicle's main task is to safely overtake obstacle vehicles while preferring the right lane for travel and using the left lane exclusively for overtaking.
At each timestep, the vehicle perceives six predefined spatial zones — East (E), North-East (NE), North-West (NW), South-East (SE), South-West (SW), and West (W) — each encoded as a boolean variable indicating whether that zone is free of obstacles. Together with the current lane indicator (curr_lane), these seven variables form the complete discrete state space:
cruise, keep, change_to_left, change_to_rightAll data collection scenarios were constructed by permuting the presence or absence of up to four obstacle vehicles at predefined detection positions, yielding 32 distinct driving scenarios used consistently across autonomous and human sessions.
| File | Controller | Rows | Columns | Unique state-action pairs |
|---|---|---|---|---|
D.csv |
Autonomous (PL-fMDP) | 962,107 | 8 | 201 / 512 |
D_humans.csv |
Human (4 drivers) | 329,798 | 9 | 313 / 512 |
complete_DB_discrete.csv |
Autonomous + Human + Synthetic | 1,958,889 | 9 | — |
D.csv — Autonomous Driving
Derived from the autonomous driving telemetry by selecting the 7 boolean state variables and the action column, and removing rows with missing values and collision-flagged timesteps (24 crashes out of 893 runs). Covers 893 simulation runs and 604.7 km of simulated travel. The AV's behavior selector is a hierarchy of action policies derived from PL-fMDPs.
D_humans.csv — Human Driver Control
Derived from the human driving telemetry under the same process, with the addition of the driver column identifying which of the four human drivers made each decision. Covers 3,923 short runs (max. 200 m each) and 597.58 km of simulated travel, with 105 collisions removed. The four drivers were undergraduate students aged 20–23: Driver 1 (female, Mechatronics, 32.8%), Driver 2 (male, IT, 28.2%), Driver 3 (male, IT, 20.3%), and Driver 4 (female, Mechatronics, 18.7%).
complete_DB_discrete.csv — Complete and Augmented
The most comprehensive dataset in this repository. It merges the real driving data from both D.csv and D_humans.csv and augments it with 128,000 synthetically generated swerve examples to address the near-total absence of emergency lateral maneuvers in real data. It also adds a latent_collision boolean column indicating whether each observation is associated with a collision-risk context. This dataset is intended as the primary source for training and benchmarking behavior selection models that include swerve actions.
All three datasets share the following core columns:
| Column | Type | Description |
|---|---|---|
action |
string | Driving action selected at this timestep |
curr_lane |
bool | True if in the right (preferred) lane; False if in the left (overtaking) lane |
free_E |
bool | East zone (directly right) free of obstacle vehicles |
free_NE |
bool | North-East zone (front-right) free of obstacle vehicles |
free_NW |
bool | North-West zone (front-left) free of obstacle vehicles |
free_SE |
bool | South-East zone (rear-right) free of obstacle vehicles |
free_SW |
bool | South-West zone (rear-left) free of obstacle vehicles |
free_W |
bool | West zone (directly left) free of obstacle vehicles |
Additional columns by dataset:
| Column | Present in | Type | Description |
|---|---|---|---|
driver |
D_humans.csv, complete_DB_discrete.csv (via merged source) |
int | Human driver identifier (1–4) |
latent_collision |
complete_DB_discrete.csv |
bool | True if the observation is associated with collision risk |
| Action | D.csv | D_humans.csv | complete_DB_discrete.csv |
|---|---|---|---|
cruise |
429,888 | 237,996 | included |
keep |
438,582 | 62,594 | included |
change_to_left |
44,929 | 15,314 | included |
change_to_right |
48,708 | 13,894 | included |
swerve_left |
— | — | 64,000 (synthetic) |
swerve_right |
— | — | 64,000 (synthetic) |
| Total | 962,107 | 329,798 | 1,958,889 |
@kaggle
Share link
Anyone who has the link will be able to view this.