Datasets for comparing MDPs CART and MLPs

Discrete Driving Behavior Datasets for Behavior Selection in Self-Driving Cars

Overview

This repository contains three discrete driving behavior datasets derived from full telemetry recordings of a simulated self-driving car system. These files represent driving decisions in a compact, abstracted format: each row is a state-action pair where the environment state is fully described by seven binary occupancy variables and the outcome is the driving action chosen in that state.

This discrete representation was specifically designed for training and evaluating behavior selection models — classifiers that map perceived environment states to driving actions. It is directly compatible with Probabilistic Logic Factored Markov Decision Processes (PL-fMDPs).

Simulation Environment and Data Origin

All data originates from a custom self-driving car system built on Webots 2022a and the Robot Operating System (ROS) Noetic, running on Ubuntu 20.04. The testbed is a two-lane road with straight segments and curves. The vehicle's main task is to safely overtake obstacle vehicles while preferring the right lane for travel and using the left lane exclusively for overtaking.

At each timestep, the vehicle perceives six predefined spatial zones — East (E), North-East (NE), North-West (NW), South-East (SE), South-West (SW), and West (W) — each encoded as a boolean variable indicating whether that zone is free of obstacles. Together with the current lane indicator (curr_lane), these seven variables form the complete discrete state space:

2⁷ = 128 possible states
4 possible actions: cruise, keep, change_to_left, change_to_right
512 total state-action points in the full space

All data collection scenarios were constructed by permuting the presence or absence of up to four obstacle vehicles at predefined detection positions, yielding 32 distinct driving scenarios used consistently across autonomous and human sessions.

Datasets in this Repository

File	Controller	Rows	Columns	Unique state-action pairs
`D.csv`	Autonomous (PL-fMDP)	962,107	8	201 / 512
`D_humans.csv`	Human (4 drivers)	329,798	9	313 / 512
`complete_DB_discrete.csv`	Autonomous + Human + Synthetic	1,958,889	9	—

D.csv — Autonomous Driving
Derived from the autonomous driving telemetry by selecting the 7 boolean state variables and the action column, and removing rows with missing values and collision-flagged timesteps (24 crashes out of 893 runs). Covers 893 simulation runs and 604.7 km of simulated travel. The AV's behavior selector is a hierarchy of action policies derived from PL-fMDPs.

D_humans.csv — Human Driver Control
Derived from the human driving telemetry under the same process, with the addition of the driver column identifying which of the four human drivers made each decision. Covers 3,923 short runs (max. 200 m each) and 597.58 km of simulated travel, with 105 collisions removed. The four drivers were undergraduate students aged 20–23: Driver 1 (female, Mechatronics, 32.8%), Driver 2 (male, IT, 28.2%), Driver 3 (male, IT, 20.3%), and Driver 4 (female, Mechatronics, 18.7%).

complete_DB_discrete.csv — Complete and Augmented
The most comprehensive dataset in this repository. It merges the real driving data from both D.csv and D_humans.csv and augments it with 128,000 synthetically generated swerve examples to address the near-total absence of emergency lateral maneuvers in real data. It also adds a latent_collision boolean column indicating whether each observation is associated with a collision-risk context. This dataset is intended as the primary source for training and benchmarking behavior selection models that include swerve actions.

Shared Column Structure

All three datasets share the following core columns:

Column	Type	Description
`action`	string	Driving action selected at this timestep
`curr_lane`	bool	`True` if in the right (preferred) lane; `False` if in the left (overtaking) lane
`free_E`	bool	East zone (directly right) free of obstacle vehicles
`free_NE`	bool	North-East zone (front-right) free of obstacle vehicles
`free_NW`	bool	North-West zone (front-left) free of obstacle vehicles
`free_SE`	bool	South-East zone (rear-right) free of obstacle vehicles
`free_SW`	bool	South-West zone (rear-left) free of obstacle vehicles
`free_W`	bool	West zone (directly left) free of obstacle vehicles

Additional columns by dataset:

Column	Present in	Type	Description
`driver`	`D_humans.csv`, `complete_DB_discrete.csv` (via merged source)	int	Human driver identifier (1–4)
`latent_collision`	`complete_DB_discrete.csv`	bool	`True` if the observation is associated with collision risk

Action Distribution Across Datasets

Action	D.csv	D_humans.csv	complete_DB_discrete.csv
`cruise`	429,888	237,996	included
`keep`	438,582	62,594	included
`change_to_left`	44,929	15,314	included
`change_to_right`	48,708	13,894	included
`swerve_left`	—	—	64,000 (synthetic)
`swerve_right`	—	—	64,000 (synthetic)
Total	962,107	329,798	1,958,889

BDs Behavior Selection In Self-driving Cars

Datasets for comparing MDPs CART and MLPs

Discrete Driving Behavior Datasets for Behavior Selection in Self-Driving Cars

Overview

Simulation Environment and Data Origin

Datasets in this Repository

Shared Column Structure

Action Distribution Across Datasets

Related Datasets

AI Models Intelligence

Driving Behavior Dataset

Data For Artificial Intelligence: Data-Centric AI For Transportation: Work Zone Use Case Raw Maryland Incidents Matched

Bidirectional And Unidirectional Charging Profiles Of Electric Vehicles

Data For Artificial Intelligence: Data-Centric AI For Transportation: Work Zone Use Case Raw Maryland Speed Data

Notable AI Systems By Machine Learning Approach