Name: RoboCupSimData Top 5 Subset
Creator: Kaggle
License: http://opendatacommons.org/licenses/odbl/1.0/

Data for Machine Learning from RoboCup Soccer Simulation League

Context

This dataset here is a subset of a large dataset from repeated games of some of the top teams (from 2016 and 2017) in RoboCup Soccer Simulation League (2D), where teams of 11 simulated robots ("agents") compete against each other. Overall, we used 10 different teams to play each other, resulting in 45 unique pairings. For each pairing, we ran 25 matches (of 10 minutes), leading to 1125 matches or more than 180 hours of game play. The generated CSV files are 17GB of data (zipped), or 229GB (unzipped).
The dataset is unique in the sense that it contains both the "ground truth" data (global, complete, noise-free information of all objects on the field), as well as the noisy, local and incomplete percepts of each robot. These data are made available as CSV files,

The data provided here is a subset of 10 games from this large dataset, from the top 5 teams playing each other (positions, velocities, body direction, and view direction), both from the (noisy, limited) view of each agent, and also the ground truth that is not available to agents during a match.

The full dataset of 1125 games is too large for this platform, and it additionally also logs of all actions of each robot as received from the simulator, as well as agent and game status information as received by each agent from the simulator (these logs are an extra 25GB of gzipped data, together with the original simulator logs). This full data set is also suitable for simple imitation learning, or as a starting point for inverse reinforcement learning. More information and links can be found on our bitbucket https://bitbucket.org/oliverobst/robocupsimdata/src/master/.

Content

Here on Kaggle, we provide two sets of different sizes. They are:

a one game set, to explore the data (Z-example-csv.zip)
a ten game set from the top 5 teams (X-top5-csv.zip)

Each game consists of a set of CSV files. For each game in the set, there is one file with groundtruth data for all players, one file with parameters of the game, and forty-four files (two per player, 11 players each team), with the local, incomplete, and noisy "visual" percepts of each player: one file for sensed moving objects, and one file for sensed landmarks.

Finally, the file landmarks.csv contains positions of landmarks on the field (flags, lines, goals), in global coordinates. Every game is using this set of landmarks (i.e., there is only one version of this file), and the "global" positions of all landmarks are known to all players. Landmarks have IDs that can be seen by players, if they are close enough. The IDs of landmarks that are further away may not be perceived by a player.

Attributions and Acknowledgements

The relevant paper for the software and dataset is at http://arxiv.org/abs/1711.01703
O. Michael, O. Obst, F. Schmidsberger, F. Stolzenburg, RoboCupSimData: Software and Data for machine learning from RoboCup Simulation League, 2017.

Team description papers (of the teams that we used to play the tournament) can be found at https://archive.robocup.info/Soccer/Simulation/2D/TDPs/RoboCup/

Inspiration

What can you do with the data? There are a number of options, but some inspiration:

Predicting ahead positions of players or the ball
"Ghosting" - predict where another team would be in the same situation
Predict successful vs unsuccessful passes
...

Related Datasets

Ultimate Soccer Dataset

@blt
Football Dataset

@kaggle
Fur Banning

@owid
Nuclear Weapons Proliferation

@owid
AI Performance On Math Problems

@owid
AI Performance On Coding Problems

@owid

Ultimate Soccer Dataset

Football Dataset

Fur Banning

Nuclear Weapons Proliferation

AI Performance On Math Problems

AI Performance On Coding Problems