Additional data for Big Data Derby 2022.
The provided dataset lists horses in races by "program number" which can be different for every race, and so provides no way to identify when the same horse ran in multiple races. It also lacks information on finishing position.
This data is extracted from the NYRA website, and provides this missing data.
File horse_ids.csv
:
track_id
, race_date
, race
: as in the Big Data Derby 2022 data files.
program_number
: the number of the horse on the program, as in the provided data files.
horse_id
: a unique ID number for the horse across all races.
finishing_place
: the position the horse finished in. 0 is an invalid value, 1 is first.
File horse_names.csv
:
horse_id
(PK): a horse identifier.
horse_name
: the name of the horse. This probably isn't useful unless you like reading racehorse names (like I do :) ) However, the data matching was based on matching horse names, so it may be worth checking for transcription glitches.