Baselight

Big Data Derby 2022: Global Horse IDs And Places

Extra data for BDD competition

@kaggle.themarkgreen_big_data_derby_2022_global_horse_ids_and_places

About this Dataset

Big Data Derby 2022: Global Horse IDs And Places

Additional data for Big Data Derby 2022.

The provided dataset lists horses in races by "program number" which can be different for every race, and so provides no way to identify when the same horse ran in multiple races. It also lacks information on finishing position.

This data is extracted from the NYRA website, and provides this missing data.

File horse_ids.csv:

  • track_id, race_date, race: as in the Big Data Derby 2022 data files.
  • program_number: the number of the horse on the program, as in the provided data files.
  • horse_id: a unique ID number for the horse across all races.
  • finishing_place: the position the horse finished in. 0 is an invalid value, 1 is first.

File horse_names.csv:

  • horse_id (PK): a horse identifier.
  • horse_name: the name of the horse. This probably isn't useful unless you like reading racehorse names (like I do :) ) However, the data matching was based on matching horse names, so it may be worth checking for transcription glitches.

Tables

Horse Ids

@kaggle.themarkgreen_big_data_derby_2022_global_horse_ids_and_places.horse_ids
  • 148.66 KB
  • 14916 rows
  • 7 columns
Loading...

CREATE TABLE horse_ids (
  "unnamed_0" BIGINT,
  "track_id" VARCHAR,
  "race_date" TIMESTAMP,
  "race" BIGINT,
  "program_number" VARCHAR,
  "horse_id" BIGINT,
  "finishing_place" BIGINT
);

Horse Names

@kaggle.themarkgreen_big_data_derby_2022_global_horse_ids_and_places.horse_names
  • 115.69 KB
  • 4638 rows
  • 3 columns
Loading...

CREATE TABLE horse_names (
  "unnamed_0" BIGINT,
  "horse_id" BIGINT,
  "horse_name" VARCHAR
);