Baselight

Olympic Historical Dataset From Olympedia.org

Event to Athlete level Olympic Games Results from Athens 1896 to Beijing 2022

@kaggle.josephcheng123456_olympic_historical_dataset_from_olympediaorg

Loading...
Loading...

About this Dataset

Olympic Historical Dataset From Olympedia.org

Context

This dataset is as an attempt to create up to date Olympic Event datasets (from 1986 to 2022 Olympics Game) for any sports/data enthusiast to use to visualise and create some insights on the Olympic Event dataset.

Unique Feature of the Dataset

  • This dataset contains ranking of each sporting event linked to speicfic country / athlete, which could be used to for any performance related analytics
  • This dataset contains string information about the athlete's Bio, which could be useful in understanding more about the athlete

Basic Dataset Information

This dataset contains:

  • 154,902 unique athletes and their biological information i.e. height, weight, date of birth
  • All Winter / Summer Olympic games from 1896 to 2022
  • 7326 unique results (result for a specific event played at an Olympic game)
  • 314,726 rows of athlete to result data which includes both team sports and individual sports
    • each row includes position - which is how well the athlete performed for the specific event (Ranked)
      • Note: not all position is integer - contains strings which contains information on which round / heat they achieved
  • 235 distinct countries (some existing from the past)

How to Use the Dataset for a complete athlete-event dataset

  • Olympic_Athlete_Event_Results.csv - Contains basic athlete to result data (including athlete_id and result_id)
  • LEFT JOIN Olympic_Athlete_Event_Results.csv file with Olympic_Athlete_Bio.csv (on athlete_id) for complete information of the athlete (i.e. height, weight, date of birth when participating in the event)
  • LEFT JOIN Olympic_Athlete_Event_Results.csv with Olympic_Results.csv (on result_id) to obtain detailed information about the result (i.e. start date, sport, result_participants, format, time
  • Aggregate information to obtain features such as:
    • BMI - from height and weight
    • Age of participation - start date of the result subtract the date of birth of the athlete
  • TODO: Create a Kernal to demonstrate how above step works

Where the Data is Obtained

The data is scrapped from www.olympedia.org which has the latest up to date olympic data set from 1896 Athene Summer Olympics to Beijing 2022 Winter Olympics. Web Scrapping Project is provided via the source code in github using Python's BeautifulSoup.

Entity Relationship Table - Schema


Note: Credit to @Richnou for providing the data schema for the dataset!

Data Validation

Athlete to event data is validated by providing some data aggregation to medal tally data and then compared the counts of medals won by each country in all Winter / Summer Olympic games. Details are provided via source code in github

Acknowledgements and Kudos

  • OlyMADMen - The Incredible Community of dedicated Olympic historians and statisticians who worked with Olympians and gathered mass amount of data for olympedia.org.
  • Randi H Griffin - Author of 120 years of Olympic history: athletes and results which this dataset is inspired from.
  • Richard Carlier - Providing Constructive feedback and data schema diagram to help make this dataset better
  • Gwanhee Lee - Provided an iniatial web scrapping for the olympedia.org website
  • David Mo - Mentorship for software engineering practices

Tables

Olympics Country

@kaggle.josephcheng123456_olympic_historical_dataset_from_olympediaorg.olympics_country
  • 5.91 KB
  • 235 rows
  • 2 columns
Loading...

CREATE TABLE olympics_country (
  "noc" VARCHAR,
  "country" VARCHAR
);

Olympics Games

@kaggle.josephcheng123456_olympic_historical_dataset_from_olympediaorg.olympics_games
  • 11.59 KB
  • 64 rows
  • 11 columns
Loading...

CREATE TABLE olympics_games (
  "edition" VARCHAR,
  "edition_id" BIGINT,
  "edition_url" VARCHAR,
  "year" BIGINT,
  "city" VARCHAR,
  "country_flag_url" VARCHAR,
  "country_noc" VARCHAR,
  "start_date" VARCHAR,
  "end_date" VARCHAR,
  "competition_date" VARCHAR,
  "isheld" VARCHAR
);

Olympic Athlete Bio

@kaggle.josephcheng123456_olympic_historical_dataset_from_olympediaorg.olympic_athlete_bio
  • 26.73 MB
  • 155861 rows
  • 10 columns
Loading...

CREATE TABLE olympic_athlete_bio (
  "athlete_id" BIGINT,
  "name" VARCHAR,
  "sex" VARCHAR,
  "born" VARCHAR,
  "height" DOUBLE,
  "weight" VARCHAR,
  "country" VARCHAR,
  "country_noc" VARCHAR,
  "description" VARCHAR,
  "special_notes" VARCHAR
);

Olympic Athlete Event Results

@kaggle.josephcheng123456_olympic_historical_dataset_from_olympediaorg.olympic_athlete_event_results
  • 4.93 MB
  • 316834 rows
  • 11 columns
Loading...

CREATE TABLE olympic_athlete_event_results (
  "edition" VARCHAR,
  "edition_id" BIGINT,
  "country_noc" VARCHAR,
  "sport" VARCHAR,
  "event" VARCHAR,
  "result_id" BIGINT,
  "athlete" VARCHAR,
  "athlete_id" BIGINT,
  "pos" VARCHAR,
  "medal" VARCHAR,
  "isteamsport" BOOLEAN
);

Olympic Games Medal Tally

@kaggle.josephcheng123456_olympic_historical_dataset_from_olympediaorg.olympic_games_medal_tally
  • 19.08 KB
  • 1807 rows
  • 9 columns
Loading...

CREATE TABLE olympic_games_medal_tally (
  "edition" VARCHAR,
  "edition_id" BIGINT,
  "year" BIGINT,
  "country" VARCHAR,
  "country_noc" VARCHAR,
  "gold" BIGINT,
  "silver" BIGINT,
  "bronze" BIGINT,
  "total" BIGINT
);

Olympic Results

@kaggle.josephcheng123456_olympic_historical_dataset_from_olympediaorg.olympic_results
  • 4.92 MB
  • 7394 rows
  • 12 columns
Loading...

CREATE TABLE olympic_results (
  "result_id" BIGINT,
  "event_title" VARCHAR,
  "edition" VARCHAR,
  "edition_id" BIGINT,
  "sport" VARCHAR,
  "sport_url" VARCHAR,
  "result_date" VARCHAR,
  "result_location" VARCHAR,
  "result_participants" VARCHAR,
  "result_format" VARCHAR,
  "result_detail" VARCHAR,
  "result_description" VARCHAR
);

Share link

Anyone who has the link will be able to view this.