Baselight

Marathon Time Predictions

Predict Marathon Results from Athletes Open Data Sources

@kaggle.girardi69_marathon_time_predictions

About this Dataset

Marathon Time Predictions

Context

Every Marathoner has a time goal in mind, and this is the result of all the training done in months of exercises. Long runs, Strides, Kilometers and phisical exercise, all add improvement to the result. Marathon time prediction is an art, generally guided by expert physiologists that prescribe the weekly exercises and the milestones to the marathon.
Unfortunately, Runners have a lot of distractions while preparing the marathon, work, family, illnes, and therefore each one of us arrives to the marathon with his own story.
The "simple" approach is to look at data after the competition, the Leaderboard.

But what if we could link the Marathon result to the training history of the Athlete? Could we find that "non orthodox" training plans give good results?

The Athlete Training History

As a start, I'll take just two data from the Athlete History, easy to extract. Two meaningful data, the average km run during the 4 weeks before the marathon, and the average speed that the athlete has run these km.

Meaningful, because in the last month of the training I have the recap of all the previous months that brought me to the marathon.

Easy to extract, because I can go to Strava and I have a "side-by-side" comparison, myself and the reference athlete. I said easy, well, that's not so easy, since I have to search every athlete and write down those numbers, the exact day the marathon happened, otherwise I will put in the average the rest days after the marathon.

I've set my future work in extracting more data and build better algorithms. Thank you for helping me to understand or suggest.

Content

id:
simple counter

Marathon:
the Marathon name where the data were extracted. I use the data coming out from Strava "Side by side comparison" and the data coming from the final marathon result

Name:
The athlete's name, still some problems with UTF-8, I'll fix that soon

Category:
the sex and age group of a runner

  • MAM Male Athletes under 40 years
  • WAM Women under 40 Years
  • M40 Male Athletes between 40 and 45 years

km4week
This is the total number of kilometers run in the last 4 weeks before the marathon, marathon included. If, for example, the km4week is 100, the athlete has run 400 km in the four weeks before the marathon

sp4week
This is the average speed of the athlete in the last 4 training weeks. The average counts all the kilometers done, included the slow kilometers done before and after the training. A typic running session can be of 2km of slow running, then 12-14km of fast running, and finally other 2km of slow running. The average of the speed is this number, and with time this is one of the numbers that has to be refined

cross training:
If the runner is also a cyclist, or a triathlete, does it counts? Use this parameter to see if the athlete is also a cross trainer in other disciplines

Wall21:
In decimal. The tricky field. To acknowledge a good performance, as a marathoner, I have to run the first half marathon with the same split of the second half. If, for example, I run the first half marathon in 1h30m, I must finish the marathon in 3h (for doing a good job). If I finish in 3h20m, I started too fast and I hit "the wall". My training history is, therefore, less valid, since I was not estimating my result

Marathon time:
In decimal. This is the final result. Based on my training history, I must predict my expected Marathon time

Category:
This is an ancillary field. It gives some direction, so feel free to use or discard it. It groups in:

  • A results under 3h
  • B results between 3h and 3h20m
  • C results between 3h20m and 3h40m
  • D results between 3h40 and 4h

Acknowledgements

Thank you to the main Athletes data sources, GARMIN and STRAVA

The Goal of this Competition:

Based on my training history, I must predict my expected Marathon time. Which other relevant data could help me to be more precise? Heart rate, cadence, speed training, what else? And how could I get those data?

Share link

Anyone who has the link will be able to view this.