A data-driven approach to analyzing athletics peak performance

My passion for running and track and field has always tried to accompany me during the different phases of my life. I used to run for fun when I was a child and to compete in school races during my high-school years. It was only during my college years that running took a pause from my life while my curiosity about data science progressively took its place.

Five years later, after graduating with my bachelor’s and master, at the age of almost 24 years old, something changed again in me. It suddenly happened that my desire for running overbearingly woke up from its long sleep and told me to give it another chance: I felt that I could become the runner I used to be.

The problem was that starting a very competitive sport, where training progression is incredibly slow and dominated by young athletes with tens of years of training behind their shoulders (and legs) can not be easy. The question that came inside me was then:

“Given my age, at which running distance I can perform the best?”

The pressure of age and competition

During my high school years, the athletics event that scared me but also fascinated the most was the 5000m. That time, I was told by my coach that given my young age I was not ready to run that distance yet. However, my eager desire to challenge myself and to compete against senior athletes still brought me to run it, even at the cost of struggling to finish and only achieving a mediocre performance.

At first sight, 5000m may seem short, especially if seen as running 5km. However, it assumes a different meaning when seen from the point of view of covering 12.5 laps on a track running close to the aerobic threshold: a single lap may feel endless, and you should trust me about this.

Athletes on the finish line of a 5000m during a Wolrd Championship
Athletes on the finish line of a 5000m during a Wolrd Championship

Why age matters when running a 5000m

A mix of endurance and speed is required to master this distance which is not easy to train and maintain. Thus, one of the most obvious questions is: “at which age is my body better prepared to take on this distance?“.

This question could also be formulated as “which age corresponds to the peak performance for 5000m?“. Junior athletes, usually younger than 17 years old prefer to focus on shorter distances, up to 1500m, since speed is their key strength and long distance may even be dangerous to their body which is still in the growth phase. Conversely, master athletes prefer longer distances since with aging it is natural to lose some speed while endurance can still be trained and it takes more years to decay.

The answer is in the data: a data scientist is all we need

Now that my age is similar to the age of those athletes once I considered being senior, I can finally start to tackle the challenge more seriously and test my limit on the 5000m. This time, I will also have a new ally ready to stay by my side during this long adventure besides my willpower and a good coach. I didn’t do sport for many years while attending college, but all the time I spent accumulating all my data science knowledge should find a way to pay me back.

The goal of this post is thus to study the peak performance of the 5000m event and how it varies with age. I’m going to do it with a data-driven approach leveraging powerful visualization tools on a dataset containing historical data of the best 5000m performances of athletes associated with the Italian Federation of athletics. More details about the dataset are given immediately in the next section.

Track and field dataset overview and where to find it

The data I’m going to use for this research contains over 900k entries of the best performances of athletes that are associated with or used to be associated with an Italian team (including myself) recognized by the FIDAL (Federazione Italiana di Atletica Leggera). It covers more than 20 track and field running events from 2005 to 2021 which range from very short distances such as 60m indoors and 100m outdoors to extreme endurance running events like the marathon or the 100km road.

The dataset, specifically scraped for this post, is publicly available on Kaggle and the data has been scraped from the online official ranking page of FIDAL. The script used to scrape the data is open-source and can be consulted on my GitHub page.

Peak performance analysis of the 5000m event in 5 charts

1. Number of athletes PB by age

As a quick warmup, we can introduce this data analysis section by visualizing the age at which athletes ran their 5000m PB (Personal Best). At a first glance, it looks immediately clear that high-school athletes are still far away from their true potential at this distance.

Number of athletes PB by age

Athletes that are less than 18 years old still have to develop their endurance for 5000m. They usually focus on shorter distances, such as 800m and 15000m where speed can play to their advantage. The majority of athletes ran their PB between 18 to 19 years old. The reason for these precocious best performances is probably due to the fact that most junior athletes leave athletics before or during college (me included).

After peaking before 20 years old, the number of 5000m participants that set their PB has another small peak after 40 years old due to the participation of master athletes that decide to resume or start their athletics career later in their life.

However, before drawing any consideration on peak performance, we should keep in mind that this data includes PB’s data from all athletes, amateur, and professionals, but given the huge proportion of the former category, this data could be biased by athletes terminating their career too early or beginning too late, both outside the window of what could be their peaking age.

2. Average time by age

The next chart completes the previous one by representing the average athletes’ PB by age. While the previous chart only gave us a statistical count on the number of PBs at each age, this visualization focuses on the quality of the PB, thus on the time.

average time by age

The lowest average PB for 5000m is around 23 years old for both genders. At that age, males have an average time of 16’26 and females run in an average of 18’56. For females, the performance decline after peak age is slower than in males during the 20-40 age group while it accelerates almost equally for both genders after that threshold. Finally, performance decline gets sharper for athletes that are more than 60 years old.

3. Number of top 300 athletes PB by age

For the next chart, we are going to filter the data such as to retain only the top 300 athletes with the fastest PB. This process filters most amateur athletes thus focusing our scope mostly on professional athletes.

Number of top 300 athletes PB by age

Most male elite athletes ran their PB when they were between 22 and 26 years old while female elite athletes ran it when they were from 20 to 22. However, a significant number of females even set their PB when they were over 40. Hence, we can identify the 5000m peak performance for males and females at around 25 and 21 years old respectively.

However, we should still keep in mind that this analysis only considers a small sample of 300 elite athletes and a bigger population should be analyzed to draw more precise conclusions.

4. Age of top 100 athletes PB

Following, we further filter the number of athletes keeping only the top 100, and zoom into their specific PB.

Age of top 100 athletes PB

Most of the male athletes that have a sub-13’30 PB ran it when they were 24 to 28 years old. Similarly, most of the female athletes that have a sub-15’30 PB ran it when they were 26 to 31 years old.

Even though female athletes seem to peak earlier, they can keep high performance even at more advanced ages. Males in the 30-35 age group can’t match the PBs of males in the 20-25 age group. Conversely, some women in the 30-35 age group ran even faster than the women in the 20-25 group. Another fact supporting the women’s longer peak conclusion is that, while only 2 males ran their PB at over 35 years old, there are 5 females who made it.

We can conclude this peak performance analysis by visualizing some useful insights into age-related performance thresholds with the following cumulative distribution chart.

5. Number of athletes PB under different time thresholds

Number of athletes PB under different time thresholds

From 2005 to 2021 only 76 unique male athletes ran 5000m under 14’00, and 496 athletes ran under 15’00. Regarding females, only 29 athletes ran under 16’00, 154 athletes ran under 17’00, and 446 athletes ran a sub 18’00. For both genders, the curve starts leading into its plateau for times slower than 22’00.

Trust the data but listen to your body

It’s never too late to start running: I started twice, once at 12 and the second time at 24 years old. During my first cycle, my PB was a 17’33 run at 15 years old. When I started running again, even running my new first sub-20′ has been a huge milestone for me. At the moment of writing this article, I have been able to reach first my previous best shape and then, after plenty of failed attempts, I’m on the way to breaking the 16′ barrier.

As we have seen in the above charts, even athletes close to the 40-year-old threshold can compete with younger elite 5000m athletes. The available data showed us that 21 to 26 years old is the age when elite athletes tend to run their PB. However, the peak age is not defined by a mathematical model, but it is influenced by genetics and environmental factors.

In most cases, athletes’ biological age doesn’t correspond to their legal age and their biological age may not necessarily correspond to the peak performance age. Thus peak performance age should only be taken as a reference but it shouldn’t discourage you to run your dream time and climbing the rankings, no matter your age.

Myself running: when I was 15 (left) and 25 (right)
Myself running: when I was 15 (left) and 25 (right)

What to do next

Leave a comment