There is nothing new
about using biometric data to estimate an athlete's level of
"fitness." Researchers have been perfecting various measurements of
fitness for over forty years. But it wasn't until the advent of the fitness
tracker or GPS watch that the underlying equations behind these measurements
could be applied to the average weekend warrior. You won't find this data in
Samsung Health, and likely not in Apple Health either, because those companies
are more interested in producing consumer products for people who like to count
steps and receive phone calls through their wrists. The companies that are
serious about competitive training, though, do provide us with this data.
Every company that
is interested in providing this kind of data to users has their own preferred
approach. It is all more or less based
on upon the same underlying principles, but it's served up slightly
differently. The numbers mean slightly different things, depending on how
they've been calculated. For our purposes here, I'll compare Garmin's
"Training Status" and Strava's "Fitness & Freshness"
measurements.
Both Training Status
and Fitness & Freshness are calculated based primarily on moving averages
of "Training Load," and both approaches are pretty interesting based
on what the teams who designed them wanted to accomplish.
To be brief,
"training load" is a measurement of how much exercise you've done
recently, and how vigorous that exercise has been. "How much" is easy
to determine simply by adding up how many hours, minutes, and seconds you've
spent exercising over a given calendar period. "How vigorous" is a
question that ultimately comes down to a physiologist's preferred measurement
of workout intensity. Strava's team prefers to analyze heart rate data during
exercise. The higher your heart rate, the harder you're exercising. Garmin's
team, by contrast, prefers to measure exercising intensity with a slightly more
technical analysis: excess post-exercise oxygen consumption (EPOC). To that
end, my guess is that Garmin estimates EPOC by analyzing how long it takes the
athlete to recovery from a given exercise session.
Both of these
measurements have pros and cons. One mark in Strava's favor is the relative
simplicity of the calculation. Time + heart rate = load. (That's not the exact
calculation, but you get the idea.) But the drawback to a calculation this
simple is that anaerobic activity can increase a person's heart rate without
doing much in the way of training load for, say, cycling. A competitive cyclist
can get her heart rate up during a 30-minute arms workout without impacting her
overall "training load" for cycling. In fact, she might go out for a
30-mile ride immediately following the arms workout without feeling too much
different than she would have otherwise. By contrast, Garmin's EPOC calculation
will capture that level of nuance here. That same cyclist's post-weight-lifting
EPOC will be quite short compared to her 30-mile ride, and her
Garmin-calculated Training Load will adjust accordingly.
But a mark against
Garmin's concept of Training Load is that it fails to account for real-world
factors. What I mean is, Garmin can't measure EPOC directly through biometric
testing, so they estimate it through heart rate measurements. If I go for a
ten-mile run, and then get caught in bad traffic on the drive home, Garmin's
calculations will erroneously assume that I'm having a hard time recovering
from my run, and my Training Load number will rise. If I take a nap or sit in a
hot tub immediately following my run, I'll have a much better EPOC profile, and
my Training Load number will fall. So different non-exercise circumstances can
impact Garmin's estimate of Training Load even when they probably shouldn't.
To complete things,
Garmin outputs a "Training Status," based on a 7-day moving average
of Training Load combined with the athlete's VO2 max data. That's not a bad
estimate, but there is a problem in that VO2 max is a measurement that doesn't
tend to move much. When it does, it moves steadily over time; it doesn't tend
to fluctuate a lot over a 7-day period. It likely doesn't change much at all in
a week. Some of the underlying data used to estimate
VO2 max, however, can change: namely, if you have a birthday this week, your
age will change; if your weight tends to fluctuate based on water weight, or
diet, or menstruation, or any of the other things that make small impacts to a
person's weight, the number you see on the scale will change. These things can
have a statistically relevant impact on the output of the VO2 max estimation
equation. But remember: it's
just an equation. It aims to estimate VO2 max. If your estimate changes
by a point here and there, it's unlikely that your VO2 max actually changed.
It's far more likely that you had some slight weight fluctuation or something.
The result of all
this is a "Training Load" and "Training Status" output that
is roughly on point, but somewhat confusing. Take a look at mine:
Over this period, I
inexplicably vacillated between "productive" and
"maintaining" before finally ending up at "unproductive."
Then I went back to vacillating during my recovery week. It wasn't until the
last three days that Garmin recognized I was actually recovering. And, I hasten to add, I am training under a training
plan supplied by Garmin through the Garmin Connect app itself.
That said, Garmin
did get things right in general. At the
end of my third week of training, I had run nine consecutive days and was
feeling tired, so "unproductive" might not be linguistically
accurate, but it was certainly true that I needed some rest. And Garmin did
recognize the recovery week eventually.
Strava's
"Fitness & Freshness" curves are based on what they call an
"impulse response model." That sounds fancy, but all it really means
is that Strava uses a weighted moving average of training load based on
activity duration and heart rate. Precisely how they choose to weight the
moving average is a mystery to me, but when compared to Garmin's data, Strava's
seems to place slightly more weight on the past. While Garmin states with
certainty that their output is based on a 7-day moving average, Strava does not
state how long their time window is. I would venture to guess, though, that
their time window is three weeks.
Why three weeks?
Because when you access Strava's "weekly effort" graphs from their
mobile app (these graphs are strangely unavailable in the browser portal), the
area denoting "consistent training" on the graph adjusts based on the
previous three weeks. I can see this by watching how it moves with my
week-to-week effort.
The result of this
longer time window provides what I believe to be a better overall measure of a
person's fitness level. Here's a piece
of my Fitness and Fatigue curves, covering my recent training regimen:
As you can see,
Strava tracked my fitness level as increasing over the first three weeks of
training; then, during my recovery week, my fitness curve stayed relatively
flat, while my fatigue curve fell. This is, at the least, an accurate
representation of what my training schedule was supposed
to achieve.
On the other hand,
take a look at the local maximum in that graph. On March 10, I went for a long
run and in doing so achieved a fitness level of 81, and a fatigue level of 114.
How should an athlete interpret that
kind of information? Strava supplies a third number, called "Form,"
which is nothing more than the arithmetic difference between Fitness and
Fatigue. This should correspond to how "fresh" I was feeling that
day. Using this data, I can say that I was fit, but fatigued. Strava seems to
have accurately assessed my feelings. What they didn't do was give me a direct
recommendation, as Garmin did. Garmin told me right then and there that my
training was getting unproductive and I needed rest.
There is no
"right answer" here. I find both sets of data useful in their own
way. But I am a very atypical athlete. Most people who use GPS watches aren't
used to calculating various weighted averages and applying statistical models
to time series. It just so happens that I do this for a living, and my great
familiarity with data science puts me at an advantage for interpreting
calculations like these.
The average athlete
-- i.e., the average person who does not work in data science -- needs a little
more help interpreting this information. To that end, I can tell you this:
Garmin's Training Load and Training Status numbers jump around a bit, because they
only look at your most recent training week; but they tend to get close to a
good recommendation if you're seeing the same output two or three days in a
row. Meanwhile, Strava's Fitness & Freshness gives you good perspective in
your overall response to training, but you should probably not take the data
too seriously if you are not actively engaged in an actual training plan of
some kind.
Always take this
data with a grain of salt. But if you can manage to think like a
biostatistician, you can get some good information out of these numbers.