2019-03-19

Comparing Garmin's Training Status And Strava's Fitness & Freshness


There is nothing new about using biometric data to estimate an athlete's level of "fitness." Researchers have been perfecting various measurements of fitness for over forty years. But it wasn't until the advent of the fitness tracker or GPS watch that the underlying equations behind these measurements could be applied to the average weekend warrior. You won't find this data in Samsung Health, and likely not in Apple Health either, because those companies are more interested in producing consumer products for people who like to count steps and receive phone calls through their wrists. The companies that are serious about competitive training, though, do provide us with this data.

Every company that is interested in providing this kind of data to users has their own preferred approach. It is all more or less based on upon the same underlying principles, but it's served up slightly differently. The numbers mean slightly different things, depending on how they've been calculated. For our purposes here, I'll compare Garmin's "Training Status" and Strava's "Fitness & Freshness" measurements.

Both Training Status and Fitness & Freshness are calculated based primarily on moving averages of "Training Load," and both approaches are pretty interesting based on what the teams who designed them wanted to accomplish.

To be brief, "training load" is a measurement of how much exercise you've done recently, and how vigorous that exercise has been. "How much" is easy to determine simply by adding up how many hours, minutes, and seconds you've spent exercising over a given calendar period. "How vigorous" is a question that ultimately comes down to a physiologist's preferred measurement of workout intensity. Strava's team prefers to analyze heart rate data during exercise. The higher your heart rate, the harder you're exercising. Garmin's team, by contrast, prefers to measure exercising intensity with a slightly more technical analysis: excess post-exercise oxygen consumption (EPOC). To that end, my guess is that Garmin estimates EPOC by analyzing how long it takes the athlete to recovery from a given exercise session.

Both of these measurements have pros and cons. One mark in Strava's favor is the relative simplicity of the calculation. Time + heart rate = load. (That's not the exact calculation, but you get the idea.) But the drawback to a calculation this simple is that anaerobic activity can increase a person's heart rate without doing much in the way of training load for, say, cycling. A competitive cyclist can get her heart rate up during a 30-minute arms workout without impacting her overall "training load" for cycling. In fact, she might go out for a 30-mile ride immediately following the arms workout without feeling too much different than she would have otherwise. By contrast, Garmin's EPOC calculation will capture that level of nuance here. That same cyclist's post-weight-lifting EPOC will be quite short compared to her 30-mile ride, and her Garmin-calculated Training Load will adjust accordingly.

But a mark against Garmin's concept of Training Load is that it fails to account for real-world factors. What I mean is, Garmin can't measure EPOC directly through biometric testing, so they estimate it through heart rate measurements. If I go for a ten-mile run, and then get caught in bad traffic on the drive home, Garmin's calculations will erroneously assume that I'm having a hard time recovering from my run, and my Training Load number will rise. If I take a nap or sit in a hot tub immediately following my run, I'll have a much better EPOC profile, and my Training Load number will fall. So different non-exercise circumstances can impact Garmin's estimate of Training Load even when they probably shouldn't.

To complete things, Garmin outputs a "Training Status," based on a 7-day moving average of Training Load combined with the athlete's VO2 max data. That's not a bad estimate, but there is a problem in that VO2 max is a measurement that doesn't tend to move much. When it does, it moves steadily over time; it doesn't tend to fluctuate a lot over a 7-day period. It likely doesn't change much at all in a week. Some of the underlying data used to estimate VO2 max, however, can change: namely, if you have a birthday this week, your age will change; if your weight tends to fluctuate based on water weight, or diet, or menstruation, or any of the other things that make small impacts to a person's weight, the number you see on the scale will change. These things can have a statistically relevant impact on the output of the VO2 max estimation equation. But remember: it's just an equation. It aims to estimate VO2 max. If your estimate changes by a point here and there, it's unlikely that your VO2 max actually changed. It's far more likely that you had some slight weight fluctuation or something.

The result of all this is a "Training Load" and "Training Status" output that is roughly on point, but somewhat confusing. Take a look at mine:

Over this period, I inexplicably vacillated between "productive" and "maintaining" before finally ending up at "unproductive." Then I went back to vacillating during my recovery week. It wasn't until the last three days that Garmin recognized I was actually recovering. And, I hasten to add, I am training under a training plan supplied by Garmin through the Garmin Connect app itself.

That said, Garmin did get things right in general. At the end of my third week of training, I had run nine consecutive days and was feeling tired, so "unproductive" might not be linguistically accurate, but it was certainly true that I needed some rest. And Garmin did recognize the recovery week eventually.

Strava's "Fitness & Freshness" curves are based on what they call an "impulse response model." That sounds fancy, but all it really means is that Strava uses a weighted moving average of training load based on activity duration and heart rate. Precisely how they choose to weight the moving average is a mystery to me, but when compared to Garmin's data, Strava's seems to place slightly more weight on the past. While Garmin states with certainty that their output is based on a 7-day moving average, Strava does not state how long their time window is. I would venture to guess, though, that their time window is three weeks.

Why three weeks? Because when you access Strava's "weekly effort" graphs from their mobile app (these graphs are strangely unavailable in the browser portal), the area denoting "consistent training" on the graph adjusts based on the previous three weeks. I can see this by watching how it moves with my week-to-week effort.

The result of this longer time window provides what I believe to be a better overall measure of a person's fitness level. Here's a piece of my Fitness and Fatigue curves, covering my recent training regimen:
As you can see, Strava tracked my fitness level as increasing over the first three weeks of training; then, during my recovery week, my fitness curve stayed relatively flat, while my fatigue curve fell. This is, at the least, an accurate representation of what my training schedule was supposed to achieve.

On the other hand, take a look at the local maximum in that graph. On March 10, I went for a long run and in doing so achieved a fitness level of 81, and a fatigue level of 114. How should an athlete interpret that kind of information? Strava supplies a third number, called "Form," which is nothing more than the arithmetic difference between Fitness and Fatigue. This should correspond to how "fresh" I was feeling that day. Using this data, I can say that I was fit, but fatigued. Strava seems to have accurately assessed my feelings. What they didn't do was give me a direct recommendation, as Garmin did. Garmin told me right then and there that my training was getting unproductive and I needed rest.

There is no "right answer" here. I find both sets of data useful in their own way. But I am a very atypical athlete. Most people who use GPS watches aren't used to calculating various weighted averages and applying statistical models to time series. It just so happens that I do this for a living, and my great familiarity with data science puts me at an advantage for interpreting calculations like these.

The average athlete -- i.e., the average person who does not work in data science -- needs a little more help interpreting this information. To that end, I can tell you this: Garmin's Training Load and Training Status numbers jump around a bit, because they only look at your most recent training week; but they tend to get close to a good recommendation if you're seeing the same output two or three days in a row. Meanwhile, Strava's Fitness & Freshness gives you good perspective in your overall response to training, but you should probably not take the data too seriously if you are not actively engaged in an actual training plan of some kind.

Always take this data with a grain of salt. But if you can manage to think like a biostatistician, you can get some good information out of these numbers.

No comments:

Post a Comment