Your watch says 54. The lab says 61. You trained harder than ever this block, and the number went down.
That’s the Garmin VO2max experience for a lot of competitive athletes. The number isn’t useless. But it isn’t what Garmin’s marketing implies either. Here’s what independent research actually shows, and how to use wearable VO2max without fooling yourself.
What the 6.85% MAPE Number Actually Means
MAPE stands for Mean Absolute Percentage Error. It’s the average miss, across a group of athletes, between what the watch estimated and what a metabolic cart measured in a lab.
A 2023 study by Carrier et al. tested the Garmin fēnix 6 against spirometry-based lab data in 21 athletes. The result: 6.85% MAPE, with a concordance correlation coefficient of 0.70. For a mid-fitness recreational runner with a true VO2max of 50 ml/kg/min, that’s an average miss of about 3.4 points. Not bad.
But MAPE is a group average. It hides the individual spread.
The INTERLIVE meta-analysis pooled 14 studies and 403 participants. Group-level bias was near zero (-0.09 ml/kg/min). That sounds precise. The individual-level limits of agreement, though, spanned -9.92 to +9.74 ml/kg/min. Translation: your specific reading, on a specific day, could be nearly 10 points too high or 10 points too low.
A lab CPET has a test-retest coefficient of variation of just 1.98%, an ICC of 0.984, and a minimum detectable difference of 2.14 ml/kg/min. Succi et al., 2023. Garmin’s individual limits of agreement span 19.66 ml/kg/min (-9.92 to +9.74). The lab’s 95% repeatability window is 2.14 ml/kg/min. The watch’s noise band is roughly 9 times wider. Think of the lab as a precise thermometer and the watch as a mood ring that usually points in the right direction.
The Fitness-Level Trap in Garmin VO2max Accuracy
Here’s what most watch reviews miss. Garmin doesn’t make one accuracy claim. It makes a spectrum of claims depending on who you are.
Engel et al., 2025 split 35 runners into two groups at a VO2max cut-off of 59.8 ml/kg/min. The results were striking:
- Moderately trained athletes (VO2max at or below 59.8): MAPE of 4.1% on the first test and 2.8% on the second. The watch was genuinely useful here.
- Highly trained athletes (VO2max above 59.8): MAPE of 10.4% and 9.4%. The watch underestimated by a mean of 6.3 ml/kg/min.
Three independent studies tell the same story across different Garmin models and fitness-tier cut-offs:
| Study | Low VO2max (<45 ml/kg/min) | Moderate (45–55) | High (>55–60) |
|---|---|---|---|
| Engel 2025 (Forerunner 245) | — | 2.8–4.1% MAPE | 9.4–10.4% MAPE |
| Düking 2022 (Forerunner 245) | 7.1% (overestimate) | 4.1% MAPE | 6.2% MAPE (underestimate) |
| Passler 2019 (Forerunner 920XT) | — | 7.3% overall | Higher in fit subgroup |
Sources: Engel et al. 2025, Düking et al. 2022, Passler et al. 2019.
Six points is not noise. A runner at a true 65 ml/kg/min might consistently see 58-59 on their watch. That’s a meaningful gap if you’re using the number to guide training intensity, set zones, or compare yourself to competitors.
The pattern makes sense once you understand the algorithm.
How Garmin’s Algorithm Actually Works
Garmin’s VO2max engine comes from Firstbeat Technologies. Here’s the core idea, stripped of marketing language.
The watch takes your GPS speed and your heart rate from each run segment. It uses a linear relationship between pace and oxygen cost to estimate how much oxygen each segment requires. Then it asks: “Given this pace and this heart rate, what VO2max does this athlete need to have?”
Each segment gets a reliability score. Segments where heart rate and speed are poorly correlated (downhill running, traffic stops, cardiovascular drift on long runs) get discarded. The final VO2max estimate is a reliability-weighted average of the segments that passed.
The problem for elite runners: their heart rate barely moves across a wide range of paces. The algorithm extrapolates from a narrow HR-speed relationship to a theoretical maximum, and that extrapolation breaks down at the top end. The watch guesses at a number it can’t actually observe, and it guesses low.
That explains the 6.3 ml/kg/min underestimate. The watch isn’t broken. It’s doing math with inputs that don’t have enough range to resolve the question accurately.
The Firstbeat white paper also notes that a 15 bpm error in your max HR setting inflates VO2max error by 7-9%. If your watch thinks your max HR is 185 and it’s actually 172, that error compounds everything downstream. Age-predicted formulas are unreliable. Set your max HR from a real max-effort workout.
What Garmin Gets Right: Device Comparison
Against other consumer options, Garmin holds up well. Passler et al., 2019 put the Garmin Forerunner 920XT at 7.3% MAPE and the Polar V800 at 13.2% in the same study. The difference comes from the algorithm type: Garmin uses an exercise-based approach that requires you to actually run. Polar’s older resting-based method guesses from HR at rest, which is far less informative.
Apple Watch Series 7 fares worst in the published data. Caserman et al., 2024 found 15.79% MAPE overall. In the excellent-fitness subgroup, it hit 21.47% with a -12.00 ml/kg/min underestimation. That’s not a wearable fitness metric. That’s a rough directional indicator at best.
| Device | Algorithm Type | Overall MAPE | Fit Athlete MAPE | Bias Direction |
|---|---|---|---|---|
| Garmin fēnix 6 (1-min avg) | Exercise-based | 6.85% | ~10% (estimated) | Underestimates fit athletes |
| Garmin Forerunner 245, mod. trained | Exercise-based | ~4.1% | — | Near zero |
| Garmin Forerunner 245, highly trained | Exercise-based | ~10.4% | 10.4% | -6.3 ml/kg/min |
| Garmin Forerunner 920XT | Exercise-based | 7.3% | — | -2.1 ml/kg/min |
| Polar V800 | Resting-based | 13.2% | — | +3.0 ml/kg/min |
| Apple Watch Series 7 | Resting-based | 15.79% | 21.47% | -12.0 ml/kg/min |
Sources: Carrier 2023, Engel 2025, Passler 2019, Caserman 2024.
Garmin wins this comparison. That’s not the same as saying Garmin is accurate enough to replace a lab.
Case Study: Marcus, 41, Ironman Athlete
Marcus had a true lab VO2max of 63 ml/kg/min, confirmed at a sports medicine clinic before his Ironman build. His Garmin Forerunner 945 read 57, then 56, then 58 across three months of solid training. He was confused. His fitness score (CTL) in AthleteOS was climbing from 72 to 94. His pace at threshold was improving by 12 seconds per mile. But the VO2max number barely moved.
He wasn’t getting less fit. He was getting more fit. Garmin was just underestimating him by the 6-7 points consistent with what Engel’s research predicts for his fitness tier.
His mistake was anchoring on the watch number. When he switched to tracking pace-at-threshold and his fitness score trend, the picture became clear. He raced to a 4:52 Ironman bike split on a goal of 5:00. His VO2max number that day? The watch said 57.
The watch wasn’t tracking his fitness. His fitness was outrunning the watch’s ability to measure it.
The Noise Floor: What Change Is Real?
Given individual LoA of ±9.83 ml/kg/min, a meaningful change threshold needs to be higher than typical week-to-week fluctuation.
Conservative rule: a real change requires movement of at least 4-5 ml/kg/min sustained across multiple sessions over 6-8 weeks. Not a 1-point Tuesday-to-Sunday shift.
Meaningful Garmin VO2max change = sustained movement of 4-5 points over 6-8 weeks
Lab CPET meaningful change = 2.14 ml/kg/min (minimum detectable difference)
A lab can detect a 2-point improvement. Your watch needs a 4-5 point shift before you can trust it’s real. That’s a useful filter, not a reason to throw the watch away.
What your watch can do well: track a 12-month trajectory. A reader who was at 43 a year ago and is now reading 51 has almost certainly made real aerobic gains. The trend over months is more reliable than any single value.
Understanding the aerobic base building that drives VO2max improvements helps you interpret these trends correctly. So does tracking training load with CTL and ATL, which gives you parallel confirmation that real fitness is accumulating.
How AthleteOS Handles Noisy VO2max Data
AthleteOS treats your Garmin VO2max as a noisy estimator. It doesn’t report the raw number as ground truth.
Instead, AthleteOS’s session analysis cross-references three signals:
- Garmin VO2max trend over a rolling 6-week window
- Pace-at-threshold change across comparable workouts
- HRV baseline trend (rising HRV with rising load = positive adaptation)
When Garmin VO2max rises but pace-at-threshold isn’t moving, AthleteOS flags it as a likely estimator artifact. When Garmin VO2max is flat but pace-at-threshold is improving alongside a rising fitness score, AthleteOS surfaces the threshold trend as the real signal and notes that the watch is probably underestimating a fit athlete.
This is exactly the scenario that affects athletes above 60 ml/kg/min, where Garmin’s MAPE exceeds 10%.
The chest strap vs optical HR accuracy question matters here too. Garmin’s algorithm uses heart rate as its primary input. Optical HR errors from wrist sensors compound the VO2max error. Using a chest strap during the runs that update your VO2max estimate improves accuracy meaningfully, particularly at high intensities where optical sensors lose reliability.
Your VO2max estimate is only as good as your HR data. And your HR data is only as good as how you measured it.
The watch gives you a direction. The lab gives you coordinates. Know which one you’re holding before you navigate.
Start tracking the metrics that actually move with training at AthleteOS and see how your Garmin data compares to your real fitness trend.