Dose-response relationship between evening exercise and sleep

Josh Leota, David M. Presby, Flora Le, Mark É. Czeisler, Luis Mascaro, Emily R. Capodilupo, Joshua F. Wiley, Sean P. A. Drummond, Shantha M. W. Rajaratnam, Elise R. Facer-Childs
Nature Communications
School of Psychological Sciences, Faculty of Medicine, Nursing and Health Sciences, Monash University, 18 Innovation Walk, Clayton 3800, Australia.

Table of Contents

Overall Summary

Study Background and Main Findings

This study investigated the relationship between the timing and strain (a measure combining intensity and duration) of evening exercise and subsequent sleep quality and physiological recovery. It aimed to address the conflict between general recommendations promoting exercise for sleep and concerns that strenuous evening activity might be disruptive. The research leveraged a large dataset from 14,689 physically active adults who used a wrist-worn biometric device (WHOOP) over a one-year period, totaling over 4 million nights of data. This observational approach allowed researchers to examine real-world exercise and sleep patterns.

The core finding was a dose-response relationship: exercising later in the evening (closer to bedtime) and engaging in higher strain activities were associated with negative sleep outcomes. Specifically, these patterns were linked to taking longer to fall asleep (delayed sleep onset), shorter total sleep duration, and lower objective sleep quality. For example, compared to light exercise, maximal strain exercise ending two hours after habitual sleep onset was associated with falling asleep over an hour later (sum of contrasts in Table 1 suggests ~74 min delay vs light) and a 15% higher heart rate during sleep.

Furthermore, later and higher-strain evening exercise was associated with physiological signs of poorer recovery during sleep, indicated by a higher nocturnal resting heart rate (RHR) and lower nocturnal heart rate variability (HRV). HRV reflects the variation in time between heartbeats and is a marker of autonomic nervous system balance; lower HRV often indicates greater physiological stress or incomplete recovery. These effects were most pronounced when exercise concluded within four hours before, or up to two hours after, an individual's typical bedtime.

The study concludes that while exercise is generally beneficial, high-strain exercise performed close to bedtime can indeed disrupt sleep and delay physiological recovery processes. Based on the finding that exercise ending four or more hours before sleep onset showed minimal negative associations, the authors recommend individuals aim to finish workouts, particularly strenuous ones, at least four hours before sleep. If exercising within this window is necessary, choosing lighter strain activities may help mitigate potential sleep disruption.

Research Impact and Future Directions

This study provides compelling, large-scale observational evidence suggesting that evening exercise, particularly when performed at high intensity and duration (high strain) and close to bedtime, is associated with measurable negative impacts on subsequent sleep and autonomic nervous system recovery in physically active adults. The identification of a potential four-hour pre-sleep window where exercise appears less disruptive offers valuable, practical guidance.

It is crucial, however, to interpret these findings as associations rather than definitive proof of causation. The observational nature means that unmeasured factors (e.g., evening light exposure, dietary choices near bedtime, psychological stress levels, specific types of exercise not fully captured by the strain metric) could influence both exercise habits and sleep outcomes. For instance, individuals choosing high-strain evening workouts might also engage in other behaviors that affect sleep. Furthermore, the study population consists of physically active individuals using a specific biometric device; the findings may not directly generalize to sedentary populations or those with different health profiles. The method for quantifying exercise strain (SHRZS), while innovative, might underestimate the load of activities like strength training, potentially muting the observed effects for individuals engaging heavily in such exercises.

Despite these limitations, the study's strengths – its large sample size, real-world setting, objective measurements, and sophisticated analysis – lend considerable weight to the conclusions. The practical takeaway is that individuals concerned about sleep should consider finishing strenuous workouts at least four hours before their typical bedtime or opting for lighter activities if exercising later. Future research, including intervention studies, could help establish causality and explore these relationships in diverse populations and with different exercise modalities, potentially refining these recommendations further.

Critical Analysis and Recommendations

Large-Scale Objective Data Enhances Robustness (written-content)
Observation: The abstract highlights the study's use of objective data from nearly 15,000 individuals over 4 million person-nights. Impact: This large-scale, real-world dataset provides high statistical power and ecological validity, strengthening confidence in the observed associations between evening exercise patterns and sleep outcomes within a physically active population.
Section: Abstract
Actionable Conclusion and Recommendation (written-content)
Observation: The abstract clearly presents the main practical recommendation: finish exercise ≥4 hours before sleep or choose lighter strain if exercising closer. Impact: This provides clear, actionable guidance directly derived from the study's findings, making the research immediately relevant for individuals seeking to optimize sleep.
Section: Abstract
Explicitly State Dose-Response Nature (written-content)
Observation: The abstract summarizes findings as associations (e.g., 'later timing... are associated with...') but doesn't explicitly use 'dose-response'. Impact: Adding 'dose-response relationship' would more precisely convey the proportional nature of the findings (more strain/later timing linked to greater effects), enhancing immediate understanding for readers familiar with the term.
Section: Abstract
Identifies Key Research Gap (Exercise Strain) (written-content)
Observation: The introduction identifies a gap in prior research which often examined exercise intensity or duration separately, failing to consider their combined effect ('exercise strain'). Impact: This clearly justifies the study's focus on 'strain', positioning the research as a novel contribution to understanding the complex interplay between exercise characteristics and sleep.
Section: Introduction
Provides Physiological Rationale (ANS) (written-content)
Observation: The introduction grounds the research question in autonomic nervous system (ANS) physiology, explaining how high strain exercise might delay the shift towards parasympathetic dominance needed for sleep. Impact: This provides a plausible biological mechanism for the hypothesized effects, strengthening the study's theoretical foundation.
Section: Introduction
Precise Quantitative Findings (written-content)
Observation: The results provide specific quantitative effects, such as maximal vs. light exercise ending 2 hours after habitual sleep onset being associated with a 15.0% (9.4 beats/min) higher nocturnal RHR. Impact: These precise figures quantify the magnitude of the observed associations, adding concrete meaning beyond general trends and aiding in the assessment of practical significance.
Section: Results
Robustness and Subgroup Analyses Support Findings (written-content)
Observation: Secondary analyses using actual sleep onset (vs. habitual) and subgroup analyses (gender, age, BMI) showed consistent results. Impact: This consistency increases confidence in the robustness of the primary findings and suggests the observed associations hold across different analytical approaches and demographic groups within this active population.
Section: Results
Effective Visualization of Dose-Response Relationships (graphical-figure)
Observation: Figures 1 and 2 use GAMM curves to effectively visualize the non-linear, dose-dependent relationships between exercise timing/strain and outcomes like sleep duration and RHR. Impact: These visualizations clearly communicate the core findings, showing how effects intensify closer to bedtime and with higher strain, making complex statistical results accessible.
Section: Results
Clear Synthesis of Main Findings (written-content)
Observation: The discussion clearly synthesizes the core finding: a dose-dependent negative association between later, higher-strain evening exercise and objective sleep/autonomic measures. Impact: This provides a concise takeaway message summarizing the study's main contribution.
Section: Discussion
Contextualization with Prior Research (written-content)
Observation: The discussion contextualizes findings with prior meta-analyses, suggesting null effects previously reported might be due to those studies predominantly including lower-strain exercise. Impact: This reconciles the study's results with existing literature and highlights the specific contribution regarding high-strain exercise.
Section: Discussion
Explicitly Link SHRZS Limitation to Potential Impact (written-content)
Observation: The discussion acknowledges limitations like the SHRZS method potentially underestimating strength training strain but doesn't explicitly state the potential consequence for the results. Impact: Explicitly linking this limitation to a potential underestimation of the negative impact of high-strain evening exercise for strength-focused individuals would enhance the critical evaluation of the findings' scope.
Section: Discussion
Large-Scale Data with Validated Objective Measures (written-content)
Observation: The study used a validated biometric device (WHOOP) for objective, continuous data collection in a large, real-world cohort. Impact: This enhances the reliability and ecological validity of the sleep and heart rate data compared to lab-based studies or self-report measures, increasing confidence in the findings.
Section: Methods
Sophisticated Operationalization of Exercise Timing (written-content)
Observation: Exercise timing was calculated relative to each individual's habitual sleep onset, adjusted for month and day type. Impact: This sophisticated approach accounts for individual sleep patterns, seasonality, and social jetlag, providing a more biologically relevant measure of exposure timing than simple clock time.
Section: Methods
Nuanced Quantification of Exercise Strain (SHRZS) (written-content)
Observation: Exercise strain was quantified using SHRZS, combining intensity (time in HR zones relative to individual HRmax) and duration. Impact: This provides a more nuanced measure of physiological load than intensity or duration alone, allowing for a better understanding of how the overall exercise challenge relates to sleep.
Section: Methods
Provide Rationale for SHRZS Strain Category Cutoffs (written-content)
Observation: The rationale for the specific numerical cutoffs used to define Light, Moderate, High, and Maximal SHRZS categories is not provided. Impact: Explaining whether these cutoffs are based on statistical distributions, physiological thresholds, or established guidelines would improve transparency and aid interpretation and comparison with other studies.
Section: Methods

Section Analysis

Abstract

Key Aspects

Strengths

Suggestions for Improvement

Introduction

Key Aspects

Strengths

Suggestions for Improvement

Results

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Fig. 1 | Relative exercise timing and strain associations with sleep and...
Full Caption

Fig. 1 | Relative exercise timing and strain associations with sleep and nocturnal autonomic activity. GAMMs demonstrating the relationship between exercise timing relative to habitual sleep onset and A sleep onset, B sleep duration (in hours), C sleep quality, D nocturnal RHR, and E nocturnal HRV at different levels of exercise strain.

Figure/Table Image (Page 3)
Fig. 1 | Relative exercise timing and strain associations with sleep and nocturnal autonomic activity. GAMMs demonstrating the relationship between exercise timing relative to habitual sleep onset and A sleep onset, B sleep duration (in hours), C sleep quality, D nocturnal RHR, and E nocturnal HRV at different levels of exercise strain.
First Reference in Text
Engaging in maximal exercise instead of light exercise is associated with a 6.8% (3.9 beats/min) increase in RHR if the exercise ended 2-h before habitual sleep onset, and a 15.0% (9.4 beats/min) increase in RHR if the exercise ended 2-h after habitual sleep onset (Fig. 1D; Table 4).
Description
  • Figure Content and Axes: This graph (Panel D of Figure 1) shows how the timing and intensity (strain) of evening exercise relate to a person's average heart rate during sleep that night, known as Nocturnal Resting Heart Rate (RHR), measured in beats per minute (BPM). The horizontal axis shows when exercise ended relative to the person's usual bedtime (habitual sleep onset), from 10 hours before (-10) to 2 hours after (+2). The vertical axis shows the resulting nocturnal RHR. The different colored lines represent different levels of exercise intensity or 'strain', categorized as Light, Moderate, High, and Maximal, based on heart rate zones and duration. The lines themselves are generated using a statistical technique called Generalized Additive Mixed Models (GAMMs), which are useful for showing potentially complex, non-linear relationships between variables while accounting for variations between individuals and repeated measurements over time.
  • Key Finding: Exercise Timing and Strain Effect: The graph indicates that exercising closer to bedtime, particularly with higher intensity (strain), is associated with a higher heart rate during sleep. For example, the reference text highlights that compared to light exercise, performing maximal intensity exercise that finishes 2 hours before the usual bedtime is linked to a 6.8% higher nocturnal RHR (an increase of 3.9 beats per minute).
  • Key Finding: Post-Sleep Onset Exercise Effect: The negative effect on RHR is even more pronounced if maximal exercise occurs after the usual bedtime. The reference text states that maximal exercise ending 2 hours after habitual sleep onset leads to a 15.0% higher nocturnal RHR (an increase of 9.4 beats per minute) compared to light exercise ending at the same time relative to sleep onset.
  • Statistical Representation: The shaded areas around each colored line represent the 95% confidence intervals, giving an indication of the statistical uncertainty around the estimated average RHR for each condition. A horizontal dashed grey line shows the average nocturnal RHR on days when no exercise was performed, serving as a baseline.
Scientific Validity
  • Statistical Methodology Appropriateness: The use of Generalized Additive Mixed Models (GAMMs) is appropriate for modeling the expected non-linear relationship between exercise timing relative to sleep onset and nocturnal RHR, while accounting for the nested structure of the data (multiple nights per participant) and covariates (e.g., age, gender, fitness).
  • Data Robustness and Sample Size: The findings are based on a very large dataset (derived from 14,689 participants and over 4 million person-nights), lending considerable statistical power and robustness to the observed associations, particularly regarding the dose-response relationship.
  • Objective Measurement: Nocturnal RHR is an objective physiological measure derived from wearable biometric device data, reducing reliance on subjective reports. The validation of the device against ECG (as cited) supports the reliability of the heart rate measurements.
  • Control for Individual Sleep Patterns: The analysis controls for habitual sleep onset timing adjusted for seasonality and social jetlag, strengthening the interpretation that the observed effects are linked to the exercise timing relative to an individual's typical sleep schedule.
  • Potential Confounding Factors: While the model includes covariates, residual confounding from unmeasured variables (e.g., specific dietary intake pre/post exercise, ambient light exposure during evening exercise, specific type of exercise beyond strain category) cannot be entirely ruled out in this observational study design.
  • Exercise Strain Quantification: The quantification of 'exercise strain' using the SHRZS method, while systematic, might differentially capture the physiological load of various exercise types (e.g., endurance vs. strength training), although the large sample likely encompasses a wide variety.
Communication
  • Visual Representation Clarity: Figure 1D effectively visualizes the dose-response relationship between exercise timing/strain and nocturnal RHR using smoothed GAMM curves. The use of distinct colors for different strain levels and shaded confidence intervals enhances clarity. The x-axis clearly represents exercise ending time relative to habitual sleep onset, and the y-axis represents nocturnal RHR in BPM. The vertical dotted line indicating habitual sleep onset provides a crucial reference point.
  • Textual Summary Effectiveness: The reference text clearly summarizes a key contrast shown in Figure 1D (maximal vs. light exercise at specific time points relative to sleep onset), providing specific quantitative effects (6.8% and 15.0% increases in RHR) which aids interpretation of the graph's magnitude of effect.
  • Magnitude Interpretation: While visually clear, interpreting the magnitude of difference between strain levels requires careful reading of the y-axis or reference to the text/tables, as the absolute differences might appear small visually depending on the scale.
  • Baseline Comparison: The inclusion of the horizontal dashed line representing the mean RHR on non-exercise days provides a useful baseline for comparison across all conditions.
Fig. 2 | Actual exercise timing and strain associations with sleep and...
Full Caption

Fig. 2 | Actual exercise timing and strain associations with sleep and nocturnal autonomic activity. GAMMs demonstrating the relationship between exercise timing relative to actual sleep onset and A sleep duration (in hours), B sleep quality, C nocturnal RHR, and D nocturnal HRV at different levels of exercise strain.

Figure/Table Image (Page 5)
Fig. 2 | Actual exercise timing and strain associations with sleep and nocturnal autonomic activity. GAMMs demonstrating the relationship between exercise timing relative to actual sleep onset and A sleep duration (in hours), B sleep quality, C nocturnal RHR, and D nocturnal HRV at different levels of exercise strain.
First Reference in Text
Secondary analyses examining exercise timing to actual sleep onset were consistent with the primary analyses above. Specifically, the combination of higher exercise strain and later exercise timing relative to actual sleep onset that night was dose-dependently associated with shorter sleep duration, lower sleep quality, higher nocturnal RHR, and lower nocturnal HRV (Fig. 2A-D).
Description
  • Figure Content and Axes: This graph (Panel A of Figure 2) illustrates the association between the timing and intensity (strain) of exercise relative to the actual time an individual fell asleep on a given night, and the total duration of sleep they achieved, measured in hours. The horizontal axis represents when exercise concluded relative to the actual sleep onset time (0 = sleep onset), ranging from 10 hours before (-10) to 2 hours after (+2). The vertical axis shows the total sleep duration in hours.
  • Exercise Strain and Statistical Method: Different colored lines represent different levels of exercise intensity or 'strain' (Light, Moderate, High, Maximal), calculated based on heart rate zones and exercise duration. The smooth curves are generated using Generalized Additive Mixed Models (GAMMs), a statistical method suitable for showing potentially non-linear trends in data collected repeatedly from the same individuals over time, while accounting for other factors.
  • Key Trend Observed: The graph demonstrates that exercising later, especially closer to or after the actual time of falling asleep, is associated with shorter sleep duration. This effect appears more pronounced with higher levels of exercise strain (e.g., the red 'Maximal' line shows a steeper decline in sleep duration as exercise timing approaches and passes actual sleep onset compared to the blue 'Light' strain line).
  • Statistical Representation: The shaded areas depict the 95% confidence intervals, indicating the range of statistical uncertainty around the estimated average sleep duration for each condition. A horizontal dashed grey line shows the average sleep duration on nights following days with no exercise.
Scientific Validity
  • Analysis Type and Complementarity: This panel presents a secondary analysis using actual sleep onset as the reference point, which complements the primary analysis based on habitual sleep onset (Figure 1). This approach provides insight into the immediate physiological impact of exercise timing relative to the sleep that directly follows it, strengthening the overall conclusions by demonstrating consistency across different temporal anchors.
  • Statistical Methodology: The use of GAMMs remains appropriate for modeling the non-linear associations in this secondary analysis, controlling for relevant covariates and the data's structure.
  • Data Robustness: Findings are derived from the same large, robust dataset as the primary analysis, ensuring statistical power.
  • Objective Measurement: Sleep duration is an objectively derived metric from the biometric device, enhancing reliability.
  • Potential Confounding: As with the primary analysis, the observational nature means potential residual confounding cannot be completely excluded, although consistency with Figure 1 findings increases confidence.
Communication
  • Clarity and Consistency: Panel A effectively illustrates the trend of decreasing sleep duration as exercise concludes closer to, or after, the actual onset of sleep, particularly for higher strain levels. The visualization is consistent with Panel B in Figure 1, reinforcing the findings using a different temporal anchor.
  • Visualization of Dose-Response: The use of GAMM smoothing curves clearly depicts the dose-response relationship between exercise timing/strain and sleep duration relative to actual sleep onset.
  • Visual Elements: Distinct colors for strain levels and shaded confidence intervals aid differentiation and interpretation of estimate uncertainty.
  • Baseline Representation: The horizontal dashed line indicating mean sleep duration on non-exercise days serves as a helpful baseline for comparison.
Table 1 | Sleep onset dose-response contrasts at different levels of exercise...
Full Caption

Table 1 | Sleep onset dose-response contrasts at different levels of exercise timing (habitual) and strain

Figure/Table Image (Page 4)
Table 1 | Sleep onset dose-response contrasts at different levels of exercise timing (habitual) and strain
First Reference in Text
Engaging in maximal exercise instead of light exercise is associated with a 6.8% (3.9 beats/min) increase in RHR if the exercise ended 2-h before habitual sleep onset, and a 15.0% (9.4 beats/min) increase in RHR if the exercise ended 2-h after habitual sleep onset (Fig. 1D; Table 4).
Description
  • Table Purpose: Dose-Response Contrasts: This table presents specific numerical comparisons, called 'contrasts', related to the time people fall asleep (sleep onset). It examines how sleep onset changes when the intensity ('strain') of exercise increases, comparing each level to the one below it (e.g., Light exercise vs. No exercise, Moderate exercise vs. Light exercise, etc.). This step-by-step comparison helps understand the 'dose-response' relationship – how increasing the 'dose' (exercise strain) affects the 'response' (sleep onset time).
  • Exercise Timing Categories: The comparisons are shown at different time points when exercise ended relative to the person's usual bedtime ('habitual sleep onset'). These time points range from 10 hours before (-10 h) to 2 hours after (+2 h) the usual bedtime.
  • Interpretation of Values (Minutes): The numbers in the table represent the difference in sleep onset time, measured in minutes, between two consecutive exercise strain levels. For example, the value 11.15 in the 'Mod. - Light' row under the '-2 h' column means that, on average, moderate exercise ending 2 hours before usual bedtime was associated with falling asleep 11.15 minutes later compared to light exercise ending at the same time.
  • Standard Errors: The values in parentheses are 'standard errors', which indicate the amount of statistical uncertainty or variability associated with each estimated difference. Smaller standard errors suggest a more precise estimate.
  • Statistical Significance (Bold Values): Values shown in bold indicate that the difference between the two exercise strain levels at that specific time point is statistically significant, meaning it's unlikely to be due to random chance (using a threshold of p < 0.005). For instance, the 11.15-minute difference mentioned earlier is bolded, indicating significance.
  • Data Source (GAMM Contrasts): These contrasts are derived from statistical models called 'Generalized Additive Mixed Models' (GAMMs), which were used to analyze the relationship shown visually in Figure 1A, accounting for individual differences and the non-linear nature of the data.
  • Key Trend Example: The table shows that as exercise gets closer to bedtime (e.g., -4h, -2h, 0, +2h), the difference in sleep onset delay between successive strain levels generally becomes larger and statistically significant. For example, comparing maximal to high strain exercise ending 2 hours after habitual sleep onset (+2 h) shows a significant additional delay of 46.51 minutes.
Scientific Validity
  • Contrast Analysis Appropriateness: Presenting contrasts between consecutive levels of the ordered factor (exercise strain) is a valid approach to examining a dose-response relationship derived from the GAMM analysis.
  • Statistical Significance Threshold: The use of a stringent p-value threshold (p < 0.005) for statistical significance is appropriate given the very large sample size, reducing the likelihood of declaring trivial effects as significant.
  • Reporting of Uncertainty: Reporting standard errors alongside the point estimates (mean differences) provides crucial information about the precision of the contrasts.
  • Multiple Comparisons Adjustment: The table explicitly states that a multivariate t-distribution adjustment was used for multiple comparisons, which is important for controlling the overall error rate when performing multiple tests.
  • Consistency with Visual Data: The results presented in the table directly quantify the stepwise effects visually suggested in Figure 1A, providing robust statistical support for the observed dose-dependent delay in sleep onset with increasing strain, particularly close to habitual sleep onset.
Communication
  • Quantification of Visual Trends: The table effectively quantifies the pairwise differences between consecutive exercise strain levels at specific time points, complementing the visual representation in Figure 1A.
  • Table Structure and Clarity: The structure is clear, with rows representing the specific contrast being made (e.g., Moderate strain vs. Light strain) and columns representing the time point relative to habitual sleep onset.
  • Indication of Statistical Significance: Using bold font to indicate statistically significant differences (p < 0.005) aids rapid interpretation of key findings.
  • Caption Clarity: The caption clearly states the table's purpose, specifying the outcome (sleep onset), the nature of the values (dose-response contrasts), the timing reference (habitual), and the factors (exercise timing and strain).
  • Footnote Informativeness: Footnotes clearly explain what the values represent (difference in minutes), the meaning of bolded values, the source of standard errors, the statistical test used (GAMM contrasts), and the adjustment method for multiple comparisons.
Table 2 | Sleep duration dose-response contrasts at different levels of...
Full Caption

Table 2 | Sleep duration dose-response contrasts at different levels of exercise timing (habitual) and strain

Figure/Table Image (Page 4)
Table 2 | Sleep duration dose-response contrasts at different levels of exercise timing (habitual) and strain
First Reference in Text
Table S2-6 present estimated marginal means (EMMs) and Tables S7-11 present strain vs. no-exercise contrasts for all analyses.
Description
  • Table Purpose: Dose-Response Contrasts for Sleep Duration: This table shows numerical comparisons ('contrasts') focusing on the total amount of sleep obtained ('sleep duration'). It specifically looks at how sleep duration changes when the intensity ('strain') of exercise increases from one level to the next (e.g., comparing Moderate exercise to Light exercise). This helps reveal the 'dose-response' pattern: how changing the exercise 'dose' (strain) affects the outcome (sleep duration).
  • Exercise Timing Categories: These comparisons are presented for different exercise ending times relative to the person's typical bedtime ('habitual sleep onset'), ranging from 10 hours before (-10 h) to 2 hours after (+2 h).
  • Interpretation of Values (Minutes, Negative Values): The numbers in the table indicate the difference in sleep duration, measured in minutes, between two adjacent exercise strain levels. A negative value means that the higher strain level is associated with shorter sleep duration compared to the lower strain level it's being compared against. For example, the value -6.68 in the 'Mod. - Light' row under the '-2 h' column indicates that moderate exercise ending 2 hours before usual bedtime was associated with 6.68 minutes less sleep compared to light exercise ending at the same time.
  • Standard Errors: The numbers in parentheses are 'standard errors', which quantify the statistical uncertainty around each estimated difference in sleep duration.
  • Statistical Significance (Bold Values): Values highlighted in bold signify that the observed difference in sleep duration between the two strain levels is statistically significant (unlikely to be due to chance, based on a p < 0.005 threshold). For instance, the -6.68 minute difference mentioned above is bolded, indicating significance.
  • Key Trend Example: The table reveals that exercising closer to habitual bedtime (e.g., -4h, -2h, 0, +2h) often leads to statistically significant reductions in sleep duration as exercise strain increases from one level to the next. For example, comparing maximal to high strain exercise ending at habitual sleep onset (0) shows a significant further reduction in sleep duration by 12.02 minutes.
  • Data Source (GAMM Contrasts): These numerical results are derived from the same statistical analysis (Generalized Additive Mixed Models or GAMMs) used to create the visual graph in Figure 1B.
Scientific Validity
  • Contrast Analysis Appropriateness: The presentation of contrasts between consecutive ordered levels of exercise strain is a suitable method for analyzing the dose-response relationship concerning sleep duration, based on the underlying GAMM.
  • Statistical Significance Threshold: Employing a stringent p-value threshold (p < 0.005) for significance is scientifically sound given the large dataset, minimizing false positives.
  • Reporting of Uncertainty: The inclusion of standard errors for each contrast estimate allows for an assessment of the precision of the findings.
  • Multiple Comparisons Adjustment: Explicit mention of the multivariate t-distribution adjustment for multiple comparisons confirms appropriate statistical control for the numerous tests performed.
  • Consistency with Visual Data: The quantitative results in this table align with and provide statistical backing for the visual trends depicted in Figure 1B, showing reduced sleep duration with higher strain exercise closer to habitual sleep onset.
Communication
  • Quantification of Visual Trends: The table effectively quantifies the pairwise differences in sleep duration between consecutive exercise strain levels at specific time points, providing numerical detail for the trends visualized in Figure 1B.
  • Table Structure and Clarity: The layout is clear, with rows detailing the specific contrast (e.g., Moderate vs. Light strain) and columns indicating the timing relative to habitual sleep onset.
  • Indication of Statistical Significance: Using bold font to highlight statistically significant differences (p < 0.005) facilitates quick identification of key results where increasing strain level significantly impacts sleep duration.
  • Caption Clarity: The caption accurately describes the table's content: the outcome (sleep duration), the type of data (dose-response contrasts), the reference point (habitual sleep onset), and the influencing factors (exercise timing and strain).
  • Footnote Informativeness: The footnotes provide essential context, explaining the units (minutes), the meaning of bold values, the source of standard errors, the statistical test used (GAMM contrasts), and the method for adjusting for multiple comparisons.
Table 3 | Sleep quality dose-response contrasts at different levels of exercise...
Full Caption

Table 3 | Sleep quality dose-response contrasts at different levels of exercise timing (habitual) and strain

Figure/Table Image (Page 4)
Table 3 | Sleep quality dose-response contrasts at different levels of exercise timing (habitual) and strain
First Reference in Text
Table S2-6 present estimated marginal means (EMMs) and Tables S7-11 present strain vs. no-exercise contrasts for all analyses.
Description
  • Table Purpose: Dose-Response Contrasts for Sleep Quality: This table focuses on 'sleep quality', which is measured here as 'sleep percentage' (the proportion of time in bed actually spent asleep). It shows numerical comparisons ('contrasts') to understand how sleep quality changes when exercise intensity ('strain') increases from one level to the next (e.g., comparing High strain to Moderate strain). This examines the 'dose-response' effect: how changing the exercise 'dose' (strain) influences sleep quality.
  • Exercise Timing Categories: The comparisons are made for exercises ending at different times relative to the individual's usual bedtime ('habitual sleep onset'), spanning from 10 hours before (-10 h) to 2 hours after (+2 h).
  • Interpretation of Values (Percentage Points, Negative Values): The values in the table represent the difference in sleep quality percentage points between two adjacent exercise strain levels. A negative value signifies that the higher strain level is associated with lower sleep quality compared to the level below it. For example, the value -0.18 in the 'High - Mod.' row under the '-2 h' column means that high-intensity exercise ending 2 hours before usual bedtime was associated with a 0.18 percentage point decrease in sleep quality compared to moderate-intensity exercise ending at the same time.
  • Standard Errors: The numbers in parentheses are 'standard errors', indicating the statistical uncertainty associated with each estimated difference in sleep quality percentage.
  • Statistical Significance (Bold Values): Bolded values indicate that the difference in sleep quality between the two strain levels is statistically significant (unlikely due to chance, using p < 0.005). For example, the -0.18 percentage point difference mentioned above is bolded, signifying statistical significance.
  • Key Trend Example: The table generally shows that within about 8 hours of habitual sleep onset, increasing exercise strain tends to lead to statistically significant decreases in sleep quality, with the magnitude of decrease becoming larger closer to bedtime. For example, comparing maximal to high strain exercise ending 2 hours after habitual sleep onset (+2 h) shows a significant further decrease in sleep quality by 2.74 percentage points.
  • Data Source (GAMM Contrasts): These numerical results are derived from the statistical analysis (Generalized Additive Mixed Models or GAMMs) that generated the visual representation in Figure 1C.
Scientific Validity
  • Contrast Analysis Appropriateness: Using pairwise contrasts between ordered strain levels is a valid statistical approach to assess the dose-response relationship for sleep quality, derived from the GAMM results.
  • Objective Sleep Quality Metric: The sleep quality metric used (Sleep Percentage = [Sleep Period - wake after sleep onset]/Sleep Period) is an objective measure derived from the biometric device, reflecting sleep continuity/fragmentation.
  • Statistical Significance Threshold: Applying a stringent significance threshold (p < 0.005) is appropriate for this large dataset to minimize the risk of type I errors.
  • Reporting of Uncertainty: Reporting standard errors provides essential information on the precision of the estimated differences in sleep quality.
  • Multiple Comparisons Adjustment: The explicit mention of adjusting for multiple comparisons (multivariate t-distribution) ensures appropriate statistical rigor.
  • Consistency with Visual Data: The quantitative findings in Table 3 align with and provide statistical support for the trends visualized in Figure 1C, demonstrating decreased sleep quality with increasing strain, particularly near habitual sleep onset.
Communication
  • Quantification of Visual Trends: The table effectively presents the quantitative differences in sleep quality percentage between successive exercise strain levels at various time points, complementing the visual trends shown in Figure 1C.
  • Table Structure and Clarity: The structure is logical, with rows representing the pairwise comparison (e.g., Moderate vs. Light strain) and columns representing exercise timing relative to habitual sleep onset.
  • Indication of Statistical Significance: Highlighting statistically significant differences (p < 0.005) using bold font allows for efficient identification of key findings where increasing strain significantly impacts sleep quality.
  • Caption Clarity: The caption clearly defines the table's scope, specifying the outcome (sleep quality), the data type (dose-response contrasts), the reference point (habitual sleep onset), and the variables (exercise timing and strain).
  • Footnote Informativeness: Footnotes provide necessary details, including the units (percentage point difference), the definition of significance, the source of standard errors, the statistical method (GAMM contrasts), and the multiple comparisons adjustment.
Table 4 | Nocturnal RHR dose-response contrasts at different levels of exercise...
Full Caption

Table 4 | Nocturnal RHR dose-response contrasts at different levels of exercise timing (habitual) and strain

Figure/Table Image (Page 4)
Table 4 | Nocturnal RHR dose-response contrasts at different levels of exercise timing (habitual) and strain
First Reference in Text
Engaging in maximal exercise instead of light exercise is associated with a 6.8% (3.9 beats/min) increase in RHR if the exercise ended 2-h before habitual sleep onset, and a 15.0% (9.4 beats/min) increase in RHR if the exercise ended 2-h after habitual sleep onset (Fig. 1D; Table 4).
Description
  • Table Purpose: Dose-Response Contrasts for Nocturnal RHR: This table examines 'Nocturnal RHR', which is the average heart rate in beats per minute (beats/min) during sleep. It presents numerical comparisons ('contrasts') showing how much Nocturnal RHR increases when exercise intensity ('strain') is stepped up from one level to the next (e.g., comparing Moderate strain to Light strain). This helps understand the 'dose-response' effect – how increasing the exercise 'dose' (strain) affects the heart rate response during sleep.
  • Exercise Timing Categories: These comparisons are shown for exercises ending at different times relative to the person's usual bedtime ('habitual sleep onset'), from 10 hours before (-10 h) to 2 hours after (+2 h).
  • Interpretation of Values (beats/min): The numbers in the table represent the difference in nocturnal RHR (in beats/min) between two consecutive exercise strain levels. A positive value indicates that the higher strain level is associated with a higher heart rate during sleep compared to the level below it. For example, the value 1.32 in the 'Mod. - Light' row under the '-2 h' column means that moderate exercise ending 2 hours before usual bedtime was associated with a nocturnal RHR that was 1.32 beats/min higher, on average, compared to light exercise ending at the same time.
  • Standard Errors: The values in parentheses are 'standard errors', indicating the statistical uncertainty associated with each estimated difference in RHR.
  • Statistical Significance (Bold Values): Values shown in bold indicate that the difference in nocturnal RHR between the two strain levels is statistically significant (unlikely due to chance, based on p < 0.005). For instance, the 1.32 beats/min difference mentioned above is bolded, indicating significance.
  • Key Trend Example: The table consistently shows statistically significant increases in nocturnal RHR with each step up in exercise strain, particularly when exercise occurs closer to habitual sleep onset. For example, moving from high to maximal strain exercise ending 2 hours after habitual sleep onset (+2 h) is associated with a further significant increase of 4.25 beats/min in nocturnal RHR.
  • Data Source (GAMM Contrasts): These numerical contrasts are derived from the statistical analysis (Generalized Additive Mixed Models or GAMMs) used to generate the visual graph in Figure 1D.
Scientific Validity
  • Contrast Analysis Appropriateness: Presenting dose-response contrasts between consecutive strain levels is a valid method to analyze the impact of increasing exercise intensity on nocturnal RHR, based on the GAMM results.
  • Objective Measurement: Nocturnal RHR is an objective physiological measure derived from the validated biometric device, providing a reliable indicator of autonomic state during sleep.
  • Statistical Significance Threshold: The use of a stringent significance threshold (p < 0.005) is appropriate for the large sample size, reducing the risk of false positives.
  • Reporting of Uncertainty: Reporting standard errors allows assessment of the precision of the estimated differences in RHR.
  • Multiple Comparisons Adjustment: Explicitly stating the adjustment for multiple comparisons (multivariate t-distribution) confirms appropriate statistical control.
  • Consistency with Visual Data: The quantitative data strongly support the visual trends in Figure 1D, demonstrating a clear, statistically significant dose-dependent increase in nocturnal RHR with higher exercise strain performed closer to sleep.
Communication
  • Quantification of Visual Trends: The table effectively quantifies the stepwise increase in nocturnal RHR associated with increasing exercise strain at various times relative to habitual sleep onset, providing numerical support for Figure 1D.
  • Table Structure and Clarity: The layout is clear, with rows indicating the specific comparison between adjacent strain levels and columns representing the time points.
  • Indication of Statistical Significance: Using bold text to denote statistical significance (p < 0.005) aids in quickly identifying where the step-up in exercise strain has a statistically robust impact on RHR.
  • Caption Clarity: The caption clearly states the table's focus: nocturnal RHR, dose-response contrasts, habitual timing reference, and the factors involved (timing and strain).
  • Footnote Informativeness: Footnotes provide essential context regarding the units (beats/min), significance level, standard errors, statistical tests (GAMM contrasts), and multiple comparison adjustments.
Table 5 | Nocturnal HRV dose-response contrasts at different levels of exercise...
Full Caption

Table 5 | Nocturnal HRV dose-response contrasts at different levels of exercise timing (habitual) and strain

Figure/Table Image (Page 5)
Table 5 | Nocturnal HRV dose-response contrasts at different levels of exercise timing (habitual) and strain
First Reference in Text
Engaging in maximal exercise instead of light exercise is associated with a 14.1% (8.3 unit) decrease in HRV if the exercise ended 2-h before habitual sleep onset, and an 32.6% (14.6 unit) decrease in HRV if the exercise ended 2-h after habitual sleep onset (Fig. 1E; Table 5).
Description
  • Table Purpose: Nocturnal HRV (RMSSD): This table focuses on 'Nocturnal HRV' (Heart Rate Variability during sleep), which measures the variation in time between consecutive heartbeats. It's often used as an indicator of the body's stress and recovery state, specifically reflecting the activity of the autonomic nervous system. Higher HRV generally suggests better recovery and a more relaxed (parasympathetic) state. The specific HRV measure used here is RMSSD (Root Mean Square of Successive Differences), a standard way to quantify short-term beat-to-beat variations.
  • Dose-Response Contrasts: The table presents numerical comparisons ('contrasts') showing how much nocturnal HRV changes when exercise intensity ('strain') increases from one level to the next (e.g., comparing Moderate strain to Light strain). This reveals the 'dose-response' pattern: how increasing the exercise 'dose' affects this measure of nervous system activity during sleep.
  • Exercise Timing Categories: These comparisons are shown for exercises ending at different times relative to the person's usual bedtime ('habitual sleep onset'), ranging from 10 hours before (-10 h) to 2 hours after (+2 h).
  • Interpretation of Values (RMSSD Units, Negative Values): The main numbers in the table represent the difference in nocturnal HRV (in RMSSD units) between two consecutive exercise strain levels. Negative values indicate that the higher strain level is associated with lower HRV (less beat-to-beat variation, suggesting reduced parasympathetic activity or higher stress/sympathetic activity) compared to the level below it. For example, the value -2.63 in the 'Mod. - Light' row under the '-2 h' column means that moderate exercise ending 2 hours before usual bedtime was associated with an average nocturnal HRV that was 2.63 RMSSD units lower than that following light exercise ending at the same time.
  • Standard Errors: The values in parentheses are 'standard errors', which provide a measure of the statistical uncertainty around each estimated difference in HRV.
  • Statistical Significance (Bold Values): Values shown in bold indicate that the difference in nocturnal HRV between the two strain levels is statistically significant (unlikely to be due to chance, based on p < 0.005). For instance, the -2.63 RMSSD unit difference mentioned above is bolded, indicating significance.
  • Key Trend Example: The table generally shows statistically significant decreases in nocturnal HRV with each step up in exercise strain, especially when exercise occurs within 6-8 hours before or after habitual sleep onset. The magnitude of the decrease tends to be larger closer to bedtime. For example, moving from high to maximal strain exercise ending 2 hours after habitual sleep onset (+2 h) is associated with a further significant decrease of 6.56 RMSSD units in nocturnal HRV.
  • Data Source (GAMM Contrasts): These numerical contrasts are derived from the statistical analysis (Generalized Additive Mixed Models or GAMMs) used to generate the visual graph in Figure 1E.
Scientific Validity
  • Contrast Analysis Appropriateness: Presenting dose-response contrasts between consecutive strain levels is a valid method for analyzing the impact of increasing exercise intensity on nocturnal HRV, based on the GAMM results.
  • Objective and Relevant Outcome Measure: Nocturnal HRV (specifically RMSSD) is an objective, widely accepted measure reflecting cardiac autonomic modulation, particularly parasympathetic influence, making it a relevant outcome for assessing recovery status during sleep.
  • Statistical Significance Threshold: The use of a stringent p-value threshold (p < 0.005) is statistically appropriate for this large dataset.
  • Reporting of Uncertainty: Reporting standard errors allows for the assessment of the precision of the estimated HRV differences.
  • Multiple Comparisons Adjustment: The explicit mention of adjustment for multiple comparisons (multivariate t-distribution) indicates appropriate statistical control.
  • Consistency with Visual Data and Interpretation: The quantitative data strongly support the visual trends in Figure 1E, demonstrating a clear, statistically significant dose-dependent decrease in nocturnal HRV with higher exercise strain performed closer to sleep, indicative of impaired autonomic recovery.
Communication
  • Quantification of Visual Trends: The table effectively quantifies the stepwise decrease in nocturnal HRV associated with increasing exercise strain levels at different times relative to habitual sleep onset, providing numerical detail for Figure 1E.
  • Table Structure and Clarity: The structure is clear, with rows representing the specific contrast between adjacent strain levels and columns indicating the time points.
  • Indication of Statistical Significance: Bold text clearly marks statistically significant differences (p < 0.005), facilitating the identification of conditions where increasing strain significantly impacts HRV.
  • Caption Clarity: The caption accurately describes the table's content: the outcome (Nocturnal HRV), the analysis type (dose-response contrasts), the timing reference (habitual), and the factors (exercise timing and strain).
  • Footnote Informativeness: Footnotes provide crucial information about the units (RMSSD, although defining RMSSD here would be helpful for broader context), significance level, standard errors, statistical methodology (GAMM contrasts), and multiple comparison adjustments.

Discussion

Key Aspects

Strengths

Suggestions for Improvement

Methods

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Fig. 3 | CONSORT flow diagram illustrating the criteria for inclusion in the...
Full Caption

Fig. 3 | CONSORT flow diagram illustrating the criteria for inclusion in the primary analysis. Procedure stage is represented in blue.

Figure/Table Image (Page 7)
Fig. 3 | CONSORT flow diagram illustrating the criteria for inclusion in the primary analysis. Procedure stage is represented in blue.
First Reference in Text
The final dataset included 4,084,354 person-nights of data (Fig. 3 presents CONSORT flow diagram).
Description
  • Element Type and Purpose: This figure is a flow diagram, a standard way in research to show how the final group of participants or data included in a study was selected from a larger initial pool. It follows the structure often recommended by reporting guidelines like CONSORT (Consolidated Standards of Reporting Trials) or STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) to ensure transparency.
  • Initial Sample Size: The diagram starts with an initial sample of 20,000 participants, corresponding to 7,106,256 potential nights of data ('person-nights').
  • First Exclusion Stage: It then details several stages of data cleaning and participant exclusion. First, 1,261 participants (446,033 nights) were excluded primarily due to having fewer than 50 logged exercise days or missing/invalid data, resulting in 18,739 participants (6,660,223 nights).
  • Second Exclusion Stage (Data Points): Next, data corresponding to 1,645,114 nights were excluded because the exercise occurred outside the specified analysis window (10 hours before to 2 hours after habitual sleep onset) or was an invalid exercise type.
  • Third Exclusion Stage (Participants): A final exclusion step removed 4,050 participants (930,755 nights) because they had fewer than 50 exercise activities remaining within the analysis window, ensuring sufficient data per participant for the primary analysis.
  • Final Sample Size: The process culminates in the final sample used for the analysis: 14,689 participants, providing a total of 4,084,354 person-nights of data.
Scientific Validity
  • Transparency and Reproducibility: The inclusion of this flow diagram significantly enhances the study's transparency and reproducibility by clearly documenting the participant and data selection process.
  • Adherence to Reporting Guidelines: The diagram adheres to the principles of reporting guidelines (STROBE, despite being labelled CONSORT) by showing the flow of participants/data through different study phases and quantifying exclusions.
  • Clear Justification for Exclusions: The specific reasons provided for exclusions at each stage (e.g., < 50 exercise days, exercise outside analysis window, invalid type, < 50 activities within window) are clear and allow assessment of potential selection biases.
  • Comprehensive Attrition Data: Quantifying both the number of participants and the number of person-nights excluded provides a comprehensive picture of data attrition.
Communication
  • Clarity and Layout: The flow diagram provides a clear visual representation of participant and data attrition from the initial sample to the final analyzed dataset. The layout is logical and easy to follow.
  • Numerical Transparency: Explicitly stating the number of participants (n) and person-nights at each stage significantly enhances the transparency of the selection process.
  • Clarity of Exclusion Criteria: The reasons for exclusion at each step are clearly stated, allowing readers to understand the characteristics of the final sample relative to the initial pool.
  • Nomenclature Precision: While labeled a 'CONSORT' diagram (typically for randomized trials), its structure effectively serves the purpose outlined by STROBE guidelines (mentioned in Methods) for reporting participant flow in observational studies. This minor nomenclature inconsistency does not detract significantly from its clarity but could be corrected for precision (e.g., 'STROBE flow diagram' or simply 'Flow diagram').
Table 6 | Exercise strain categories with examples of exercises and...
Full Caption

Table 6 | Exercise strain categories with examples of exercises and quantifications

Figure/Table Image (Page 7)
Table 6 | Exercise strain categories with examples of exercises and quantifications
First Reference in Text
Nocturnal RHR and HRV were calculated using a weighted average over the sleep period (i.e., from sleep onset to sleep offset), giving more weight to periods of slow wave sleep, as slow wave sleep is typically characterized by parasympathetic dominance22.
Description
  • Purpose: Defining Exercise Strain Categories: This table defines the categories used to classify the intensity and duration load, or 'strain', of exercise sessions throughout the study. It divides exercise strain into four levels: Light, Moderate, High, and Maximal.
  • Strain Categories and SHRZS Ranges: Each category corresponds to a specific range of scores calculated using a method called SHRZS (Summated-Heart-Rate-Zones score). SHRZS quantifies exercise load by multiplying the time spent in different heart rate zones (expressed as a percentage of maximum heart rate, HRmax) by a weighting factor for each zone, and then summing these values. The ranges are: Light (<116), Moderate (116 to 214), High (214 to 461), and Maximal (>461).
  • Distribution of Exercises by Strain: The table shows the number ('No.') and percentage ('%') of all exercise sessions logged by participants that fell into each category. For example, Light strain accounted for 834,363 sessions (46.5% of the total), while Maximal strain accounted for 43,437 sessions (2.4%). This shows that lower strain activities were much more common in this dataset.
  • Example Exercises: For each strain category, the table provides examples of typical exercises, such as 'Brisk walk; slow jog' for Light strain, 'Gym class; 5-mile run' for Moderate strain, 'Hockey training; 10-mile run' for High strain, and 'Hockey match; half marathon' for Maximal strain.
  • Example Quantifications (Intensity & Duration): Crucially, it also gives an example of the intensity and duration combination that could result in that strain level. Intensity is shown as a percentage of the individual's maximum heart rate (HRmax – the highest rate their heart can beat, often estimated based on age). For example, Moderate strain could be achieved by exercising for 45 minutes where the heart rate is maintained at 85% of HRmax (which falls into heart rate zone 4). This highlights that strain depends on both how hard (intensity) and how long (duration) someone exercises.
  • Footnote Clarification: A footnote clarifies that the SHRZS score used for categorization in this study is calculated differently from the 'Strain Score' shown in the commercial WHOOP app.
Scientific Validity
  • Objective Quantification Method (SHRZS): Using the SHRZS method provides a standardized, objective way to quantify exercise load based on heart rate data, integrating both intensity (%HRmax) and duration. This is a valid approach for comparing diverse activities.
  • Clear Categorization Criteria: Defining clear numerical ranges for each strain category ensures consistent classification of exercise sessions across the large dataset.
  • Physiological Relevance: The categorization reflects a physiological basis, as higher SHRZS scores correspond to greater cardiovascular load and metabolic demand.
  • Transparency of Quantification: Providing specific examples of intensity (%HRmax) and duration combinations adds transparency to how strain levels were achieved and allows for comparison with other exercise physiology literature.
  • Acknowledged Limitations (Implicit): The acknowledgement that SHRZS might underestimate the strain of certain activities (like strength training, mentioned in the Discussion) is appropriate, though the method remains a reasonable approach for predominantly aerobic activities captured in large datasets.
  • Dependence on HR Data Accuracy: The method relies on accurate HRmax estimation (either formula-based or user-inputted) and continuous heart rate monitoring during exercise, the validity of which underpins the SHRZS calculation.
Communication
  • Clarity of Strain Categories: The table clearly defines the four exercise strain categories (Light, Moderate, High, Maximal) used throughout the study, linking them to specific ranges of the SHRZS score.
  • Contextual Data Distribution: Providing the number and percentage of total exercises falling into each category gives valuable context about the distribution of exercise intensities within the study sample.
  • Relatability through Exercise Examples: Including concrete examples of exercises (e.g., 'Brisk walk', 'Gym class', 'Hockey training', 'Hockey match') for each category makes the abstract strain levels more tangible and relatable.
  • Clarity of Quantification Examples: The 'Intensity and duration example' column provides specific, quantifiable examples (e.g., '30 min at 65% [zone 2] of HRmax') that illustrate how a given strain level could be achieved. This greatly aids understanding of the interplay between intensity and duration that defines strain.
  • Clarification Footnote: The footnote explaining that the SHRZS calculation used here differs from the consumer-facing WHOOP Strain Score is an important clarification for readers familiar with the device.
  • Reference Utility: The table effectively serves as a reference key for interpreting the strain levels presented in the figures and other tables.
↑ Back to Top