This study developed a predictive equation for Total Energy Expenditure (TEE) using the International Atomic Energy Agency Doubly Labeled Water Database, encompassing 6,497 measures from individuals aged 4 to 96. The equation, derived via general linear regression, showed strong predictive power, with 94.6% of independent TEE measurements falling within the 95% predictive limits. Application to the NDNS and NHANES datasets revealed significant misreporting, with over 50% of dietary reports falling outside the predicted TEE limits. Notably, under-reporting was strongly correlated with higher BMI and reported protein intake, while being negatively correlated with reported fat intake.
The study provides a robust predictive equation for TEE and offers valuable insights into the pervasive issue of misreporting in dietary surveys. The strong correlation between under-reporting and factors like BMI and reported macronutrient intake highlights the limitations of relying solely on self-reported dietary data. However, the study clearly distinguishes between correlation and causation, acknowledging that the observed relationships do not prove a causal link between dietary composition and misreporting.
The practical utility of the predictive equation is significant, as it offers a more accurate tool for identifying potential misreporting in large-scale dietary surveys. This can help improve the accuracy of nutritional epidemiology research and inform the development of more effective public health interventions. The findings are placed within the context of existing literature, building upon previous work on misreporting and offering a novel approach based on a large and diverse dataset.
While the study provides valuable guidance for researchers and practitioners, it also acknowledges key uncertainties. The reliance on self-reported data, even with the use of a predictive equation, remains a limitation. The study also highlights the need for caution when interpreting self-reported dietary data, particularly in relation to macronutrient composition and its association with BMI. The authors suggest that future improvements may involve integrating objective measures of physical activity, such as accelerometry.
Critical unanswered questions remain, particularly regarding the generalizability of the findings to populations not represented in the study sample. Additionally, while the study identifies patterns of misreporting, the underlying reasons for these patterns are not fully explored. Further research is needed to investigate the psychological and social factors that contribute to misreporting. The methodological limitations, such as the exclusion of children under 4 and the handling of 'other' and mixed-race ethnicities, do not fundamentally affect the main conclusions but highlight areas for future research. Overall, the study makes a significant contribution to the field of nutritional epidemiology by providing a valuable tool for identifying misreporting and highlighting the need for more accurate methods of dietary assessment.
The study leverages a large and diverse dataset from the International Atomic Energy Agency Doubly Labeled Water Database, enhancing the robustness and generalizability of the predictive equation.
The abstract clearly outlines the methodological approach, including the development of a regression equation and its application to large datasets, providing a concise overview of the study's design.
The study reveals a high level of misreporting in dietary studies and identifies systematic biases in reported macronutrient composition, highlighting critical issues in nutritional epidemiology.
This medium-impact improvement would enhance the reader's understanding of the predictive equation's inputs and their relevance. The Abstract section particularly needs this detail as it sets the stage for the study's methodology and findings. Elaborating on the specific variables used in the regression equation would strengthen the paper by providing a clearer picture of how TEE is predicted and how these variables were chosen. This would also help readers appreciate the novelty and robustness of the approach compared to previous methods. Ultimately, clarifying the variables used in the predictive equation would improve the study's scientific contribution by ensuring the methodology is transparent and easily understood.
Implementation: Specifically mention the key variables used in the regression equation, such as body weight, age, sex, and any other significant predictors. For example, "The resultant regression equation predicts expected TEE from easily acquired variables, such as body weight, age, sex, height, and ethnicity, with 95% predictive limits..."
This high-impact improvement would significantly enhance the reader's understanding of the study's broader implications. The Abstract section needs this context to effectively communicate the significance of the findings to the field of nutritional epidemiology. Briefly elaborating on the consequences of misreporting for research and public health would strengthen the paper by highlighting the importance of accurate dietary data and the potential impact of the new predictive equation. This would also underscore the study's contribution to improving the reliability of nutritional research. Ultimately, providing context on the implications of misreporting would significantly improve the study's impact by emphasizing the practical value of the findings.
Implementation: Include a sentence or phrase that briefly explains the implications of misreporting for nutritional epidemiology. For example, "This misreporting can lead to inaccurate conclusions about diet-disease relationships and hinder the development of effective public health interventions."
This medium-impact improvement would enhance the reader's understanding of the study's unique contribution to the field. The Abstract section needs this clarification to effectively position the research within the existing literature. Briefly explaining how the new predictive equation differs from and improves upon previous methods would strengthen the paper by highlighting its novelty and potential to advance the field. This would also help readers appreciate the significance of the study's findings in the context of existing research limitations. Ultimately, clarifying the novelty of the approach would improve the study's scientific contribution by clearly demonstrating its advancement over previous methods.
Implementation: Add a sentence or phrase that explicitly states how the new predictive equation differs from previous approaches, such as the Goldberg cut-off. For example, "Unlike previous methods that rely on basal metabolic rate estimations and arbitrary multipliers, this equation uses a data-driven approach to predict TEE and identify misreporting."
The Introduction effectively outlines the pervasive issue of misreporting in dietary studies, providing a clear rationale for the study's focus on developing a more accurate method for identifying such errors.
The section clearly explains previous methods, such as the Goldberg cut-off and its modifications, and highlights their limitations, setting the stage for the need for a new approach.
The Introduction provides a strong justification for the current study by highlighting the limitations of existing methods and the need for a more robust approach based on a large dataset of doubly labeled water measurements.
This medium-impact improvement would enhance the reader's understanding of the study's unique contribution to the field. The Introduction section needs this clarification to effectively position the research within the existing literature and highlight the innovative use of the extensive DLW database. Elaborating on the specific advantages and novel aspects of this database, such as its size, diversity, and the inclusion of various age groups and ethnicities, would strengthen the paper by emphasizing its potential to overcome limitations of previous studies. This would also help readers appreciate the significance of the study's findings in the context of existing research limitations and underscore the advancement this database represents in the field of nutritional epidemiology. Ultimately, expanding on the novelty of the DLW database would improve the study's scientific contribution by clearly demonstrating its advancement over previous methods and its potential to provide more accurate and generalizable insights into dietary misreporting.
Implementation: Include a paragraph that details the unique features of the DLW database, such as its size, diversity, and the range of ages and ethnicities included. For example, "This study leverages an unprecedentedly large and diverse database of doubly labeled water measurements, encompassing over 7,500 individuals aged 8 days to 96 years from various ethnic backgrounds. This extensive dataset allows for the development of more robust and generalizable prediction equations for TEE, addressing limitations of previous studies that relied on smaller, less diverse samples."
This high-impact improvement would significantly enhance the reader's understanding of the study's broader implications for public health. The Introduction section needs this context to effectively communicate the significance of accurate dietary assessment beyond the research setting. Briefly elaborating on how misreporting can lead to flawed public health policies, inaccurate dietary guidelines, and ineffective interventions would strengthen the paper by highlighting the real-world consequences of the problem. This would also underscore the study's contribution to improving the evidence base for public health nutrition and emphasize the practical value of the new predictive equation in addressing these issues. Ultimately, clarifying the implications of misreporting for public health would significantly improve the study's impact by emphasizing the importance of accurate dietary data for promoting population health and preventing chronic diseases.
Implementation: Add a few sentences that explain the potential consequences of misreporting for public health policy and interventions. For example, "Accurate dietary assessment is crucial not only for research but also for informing public health policies, developing dietary guidelines, and designing effective interventions to prevent chronic diseases. Misreporting can lead to erroneous conclusions about diet-disease relationships, resulting in misguided policies and interventions that fail to address the true nutritional needs of the population."
This medium-impact improvement would enhance the reader's understanding of the study's scope and generalizability. The Introduction section needs this context to effectively frame the research within the broader population and highlight any potential limitations. Briefly describing the characteristics of the study population, such as age range, sex distribution, and ethnicity, would strengthen the paper by providing a clearer picture of the individuals included in the database and the potential applicability of the findings. This would also help readers assess the representativeness of the sample and identify any potential biases or limitations in generalizing the results to other populations. Ultimately, providing more context on the study population would improve the study's scientific contribution by ensuring transparency and allowing for a more nuanced interpretation of the findings.
Implementation: Include a brief description of the study population, mentioning the age range, sex distribution, and ethnicity of the individuals included in the DLW database. For example, "The database includes measurements from over 7,500 individuals aged 8 days to 96 years, with a diverse representation of ethnicities, including White, African, Asian, and Hispanic populations. The sample includes both males and females, providing a comprehensive dataset for developing prediction equations across different demographic groups."
The Results section clearly presents the derived predictive equation for TEE, including all significant terms and their coefficients, which enhances the transparency and reproducibility of the study.
The researchers conducted a thorough validation of the predictive equation using an independent dataset, demonstrating its robustness and confirming that a high percentage of measurements fell within the predicted limits.
The application of the predictive equation to two large dietary surveys (NDNS and NHANES) provides a detailed and insightful analysis of misreporting, including stratification by age, sex, and BMI.
This medium-impact improvement would provide a more comprehensive understanding of the methodological choices made in the study. The Results section needs this detail to fully justify the selection of the classical general linear regression model over the machine learning alternatives. Elaborating on the specific reasons why the machine learning models (Random Forest, XGBoost, and Support Vector Regression) did not outperform the classical regression would strengthen the paper by providing a clearer rationale for the chosen approach. This would also help readers appreciate the nuances of model selection in the context of predicting TEE and understand the limitations of each method. Ultimately, expanding on the machine learning model comparisons would improve the study's methodological rigor by ensuring transparency in the model selection process and providing a more complete picture of the analytical approach.
Implementation: Include a paragraph that details the performance metrics of each machine learning model compared to the classical regression, such as R-squared values, mean absolute error, and any other relevant statistics. For example, "While the machine learning models showed comparable performance to the classical regression, with R-squared values of X for Random Forest, Y for XGBoost, and Z for Support Vector Regression, they did not offer significant improvements in predictive accuracy. This is likely due to the linear relationships between the predictors and TEE, which are adequately captured by the classical regression model."
This medium-impact improvement would enhance the reader's understanding of the statistical methods employed in the study. The Results section needs this clarification to justify the choice of using 95% predictive intervals (PI) over other potential metrics for assessing misreporting. Providing a more detailed explanation of why 95% PI were selected and how they offer advantages over previous methods, such as the Goldberg cut-off, would strengthen the paper by providing a stronger statistical foundation for the analysis. This would also help readers appreciate the novelty and robustness of the approach in identifying misreporting. Ultimately, clarifying the rationale for using 95% PI would improve the study's methodological rigor by ensuring transparency in the statistical methods and providing a clearer justification for their use.
Implementation: Add a few sentences that explain the advantages of using 95% PI, such as their ability to account for individual variability and provide a more accurate assessment of misreporting compared to fixed cut-off values. For example, "The 95% PI were chosen over traditional cut-off methods because they provide a statistically sound range that accounts for individual variability in TEE. Unlike fixed cut-offs, which can lead to misclassification, the 95% PI offer a more nuanced and accurate approach to identifying potential misreporting."
This high-impact improvement would significantly enhance the reader's understanding of the study's broader implications for different demographic groups. The Results section needs this context to effectively communicate the significance of the findings beyond the overall levels of misreporting. Briefly elaborating on how the observed patterns of misreporting vary across specific populations, such as children, adults, and individuals with different BMIs, and the potential consequences for nutritional research and public health in these groups would strengthen the paper by highlighting the practical relevance of the findings. This would also underscore the study's contribution to improving the accuracy of dietary assessment in diverse populations. Ultimately, providing more context on the implications of the findings for specific populations would significantly improve the study's impact by emphasizing the importance of tailored approaches to addressing misreporting and promoting accurate dietary data collection in different demographic groups.
Implementation: Include a few sentences or a paragraph that discusses the implications of the findings for specific populations, such as the higher prevalence of under-reporting among individuals with higher BMI and the potential impact on obesity research. For example, "The finding that under-reporting is more prevalent among individuals with higher BMI has important implications for studies investigating the relationship between diet and obesity. This highlights the need for targeted strategies to address misreporting in this population and improve the accuracy of dietary data in obesity research."
Table 1 | Significant terms in the general linear model analysis (10 decimal places) predicting TEE
Table 2 | Summary of observations inside and outside the tolerance limits in the NDNS and NHANES datasets
Fig. 1 | Misreporting in relation to age, BMI and sex. a, Comparison of the difference between predicted TEE and self-reported energy intake (EI) in the NDNS (n = 12,694) and NHANES (n = 5,873) datasets in relation to age for children (≤16 yr) and adults (>16 yr). b, Comparison of the difference between predicted TEE and self-reported energy intake in the same datasets in relation to BMI for children (≤16 yr) and adults (>16 yr). Negative values show observations lower than prediction and positive values show prediction higher than observation.
Table 3 | Relationships between the discrepancy of intake to expenditure and self-reported dietary macronutrient composition
Fig. 2 | Misreporting and macronutrient intake. a-c, The discrepancy between the predicted TEE and the reported energy intake in the NHANES and NDNS surveys plotted against the self-reported intakes of fat (a), protein (b) and carbohydrates (c) as a percentage of the total energy. For each macronutrient, the top two plots show data from the whole sample (full data) and the bottom two plots show the data from the sample screened to include only those individuals within the predictive interval of the equation (screened). Significant effects in the whole sample were severely attenuated in the screened sample (see Table 3 for regression details).