Predictive equation derived from 6,497 doubly labelled water measurements enables the detection of erroneous self-reported energy intake

Table of Contents

Overall Summary

Study Background and Main Findings

This study developed a predictive equation for Total Energy Expenditure (TEE) using the International Atomic Energy Agency Doubly Labeled Water Database, encompassing 6,497 measures from individuals aged 4 to 96. The equation, derived via general linear regression, showed strong predictive power, with 94.6% of independent TEE measurements falling within the 95% predictive limits. Application to the NDNS and NHANES datasets revealed significant misreporting, with over 50% of dietary reports falling outside the predicted TEE limits. Notably, under-reporting was strongly correlated with higher BMI and reported protein intake, while being negatively correlated with reported fat intake.

Research Impact and Future Directions

The study provides a robust predictive equation for TEE and offers valuable insights into the pervasive issue of misreporting in dietary surveys. The strong correlation between under-reporting and factors like BMI and reported macronutrient intake highlights the limitations of relying solely on self-reported dietary data. However, the study clearly distinguishes between correlation and causation, acknowledging that the observed relationships do not prove a causal link between dietary composition and misreporting.

The practical utility of the predictive equation is significant, as it offers a more accurate tool for identifying potential misreporting in large-scale dietary surveys. This can help improve the accuracy of nutritional epidemiology research and inform the development of more effective public health interventions. The findings are placed within the context of existing literature, building upon previous work on misreporting and offering a novel approach based on a large and diverse dataset.

While the study provides valuable guidance for researchers and practitioners, it also acknowledges key uncertainties. The reliance on self-reported data, even with the use of a predictive equation, remains a limitation. The study also highlights the need for caution when interpreting self-reported dietary data, particularly in relation to macronutrient composition and its association with BMI. The authors suggest that future improvements may involve integrating objective measures of physical activity, such as accelerometry.

Critical unanswered questions remain, particularly regarding the generalizability of the findings to populations not represented in the study sample. Additionally, while the study identifies patterns of misreporting, the underlying reasons for these patterns are not fully explored. Further research is needed to investigate the psychological and social factors that contribute to misreporting. The methodological limitations, such as the exclusion of children under 4 and the handling of 'other' and mixed-race ethnicities, do not fundamentally affect the main conclusions but highlight areas for future research. Overall, the study makes a significant contribution to the field of nutritional epidemiology by providing a valuable tool for identifying misreporting and highlighting the need for more accurate methods of dietary assessment.

Critical Analysis and Recommendations

Comprehensive Data Utilization (written-content)
The study leverages a large and diverse dataset from the International Atomic Energy Agency Doubly Labeled Water Database, enhancing the robustness and generalizability of the predictive equation. This is crucial because it allows for a more accurate representation of the population and increases the applicability of the findings.
Section: Abstract
Detailed Analysis of Survey Data (written-content)
The application of the predictive equation to two large dietary surveys (NDNS and NHANES) provides a detailed and insightful analysis of misreporting, including stratification by age, sex, and BMI. This is important because it allows for a nuanced understanding of how misreporting varies across different demographic groups and highlights the need for targeted interventions.
Section: Results
Comprehensive Discussion of Findings (written-content)
The Discussion section provides a thorough and comprehensive discussion of the study's findings, effectively placing them within the context of existing literature and addressing potential implications. This is important because it helps to establish the study's contribution to the field and highlights the significance of the findings for future research and practice.
Section: Discussion
Provide Context on Misreporting Implications (written-content)
The Abstract lacks sufficient context on the implications of misreporting for research and public health. Adding this context would significantly improve the study's impact by emphasizing the practical value of the findings and the importance of accurate dietary data.
Section: Abstract
Expand on Machine Learning Model Comparisons (written-content)
The Results section lacks sufficient detail on the comparison between classical regression and machine learning models. Providing a more comprehensive comparison would improve the study's methodological rigor by ensuring transparency in the model selection process and providing a more complete picture of the analytical approach.
Section: Results
Expand on the Implications of Findings for Dietary Guidelines (written-content)
The Discussion section lacks a thorough exploration of the implications of the findings for dietary guidelines. Adding this discussion would significantly improve the study's impact by emphasizing the need for evidence-based recommendations that account for the limitations of self-reported data.
Section: Discussion
Lack of Legend in Figure 1 (graphical-figure)
Figure 1 lacks a legend to differentiate between datasets and age groups. Adding a legend would significantly improve the clarity and interpretability of the graphs, making it easier for readers to understand the presented data.
Section: Results
Lack of Trend Line Description in Figure 2 (graphical-figure)
Figure 2's caption does not mention that the red lines are trend lines, nor does it specify the method used to generate them. Including this information would improve the clarity and scientific validity of the figure, as the type of trend line used can impact the visual interpretation of the data.
Section: Results
Clarity of Axes Labels in Figure 3 (graphical-figure)
While the axes labels in Figure 3 are generally clear, the x-axis label could be improved by specifying that it represents the 'Percentage of total energy from macronutrient'. This would enhance consistency with previous tables and improve the overall clarity of the figure.
Section: Discussion

Section Analysis

Abstract

Key Aspects

Strengths

Suggestions for Improvement

Introduction

Key Aspects

Strengths

Suggestions for Improvement

Results

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Table 1 | Significant terms in the general linear model analysis (10 decimal...
Full Caption

Table 1 | Significant terms in the general linear model analysis (10 decimal places) predicting TEE

Figure/Table Image (Page 3)
Table 1 | Significant terms in the general linear model analysis (10 decimal places) predicting TEE
First Reference in Text
The derived significant predictors and their regression coeffi-cients are reported in Table 1.
Description
  • Purpose of the table: This table shows the results of a statistical analysis called 'general linear model analysis'. You can think of this like drawing a line through a cloud of points on a graph to see if there's a relationship between different things. Here, they're looking at what factors might be related to something called 'Total Energy Expenditure' (TEE), which is the amount of energy (like calories) a person uses in a day.
  • Content of the table: The table lists different factors that the analysis found to be important in predicting TEE. For each factor, it gives a 'coefficient,' which is like the slope of the line in our graph analogy – it tells us how much TEE is expected to change if that factor changes. It also shows the 'standard error' of the coefficient, which is a measure of how precise that estimate is. Additionally, it includes a 'T-value,' which helps determine if the factor is statistically significant, and a 'P-value,' which is the probability of seeing the observed relationship (or a stronger one) if there was actually no real relationship between the factor and TEE. A low P-value (typically less than 0.05) suggests that the relationship is statistically significant, meaning it's unlikely to have occurred by chance.
  • Specific factors listed: The factors listed in the table are things like body weight, height, age, sex, elevation, and ethnicity. 'In[BW (kg)]' refers to the natural logarithm of body weight in kilograms. The natural logarithm is a mathematical function that helps to transform the data in a way that makes it more suitable for this type of analysis. Similarly, 'In[Elevation (m)]' is the natural logarithm of the elevation where the measurement was taken, in meters. 'Sex' is a categorical variable, likely coded as male or female. 'Ethnicity' is also a categorical variable, with categories like 'A' for African, 'AA' for African living outside Africa, 'AS' for Asian, 'W' for White, 'H' for Hispanic, and 'NA' for not available.
  • Precision of the coefficients: The coefficients in the table are reported to 10 decimal places. This level of precision is unusual in many scientific fields, but the authors state later in the paper that they did this for reproducibility, so that others can get the exact same results if they use their model.
Scientific Validity
  • Model Specification: The authors have included a comprehensive set of predictors in their model, including demographic, anthropometric, and environmental variables. The inclusion of interaction terms (e.g., Height x In[Elevation (m)]) is appropriate and suggests a thorough exploration of potential relationships. However, the rationale for including specific interaction terms could be more explicitly stated.
  • Statistical Significance: The reported T-values and P-values allow for a clear assessment of the statistical significance of each predictor. The use of a general linear model is appropriate given the nature of the dependent variable (TEE) and the mix of continuous and categorical predictors.
  • Precision of Coefficients: While reporting coefficients to 10 decimal places is unconventional, the authors justify this decision based on the need for precise replication of their predictive model. This level of precision, although unusual, does not inherently invalidate the scientific rigor of the analysis. It is crucial, however, that the authors provide a sensitivity analysis demonstrating the impact of this precision on the model's predictions.
Communication
  • Clarity of Column Headers: The column headers are generally clear and provide sufficient information to understand the table's contents. However, 'SE coefficient' could be more explicitly labeled as 'Standard Error of Coefficient' for better clarity.
  • Footnotes: The footnote defining the ethnicity abbreviations is helpful. However, providing a brief explanation of the coding for 'Sex' within the table or in a footnote would further enhance clarity.
  • Caption Clarity: The caption is concise but could be slightly more descriptive. For example, it could be revised to: 'Table 1 | Significant terms and their coefficients from the general linear model analysis predicting Total Energy Expenditure (TEE), presented with 10 decimal places for reproducibility.'
Table 2 | Summary of observations inside and outside the tolerance limits in...
Full Caption

Table 2 | Summary of observations inside and outside the tolerance limits in the NDNS and NHANES datasets

Figure/Table Image (Page 4)
Table 2 | Summary of observations inside and outside the tolerance limits in the NDNS and NHANES datasets
First Reference in Text
Using the predictive equations developed above, the number and percentage of individuals that fell outside the predicted limits (both over and under) and within the predicted limits are shown in Table 2, stratified by data source, age (adults versus children) and sex.
Description
  • Purpose of the table: This table summarizes how well the predictions from their equation match up with real-world data from two large dietary surveys, the NDNS (National Diet and Nutrition Survey) and NHANES (National Health and Nutrition Examination Survey). It's like checking if a weatherman's forecast (the equation's prediction) accurately reflects the actual weather (the survey data).
  • Tolerance Limits: The 'tolerance limits' refer to the range of values within which the researchers expect the real-world data to fall, based on their equation. It's like setting a margin of error around the prediction. If the weatherman predicts a temperature of 20 degrees Celsius, he might say the actual temperature will likely be between 18 and 22 degrees. That range is the tolerance limit.
  • Organization of the table: The table is organized by dividing the survey participants into groups based on which survey they were part of (NDNS or NHANES), whether they were adults or children, and their sex (male or female). For each group, the table shows how many people had reported dietary intakes that fell within the predicted range (inside the tolerance limits), below the predicted range (underestimated), and above the predicted range (overestimated). These counts are also shown as percentages of the total number of people in each group.
  • Interpretation of the data: If the equation is a good predictor, most people's reported intakes should fall within the tolerance limits. If a large percentage of people fall outside the limits, it suggests that the equation may not be accurately reflecting real-world dietary intake, or that there's a lot of misreporting in the surveys. For example a high percentage of 'underestimated' would suggest that many people are reporting eating less than what the equation predicts they should need based on factors like their weight, height, and age.
Scientific Validity
  • Appropriateness of Tolerance Limits: The scientific validity of this table hinges on the appropriateness of the tolerance limits used. The authors have previously described how these limits were derived (95% prediction intervals), which is a statistically sound approach. However, the validity of applying these limits to assess misreporting relies on the assumption that deviations from the predicted TEE primarily reflect misreporting rather than individual variability or other factors not captured by the model.
  • Stratification by Data Source, Age, and Sex: Stratifying the results by data source, age, and sex is crucial for identifying potential biases or differences in the performance of the predictive equation across different populations. This allows for a more nuanced interpretation of the results and helps to pinpoint specific groups where misreporting may be more prevalent. The choice of these stratification variables is justified given their known associations with dietary intake and reporting behaviors.
  • Use of Number and Percentage: Presenting both the number and percentage of individuals within each category is helpful for interpretation. Percentages provide a standardized way to compare across groups of different sizes, while raw numbers give a sense of the actual sample sizes involved.
Communication
  • Clarity of Column Headers: The column headers are relatively clear but could be improved. 'Number underestimated' and 'Number overestimated' could be more explicitly defined as 'Number below the lower tolerance limit' and 'Number above the upper tolerance limit,' respectively. Similarly, 'Number within range' could be clarified as 'Number within tolerance limits.'
  • Caption Clarity: The caption is generally clear and informative. It could be slightly improved by explicitly stating that the tolerance limits are based on the 95% prediction intervals of the predictive equation.
  • Footnote: The footnote is helpful in explaining what the table shows. However, it could be made more informative by briefly mentioning the years the datasets cover, which is relevant context for interpreting the results.
  • Overall Readability: The table is well-organized and relatively easy to read. The use of bold font for the main categories (e.g., Male children, Female children) enhances readability.
Fig. 1 | Misreporting in relation to age, BMI and sex. a, Comparison of the...
Full Caption

Fig. 1 | Misreporting in relation to age, BMI and sex. a, Comparison of the difference between predicted TEE and self-reported energy intake (EI) in the NDNS (n = 12,694) and NHANES (n = 5,873) datasets in relation to age for children (≤16 yr) and adults (>16 yr). b, Comparison of the difference between predicted TEE and self-reported energy intake in the same datasets in relation to BMI for children (≤16 yr) and adults (>16 yr). Negative values show observations lower than prediction and positive values show prediction higher than observation.

Figure/Table Image (Page 5)
Fig. 1 | Misreporting in relation to age, BMI and sex. a, Comparison of the difference between predicted TEE and self-reported energy intake (EI) in the NDNS (n = 12,694) and NHANES (n = 5,873) datasets in relation to age for children (≤16 yr) and adults (>16 yr). b, Comparison of the difference between predicted TEE and self-reported energy intake in the same datasets in relation to BMI for children (≤16 yr) and adults (>16 yr). Negative values show observations lower than prediction and positive values show prediction higher than observation.
First Reference in Text
We plotted the difference between the survey estimate of daily energy intake and the predicted TEE as a function of age and body mass index (BMI) for both the NDNS and NHANES datasets (Fig. 1).
Description
  • Overall Purpose: This figure is trying to show how well people's self-reported food intake matches up with what a scientific equation predicts they should be eating. The equation predicts Total Energy Expenditure (TEE), which is the number of calories a person burns in a day. The researchers are comparing this prediction to self-reported energy intake (EI), which is what people say they eat in dietary surveys. The difference between these two (predicted TEE - self-reported EI) is an indication of potential 'misreporting' - either under-reporting (eating less than they say) or over-reporting (eating more than they say).
  • Structure of the Figure: The figure is divided into two parts, labeled 'a' and 'b'. Each part contains four graphs. Part 'a' looks at the relationship between misreporting and age, while part 'b' looks at the relationship between misreporting and Body Mass Index (BMI), which is a measure of body fat based on height and weight. Each graph is a scatter plot, with each dot representing a person in the study. The graphs are further split by the dataset used (NDNS or NHANES) and whether the participants were children or adults.
  • X and Y Axes: In part 'a', the x-axis (horizontal) represents age in years, while in part 'b', it represents BMI. In both parts, the y-axis (vertical) represents the difference between predicted TEE and self-reported EI. A value of 0 on the y-axis means that the predicted TEE and self-reported EI are the same. Negative values mean that people are reporting eating less than the equation predicts (under-reporting), while positive values mean they are reporting eating more (over-reporting).
  • Interpretation of the Data Points: Each dot on the scatter plots shows an individual's data. For example, in part 'a', a dot on the NDNS-Adults graph with an x-axis value of 40 and a y-axis value of -5 would represent a 40-year-old adult in the NDNS study who reported eating 5 megajoules (a unit of energy) less per day than the equation predicted. The red line on each graph is a trend line, which is like drawing a line through the middle of the dots to see the general pattern. If the trend line slopes downwards, it means that as age or BMI increases, the difference between predicted TEE and self-reported EI tends to become more negative (more under-reporting).
Scientific Validity
  • Appropriateness of Visualization: Using scatter plots with trend lines is an appropriate way to visualize the relationship between continuous variables like age, BMI, and the difference between predicted and reported energy intake. This allows for a visual assessment of the magnitude and direction of misreporting across different age and BMI groups.
  • Statistical Analysis: The reference text indicates that the authors plotted the difference between predicted and reported energy intake, but it doesn't specify the method used to generate the trend lines. It's crucial to know whether these are simple linear regressions or if a more sophisticated smoothing technique was employed. The choice of method can influence the interpretation of the trends.
  • Sample Size: The large sample sizes from the NDNS and NHANES datasets provide robust data for this analysis. However, it's important to consider potential biases or limitations inherent in these datasets, such as the reliance on self-reported dietary intake.
Communication
  • Clarity of Axes Labels: The axes labels are generally clear and informative. The y-axis label could be slightly improved by specifying the units (MJ d-1) for the difference between predicted TEE and self-reported EI.
  • Legend: The figure lacks a legend to differentiate between the NDNS and NHANES datasets and between children and adults. Adding a legend would significantly improve the clarity and interpretability of the graphs.
  • Caption: The caption is relatively clear but could be more concise. It could also benefit from explicitly stating that the red lines represent trend lines.
  • Visual Clutter: The use of different colors and symbols for each dataset and age group, combined with the trend lines, creates some visual clutter. Using a more minimalist color scheme or separating the graphs for children and adults could improve readability.
  • Trend Line Description: The caption should mention that the red lines are trend lines, and the method used to generate them should be specified in the methods section.
Table 3 | Relationships between the discrepancy of intake to expenditure and...
Full Caption

Table 3 | Relationships between the discrepancy of intake to expenditure and self-reported dietary macronutrient composition

Figure/Table Image (Page 5)
Table 3 | Relationships between the discrepancy of intake to expenditure and self-reported dietary macronutrient composition
First Reference in Text
Next, we explored the relationship between the discrepancy in energy intake and the proportional macronutrient composition (percentage energy) of the reported diet (Table 3).
Description
  • Purpose of the Table: This table explores whether the tendency for people to over- or under-report their food intake is related to the types of food they eat. Specifically, it looks at whether the difference between what people report eating and what an equation predicts they need (the 'discrepancy') is linked to the proportion of their diet that comes from carbohydrates, protein, and fat. These three are called 'macronutrients' and are the main components of food that provide energy.
  • Structure of the Table: The table is divided into four sections, each representing a different set of data: the full NDNS dataset, the screened NDNS dataset, the full NHANES dataset, and the screened NHANES dataset. 'Screened' here likely refers to removing data points that were considered unreliable based on some criteria, like falling outside the tolerance limits mentioned in previous tables. Each section shows the results of a statistical analysis called 'multiple regression analysis'. This is a method used to examine the relationship between a dependent variable (in this case, the discrepancy between reported and predicted energy intake) and several independent variables (the percentage of energy from carbohydrates, protein, and fat).
  • Key Terms Explained: 'Coefficient' in this context refers to the estimated change in the discrepancy (in kilojoules per day) for a one-unit change in the percentage of energy from each macronutrient. For example, a coefficient of -207.3 for 'Percentage protein' in the full NDNS dataset means that for every 1% increase in the proportion of protein in the diet, the discrepancy between reported and predicted intake is estimated to decrease by 207.3 kJ/day (meaning more under-reporting). 'SE coefficient' stands for the standard error of the coefficient, which is a measure of the precision of the estimate. 'P-value' is the probability of observing the relationship (or a stronger one) if there was actually no real relationship between the macronutrient and the discrepancy. A low P-value (typically less than 0.05) suggests that the relationship is statistically significant. 'R²' is a measure of how well the model fits the data, with higher values indicating a better fit.
Scientific Validity
  • Appropriateness of Statistical Method: Multiple regression analysis is an appropriate method for examining the relationship between the discrepancy in energy intake and the proportional macronutrient composition of the diet. The use of both full and screened datasets allows for an assessment of the robustness of the findings.
  • Interpretation of Coefficients: The interpretation of the coefficients is crucial. The authors should provide a more detailed discussion of the implications of the positive and negative coefficients for different macronutrients. For instance, they should discuss potential reasons why a higher reported protein intake is associated with a greater discrepancy (more under-reporting).
  • Consideration of Confounding Factors: While the table presents the results of multiple regression, which adjusts for the other included macronutrients, there may be other confounding factors that influence both macronutrient composition and the discrepancy in energy intake. These potential confounders should be acknowledged and discussed.
  • Comparison of Full and Screened Datasets: Comparing the results from the full and screened datasets is valuable for assessing the impact of removing potentially unreliable data points. The authors should provide a more detailed comparison of these results and discuss any notable differences.
Communication
  • Clarity of Column Headers: The column headers are generally clear and informative. However, 'Term' could be more explicitly labeled as 'Predictor Variable' or 'Macronutrient.'
  • Caption Clarity: The caption is concise but could be more descriptive. It could be revised to: 'Table 3 | Results of multiple regression analyses examining the relationships between the discrepancy of reported energy intake to predicted expenditure and the self-reported dietary macronutrient composition (percentage of total energy) in the NDNS and NHANES datasets, using both full and screened data.'
  • Table Organization: The organization of the table into four sections is logical and facilitates comparison across datasets and data treatments (full vs. screened). However, the table is quite dense, and the use of bold font or shading could help to visually separate the different sections.
  • Explanation of Screening: The table would benefit from a brief explanation of the screening criteria used to define the 'screened' datasets. This information could be included in a footnote or in the methods section.
  • Units: It would be helpful to include the units for the coefficients (kJ/day per 1% change in macronutrient) in the column header or a footnote.
Fig. 2 | Misreporting and macronutrient intake. a-c, The discrepancy between...
Full Caption

Fig. 2 | Misreporting and macronutrient intake. a-c, The discrepancy between the predicted TEE and the reported energy intake in the NHANES and NDNS surveys plotted against the self-reported intakes of fat (a), protein (b) and carbohydrates (c) as a percentage of the total energy. For each macronutrient, the top two plots show data from the whole sample (full data) and the bottom two plots show the data from the sample screened to include only those individuals within the predictive interval of the equation (screened). Significant effects in the whole sample were severely attenuated in the screened sample (see Table 3 for regression details).

Figure/Table Image (Page 6)
Fig. 2 | Misreporting and macronutrient intake. a-c, The discrepancy between the predicted TEE and the reported energy intake in the NHANES and NDNS surveys plotted against the self-reported intakes of fat (a), protein (b) and carbohydrates (c) as a percentage of the total energy. For each macronutrient, the top two plots show data from the whole sample (full data) and the bottom two plots show the data from the sample screened to include only those individuals within the predictive interval of the equation (screened). Significant effects in the whole sample were severely attenuated in the screened sample (see Table 3 for regression details).
First Reference in Text
As the level of protein in the diet increased, the discrepancy became more negative. For each 1.0% increase in reported protein energy, the differ- ence between reported energy intake and actual intake decreased by around 200 kJ d¯¹ in both NDNS and NHANES (Table 3). Note that as most data fall below the line of equality, this negative relationship means that as the self-reported percentage of protein in the diet increased, the discrepancy between the self-reported total energy intake and the predicted total energy expenditure got larger (Fig. 2).
Description
  • Purpose of the Figure: This figure investigates the relationship between how much people misreport their food intake and the proportion of fat, protein, and carbohydrates in their diet. It's like asking: are people who say they eat a high-protein diet more or less likely to underreport their total calorie intake compared to people who say they eat a high-fat or high-carbohydrate diet?
  • Structure of the Figure: The figure is divided into three sections (a, b, and c), each representing a different macronutrient: fat, protein, and carbohydrates, respectively. Each section contains four graphs. The top two graphs in each section show data from the entire sample of two studies (NHANES and NDNS), while the bottom two graphs show data only from a 'screened' sample. This 'screened' sample includes only individuals whose reported energy intake was close to what the researchers' equation predicted they should be eating. This is like taking out the data from people who might have very inaccurate reporting to see if the patterns hold up when looking at more reliable data.
  • X and Y Axes: On the x-axis (horizontal) of each graph is the percentage of total energy that comes from the specific macronutrient (fat, protein, or carbohydrates) according to what people reported eating. The y-axis (vertical) shows the difference between the predicted Total Energy Expenditure (TEE, the number of calories a person burns in a day) and the reported energy intake. A negative value on the y-axis means people are reporting eating less than the equation predicts (under-reporting), while a positive value means they are reporting eating more (over-reporting).
  • Interpretation of the Graphs: Each dot on the graphs represents a person in the study. The red line is a trend line, which is like drawing a line through the middle of the dots to see the general pattern. If the trend line slopes downwards, it suggests that as the percentage of a particular macronutrient in the diet increases, people tend to under-report their intake more. The caption mentions that the effects seen in the whole sample were 'severely attenuated' in the screened sample. This means that the relationship between macronutrient intake and misreporting became weaker or disappeared when looking only at the more reliable data. This suggests that the relationship observed in the whole sample might be driven by inaccurate reporting.
Scientific Validity
  • Rationale for Screening: The rationale for screening the data to include only individuals within the predictive interval of the equation is sound. This helps to reduce the influence of extreme misreporting and provides a more accurate picture of the relationship between macronutrient intake and the discrepancy between predicted and reported energy intake in individuals with more plausible reports.
  • Comparison of Full and Screened Samples: Comparing the results from the full and screened samples is crucial for assessing the robustness of the findings. The observation that significant effects in the whole sample are attenuated in the screened sample highlights the importance of addressing misreporting in dietary studies.
  • Statistical Analysis: The caption refers to Table 3 for regression details, indicating that the relationships depicted in the figure are based on statistical modeling. The validity of the conclusions drawn from the figure depends on the appropriateness and rigor of the statistical analysis performed in Table 3.
  • Causality: It is important to note that the figure and the associated analyses demonstrate correlations but do not prove causation. While the results suggest that macronutrient composition may be related to misreporting, it is not possible to determine from these data whether dietary composition influences misreporting or vice-versa. Other factors may contribute to both.
Communication
  • Clarity of Axes Labels: The axes labels are generally clear and informative. The y-axis label could be slightly improved by specifying the units (MJ d-1) for the discrepancy between predicted TEE and reported energy intake.
  • Legend: The figure lacks a legend to differentiate between the NHANES and NDNS datasets. Adding a legend would improve the clarity of the graphs.
  • Caption Clarity: The caption is relatively clear and provides a good overview of the figure's content. It could be improved by briefly explaining the rationale for screening the data.
  • Visual Clutter: The use of multiple graphs with different datasets and data treatments (full vs. screened) creates some visual clutter. Using a more minimalist color scheme or further separating the graphs could improve readability.
  • Trend Line Description: The caption should mention that the red lines are trend lines, and the method used to generate them should be specified in the methods section. The specific type of trendline used (e.g. linear, loess) could impact the visual interpretation of the data.

Discussion

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Table 4 | Relationships between macronutrient intake and BMI in both datasets
Figure/Table Image (Page 8)
Table 4 | Relationships between macronutrient intake and BMI in both datasets
First Reference in Text
The gradient and R² values of the relationship between BMI and protein were both strongly reduced (Fig. 3 and Table 4), while the negative gradient for the relationship between BMI and carbohy- drates became more negative and the R² value approximately doubled.
Description
  • Purpose of the Table: This table examines how the amount of fat, protein, and carbohydrates people eat relates to their Body Mass Index (BMI), which is a measure of body weight relative to height. It's like trying to see if there's a connection between diet composition and body weight.
  • Structure of the Table: The table is divided into two main sections: 'Whole data' and 'Within 95% PI'. 'Whole data' refers to the analysis using all the data collected in the NHANES and NDNS surveys. 'Within 95% PI' refers to the analysis using only the data from individuals whose reported energy intake fell within the 95% predictive interval of the researchers' equation, meaning their reported intake was close to what the equation predicted they should be eating. This is a way of focusing on more reliable data. Each section shows the results of a statistical analysis, likely a regression analysis, that examines the relationship between BMI and the percentage of each macronutrient in the diet.
  • Key Terms Explained: 'Gradient' here likely refers to the slope of the line in a regression analysis, which shows how much BMI is expected to change for a one-unit change in the percentage of each macronutrient. A positive gradient means that as the percentage of the macronutrient increases, BMI tends to increase as well. A negative gradient means the opposite. 'R²' (R-squared) is a statistical measure that represents the proportion of the variation in BMI that is explained by the macronutrient intake. It's a measure of how well the statistical model fits the data, with values closer to 1 indicating a better fit. 'P' is the p-value, which is the probability of observing the relationship between BMI and macronutrient intake (or a stronger one) if there was actually no real relationship. A low p-value (typically less than 0.05) suggests that the relationship is statistically significant.
  • Macronutrient Focus: The table focuses on three macronutrients: fat, carbohydrates, and protein. These are the main components of food that provide energy. The table shows how the relationship between each of these macronutrients and BMI changes when looking at all the data versus only the more reliable data (within 95% PI).
Scientific Validity
  • Appropriateness of Statistical Analysis: The use of regression analysis (presumably linear regression, although this is not explicitly stated in the table) is appropriate for examining the relationship between macronutrient intake and BMI. The inclusion of both the whole dataset and the data within the 95% PI allows for a comparison of the results with and without potentially unreliable data points.
  • Interpretation of Gradient and R²: The reference text highlights the changes in gradient and R² values for protein and carbohydrates between the whole and screened datasets. This comparison is crucial for understanding the impact of misreporting on the observed relationships. However, the authors should provide a more comprehensive discussion of the results for all three macronutrients in both datasets, including the direction and magnitude of the gradients and the goodness-of-fit (R²) values.
  • Consideration of Confounding Factors: While the table presents the results of a statistical analysis that likely adjusts for other variables, there may be other confounding factors that influence both macronutrient intake and BMI. These potential confounders should be acknowledged and discussed in the main text. For example, socioeconomic status, physical activity levels, and genetic factors could all play a role.
Communication
  • Clarity of Column Headers: The column headers are generally clear. However, 'Gradient' could be more explicitly labeled as 'Regression Coefficient' or 'Slope' for better clarity. Also, adding a column for the standard error of the gradient would be informative.
  • Caption Clarity: The caption is concise but could be more informative. It could be revised to: 'Table 4 | Results of regression analyses examining the relationships between macronutrient intake (percentage of total energy) and BMI in the NHANES and NDNS datasets, using both the whole data and data within the 95% predictive interval (95% PI).'
  • Table Organization: The organization of the table into two sections (whole data and within 95% PI) is logical and facilitates comparison. However, the table could be made more visually appealing and easier to read by using bold font or shading to separate the different sections and macronutrients.
  • Units: The table would benefit from including the units for the gradient (e.g., change in BMI per 1% change in macronutrient intake).
  • Explanation of 95% PI: While the concept of the 95% PI has been introduced earlier in the paper, it would be helpful to briefly reiterate its meaning in the context of this table, either in the caption or in a footnote.
Fig. 3 | Relationships between the reported dietary intakes of macronutrients...
Full Caption

Fig. 3 | Relationships between the reported dietary intakes of macronutrients and BMI. a-f, Relationships between BMI and the intakes of fat (a,b), protein (c,d) and carbohydrate (e,f) for the NHANES and NDNS surveys. Panels a, c and e show the data for the whole sample and panels b, d and f show the data for those individuals whose total energy intake was within the predictive interval (that is, excluding under- and over-reporters).

Figure/Table Image (Page 7)
Fig. 3 | Relationships between the reported dietary intakes of macronutrients and BMI. a-f, Relationships between BMI and the intakes of fat (a,b), protein (c,d) and carbohydrate (e,f) for the NHANES and NDNS surveys. Panels a, c and e show the data for the whole sample and panels b, d and f show the data for those individuals whose total energy intake was within the predictive interval (that is, excluding under- and over-reporters).
First Reference in Text
As there is a systematic trend between macronutrient intake and the extent of under-reporting and because under-reporting is related to BMI, there was a strong positive relationship between the reported dietary intakes of protein and BMI in both surveys (Fig. 3 and Table 4).
Description
  • Purpose of the Figure: This figure shows the relationship between what people say they eat (specifically fat, protein, and carbohydrates) and their Body Mass Index (BMI), which is a measure of body weight relative to height. It's like looking for a connection between diet and weight, but focusing on what people *report* eating, which might not always be accurate.
  • Structure of the Figure: The figure is divided into six panels (a-f), each of which is a scatter plot. Each dot on a plot represents one person in the study. The plots are grouped in pairs for each of the three macronutrients: fat (a, b), protein (c, d), and carbohydrates (e, f). For each macronutrient, one plot shows the data for the entire sample from a specific survey (either NHANES or NDNS), and the other plot shows the data only for individuals whose reported total energy intake was close to what a scientific equation predicted they should be eating. These individuals are considered to be more likely to be reporting their diet accurately.
  • X and Y Axes: In all the plots, the x-axis (horizontal) represents the percentage of total energy that comes from a specific macronutrient (fat, protein, or carbohydrates) in a person's reported diet. The y-axis (vertical) represents the person's BMI. The red line on each plot is a trend line, which is a line drawn through the middle of the dots to show the general pattern of the relationship between the two variables. If the line slopes upwards, it suggests that people with a higher percentage of that macronutrient in their diet tend to have a higher BMI. If it slopes downwards, it suggests the opposite.
  • Interpretation of the Plots: By comparing the plots for the whole sample and the screened sample (those within the predictive interval), we can see how the relationship between reported macronutrient intake and BMI changes when potentially inaccurate data is removed. For example, if the trend line is steeper in the whole sample than in the screened sample, it suggests that the relationship might be exaggerated by misreporting in the whole sample. The reference text highlights that there was a strong positive relationship between reported protein intake and BMI, meaning that people who reported eating a higher proportion of protein tended to have a higher BMI. However, this relationship was affected when looking at only the more reliable data, as mentioned in the discussion of Table 4.
Scientific Validity
  • Visualization of Relationship: Using scatter plots with trend lines is an appropriate way to visualize the relationship between reported macronutrient intake and BMI. This allows for a visual assessment of the direction and strength of the relationship in both the whole sample and the screened sample.
  • Comparison of Whole and Screened Samples: Comparing the plots for the whole sample and the screened sample is crucial for understanding the potential impact of misreporting on the observed relationships. The differences between these plots highlight the importance of considering data quality when interpreting dietary data.
  • Statistical Analysis: The reference text and the caption imply that the relationships depicted in the figure are based on statistical analyses, likely regression analyses as presented in Table 4. The validity of the conclusions drawn from the figure depends on the appropriateness and rigor of these underlying analyses. It's important that the authors have adequately controlled for potential confounding factors in their analyses.
  • Causality: It is important to remember that these plots show correlations and do not prove causation. Even if a strong relationship is observed between reported macronutrient intake and BMI, it does not necessarily mean that one causes the other. There could be other factors that influence both dietary reporting and BMI.
Communication
  • Clarity of Axes Labels: The axes labels are generally clear and informative. The x-axis label could be slightly improved by specifying that it represents the 'Percentage of total energy from macronutrient' to be consistent with previous tables.
  • Legend: The figure lacks a legend to differentiate between the NHANES and NDNS datasets. Adding a legend with different colors or symbols for each dataset would significantly improve the clarity and interpretability of the graphs.
  • Caption Clarity: The caption is relatively clear and provides a good overview of the figure's content. It could be improved by briefly explaining the rationale for screening the data and by explicitly stating that the red lines represent trend lines.
  • Visual Clutter: The use of six panels with different datasets and data treatments (whole vs. screened) creates some visual clutter. Using a more minimalist color scheme or further separating the graphs could improve readability.
  • Trend Line Description: The caption should mention that the red lines are trend lines, and the method used to generate them (e.g., linear regression, LOESS smoothing) should be specified in the methods section. The type of trendline used could impact the interpretation of the relationships depicted.

Methods

Key Aspects

Strengths

Suggestions for Improvement

↑ Back to Top