Leaving Science: Attrition of Biologists in 38 OECD Countries

Table of Contents

Overall Summary

Study Background and Main Findings

This study examines biologist attrition in 38 OECD countries using Scopus publication data from 2000-2022, focusing on two cohorts (starting in 2000 and 2010). Kaplan-Meier survival analysis revealed consistently higher attrition rates for women, particularly in BIO and NEURO, with the gender gap widening over time in the 2000 cohort. The 2010 cohort showed higher overall attrition but a smaller gender gap. Supplementary analyses, including survival regression and hazard rate curves, provided further insights into attrition dynamics. A key limitation is the definition of "leaving science" as ceasing publication, potentially overestimating attrition, especially for women.

Research Impact and Future Directions

The study's reliance on publication data to define "leaving science" presents a significant limitation. While ceasing publication could indicate leaving academia, it doesn't account for career transitions within academia (e.g., teaching-focused roles), movement to industry, or temporary career breaks. This operational definition likely overestimates attrition, especially for women who might be more prone to transitioning to non-publishing roles. Therefore, while the observed gender disparities in publication cessation are important, they don't definitively prove higher attrition rates for women from science itself.

Despite this limitation, the study offers valuable insights into publication patterns and potential indicators of attrition. The large-scale, multi-national dataset provides a broad overview of trends across disciplines and over time. The finding that women in gender-balanced biological fields have lower publication persistence than men, while men and women in male-dominated fields show similar patterns, raises important questions about the influence of disciplinary culture and structural factors on career trajectories. The interactive map, while not fully integrated into the paper, offers a valuable tool for exploring country-specific variations.

Future research should explore alternative metrics for defining and measuring "leaving science" that go beyond publication records. Qualitative studies investigating the reasons behind publication cessation, including career transitions, family responsibilities, and workplace experiences, would provide a more nuanced understanding of the observed patterns. Analyzing career paths beyond academia, such as industry roles or government positions, would offer a more complete picture of scientists' career trajectories. Furthermore, investigating the specific factors contributing to the higher retention rates in AGRI, particularly for women, could reveal valuable insights for promoting career longevity in other scientific fields.

The study's methodological limitations, particularly the narrow definition of "leaving science," restrict the conclusions that can be drawn about actual attrition rates. While the findings suggest a concerning gender disparity in publication persistence, further research using more comprehensive measures of career trajectories is needed to confirm whether women are truly leaving science at higher rates than men. The lack of rigorous statistical comparisons between survival curves and the limited integration of supplementary analyses further weaken the study's conclusions. Addressing these limitations would significantly strengthen the study's contribution to understanding gender disparities in STEMM.

Critical Analysis and Recommendations

Comprehensive Abstract (written-content)
The abstract provides a comprehensive overview of the study's components, including the focus on biologist attrition, data source, methodology, and key findings. This clarity allows readers to quickly grasp the study's essence and significance.
Section: Abstract
State Research Question (written-content)
The Introduction clearly defines the scope and conceptual approach, but lacks an explicit research question. Adding a research question would enhance clarity and focus the reader's attention on the study's central aim.
Section: Introduction
Detail Data Processing and Analysis Methods (written-content)
The Data and Methods section transparently describes the data source and cohort selection, but lacks details on data cleaning and specific survival analysis models. This omission hinders reproducibility and in-depth methodological evaluation.
Section: Data and methods
Report Statistical Tests for Gender Differences (graphical-figure)
The Kaplan-Meier curves effectively visualize gender differences in attrition, but lack statistical tests for significance. Adding p-values or confidence intervals would strengthen the evidence supporting the observed disparities.
Section: Results
Integrate Supplementary Analyses (written-content)
The Results section presents detailed data in tables, but relegates important analyses (survival regression, hazard rate, kernel density) to supplementary materials. Integrating these analyses into the main Results would provide a more complete picture of attrition patterns.
Section: Results
Discuss Broader Implications (written-content)
The Conclusions section synthesizes key findings and contextualizes them within existing literature, but lacks broader implications and recommendations. Discussing the consequences for scientific workforce development and gender equity, along with actionable recommendations, would enhance the paper's impact.
Section: Conclusions
Improve Visualization of Geographical Data (graphical-figure)
The interactive map provides valuable geographical data, but the static snapshot in Figure 5 lacks context and clarity. Improving the figure or providing a more detailed map within the paper would enhance communication.
Section: Conclusions
Refine Definition of Leaving Science (written-content)
The study's definition of "leaving science" as ceasing publication is a major limitation. This might overestimate attrition, especially for women, and necessitates further research using broader measures of scientific careers.
Section: All Sections

Section Analysis

Abstract

Key Aspects

Strengths

Suggestions for Improvement

Introduction

Key Aspects

Strengths

Suggestions for Improvement

Data and methods

Key Aspects

Strengths

Suggestions for Improvement

Results

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Figure 1. 2000 cohort of scientists, Kaplan-Meier survival curve, AGRI,...
Full Caption

Figure 1. 2000 cohort of scientists, Kaplan-Meier survival curve, AGRI, agricultural and biological sciences (N=7,970), BIO, biochemistry, genetics, and molecular biology (N=22,692), IMMU, immunology and microbiology (N=1,775), and NEURO, neuroscience (N=2,533). The four disciplines combined (N=34,970).

Figure/Table Image (Page 7)
Figure 1. 2000 cohort of scientists, Kaplan-Meier survival curve, AGRI, agricultural and biological sciences (N=7,970), BIO, biochemistry, genetics, and molecular biology (N=22,692), IMMU, immunology and microbiology (N=1,775), and NEURO, neuroscience (N=2,533). The four disciplines combined (N=34,970).
First Reference in Text
For the 2000 cohort, we can observe, year after year, how the probability of staying in science is always higher for men and always lower for women.
Description
  • Kaplan-Meier Survival Curves: The figure shows Kaplan-Meier survival curves, which are used to estimate the probability of an event (in this case, a scientist ceasing to publish) over time. Imagine you're tracking how long a group of lightbulbs lasts. The Kaplan-Meier curve shows, for each point in time, the percentage of lightbulbs that are still working. Here, the "lightbulbs" are scientists, and "working" means publishing. The x-axis represents time (years since 2000), and the y-axis represents the proportion of scientists still publishing. Each line on the graph represents a different group (men or women in a specific scientific discipline). The lines start at 100% (all scientists are initially publishing) and step down whenever a scientist stops publishing. The steeper the drop, the more scientists are leaving at that time. The small crosses represent "censored" data points, meaning the scientists were still publishing at the study's end, so we don't know when they'll eventually stop.
  • Figure Organization: The figure is organized into four panels, one for each scientific discipline: AGRI, BIO, IMMU, and NEURO. Within each panel, there are two lines: one for men and one for women. This allows us to compare attrition rates between men and women within each discipline. By looking across the four panels, we can also compare attrition rates across different scientific fields.
Scientific Validity
  • Methodology: Kaplan-Meier analysis is a standard method for survival analysis and is appropriate for this research question. However, the validity depends heavily on the assumption that ceasing publication is equivalent to "leaving science." This might not always be true, as scientists could shift to industry roles, non-publishing academic positions, or other fields without entirely abandoning science. The authors should address this limitation more thoroughly, perhaps by attempting to validate the publication metric against other measures of scientific activity or career status where available.
  • Statistical Significance: The reference text claims a consistent difference between men and women, but the figure itself doesn't present any statistical measures of significance. The authors should perform statistical tests (e.g., log-rank test) to determine if the observed differences are statistically significant and report p-values or confidence intervals. This is essential to support the claim of consistent difference.
  • Confounding Variables: The analysis lacks control for potential confounding variables. Factors like career stage, family responsibilities, funding opportunities, and institutional support can influence attrition rates and may differ between men and women. Without controlling for these factors, it's difficult to isolate the effect of gender on attrition. The authors should consider incorporating these variables into their analysis, perhaps using a Cox proportional hazards model, to strengthen the validity of their conclusions.
Communication
  • Clarity and Conciseness: The figure clearly communicates the core message of differing attrition rates between men and women in science. The visual presentation of survival curves allows for easy comparison across disciplines and genders. The labeling is clear and appropriate. However, the caption could be streamlined by removing the sample sizes, which could be included in a separate table. A brief statement summarizing the main finding directly in the caption would enhance understanding.
  • Data Presentation: While visually effective in showing the trend, the figure lacks specific data points or annotations to highlight the magnitude of the differences. Adding numerical labels at key time points (e.g., 5, 10, and 15 years) or showing percentage differences directly on the graph would enhance the communication of the findings.
Figure 2. 2010 cohort of scientists, Kaplan–Meier survival curve, AGRI,...
Full Caption

Figure 2. 2010 cohort of scientists, Kaplan–Meier survival curve, AGRI, agricultural and biological sciences (N=12,792), BIO, biochemistry, genetics, and molecular biology (N=31,542), IMMU, immunology and microbiology (N=2,361), and NEURO, neuroscience (N=4,513). The four disciplines combined (N=51,208).

Figure/Table Image (Page 11)
Figure 2. 2010 cohort of scientists, Kaplan–Meier survival curve, AGRI, agricultural and biological sciences (N=12,792), BIO, biochemistry, genetics, and molecular biology (N=31,542), IMMU, immunology and microbiology (N=2,361), and NEURO, neuroscience (N=4,513). The four disciplines combined (N=51,208).
First Reference in Text
Comparing the four disciplines for the younger cohort after 9 years, the highest probability of staying in science was observed for AGRI (47.3% for women, 51.8% for men), with women in the other disciplines having probabilities below 40%.
Description
  • Kaplan-Meier Survival Curves: Figure 2 displays Kaplan-Meier survival curves for scientists who began publishing in 2010. These curves show the probability of a scientist continuing to publish over time, within each of four scientific disciplines (AGRI, BIO, IMMU, NEURO). Imagine tracking how many cars from a 2010 model year are still on the road each subsequent year. The Kaplan-Meier curve acts like a graph showing the percentage of cars still running over time. Here, publishing a paper is like the car still running; the x-axis is the years since 2010, and the y-axis is the proportion of scientists from that 2010 "model year" still publishing. Each panel represents a different scientific discipline, with separate lines for men and women. The lines step down each time a scientist ceases to publish. The crosses indicate censored observations, meaning those scientists were still publishing at the end of data collection (2022), so we don't know when they eventually stopped.
  • Visual Comparisons: The figure uses a panel for each discipline, allowing for comparison of attrition rates between genders within each discipline and across the four fields. Each panel shows two lines: one for men and one for women. The lines begin at 100% at time zero (when they first published in 2010) and step down over time as scientists cease publishing. Steeper drops indicate higher attrition rates. By visually comparing the height of the lines at different time points, we can observe the relative retention rates for men and women within and across the different disciplines.
Scientific Validity
  • Methodology and Data Interpretation: The application of Kaplan-Meier survival analysis is appropriate for this research question. The cohort design allows for tracking changes within a specific group over time. However, equating cessation of publication with "leaving science" is a significant limitation. Scientists may transition to industry, take career breaks, or focus on other academic activities without leaving science altogether. This should be explicitly addressed, possibly by exploring alternate metrics for "leaving science" or acknowledging the potential for underestimation.
  • Statistical Validation of Findings: While visually apparent, the differences in retention rates at the 9-year mark need to be statistically validated. The authors must conduct and report statistical comparisons (e.g., using log-rank tests or Cox proportional hazards models) between disciplines and genders at this time point. Reporting p-values or confidence intervals is essential to support the claims made in the reference text.
  • Comparison with 2000 Cohort: The study's focus on temporal changes necessitates a robust statistical comparison between the 2000 and 2010 cohorts. The authors should conduct formal hypothesis tests to determine the statistical significance of the changes in attrition rates over time. This would strengthen the study's core argument and provide more compelling evidence for the observed trends. Consider a table or figure that specifically compares the two cohorts.
Communication
  • Clarity and Visual Organization: Presenting four disciplines in separate panels allows for clear visual comparison of attrition rates. The figure is well-labeled, and the lines for men and women are easily distinguishable. Highlighting the 9-year mark with a vertical line or shaded area could further improve readability and emphasize the point made in the reference text about retention at this specific time point. Directly labeling the lines with the percentage values at the 9-year mark could also enhance clarity. Finally, consider if a small inset or added panel summarizing the overall trends across all disciplines might be helpful for quicker comprehension.
  • Connection to the Research Question: The figure supports the reference text by visually demonstrating the higher retention rate in AGRI compared to the other disciplines. However, the figure's impact could be improved by visually connecting it to the broader narrative of temporal changes. This could involve a direct visual comparison with Figure 1 (e.g., a combined figure or small multiples showing both cohorts side-by-side) or adding annotations that explicitly reference the 2000 cohort data. This would enhance the communication of the study's main point about decreasing retention over time.
Figure 3. 2000 (top panel, N=7,970) and 2010 (bottom panel, N=12,792) cohorts...
Full Caption

Figure 3. 2000 (top panel, N=7,970) and 2010 (bottom panel, N=12,792) cohorts of scientists, AGRI, agricultural and biological sciences. Survival regression curve (exponential distribution fitting of Kaplan-Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve).

Figure/Table Image (Page 13)
Figure 3. 2000 (top panel, N=7,970) and 2010 (bottom panel, N=12,792) cohorts of scientists, AGRI, agricultural and biological sciences. Survival regression curve (exponential distribution fitting of Kaplan-Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve).
First Reference in Text
The survival regression curves for the four disciplines for both cohorts (Figure 3, Figure 4 Panels B; as well as Supplementary Figures 1-2) indicate a steeper decline for men and women in the early years of their publishing careers and a smoother decline in later years, with increasing divergence between the curves for men and women from the 2000 cohort.
Description
  • Types of Curves: The figure presents three different types of curves related to scientist attrition in the AGRI discipline, for two cohorts (those starting in 2000 and 2010). First, it shows *survival regression curves*, which are smoothed versions of Kaplan-Meier curves, depicting the probability of remaining in science over time. Imagine tracking 100 runners in a marathon; the survival curve shows the percentage still running at each point in time. Second, *hazard rate curves* show the instantaneous risk of "leaving science" (stopping publishing) at any given time. Think of it as the probability of a runner dropping out of the marathon at each mile marker. Finally, *kernel density curves* illustrate the distribution of when scientists stopped publishing, showing where the dropouts clustered along the marathon route. The x-axis for all curves represents years since the start year of the cohort, and the y-axis represents either the proportion remaining (survival curves), the hazard rate, or the density.
  • Panel Organization and Labeling: The figure is organized into two panels: one for the 2000 cohort (top) and one for the 2010 cohort (bottom). Each panel shows the three curve types (survival regression, hazard rate, and kernel density) with separate lines for men and women. This allows for comparison of attrition patterns between genders and across the two cohorts. The curves are labeled "Mw" and "Mm" for women and men, respectively. The caption provides the sample size (N) for each cohort.
  • Curve Fitting Methods: Technical details about the curve fitting methods are included in the caption. Survival regression curves are fitted using exponential distribution. Hazard rate and kernel density curves use B-splines smoothing, a method for creating smooth curves from data points, with specified smoothing parameters (10k for hazard rate) and bandwidth (2 for kernel density). The kernel density estimation uses a Gaussian curve as the basis for its smoothing.
Scientific Validity
  • Curve Fitting Choices and Justification: The inclusion of survival regression, hazard rate, and kernel density curves provides a comprehensive view of attrition dynamics. However, the justification for fitting survival regression curves with exponential distribution is lacking. The authors should explain why this distribution was chosen and whether it adequately fits the data. Alternative distributions or non-parametric methods might be more appropriate. Goodness-of-fit tests should be performed and reported. Additionally, the choice of smoothing parameters (10k and 2) for B-splines should be justified, and the sensitivity of the results to these parameters should be assessed.
  • Statistical Analysis of Divergence: The reference text highlights the increasing divergence between men and women in the 2000 cohort. While visually discernible, this observation needs statistical support. The authors should quantify this divergence (e.g., by calculating the difference in retention rates at specific time points) and perform statistical tests to determine if this increasing divergence is statistically significant. A simple visual observation is not enough to draw robust conclusions.
  • Consistency and Cross-Disciplinary Comparison: The figure focuses on the AGRI discipline, but the reference text mentions analysis for all four disciplines (referencing Figure 4 and Supplementary Figures 1-2). The authors should ensure consistency between the figures and text. If the trends are similar across disciplines, a concise summary figure showing the key patterns across all fields might be more informative than presenting detailed graphs for each discipline individually.
Communication
  • Clarity and Visual Overload: The figure presents a lot of information in a relatively compact space, but the density of information makes it challenging to grasp the key messages quickly. While the different curve types provide complementary perspectives on attrition, the visual presentation could be improved. Specifically, the hazard rate curves are difficult to interpret without clear explanation or visual cues to highlight key differences. Consider separating the three curve types into distinct subpanels or even separate figures for improved clarity. Also, consider adding a brief explanatory note in the caption or figure legend to guide the reader through the different panels and curve types.
  • Emphasis and Quantification: The reference text mentions increasing divergence in survival curves for the 2000 cohort but doesn't visually emphasize this point in the figure. Adding visual cues like arrows or annotations to highlight this divergence would strengthen the connection between the text and the figure. Quantifying the divergence at specific time points would further enhance the message. For instance, stating the percentage difference in retention between men and women at year 10 and year 20 would be more impactful than just mentioning "increasing divergence."
  • Accessibility for a Broader Audience: The use of technical terms like "B-splines smoothing" and "kernel density" without further explanation in the caption or figure legend assumes a high level of statistical knowledge from the reader. While appropriate for a specialized audience, consider adding a brief, intuitive explanation of these methods for broader accessibility. For example, explain that these methods are used to create smooth curves that represent the underlying data distribution.
Figure 4: 2000 (top panel, N=22,692) and 2010 (bottom panel, N=31,542) cohorts...
Full Caption

Figure 4: 2000 (top panel, N=22,692) and 2010 (bottom panel, N=31,542) cohorts of scientists, BIO, biochemistry, genetics, and molecular biology. Survival regression curve (exponential distribution fitting of Kaplan–Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve)

Figure/Table Image (Page 14)
Figure 4: 2000 (top panel, N=22,692) and 2010 (bottom panel, N=31,542) cohorts of scientists, BIO, biochemistry, genetics, and molecular biology. Survival regression curve (exponential distribution fitting of Kaplan–Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve)
First Reference in Text
Kernel density curves (Figure 4, Panels D) use kernel density estimation to create a smoothed, continuous curve that approximates the underlying data distribution.
Description
  • Types of Curves: Figure 4 presents three types of curves: survival regression, hazard rate, and kernel density, for two cohorts of scientists in the BIO discipline (those starting their publishing careers in 2000 and 2010). The *survival regression curves* are smoothed versions of Kaplan-Meier curves, showing the probability of a scientist continuing to publish over time. Imagine tracking 100 trees planted in 2000 and charting the percentage still alive each year. That's a survival curve. The *hazard rate curves* show the instantaneous risk of ceasing publication at any given time. Think of this as the probability a tree will die in a given year. The *kernel density curves*, the focus of the reference text, show the distribution of when scientists stopped publishing. Imagine plotting the years when each of those trees died – the kernel density curve would show which years had the most tree deaths.
  • Panel Organization and Kernel Density Curves: The figure has two panels, one for each cohort (2000 on top, 2010 on bottom). Each panel shows all three curve types, with separate lines for men and women (labeled Mw and Mm). The kernel density curves (Panel D) are smooth, continuous curves showing the distribution of when scientists ceased publishing. The x-axis represents the years since the cohort's start year, and the y-axis represents the density, showing the relative concentration of dropouts at different time points. The higher the curve, the more scientists stopped publishing in that year. The caption also provides the sample size (N) for each cohort.
  • Curve-Fitting Details: The caption includes technical details: survival regression uses exponential distribution fitting, while hazard rate and kernel density curves are generated using B-splines smoothing. B-splines are a way to create smooth curves from data points, and the "smoothing parameter" (10k for hazard rate) and "bandwidth" (2 for kernel density) control how smooth the curves are. The kernel density uses a Gaussian (bell-shaped) curve as a component for smoothing.
Scientific Validity
  • Kernel Density Estimation: Bandwidth and Interpretation: Using kernel density estimation is appropriate for visualizing the distribution of dropout times. However, the choice of bandwidth (2) should be justified and its impact on the results discussed. Different bandwidths can significantly alter the shape of the density curve. Sensitivity analysis using different bandwidths would demonstrate the robustness of the observed patterns. Also, clarify the specific kernel function used (presumably Gaussian, as mentioned in the caption). While the reference text notes the creation of a smoothed curve, it doesn't elaborate on the interpretation of this curve in relation to the research questions. What insights do the peaks and shape of the curves provide regarding attrition patterns and temporal changes?
  • Gender Comparison and Statistical Analysis: While the figure shows overall dropout distributions, it doesn't directly address the gender differences, a central aspect of the study. Separating the kernel density curves by gender would enable visual comparison of dropout patterns between men and women, enhancing the figure's scientific value. Furthermore, statistical tests could be applied to compare the distributions and assess the significance of any observed differences.
  • Rationale for Combined Curve Types: The rationale for including the survival regression and hazard rate curves in the same figure as kernel density curves needs further clarification. While related, these curves represent different aspects of attrition. If their purpose is to support the kernel density analysis, explain how they contribute to the overall interpretation. If they serve a separate analytical purpose, consider presenting them in separate figures to avoid confusion and enhance clarity.
Communication
  • Clarity and Quantification of Temporal Changes: The figure successfully uses kernel density estimation to visualize the distribution of when scientists ceased publishing. The smoothed curves provide a clear picture of the dropout patterns for both cohorts, revealing a higher concentration of dropouts in the early career stages. However, the figure would benefit from additional annotations or insets that quantify the differences between the 2000 and 2010 cohorts. For example, noting the shift in peak dropout year or the relative proportions of early-career dropouts would enhance the communication of temporal changes.
  • Integration with Gender Analysis: While the kernel density plots effectively show the distribution of dropouts, they lack direct connection to the study's central theme of gender differences. The figure would be more informative if it included separate density curves for men and women, allowing for visual comparison of dropout patterns between genders. This would enhance the figure's contribution to the overall narrative and provide a more nuanced understanding of attrition dynamics.
  • Accessibility and Conciseness: As in Figure 3, the caption's technical details about smoothing methods might overwhelm readers unfamiliar with these concepts. Consider moving these details to a separate methods section or providing a concise, intuitive explanation in the figure legend. Focus the caption on the key message: the distribution of dropouts over time and its implications for the study.
Supplementary Figure 1: 2000 (N=1,775) and 2010 (N=2,361) cohorts of...
Full Caption

Supplementary Figure 1: 2000 (N=1,775) and 2010 (N=2,361) cohorts of scientists, IMMU, immunology and microbiology. Survival regression curve (exponential distribution fitting of Kaplan-Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve)

Figure/Table Image (Page 23)
Supplementary Figure 1: 2000 (N=1,775) and 2010 (N=2,361) cohorts of scientists, IMMU, immunology and microbiology. Survival regression curve (exponential distribution fitting of Kaplan-Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve)
First Reference in Text
No explicit numbered reference found
Description
  • Types of Curves: This supplementary figure presents three types of curves related to scientist attrition in the IMMU (immunology and microbiology) discipline, for two cohorts (2000 and 2010). The first type, *survival regression curves*, are smoothed Kaplan-Meier curves. They show the probability of a scientist continuing to publish over time. Imagine you're tracking how many of 100 planted flowers are still blooming each year. The survival curve shows the percentage still blooming. The second type, *hazard rate curves*, represents the instantaneous risk of ceasing publication at any given time. Think of this as the chance a flower will wilt on any given day. The third type, *kernel density curves*, illustrate the distribution of when scientists stopped publishing. If you plotted the day each flower wilted, the kernel density curve would show which days had the most wilting flowers. The x-axis represents years since the start of the cohort, and the y-axis represents either the proportion remaining in science (survival curves), the hazard rate, or the density.
  • Panel Organization: The figure has two panels: top for the 2000 cohort and bottom for the 2010 cohort. Each panel shows all three curve types, with separate lines for men and women (labeled Mm and Mw). This allows comparison of attrition patterns between genders and across the two cohorts. The sample size (N) for each cohort is given in the caption.
  • Curve-Fitting Methods: The caption includes technical details about curve-fitting methods. Survival regression curves use exponential distribution fitting. Hazard rate and kernel density curves use B-splines smoothing, which creates smooth curves from data. The "smoothing parameter" (10k for hazard rate) and "bandwidth" (2 for kernel density) control the smoothness. Kernel density uses a Gaussian (bell curve) component for its smoothing.
Scientific Validity
  • Justification and Interpretation of Curve Types: While using multiple curve types provides a comprehensive view, ensure their inclusion is justified and their interpretation contributes meaningfully to the research questions. Specifically, discuss the insights gained from each curve type (survival regression, hazard rate, and kernel density) and how they complement each other in understanding attrition dynamics. For example, how does the shape of the kernel density curve inform the interpretation of the hazard rate or survival curves?
  • Curve-Fitting Choices and Sensitivity Analysis: The caption mentions the use of exponential distribution for survival regression. Justify this choice and assess the goodness-of-fit to the data. Explore alternative distributions or non-parametric methods if the exponential distribution is not a good fit. Similarly, justify the choice of smoothing parameters for the B-splines and assess the sensitivity of the results to these choices.
  • Comparison with Other Disciplines and Statistical Analysis: As a supplementary figure, it's crucial to link this figure explicitly to the main paper's findings. Compare the attrition patterns in the IMMU discipline to those observed in other disciplines (shown in other figures). Discuss whether these patterns are consistent with the overall trends and how they contribute to the study's broader conclusions. Quantify key differences and perform statistical tests to assess their significance.
Communication
  • Clarity and Accessibility: The figure is well-organized, with clear labels and distinct lines for each cohort and gender. The inclusion of three different curve types (survival regression, hazard rate, and kernel density) provides a comprehensive view of the attrition patterns. However, the caption is quite dense with technical terminology. While appropriate for supplementary materials, consider adding a brief explanation of the curve types in the figure legend itself to improve readability for a broader audience. Specifically, explain what each curve type represents and how it should be interpreted.
  • Connection to Main Findings: Since this figure is supplementary, it's crucial to explicitly link it back to the main findings of the paper. Explain how the attrition patterns observed in the IMMU discipline compare to those in other disciplines and whether they support or contradict the overall trends discussed in the main text. This will provide context and justify the inclusion of this supplementary figure.
  • Data Highlighting: The visual presentation could be enhanced by adding specific data points or annotations to highlight key findings. For instance, mark the retention rates at 5, 10, and 15 years on the survival curves, or indicate the peak hazard rate on the corresponding curves. This would make the figure more informative and easier to interpret.
Supplementary Figure 2: 2000 (N=2,533) and 2010 (N=4,513) cohorts of...
Full Caption

Supplementary Figure 2: 2000 (N=2,533) and 2010 (N=4,513) cohorts of scientists, NEURO, neuroscience. Survival regression curve (exponential distribution fitting of Kaplan–Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve)

Figure/Table Image (Page 23)
Supplementary Figure 2: 2000 (N=2,533) and 2010 (N=4,513) cohorts of scientists, NEURO, neuroscience. Survival regression curve (exponential distribution fitting of Kaplan–Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve)
First Reference in Text
No explicit numbered reference found
Description
  • Types of Curves: This supplementary figure presents three types of curves to analyze scientist attrition in the NEURO (neuroscience) discipline, for the 2000 and 2010 cohorts. First, *survival regression curves* (smoothed Kaplan-Meier curves) show the probability of remaining in science (continuing to publish) over time. Imagine tracking 100 newly planted saplings and charting the percentage that survive each year. That's a survival curve. Second, *hazard rate curves* show the instantaneous risk of ceasing publication at any given time. Think of it as the probability of a sapling dying in any given year. Third, *kernel density curves* show the distribution of when scientists stopped publishing. If you plotted the year each sapling died, the kernel density curve would show which years had the most sapling deaths. The x-axis represents years since the cohort's start year, while the y-axis represents the proportion remaining (survival curves), hazard rate, or density.
  • Figure Organization: The figure is organized into two panels: the top for the 2000 cohort and the bottom for the 2010 cohort. Each panel displays all three curve types, with separate lines for men and women (labeled Mm and Mw). This allows for comparison between genders and across the two cohorts. The caption also notes the sample size (N) for each cohort.
  • Curve-Fitting Methods: The caption provides technical details about the curve-fitting methods. Survival regression curves use exponential distribution fitting. Hazard rate and kernel density curves are generated using B-splines smoothing, a method for creating smooth curves from data. The smoothing parameter (10k for hazard rate) and bandwidth (2 for kernel density) control how smooth the curves appear. The kernel density estimation uses a Gaussian (bell-shaped) curve as a component for smoothing.
Scientific Validity
  • Research Question and Interpretation: While the figure presents a comprehensive analysis using three curve types, the scientific validity would be enhanced by explicitly stating the research question or hypothesis being addressed by this supplementary figure. How do the observed patterns in the NEURO discipline contribute to the overall research goals? What specific insights are gained from each curve type, and how do they inform the interpretation of the main findings?
  • Curve-Fitting Choices and Justification: The use of exponential distribution for survival regression should be justified, and goodness-of-fit tests should be reported. Explore alternative distributions or non-parametric approaches if the exponential distribution doesn't adequately fit the data. Similarly, justify the chosen smoothing parameters for the B-splines and assess the sensitivity of the results to these parameters. This will strengthen the robustness of the analysis.
  • Interdisciplinary Comparisons and Statistical Analysis: Compare the attrition patterns in NEURO with those observed in other disciplines (presented in other figures). This will contextualize the findings and contribute to a more holistic understanding of attrition across different scientific fields. Quantify key differences between disciplines and conduct statistical tests to determine the significance of these differences. This comparative analysis is essential for drawing robust conclusions about the factors influencing attrition.
Communication
  • Context and Connection to Main Text: The figure provides a detailed view of attrition patterns in the NEURO discipline using three different curve types, allowing for a nuanced understanding of the trends. However, as a supplementary figure, its connection to the main paper's narrative needs to be strengthened. Explain how the trends observed in NEURO relate to the broader patterns discussed in the main text and whether they support the overall conclusions. Consider adding a concise summary of the key takeaways from this figure to the main text to guide the reader.
  • Visual Enhancements: While the visual presentation is generally clear, consider enhancing it by highlighting specific data points or features. For instance, mark the retention rates at key time points on the survival curves or indicate the peak hazard rate. Adding such visual cues can improve readability and draw attention to the most relevant information.
  • Technical Terminology: The caption contains technical details about smoothing parameters and curve-fitting methods. While appropriate for supplementary materials, consider briefly explaining these terms in the figure legend or providing a reference to a more detailed explanation in the methods section. This will make the figure more accessible to readers unfamiliar with these specific techniques.
Table 1: The sample, four disciplines, two cohorts of scientists (2000 and...
Full Caption

Table 1: The sample, four disciplines, two cohorts of scientists (2000 and 2010) by gender.

Figure/Table Image (Page 6)
Table 1: The sample, four disciplines, two cohorts of scientists (2000 and 2010) by gender.
First Reference in Text
In the four disciplines, for the two cohorts examined, gender balance has been achieved.
Description
  • Sample Size and Gender Distribution: Table 1 shows the number and percentage of male and female scientists in the study sample, categorized by scientific discipline (AGRI, BIO, IMMU, and NEURO) and cohort year (2000 and 2010). Imagine you're sorting a box of marbles by color (discipline) and size (cohort). The table tells you how many marbles of each color and size you have, and also what percentage of each group is red (women) and blue (men). The table is organized into two panels, one for each cohort. Within each panel, there are rows for each discipline and a final row for the total across all disciplines. For each discipline, the table presents the total number (N) of scientists, the number and percentage of men, and the number and percentage of women.
Scientific Validity
  • Sampling Methodology: Presenting the sample size and gender distribution is essential for transparency and allows readers to assess the representativeness of the sample. However, the table lacks information about the sampling method used to select the scientists within each discipline and cohort. Describe the sampling strategy and any inclusion/exclusion criteria. This is crucial for evaluating the generalizability of the findings.
  • Statistical Test for Gender Balance: The reference text mentions "gender balance," but the table doesn't provide any statistical measures to support this claim. Calculate and include appropriate statistical tests (e.g., chi-squared test) to determine whether the observed gender distributions are significantly different from a 50/50 split. This will add rigor to the analysis and provide a more objective assessment of gender balance.
  • Temporal Changes in Gender Balance: The table presents data for two cohorts separately. While this shows the sample composition at two different time points, it's important to analyze and discuss the temporal changes in gender balance. Calculate and report the change in gender proportions between the two cohorts for each discipline. This will provide insights into the evolution of gender representation over time and connect the table to the study's focus on temporal trends.
Communication
  • Clarity and Conciseness: The table effectively presents the sample size and gender distribution for each discipline and cohort. The structure is clear, with rows and columns clearly labeled. Using percentages in addition to raw numbers helps in understanding the gender balance within each group. However, the table could be made more concise by combining the two cohorts into a single panel with sub-columns for each year. This would reduce redundancy and improve readability.
  • Definition of Gender Balance: The reference text mentions "gender balance," but the table itself doesn't provide a clear definition or criterion for what constitutes "balance." Adding a footnote or a brief explanation of the 40-60% criterion mentioned elsewhere in the paper would enhance the table's clarity and connect it to the authors' definition of gender balance.
Table 2: 2000 and 2010 cohorts of scientists, BIO, biochemistry, genetics, and...
Full Caption

Table 2: 2000 and 2010 cohorts of scientists, BIO, biochemistry, genetics, and molecular biology. Kaplan-Meier estimate by gender with total counts for men and women, time (in years), number of observations of scientists leaving science. Kaplan-Meier probability of staying in science with a 95% confidence interval. Note: (1) standard error of 0.01. (Total i.e., men and women combined in Supplementary Table 1).

Figure/Table Image (Page 9)
Table 2: 2000 and 2010 cohorts of scientists, BIO, biochemistry, genetics, and molecular biology. Kaplan-Meier estimate by gender with total counts for men and women, time (in years), number of observations of scientists leaving science. Kaplan-Meier probability of staying in science with a 95% confidence interval. Note: (1) standard error of 0.01. (Total i.e., men and women combined in Supplementary Table 1).
First Reference in Text
Thus, women in BIO were 10.19% more likely to drop out of science than men after 5 years, with markedly increasing chances of dropping out in later years.
Description
  • Kaplan-Meier Survival Data: Table 2 presents Kaplan-Meier survival analysis data for biologists in the BIO discipline, broken down by gender and cohort (2000 and 2010). Imagine you have two groups of runners (men and women) starting a marathon in two different years (2000 and 2010). This table tracks what percentage of each group is still running at each mile marker (each year). For each year, it shows the number of runners who dropped out that year, the percentage still running (Kaplan-Meier probability), and a range within which we are 95% confident the true percentage lies (confidence interval). It also shows how much higher the percentage of men still running is compared to women. The table is divided into two sections, one for each cohort. Within each section, data for men and women are presented side-by-side for each year, up to 19 years after their starting year.
  • Table Columns: The "n leaving science" column indicates the number of scientists who ceased publishing in a given year. The "KM probability (staying)" column shows the estimated probability of a scientist still publishing at that point in time. The 95% CI provides a range of values likely to contain the true probability. The final column calculates the percentage point difference in retention rates between men and women.
Scientific Validity
  • Statistical Analysis: Providing Kaplan-Meier estimates with confidence intervals is a standard practice in survival analysis. However, the table doesn't mention the method used to calculate the difference in attrition rates between men and women (last column). Clarify whether this is a simple subtraction of probabilities or a more sophisticated measure like a hazard ratio. Furthermore, conduct and report statistical tests (e.g., log-rank test) to compare the survival curves of men and women and determine if the observed differences are statistically significant.
  • Comparison with Other Disciplines: While the table focuses on the BIO discipline, it's important to discuss how these findings relate to the broader trends across other disciplines. Do women in other fields experience similar or different attrition patterns? A comparative analysis across disciplines is essential for drawing general conclusions about gender disparities in scientific careers.
  • Quantifying the Increasing Attrition Gap: The reference text mentions "markedly increasing chances" of dropping out for women, but the table itself doesn't provide any statistical measure of this trend. Quantify this increase by calculating the difference in attrition rates at different time points (e.g., 5, 10, and 19 years) and assess the statistical significance of this change over time. This would provide more robust evidence for the claim of increasing attrition disparities.
Communication
  • Clarity and Visual Presentation: The table clearly presents the Kaplan-Meier probabilities and confidence intervals, allowing for a detailed, year-by-year comparison of retention rates between men and women. However, the table is dense and difficult to read. Consider highlighting key data points, such as the 5, 10, and 19-year marks, to draw attention to the trends discussed in the reference text. Visualizations, like line graphs based on this data, might be more effective in conveying the increasing divergence in retention rates over time. Additionally, explaining the "Probability of staying is higher for men by (in %)" column more clearly in the caption or a table footnote would enhance understanding.
  • Context and Interpretation: While the reference text highlights the 10.19% difference at 5 years, it would be beneficial to include the corresponding values for 10 and 19 years directly in the text as well. This would provide a more complete picture of the increasing attrition gap and strengthen the argument. Furthermore, consider adding a sentence or two interpreting the practical significance of these percentages. What do these differences mean in terms of the number of women leaving the field compared to men?
Table 3: 2000 and 2010 cohorts of scientists, AGRI, agricultural and biological...
Full Caption

Table 3: 2000 and 2010 cohorts of scientists, AGRI, agricultural and biological sciences. Kaplan-Meier estimate by gender with total counts for men and women, time (in years), number of observations of scientists leaving science. Kaplan-Meier probability of staying in science with a 95% confidence interval. Note: (1) standard error of 0.01. (Total i.e., men and women combined in Supplementary Table 2).

Figure/Table Image (Page 10)
Table 3: 2000 and 2010 cohorts of scientists, AGRI, agricultural and biological sciences. Kaplan-Meier estimate by gender with total counts for men and women, time (in years), number of observations of scientists leaving science. Kaplan-Meier probability of staying in science with a 95% confidence interval. Note: (1) standard error of 0.01. (Total i.e., men and women combined in Supplementary Table 2).
First Reference in Text
Comparing the four disciplines, for the older cohort, gender differences in retention were highest for NEURO after 5 years, after 10 years, and at end of the period examined, reaching 10% in each case.
Description
  • Kaplan-Meier Survival Data: Table 3 shows Kaplan-Meier survival analysis data for scientists in the AGRI (agricultural and biological sciences) discipline, separated by gender and cohort (2000 and 2010). Imagine two races (2000 and 2010) with male and female runners. This table tracks, year by year, what percentage of each group is still in the race. For each year, the table shows how many runners dropped out in that year ("n leaving science"), the estimated percentage still running at that point ("KM probability"), and a range within which we're 95% sure the true percentage lies ("95% CI"). It also calculates how much higher the percentage of men still running is compared to women. The table is split into two sections, one for each cohort (2000 and 2010). Each section presents data for men and women side-by-side for each year, up to 19 years after their starting year.
  • Table Columns: The columns in the table represent: 'n': the number of scientists at the start of the year, 'n leaving science': the number who stopped publishing that year, 'KM probability (staying)': the estimated percentage still publishing at the end of that year, '95% CI': the confidence interval for the probability, and 'Probability of staying is higher for men by (in %)': the difference in retention rates between men and women.
Scientific Validity
  • Statistical Analysis and Reporting: Presenting Kaplan-Meier estimates with confidence intervals is standard practice. However, the table lacks explicit statistical tests comparing the survival curves of men and women. Perform and report log-rank tests or other appropriate statistical tests to determine if the observed differences in retention are statistically significant. Additionally, clarify the method used to calculate the percentage difference in retention (last column). Is it a simple subtraction or a more nuanced measure like a hazard ratio?
  • Interdisciplinary Comparison: The reference text highlights differences across disciplines, but this table isolates AGRI. Provide a separate table or a combined visualization comparing gender differences across all disciplines to support the claims made in the text. Simply stating that NEURO has the highest difference isn't sufficient; show the data to substantiate this claim.
  • Absolute Differences in Attrition: Consider adding a column showing the absolute difference in the number of men and women leaving science each year. This would complement the percentage difference and provide a more concrete understanding of the attrition gap. This is particularly relevant given the differing sample sizes between men and women in some cohorts and disciplines.
Communication
  • Clarity and Accessibility: The table is structured similarly to Table 2, presenting detailed Kaplan-Meier probabilities and confidence intervals for the AGRI discipline. However, its effectiveness is hampered by its density and complexity. Highlighting key data points or summarizing the trends in a more visually accessible format (e.g., a line graph) would improve communication. Directly showing the calculated gender differences in retention rates in the table would also be beneficial, as the current format requires the reader to perform mental calculations.
  • Interdisciplinary Comparison: The reference text compares retention differences across disciplines, highlighting NEURO as having the highest difference. However, this information isn't directly evident in Table 3, which focuses only on AGRI. Either add a column showing the gender difference in retention for AGRI or provide a separate table comparing these differences across all four disciplines. The current presentation makes it difficult for the reader to connect the table to the broader interdisciplinary comparison.
Supplementary Table 3: 2000 and 2010 cohorts of scientists, IMMU, immunology...
Full Caption

Supplementary Table 3: 2000 and 2010 cohorts of scientists, IMMU, immunology and microbiology. Kaplan-Meier estimate by gender with total counts for men and women, time (in years), number of observations of scientists leaving science. Kaplan-Meier probability of staying in science with a 95% confidence interval. Note: (1) standard error of 0.01, (2) standard error of 0.05.

Figure/Table Image (Page 21)
Supplementary Table 3: 2000 and 2010 cohorts of scientists, IMMU, immunology and microbiology. Kaplan-Meier estimate by gender with total counts for men and women, time (in years), number of observations of scientists leaving science. Kaplan-Meier probability of staying in science with a 95% confidence interval. Note: (1) standard error of 0.01, (2) standard error of 0.05.
First Reference in Text
No explicit numbered reference found
Description
  • Kaplan-Meier Survival Data: Supplementary Table 3 presents Kaplan-Meier survival analysis data for scientists in the IMMU (immunology and microbiology) discipline, separated by gender and cohort (2000 and 2010). Imagine you're observing two groups of hikers (men and women) starting a trail in two different years (2000 and 2010). This table tracks the percentage of hikers still on the trail at each checkpoint (each year). For each checkpoint, it shows how many hikers stopped at that point ("n leaving science"), the percentage still hiking ("KM probability"), and a range where we are 95% sure the real percentage lies (confidence interval). It also calculates how much higher the percentage of men still hiking is compared to women. The table is divided into sections for each cohort, with data for men and women presented side-by-side for each year, up to 19 years after their starting year.
  • Table Columns: The columns represent: 'n': the number of scientists at the start of each year, 'n leaving science': the number who stopped publishing that year, 'KM probability (staying)': the estimated percentage still publishing at the end of that year, '95% CI': a range of values likely containing the true probability, and 'Probability of staying is higher for men by (%)': the percentage point difference in retention between men and women.
  • Standard Error: The table notes two standard errors: 0.01 and 0.05. The standard error is a measure of the uncertainty in the estimated KM probability. A smaller standard error indicates greater precision in the estimate. Two different standard errors are reported, likely reflecting different sample sizes or time points in the analysis.
Scientific Validity
  • Statistical Analysis and Reporting: Presenting Kaplan-Meier estimates with confidence intervals is essential. However, the table lacks explicit statistical tests comparing male and female survival curves. Conduct and report log-rank tests to determine if the observed retention differences are statistically significant. Clarify the method for calculating the percentage difference in retention. Is it a simple subtraction, or a hazard ratio, or another measure?
  • Interdisciplinary Comparisons: While this table focuses on IMMU, comparing these findings to the other disciplines is crucial. A separate table or visualization directly comparing gender differences in attrition across all four disciplines would strengthen the paper and support broader conclusions.
  • Clarification of Standard Errors: The caption mentions two different standard errors (0.01 and 0.05). Clarify which standard error corresponds to which part of the table (e.g., specific cohort or time points). Explain why two different values are reported and what they signify. Consistent use of a single standard error, if appropriate, would simplify the presentation.
Communication
  • Clarity and Visual Presentation: The table presents comprehensive Kaplan-Meier survival data for the IMMU discipline, but its dense format makes it challenging to extract key insights quickly. Consider visually highlighting key data points, such as the retention rates at specific time intervals (e.g., 5, 10, and 19 years), or presenting some of this data graphically. A clearer presentation of the gender differences in retention rates, perhaps by including a dedicated column showing the percentage point difference, would enhance readability and improve the communication of the main findings.
  • Context and Relevance to Main Findings: As a supplementary table, it's crucial to explicitly connect the data presented here to the main findings of the paper. Discuss how the attrition patterns in IMMU compare to those in other disciplines and whether they support or contradict the overall trends discussed in the main text. Provide a concise summary of the key takeaways from this table in the main results section to guide the reader and justify the inclusion of this supplementary information.
  • Conciseness and Focus: The caption includes technical details about standard errors (0.01 and 0.05). While statistically relevant, these details might not be necessary in the caption. Consider moving them to a table footnote or a methods section. Focus the caption on the key message of the table: the presentation of Kaplan-Meier survival data for the IMMU discipline, broken down by gender and cohort.
Supplementary Table 4: 2010 and 2010 cohorts of scientists, NEURO,...
Full Caption

Supplementary Table 4: 2010 and 2010 cohorts of scientists, NEURO, neuroscience. Kaplan-Meier estimate by gender with total counts for men and women, time (in years), number of observations of scientists leaving science. Kaplan-Meier probability of staying in science with a 95% confidence interval. Note: (1) standard error of 0.01, (2) standard error of 0.05.

Figure/Table Image (Page 22)
Supplementary Table 4: 2010 and 2010 cohorts of scientists, NEURO, neuroscience. Kaplan-Meier estimate by gender with total counts for men and women, time (in years), number of observations of scientists leaving science. Kaplan-Meier probability of staying in science with a 95% confidence interval. Note: (1) standard error of 0.01, (2) standard error of 0.05.
First Reference in Text
No explicit numbered reference found
Description
  • Kaplan-Meier Survival Data: Supplementary Table 4 presents Kaplan-Meier survival analysis data for scientists in the NEURO (neuroscience) discipline, split by gender and cohort (2000 and 2010). Imagine two groups of climbers (men and women) attempting a mountain ascent in two different years (2000 and 2010). This table tracks the percentage of each group still climbing at each base camp (each year). It shows how many climbers turned back at each camp ("n leaving science"), the percentage still climbing ("KM probability"), and a range where we are 95% confident the true percentage lies (confidence interval). It also shows how much higher the percentage of men still climbing is compared to women. The table is divided by cohort, with data for men and women presented side-by-side for each year, up to 19 years after their starting year.
  • Table Columns: The columns represent: 'n': the number of scientists at the beginning of each year, 'n leaving science': the number who stopped publishing that year, 'KM probability (staying)': the estimated percentage still publishing at the end of that year, '95% CI': a range likely containing the true percentage, and 'Probability of staying is higher for men by (%)': the percentage point difference in retention between men and women.
  • Standard Error: The table includes two standard error values (0.01 and 0.05). The standard error measures uncertainty in the estimated KM probability. A smaller standard error means a more precise estimate. The two values likely reflect different sample sizes or time points.
Scientific Validity
  • Statistical Analysis and Clarity of Methods: While Kaplan-Meier estimates and confidence intervals are provided, the table lacks explicit statistical tests comparing male and female survival curves. Perform and report log-rank tests or similar appropriate statistical tests. Clarify the calculation of the percentage difference in retention rates (last column) - is it a simple subtraction, a hazard ratio, or another method?
  • Interdisciplinary Comparison: This table focuses on NEURO. Compare these results with those from other disciplines (presented in other tables/figures) to understand broader attrition trends and support generalizable conclusions.
  • Standard Error Reporting: Two different standard errors (0.01 and 0.05) are reported. Specify which error corresponds to which part of the table (e.g., specific cohorts or years). Explain the reason for two values. If appropriate, use a single standard error for consistency.
  • Typographical Error in Caption: The caption mentions "2010 and 2010 cohorts," which seems to be a typo. Verify and correct this to accurately reflect the cohorts included in the table. Likely, this should be 2000 and 2010, consistent with the other supplementary tables.
Communication
  • Clarity and Visual Presentation: While the table provides detailed survival probabilities, it's difficult to grasp the key trends without careful examination. Consider adding a column explicitly showing the difference in retention rates between men and women. Visualizing this data with a line graph would greatly improve its communicative power and allow for easier comparison with other disciplines. Highlight key time points (e.g., 5, 10, 19 years) visually within the table to emphasize trends discussed elsewhere.
  • Connection to Main Findings: As a supplementary table, its purpose and relevance to the main findings aren't immediately clear. Explicitly state how the observed patterns in NEURO relate to the broader trends discussed in the main text. Summarize key findings from the table within the main results section to provide context and guide the reader.
  • Caption Conciseness: The caption is overly technical and could be simplified. Move details about standard errors to a footnote. Focus on clearly conveying the table's core message: Kaplan-Meier survival data for NEURO, broken down by gender and cohort.

Conclusions

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Figure 5. BIO, biochemistry, genetics, and molecular biology. Women scientists...
Full Caption

Figure 5. BIO, biochemistry, genetics, and molecular biology. Women scientists from cohort 2000, probability of staying in science after 10 years. A snapshot of an interactive map (available at https://public.tableau.com/app/profile/marek.kwiek/viz/Attrition-in-science-OECD/Dashboard) in which Kaplan-Meier probabilities of remaining in science (i.e., continuing publishing) are provided by country (38 OECD countries), discipline, and gender and for all 11 cohorts from 2000–2010 (N=2,127,803 scientists in 16 STEMM disciplines, of which 1,289,756 are identified as men and 838,047 as women).

Figure/Table Image (Page 15)
Figure 5. BIO, biochemistry, genetics, and molecular biology. Women scientists from cohort 2000, probability of staying in science after 10 years. A snapshot of an interactive map (available at https://public.tableau.com/app/profile/marek.kwiek/viz/Attrition-in-science-OECD/Dashboard) in which Kaplan-Meier probabilities of remaining in science (i.e., continuing publishing) are provided by country (38 OECD countries), discipline, and gender and for all 11 cohorts from 2000–2010 (N=2,127,803 scientists in 16 STEMM disciplines, of which 1,289,756 are identified as men and 838,047 as women).
First Reference in Text
Our online interactive tool (Figure 5) reports data about retention in BIO for the USA (the largest OECD science system): 60% vs. 70% after 5 years, 40% vs. 51% after 10 years, and 20% vs. 30% at the end of the study period for women vs. men, respectively.
Description
  • Interactive Map of Retention Rates: Figure 5 is a snapshot of an interactive map showing the probability of women scientists (from the 2000 cohort) remaining in the field of BIO (biochemistry, genetics, and molecular biology) after 10 years. The map displays data for 38 OECD countries. Each country is colored according to its retention rate, with darker shades indicating higher retention. Think of it like a weather map, but instead of temperature, it shows the "climate" for women scientists in different countries. The map is a static image taken from a larger, interactive online tool. This tool allows users to explore the data in more detail, by selecting different cohorts (2000-2010), disciplines (16 STEMM fields), and gender. The underlying data used to generate the map are Kaplan-Meier probabilities, which estimate the likelihood of "survival" (continued publishing) over time.
  • Underlying Dataset and Tool Functionality: The interactive tool behind the map contains a much larger dataset than just the 2000 cohort in BIO shown in the figure. It includes data for 11 cohorts of scientists (starting from 2000 to 2010), across 16 different STEMM (Science, Technology, Engineering, Mathematics, and Medicine) disciplines, and for both men and women. The snapshot in Figure 5 is just one slice of this larger dataset, focusing on women in BIO from the 2000 cohort after 10 years.
Scientific Validity
  • Interactive Data Exploration: Presenting a static snapshot of an interactive tool limits the reader's ability to engage with the data. While the map visually conveys some information, it doesn't allow for exploration of other cohorts, disciplines, or gender comparisons, which are crucial aspects of the study. Instead of a static image, consider embedding the interactive tool directly into the paper (if feasible) or providing a link to a dedicated webpage where readers can explore the full dataset. This would significantly enhance the paper's scientific value and transparency.
  • Cross-National Analysis: The reference text focuses on retention rates in the USA, but the larger dataset offers valuable insights into cross-national variations. A more thorough analysis of these variations, including statistical comparisons between countries and investigation of potential explanatory factors (e.g., national science policies, cultural context, funding structures), would strengthen the paper's scientific contribution. The current focus on a single country underutilizes the available data and limits the scope of the conclusions.
  • Statistical Methodology and Significance Testing: While the caption mentions the use of Kaplan-Meier probabilities, it doesn't specify how these probabilities were calculated or whether any statistical tests were performed to assess the significance of observed differences between countries. Providing more details about the underlying statistical methodology would improve the transparency and rigor of the analysis.
Communication
  • Showcasing the Interactive Tool: Presenting the data on a map effectively highlights the geographical variation in retention rates. However, the static snapshot shown in the paper doesn't convey the interactive nature of the online tool. A brief description of the tool's functionalities (e.g., ability to select different cohorts, disciplines, and genders) would be beneficial. Also, consider including a more visually compelling visualization from the interactive tool, perhaps showing the change in retention rates over time or the difference between men and women. The current snapshot is somewhat underwhelming and doesn't fully exploit the potential of the interactive resource.
  • Discussing Cross-National Patterns: The reference text focuses on retention rates in the USA but doesn't discuss the broader patterns observed across OECD countries. Highlighting some key findings from the interactive tool, such as countries with particularly high or low retention rates, or interesting regional patterns, would enrich the discussion and better utilize the available data. Connecting these observations to potential explanatory factors (e.g., national policies, cultural differences, funding structures) would further enhance the paper's contribution.
  • Caption Clarity and Conciseness: The caption is overloaded with information, making it difficult to parse. Streamline the caption by focusing on the key message (geographical variation in retention rates) and moving details like sample sizes and the link to the interactive tool to the main text or a separate table. A clear and concise caption will improve the figure's accessibility and impact.
↑ Back to Top