Leaving Science: Attrition of Biologists in 38 OECD Countries

Table of Contents

Overall Summary

Study Background and Main Findings

This study examines biologist attrition in 38 OECD countries using Scopus publication data from 2000-2022, focusing on two cohorts (starting in 2000 and 2010). Kaplan-Meier survival analysis revealed consistently higher attrition rates for women, particularly in BIO and NEURO, with the gender gap widening over time in the 2000 cohort. The 2010 cohort showed higher overall attrition but a smaller gender gap. Supplementary analyses, including survival regression and hazard rate curves, provided further insights into attrition dynamics. A key limitation is the definition of "leaving science" as ceasing publication, potentially overestimating attrition, especially for women.

Research Impact and Future Directions

The study's reliance on publication data to define "leaving science" presents a significant limitation. While ceasing publication could indicate leaving academia, it doesn't account for career transitions within academia (e.g., teaching-focused roles), movement to industry, or temporary career breaks. This operational definition likely overestimates attrition, especially for women who might be more prone to transitioning to non-publishing roles. Therefore, while the observed gender disparities in publication cessation are important, they don't definitively prove higher attrition rates for women from science itself.

Despite this limitation, the study offers valuable insights into publication patterns and potential indicators of attrition. The large-scale, multi-national dataset provides a broad overview of trends across disciplines and over time. The finding that women in gender-balanced biological fields have lower publication persistence than men, while men and women in male-dominated fields show similar patterns, raises important questions about the influence of disciplinary culture and structural factors on career trajectories. The interactive map, while not fully integrated into the paper, offers a valuable tool for exploring country-specific variations.

Future research should explore alternative metrics for defining and measuring "leaving science" that go beyond publication records. Qualitative studies investigating the reasons behind publication cessation, including career transitions, family responsibilities, and workplace experiences, would provide a more nuanced understanding of the observed patterns. Analyzing career paths beyond academia, such as industry roles or government positions, would offer a more complete picture of scientists' career trajectories. Furthermore, investigating the specific factors contributing to the higher retention rates in AGRI, particularly for women, could reveal valuable insights for promoting career longevity in other scientific fields.

The study's methodological limitations, particularly the narrow definition of "leaving science," restrict the conclusions that can be drawn about actual attrition rates. While the findings suggest a concerning gender disparity in publication persistence, further research using more comprehensive measures of career trajectories is needed to confirm whether women are truly leaving science at higher rates than men. The lack of rigorous statistical comparisons between survival curves and the limited integration of supplementary analyses further weaken the study's conclusions. Addressing these limitations would significantly strengthen the study's contribution to understanding gender disparities in STEMM.

Critical Analysis and Recommendations

Comprehensive Abstract (written-content)
The abstract provides a comprehensive overview of the study's components, including the focus on biologist attrition, data source, methodology, and key findings. This clarity allows readers to quickly grasp the study's essence and significance.
Section: Abstract
State Research Question (written-content)
The Introduction clearly defines the scope and conceptual approach, but lacks an explicit research question. Adding a research question would enhance clarity and focus the reader's attention on the study's central aim.
Section: Introduction
Detail Data Processing and Analysis Methods (written-content)
The Data and Methods section transparently describes the data source and cohort selection, but lacks details on data cleaning and specific survival analysis models. This omission hinders reproducibility and in-depth methodological evaluation.
Section: Data and methods
Report Statistical Tests for Gender Differences (graphical-figure)
The Kaplan-Meier curves effectively visualize gender differences in attrition, but lack statistical tests for significance. Adding p-values or confidence intervals would strengthen the evidence supporting the observed disparities.
Section: Results
Integrate Supplementary Analyses (written-content)
The Results section presents detailed data in tables, but relegates important analyses (survival regression, hazard rate, kernel density) to supplementary materials. Integrating these analyses into the main Results would provide a more complete picture of attrition patterns.
Section: Results
Discuss Broader Implications (written-content)
The Conclusions section synthesizes key findings and contextualizes them within existing literature, but lacks broader implications and recommendations. Discussing the consequences for scientific workforce development and gender equity, along with actionable recommendations, would enhance the paper's impact.
Section: Conclusions
Improve Visualization of Geographical Data (graphical-figure)
The interactive map provides valuable geographical data, but the static snapshot in Figure 5 lacks context and clarity. Improving the figure or providing a more detailed map within the paper would enhance communication.
Section: Conclusions
Refine Definition of Leaving Science (written-content)
The study's definition of "leaving science" as ceasing publication is a major limitation. This might overestimate attrition, especially for women, and necessitates further research using broader measures of scientific careers.
Section: All Sections

Section Analysis

Abstract

Key Aspects

Strengths

Suggestions for Improvement

Introduction

Key Aspects

Strengths

Suggestions for Improvement

Data and methods

Key Aspects

Strengths

Suggestions for Improvement

Results

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Figure 1. 2000 cohort of scientists, Kaplan-Meier survival curve, AGRI,...
Full Caption

Figure 1. 2000 cohort of scientists, Kaplan-Meier survival curve, AGRI, agricultural and biological sciences (N=7,970), BIO, biochemistry, genetics, and molecular biology (N=22,692), IMMU, immunology and microbiology (N=1,775), and NEURO, neuroscience (N=2,533). The four disciplines combined (N=34,970).

Figure/Table Image (Page 7)
Figure 1. 2000 cohort of scientists, Kaplan-Meier survival curve, AGRI, agricultural and biological sciences (N=7,970), BIO, biochemistry, genetics, and molecular biology (N=22,692), IMMU, immunology and microbiology (N=1,775), and NEURO, neuroscience (N=2,533). The four disciplines combined (N=34,970).
First Reference in Text
For the 2000 cohort, we can observe, year after year, how the probability of staying in science is always higher for men and always lower for women.
Description
  • Kaplan-Meier Survival Curves: The figure shows Kaplan-Meier survival curves, which are used to estimate the probability of an event (in this case, a scientist ceasing to publish) over time. Imagine you're tracking how long a group of lightbulbs lasts. The Kaplan-Meier curve shows, for each point in time, the percentage of lightbulbs that are still working. Here, the "lightbulbs" are scientists, and "working" means publishing. The x-axis represents time (years since 2000), and the y-axis represents the proportion of scientists still publishing. Each line on the graph represents a different group (men or women in a specific scientific discipline). The lines start at 100% (all scientists are initially publishing) and step down whenever a scientist stops publishing. The steeper the drop, the more scientists are leaving at that time. The small crosses represent "censored" data points, meaning the scientists were still publishing at the study's end, so we don't know when they'll eventually stop.
  • Figure Organization: The figure is organized into four panels, one for each scientific discipline: AGRI, BIO, IMMU, and NEURO. Within each panel, there are two lines: one for men and one for women. This allows us to compare attrition rates between men and women within each discipline. By looking across the four panels, we can also compare attrition rates across different scientific fields.
Scientific Validity
  • Methodology: Kaplan-Meier analysis is a standard method for survival analysis and is appropriate for this research question. However, the validity depends heavily on the assumption that ceasing publication is equivalent to "leaving science." This might not always be true, as scientists could shift to industry roles, non-publishing academic positions, or other fields without entirely abandoning science. The authors should address this limitation more thoroughly, perhaps by attempting to validate the publication metric against other measures of scientific activity or career status where available.
  • Statistical Significance: The reference text claims a consistent difference between men and women, but the figure itself doesn't present any statistical measures of significance. The authors should perform statistical tests (e.g., log-rank test) to determine if the observed differences are statistically significant and report p-values or confidence intervals. This is essential to support the claim of consistent difference.
  • Confounding Variables: The analysis lacks control for potential confounding variables. Factors like career stage, family responsibilities, funding opportunities, and institutional support can influence attrition rates and may differ between men and women. Without controlling for these factors, it's difficult to isolate the effect of gender on attrition. The authors should consider incorporating these variables into their analysis, perhaps using a Cox proportional hazards model, to strengthen the validity of their conclusions.
Communication
  • Clarity and Conciseness: The figure clearly communicates the core message of differing attrition rates between men and women in science. The visual presentation of survival curves allows for easy comparison across disciplines and genders. The labeling is clear and appropriate. However, the caption could be streamlined by removing the sample sizes, which could be included in a separate table. A brief statement summarizing the main finding directly in the caption would enhance understanding.
  • Data Presentation: While visually effective in showing the trend, the figure lacks specific data points or annotations to highlight the magnitude of the differences. Adding numerical labels at key time points (e.g., 5, 10, and 15 years) or showing percentage differences directly on the graph would enhance the communication of the findings.
Figure 2. 2010 cohort of scientists, Kaplan–Meier survival curve, AGRI,...
Full Caption

Figure 2. 2010 cohort of scientists, Kaplan–Meier survival curve, AGRI, agricultural and biological sciences (N=12,792), BIO, biochemistry, genetics, and molecular biology (N=31,542), IMMU, immunology and microbiology (N=2,361), and NEURO, neuroscience (N=4,513). The four disciplines combined (N=51,208).

Figure/Table Image (Page 11)
Figure 2. 2010 cohort of scientists, Kaplan–Meier survival curve, AGRI, agricultural and biological sciences (N=12,792), BIO, biochemistry, genetics, and molecular biology (N=31,542), IMMU, immunology and microbiology (N=2,361), and NEURO, neuroscience (N=4,513). The four disciplines combined (N=51,208).
First Reference in Text
Comparing the four disciplines for the younger cohort after 9 years, the highest probability of staying in science was observed for AGRI (47.3% for women, 51.8% for men), with women in the other disciplines having probabilities below 40%.
Description
  • Kaplan-Meier Survival Curves: Figure 2 displays Kaplan-Meier survival curves for scientists who began publishing in 2010. These curves show the probability of a scientist continuing to publish over time, within each of four scientific disciplines (AGRI, BIO, IMMU, NEURO). Imagine tracking how many cars from a 2010 model year are still on the road each subsequent year. The Kaplan-Meier curve acts like a graph showing the percentage of cars still running over time. Here, publishing a paper is like the car still running; the x-axis is the years since 2010, and the y-axis is the proportion of scientists from that 2010 "model year" still publishing. Each panel represents a different scientific discipline, with separate lines for men and women. The lines step down each time a scientist ceases to publish. The crosses indicate censored observations, meaning those scientists were still publishing at the end of data collection (2022), so we don't know when they eventually stopped.
  • Visual Comparisons: The figure uses a panel for each discipline, allowing for comparison of attrition rates between genders within each discipline and across the four fields. Each panel shows two lines: one for men and one for women. The lines begin at 100% at time zero (when they first published in 2010) and step down over time as scientists cease publishing. Steeper drops indicate higher attrition rates. By visually comparing the height of the lines at different time points, we can observe the relative retention rates for men and women within and across the different disciplines.
Scientific Validity
  • Methodology and Data Interpretation: The application of Kaplan-Meier survival analysis is appropriate for this research question. The cohort design allows for tracking changes within a specific group over time. However, equating cessation of publication with "leaving science" is a significant limitation. Scientists may transition to industry, take career breaks, or focus on other academic activities without leaving science altogether. This should be explicitly addressed, possibly by exploring alternate metrics for "leaving science" or acknowledging the potential for underestimation.
  • Statistical Validation of Findings: While visually apparent, the differences in retention rates at the 9-year mark need to be statistically validated. The authors must conduct and report statistical comparisons (e.g., using log-rank tests or Cox proportional hazards models) between disciplines and genders at this time point. Reporting p-values or confidence intervals is essential to support the claims made in the reference text.
  • Comparison with 2000 Cohort: The study's focus on temporal changes necessitates a robust statistical comparison between the 2000 and 2010 cohorts. The authors should conduct formal hypothesis tests to determine the statistical significance of the changes in attrition rates over time. This would strengthen the study's core argument and provide more compelling evidence for the observed trends. Consider a table or figure that specifically compares the two cohorts.
Communication
  • Clarity and Visual Organization: Presenting four disciplines in separate panels allows for clear visual comparison of attrition rates. The figure is well-labeled, and the lines for men and women are easily distinguishable. Highlighting the 9-year mark with a vertical line or shaded area could further improve readability and emphasize the point made in the reference text about retention at this specific time point. Directly labeling the lines with the percentage values at the 9-year mark could also enhance clarity. Finally, consider if a small inset or added panel summarizing the overall trends across all disciplines might be helpful for quicker comprehension.
  • Connection to the Research Question: The figure supports the reference text by visually demonstrating the higher retention rate in AGRI compared to the other disciplines. However, the figure's impact could be improved by visually connecting it to the broader narrative of temporal changes. This could involve a direct visual comparison with Figure 1 (e.g., a combined figure or small multiples showing both cohorts side-by-side) or adding annotations that explicitly reference the 2000 cohort data. This would enhance the communication of the study's main point about decreasing retention over time.
Figure 3. 2000 (top panel, N=7,970) and 2010 (bottom panel, N=12,792) cohorts...
Full Caption

Figure 3. 2000 (top panel, N=7,970) and 2010 (bottom panel, N=12,792) cohorts of scientists, AGRI, agricultural and biological sciences. Survival regression curve (exponential distribution fitting of Kaplan-Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve).

Figure/Table Image (Page 13)
Figure 3. 2000 (top panel, N=7,970) and 2010 (bottom panel, N=12,792) cohorts of scientists, AGRI, agricultural and biological sciences. Survival regression curve (exponential distribution fitting of Kaplan-Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve).
First Reference in Text
The survival regression curves for the four disciplines for both cohorts (Figure 3, Figure 4 Panels B; as well as Supplementary Figures 1-2) indicate a steeper decline for men and women in the early years of their publishing careers and a smoother decline in later years, with increasing divergence between the curves for men and women from the 2000 cohort.
Description
  • Types of Curves: The figure presents three different types of curves related to scientist attrition in the AGRI discipline, for two cohorts (those starting in 2000 and 2010). First, it shows *survival regression curves*, which are smoothed versions of Kaplan-Meier curves, depicting the probability of remaining in science over time. Imagine tracking 100 runners in a marathon; the survival curve shows the percentage still running at each point in time. Second, *hazard rate curves* show the instantaneous risk of "leaving science" (stopping publishing) at any given time. Think of it as the probability of a runner dropping out of the marathon at each mile marker. Finally, *kernel density curves* illustrate the distribution of when scientists stopped publishing, showing where the dropouts clustered along the marathon route. The x-axis for all curves represents years since the start year of the cohort, and the y-axis represents either the proportion remaining (survival curves), the hazard rate, or the density.
  • Panel Organization and Labeling: The figure is organized into two panels: one for the 2000 cohort (top) and one for the 2010 cohort (bottom). Each panel shows the three curve types (survival regression, hazard rate, and kernel density) with separate lines for men and women. This allows for comparison of attrition patterns between genders and across the two cohorts. The curves are labeled "Mw" and "Mm" for women and men, respectively. The caption provides the sample size (N) for each cohort.
  • Curve Fitting Methods: Technical details about the curve fitting methods are included in the caption. Survival regression curves are fitted using exponential distribution. Hazard rate and kernel density curves use B-splines smoothing, a method for creating smooth curves from data points, with specified smoothing parameters (10k for hazard rate) and bandwidth (2 for kernel density). The kernel density estimation uses a Gaussian curve as the basis for its smoothing.
Scientific Validity
  • Curve Fitting Choices and Justification: The inclusion of survival regression, hazard rate, and kernel density curves provides a comprehensive view of attrition dynamics. However, the justification for fitting survival regression curves with exponential distribution is lacking. The authors should explain why this distribution was chosen and whether it adequately fits the data. Alternative distributions or non-parametric methods might be more appropriate. Goodness-of-fit tests should be performed and reported. Additionally, the choice of smoothing parameters (10k and 2) for B-splines should be justified, and the sensitivity of the results to these parameters should be assessed.
  • Statistical Analysis of Divergence: The reference text highlights the increasing divergence between men and women in the 2000 cohort. While visually discernible, this observation needs statistical support. The authors should quantify this divergence (e.g., by calculating the difference in retention rates at specific time points) and perform statistical tests to determine if this increasing divergence is statistically significant. A simple visual observation is not enough to draw robust conclusions.
  • Consistency and Cross-Disciplinary Comparison: The figure focuses on the AGRI discipline, but the reference text mentions analysis for all four disciplines (referencing Figure 4 and Supplementary Figures 1-2). The authors should ensure consistency between the figures and text. If the trends are similar across disciplines, a concise summary figure showing the key patterns across all fields might be more informative than presenting detailed graphs for each discipline individually.
Communication
  • Clarity and Visual Overload: The figure presents a lot of information in a relatively compact space, but the density of information makes it challenging to grasp the key messages quickly. While the different curve types provide complementary perspectives on attrition, the visual presentation could be improved. Specifically, the hazard rate curves are difficult to interpret without clear explanation or visual cues to highlight key differences. Consider separating the three curve types into distinct subpanels or even separate figures for improved clarity. Also, consider adding a brief explanatory note in the caption or figure legend to guide the reader through the different panels and curve types.
  • Emphasis and Quantification: The reference text mentions increasing divergence in survival curves for the 2000 cohort but doesn't visually emphasize this point in the figure. Adding visual cues like arrows or annotations to highlight this divergence would strengthen the connection between the text and the figure. Quantifying the divergence at specific time points would further enhance the message. For instance, stating the percentage difference in retention between men and women at year 10 and year 20 would be more impactful than just mentioning "increasing divergence."
  • Accessibility for a Broader Audience: The use of technical terms like "B-splines smoothing" and "kernel density" without further explanation in the caption or figure legend assumes a high level of statistical knowledge from the reader. While appropriate for a specialized audience, consider adding a brief, intuitive explanation of these methods for broader accessibility. For example, explain that these methods are used to create smooth curves that represent the underlying data distribution.
Figure 4: 2000 (top panel, N=22,692) and 2010 (bottom panel, N=31,542) cohorts...
Full Caption

Figure 4: 2000 (top panel, N=22,692) and 2010 (bottom panel, N=31,542) cohorts of scientists, BIO, biochemistry, genetics, and molecular biology. Survival regression curve (exponential distribution fitting of Kaplan–Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve)

Figure/Table Image (Page 14)
Figure 4: 2000 (top panel, N=22,692) and 2010 (bottom panel, N=31,542) cohorts of scientists, BIO, biochemistry, genetics, and molecular biology. Survival regression curve (exponential distribution fitting of Kaplan–Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve)
First Reference in Text
Kernel density curves (Figure 4, Panels D) use kernel density estimation to create a smoothed, continuous curve that approximates the underlying data distribution.
Description
  • Types of Curves: Figure 4 presents three types of curves: survival regression, hazard rate, and kernel density, for two cohorts of scientists in the BIO discipline (those starting their publishing careers in 2000 and 2010). The *survival regression curves* are smoothed versions of Kaplan-Meier curves, showing the probability of a scientist continuing to publish over time. Imagine tracking 100 trees planted in 2000 and charting the percentage still alive each year. That's a survival curve. The *hazard rate curves* show the instantaneous risk of ceasing publication at any given time. Think of this as the probability a tree will die in a given year. The *kernel density curves*, the focus of the reference text, show the distribution of when scientists stopped publishing. Imagine plotting the years when each of those trees died – the kernel density curve would show which years had the most tree deaths.
  • Panel Organization and Kernel Density Curves: The figure has two panels, one for each cohort (2000 on top, 2010 on bottom). Each panel shows all three curve types, with separate lines for men and women (labeled Mw and Mm). The kernel density curves (Panel D) are smooth, continuous curves showing the distribution of when scientists ceased publishing. The x-axis represents the years since the cohort's start year, and the y-axis represents the density, showing the relative concentration of dropouts at different time points. The higher the curve, the more scientists stopped publishing in that year. The caption also provides the sample size (N) for each cohort.
  • Curve-Fitting Details: The caption includes technical details: survival regression uses exponential distribution fitting, while hazard rate and kernel density curves are generated using B-splines smoothing. B-splines are a way to create smooth curves from data points, and the "smoothing parameter" (10k for hazard rate) and "bandwidth" (2 for kernel density) control how smooth the curves are. The kernel density uses a Gaussian (bell-shaped) curve as a component for smoothing.
Scientific Validity
  • Kernel Density Estimation: Bandwidth and Interpretation: Using kernel density estimation is appropriate for visualizing the distribution of dropout times. However, the choice of bandwidth (2) should be justified and its impact on the results discussed. Different bandwidths can significantly alter the shape of the density curve. Sensitivity analysis using different bandwidths would demonstrate the robustness of the observed patterns. Also, clarify the specific kernel function used (presumably Gaussian, as mentioned in the caption). While the reference text notes the creation of a smoothed curve, it doesn't elaborate on the interpretation of this curve in relation to the research questions. What insights do the peaks and shape of the curves provide regarding attrition patterns and temporal changes?
  • Gender Comparison and Statistical Analysis: While the figure shows overall dropout distributions, it doesn't directly address the gender differences, a central aspect of the study. Separating the kernel density curves by gender would enable visual comparison of dropout patterns between men and women, enhancing the figure's scientific value. Furthermore, statistical tests could be applied to compare the distributions and assess the significance of any observed differences.
  • Rationale for Combined Curve Types: The rationale for including the survival regression and hazard rate curves in the same figure as kernel density curves needs further clarification. While related, these curves represent different aspects of attrition. If their purpose is to support the kernel density analysis, explain how they contribute to the overall interpretation. If they serve a separate analytical purpose, consider presenting them in separate figures to avoid confusion and enhance clarity.
Communication
  • Clarity and Quantification of Temporal Changes: The figure successfully uses kernel density estimation to visualize the distribution of when scientists ceased publishing. The smoothed curves provide a clear picture of the dropout patterns for both cohorts, revealing a higher concentration of dropouts in the early career stages. However, the figure would benefit from additional annotations or insets that quantify the differences between the 2000 and 2010 cohorts. For example, noting the shift in peak dropout year or the relative proportions of early-career dropouts would enhance the communication of temporal changes.
  • Integration with Gender Analysis: While the kernel density plots effectively show the distribution of dropouts, they lack direct connection to the study's central theme of gender differences. The figure would be more informative if it included separate density curves for men and women, allowing for visual comparison of dropout patterns between genders. This would enhance the figure's contribution to the overall narrative and provide a more nuanced understanding of attrition dynamics.
  • Accessibility and Conciseness: As in Figure 3, the caption's technical details about smoothing methods might overwhelm readers unfamiliar with these concepts. Consider moving these details to a separate methods section or providing a concise, intuitive explanation in the figure legend. Focus the caption on the key message: the distribution of dropouts over time and its implications for the study.
Supplementary Figure 1: 2000 (N=1,775) and 2010 (N=2,361) cohorts of...
Full Caption

Supplementary Figure 1: 2000 (N=1,775) and 2010 (N=2,361) cohorts of scientists, IMMU, immunology and microbiology. Survival regression curve (exponential distribution fitting of Kaplan-Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve)

Figure/Table Image (Page 23)
Supplementary Figure 1: 2000 (N=1,775) and 2010 (N=2,361) cohorts of scientists, IMMU, immunology and microbiology. Survival regression curve (exponential distribution fitting of Kaplan-Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve)
First Reference in Text
No explicit numbered reference found
Description
  • Types of Curves: This supplementary figure presents three types of curves related to scientist attrition in the IMMU (immunology and microbiology) discipline, for two cohorts (2000 and 2010). The first type, *survival regression curves*, are smoothed Kaplan-Meier curves. They show the probability of a scientist continuing to publish over time. Imagine you're tracking how many of 100 planted flowers are still blooming each year. The survival curve shows the percentage still blooming. The second type, *hazard rate curves*, represents the instantaneous risk of ceasing publication at any given time. Think of this as the chance a flower will wilt on any given day. The third type, *kernel density curves*, illustrate the distribution of when scientists stopped publishing. If you plotted the day each flower wilted, the kernel density curve would show which days had the most wilting flowers. The x-axis represents years since the start of the cohort, and the y-axis represents either the proportion remaining in science (survival curves), the hazard rate, or the density.
  • Panel Organization: The figure has two panels: top for the 2000 cohort and bottom for the 2010 cohort. Each panel shows all three curve types, with separate lines for men and women (labeled Mm and Mw). This allows comparison of attrition patterns between genders and across the two cohorts. The sample size (N) for each cohort is given in the caption.
  • Curve-Fitting Methods: The caption includes technical details about curve-fitting methods. Survival regression curves use exponential distribution fitting. Hazard rate and kernel density curves use B-splines smoothing, which creates smooth curves from data. The "smoothing parameter" (10k for hazard rate) and "bandwidth" (2 for kernel density) control the smoothness. Kernel density uses a Gaussian (bell curve) component for its smoothing.
Scientific Validity
  • Justification and Interpretation of Curve Types: While using multiple curve types provides a comprehensive view, ensure their inclusion is justified and their interpretation contributes meaningfully to the research questions. Specifically, discuss the insights gained from each curve type (survival regression, hazard rate, and kernel density) and how they complement each other in understanding attrition dynamics. For example, how does the shape of the kernel density curve inform the interpretation of the hazard rate or survival curves?
  • Curve-Fitting Choices and Sensitivity Analysis: The caption mentions the use of exponential distribution for survival regression. Justify this choice and assess the goodness-of-fit to the data. Explore alternative distributions or non-parametric methods if the exponential distribution is not a good fit. Similarly, justify the choice of smoothing parameters for the B-splines and assess the sensitivity of the results to these choices.
  • Comparison with Other Disciplines and Statistical Analysis: As a supplementary figure, it's crucial to link this figure explicitly to the main paper's findings. Compare the attrition patterns in the IMMU discipline to those observed in other disciplines (shown in other figures). Discuss whether these patterns are consistent with the overall trends and how they contribute to the study's broader conclusions. Quantify key differences and perform statistical tests to assess their significance.
Communication
  • Clarity and Accessibility: The figure is well-organized, with clear labels and distinct lines for each cohort and gender. The inclusion of three different curve types (survival regression, hazard rate, and kernel density) provides a comprehensive view of the attrition patterns. However, the caption is quite dense with technical terminology. While appropriate for supplementary materials, consider adding a brief explanation of the curve types in the figure legend itself to improve readability for a broader audience. Specifically, explain what each curve type represents and how it should be interpreted.
  • Connection to Main Findings: Since this figure is supplementary, it's crucial to explicitly link it back to the main findings of the paper. Explain how the attrition patterns observed in the IMMU discipline compare to those in other disciplines and whether they support or contradict the overall trends discussed in the main text. This will provide context and justify the inclusion of this supplementary figure.
  • Data Highlighting: The visual presentation could be enhanced by adding specific data points or annotations to highlight key findings. For instance, mark the retention rates at 5, 10, and 15 years on the survival curves, or indicate the peak hazard rate on the corresponding curves. This would make the figure more informative and easier to interpret.
Supplementary Figure 2: 2000 (N=2,533) and 2010 (N=4,513) cohorts of...
Full Caption

Supplementary Figure 2: 2000 (N=2,533) and 2010 (N=4,513) cohorts of scientists, NEURO, neuroscience. Survival regression curve (exponential distribution fitting of Kaplan–Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve)

Figure/Table Image (Page 23)
Supplementary Figure 2: 2000 (N=2,533) and 2010 (N=4,513) cohorts of scientists, NEURO, neuroscience. Survival regression curve (exponential distribution fitting of Kaplan–Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve)
First Reference in Text
No explicit numbered reference found
Description
  • Types of Curves: This supplementary figure presents three types of curves to analyze scientist attrition in the NEURO (neuroscience) discipline, for the 2000 and 2010 cohorts. First, *survival regression curves* (smoothed Kaplan-Meier curves) show the probability of remaining in science (continuing to publish) over time. Imagine tracking 100 newly planted saplings and charting the percentage that survive each year. That's a survival curve. Second, *hazard rate curves* show the instantaneous risk of ceasing publication at any given time. Think of it as the probability of a sapling dying in any given year. Third, *kernel density curves* show the distribution of when scientists stopped publishing. If you plotted the year each sapling died, the kernel density curve would show which years had the most sapling deaths. The x-axis represents years since the cohort's start year, while the y-axis represents the proportion remaining (survival curves), hazard rate, or density.
  • Figure Organization: The figure is organized into two panels: the top for the 2000 cohort and the bottom for the 2010 cohort. Each panel displays all three curve types, with separate lines for men and women (labeled Mm and Mw). This allows for comparison between genders and across the two cohorts. The caption also notes the sample size (N) for each cohort.
  • Curve-Fitting Methods: The caption provides technical details about the curve-fitting methods. Survival regression curves use exponential distribution fitting. Hazard rate and kernel density curves are generated using B-splines smoothing, a method for creating smooth curves from data. The smoothing parameter (10k for hazard rate) and bandwidth (2 for kernel density) control how smooth the curves appear. The kernel density estimation uses a Gaussian (bell-shaped) curve as a component for smoothing.
Scientific Validity
  • Research Question and Interpretation: While the figure presents a comprehensive analysis using three curve types, the scientific validity would be enhanced by explicitly stating the research question or hypothesis being addressed by this supplementary figure. How do the observed patterns in the NEURO discipline contribute to the overall research goals? What specific insights are gained from each curve type, and how do they inform the interpretation of the main findings?
  • Curve-Fitting Choices and Justification: The use of exponential distribution for survival regression should be justified, and goodness-of-fit tests should be reported. Explore alternative distributions or non-parametric approaches if the exponential distribution doesn't adequately fit the data. Similarly, justify the chosen smoothing parameters for the B-splines and assess the sensitivity of the results to these parameters. This will strengthen the robustness of the analysis.
  • Interdisciplinary Comparisons and Statistical Analysis: Compare the attrition patterns in NEURO with those observed in other disciplines (presented in other figures). This will contextualize the findings and contribute to a more holistic understanding of attrition across different scientific fields. Quantify key differences between disciplines and conduct statistical tests to determine the significance of these differences. This comparative analysis is essential for drawing robust conclusions about the factors influencing attrition.
Communication
  • Context and Connection to Main Text: The figure provides a detailed view of attrition patterns in the NEURO discipline using three different curve types, allowing for a nuanced understanding of the trends. However, as a supplementary figure, its connection to the main paper's narrative needs to be strengthened. Explain how the trends observed in NEURO relate to the broader patterns discussed in the main text and whether they support the overall conclusions. Consider adding a concise summary of the key takeaways from this figure to the main text to guide the reader.
  • Visual Enhancements: While the visual presentation is generally clear, consider enhancing it by highlighting specific data points or features. For instance, mark the retention rates at key time points on the survival curves or indicate the peak hazard rate. Adding such visual cues can improve readability and draw attention to the most relevant information.
  • Technical Terminology: The caption contains technical details about smoothing parameters and curve-fitting methods. While appropriate for supplementary materials, consider briefly explaining these terms in the figure legend or providing a reference to a more detailed explanation in the methods section. This will make the figure more accessible to readers unfamiliar with these specific techniques.
Table 1: The sample, four disciplines, two cohorts of scientists (2000 and...
Full Caption

Table 1: The sample, four disciplines, two cohorts of scientists (2000 and 2010) by gender.

Figure/Table Image (Page 6)