This study examines biologist attrition in 38 OECD countries using Scopus publication data from 2000-2022, focusing on two cohorts (starting in 2000 and 2010). Kaplan-Meier survival analysis revealed consistently higher attrition rates for women, particularly in BIO and NEURO, with the gender gap widening over time in the 2000 cohort. The 2010 cohort showed higher overall attrition but a smaller gender gap. Supplementary analyses, including survival regression and hazard rate curves, provided further insights into attrition dynamics. A key limitation is the definition of "leaving science" as ceasing publication, potentially overestimating attrition, especially for women.
The study's reliance on publication data to define "leaving science" presents a significant limitation. While ceasing publication could indicate leaving academia, it doesn't account for career transitions within academia (e.g., teaching-focused roles), movement to industry, or temporary career breaks. This operational definition likely overestimates attrition, especially for women who might be more prone to transitioning to non-publishing roles. Therefore, while the observed gender disparities in publication cessation are important, they don't definitively prove higher attrition rates for women from science itself.
Despite this limitation, the study offers valuable insights into publication patterns and potential indicators of attrition. The large-scale, multi-national dataset provides a broad overview of trends across disciplines and over time. The finding that women in gender-balanced biological fields have lower publication persistence than men, while men and women in male-dominated fields show similar patterns, raises important questions about the influence of disciplinary culture and structural factors on career trajectories. The interactive map, while not fully integrated into the paper, offers a valuable tool for exploring country-specific variations.
Future research should explore alternative metrics for defining and measuring "leaving science" that go beyond publication records. Qualitative studies investigating the reasons behind publication cessation, including career transitions, family responsibilities, and workplace experiences, would provide a more nuanced understanding of the observed patterns. Analyzing career paths beyond academia, such as industry roles or government positions, would offer a more complete picture of scientists' career trajectories. Furthermore, investigating the specific factors contributing to the higher retention rates in AGRI, particularly for women, could reveal valuable insights for promoting career longevity in other scientific fields.
The study's methodological limitations, particularly the narrow definition of "leaving science," restrict the conclusions that can be drawn about actual attrition rates. While the findings suggest a concerning gender disparity in publication persistence, further research using more comprehensive measures of career trajectories is needed to confirm whether women are truly leaving science at higher rates than men. The lack of rigorous statistical comparisons between survival curves and the limited integration of supplementary analyses further weaken the study's conclusions. Addressing these limitations would significantly strengthen the study's contribution to understanding gender disparities in STEMM.
The abstract effectively summarizes the study's core components: the focus on biologist attrition in OECD countries, the use of Scopus publication data, the longitudinal cohort-based methodology, and the key finding of significant gender disparities in leaving science.
This is a high-impact improvement that would strengthen the abstract's clarity and context by explicitly stating the time frame of the studied cohorts. While the abstract mentions "the past two decades," directly specifying the start and end years (2000-2022) would provide readers with immediate clarity regarding the study's temporal scope. This explicit time frame would also enhance the reproducibility of the research by enabling other researchers to easily identify and access the relevant data. Adding this detail would improve the abstract's precision and ensure readers can quickly grasp the study's temporal boundaries, enhancing its overall clarity and impact. Specifying the cohort years (2000 and 2010) in the abstract is essential for clarity and reproducibility.
Implementation: Add "(2000-2022)" after "past two decades" in the first sentence. Also, explicitly mention the 2000 and 2010 cohorts.
This is a medium-impact suggestion to strengthen the abstract's engagement by briefly mentioning the specific survival analysis techniques used beyond just naming "Kaplan-Meier." While the abstract mentions Kaplan-Meier curves, briefly noting the use of survival regression curves and hazard rate curves would provide a more complete picture of the methodological approach. This addition would not only enhance the abstract's informativeness but also signal the study's methodological rigor to readers familiar with survival analysis. By providing this extra detail, the abstract would better convey the depth of the analysis and attract readers interested in the specific techniques employed. Briefly mentioning these additional techniques would strengthen the abstract's methodological transparency.
Implementation: Add "survival regression, and hazard rate curves" after "Kaplan-Meier survival analysis" in the abstract.
Clearly defines the scope of the study, focusing on biologist attrition in OECD countries within the past two decades. This focus provides a manageable and relevant scope for the research.
Explains the straightforward conceptual approach of defining career start and end based on first and last publications. This clear definition provides a concrete and measurable basis for the analysis.
Specifies the data source as publication metadata from Scopus, indicating the use of a large-scale, established database. This strengthens the study's credibility and potential for generalizability.
This is a high-impact improvement that would enhance the Introduction's context and engagement by explicitly stating the research question or hypothesis guiding the study. While the Introduction defines the scope and approach, directly stating the research question would provide readers with a clear understanding of the study's central aim. Elaborating on the specific research question would strengthen the paper by focusing the reader's attention on the core issue being investigated. This would also help to frame the subsequent sections and provide a roadmap for the reader to follow. Ultimately, stating the research question upfront would enhance the Introduction's clarity and purpose by providing a clear direction for the research. For example, the research question could be: "What are the gender disparities in attrition rates among biologists in OECD countries, and how have these disparities changed over the past two decades?"
Implementation: Add a clear statement of the research question or hypothesis at the end of the introductory paragraph. For example, "This study investigates the following research question: [insert research question here]."
This is a medium-impact improvement that would enhance the Introduction's flow and coherence by providing a more structured overview of the section's content. While the Introduction covers essential information, organizing it into distinct subsections with descriptive headings would improve readability and guide the reader through the key topics. Structuring the Introduction with clear subsections would strengthen the paper by improving the logical flow of information and making it easier for the reader to grasp the key concepts and arguments. This would also enhance the overall presentation of the research. Ultimately, adding subsections with descriptive headings would enhance the Introduction's organization and clarity by providing a more structured and accessible reading experience.
Implementation: Divide the Introduction into subsections with descriptive headings, such as "Background," "Research Question/Hypothesis," "Data and Methods," and "Significance of the Study." Briefly describe the content of each subsection under the corresponding heading.
This is a medium-impact improvement that would strengthen the Introduction's rigor and context by providing a more critical discussion of the limitations of using publication data to define "leaving science." While the Introduction acknowledges the limitations of bibliometric data, explicitly addressing the potential biases and alternative interpretations related to publication behavior would enhance the study's methodological transparency. Elaborating on the limitations of using publication data would strengthen the paper by acknowledging potential biases and alternative interpretations. This would demonstrate the authors' awareness of the complexities of the data and enhance the study's credibility. This discussion could also suggest future research directions to address these limitations. Ultimately, critically discussing the limitations of the chosen definition would significantly improve the study's methodological rigor by ensuring a more nuanced and comprehensive interpretation of the findings.
Implementation: Add a paragraph discussing the limitations of using publication data to define "leaving science." Address potential biases, such as excluding non-publishing scientists or those who switch fields but continue publishing. Discuss alternative interpretations of ceasing publication, such as career breaks or shifts in research focus. Suggest future research avenues to address these limitations.
The section clearly outlines the data source (Scopus), cohort selection criteria (publication years, disciplines, minimum publications, OECD affiliation, and defined gender), and the definition of leaving science (ceasing publication). This transparency allows for scrutiny and potential replication of the study.
The rationale for using Scopus is well-articulated, emphasizing its suitability for global, micro-level analysis due to unique author IDs and comprehensive publication metadata. This justification strengthens the study's methodological foundation.
This is a high-impact improvement that would enhance the rigor and reproducibility of the study. The Data and Methods section should explicitly detail the specific procedures for data cleaning and preprocessing. This is crucial for allowing others to understand and potentially replicate the analysis. Providing detailed steps on handling missing data, resolving ambiguous affiliations, and addressing potential biases in gender detection would strengthen the study's transparency and credibility. This detailed explanation would enable other researchers to evaluate the robustness of the findings and conduct sensitivity analyses. Ultimately, a clear description of data cleaning and preprocessing steps is essential for ensuring the study's reproducibility and contributing to methodological best practices in the field.
Implementation: Add a subsection titled "Data Cleaning and Preprocessing." Detail the specific steps taken to clean and prepare the Scopus data for analysis. This should include: 1. Handling missing data (e.g., imputation methods or exclusion criteria). 2. Resolving ambiguous affiliations (e.g., using institutional databases or manual verification). 3. Addressing potential biases in gender detection (e.g., manual verification of ambiguous cases or sensitivity analyses using different gender detection tools). 4. Any other data cleaning or transformation steps (e.g., data normalization, outlier removal).
This is a medium-impact improvement that would enhance the methodological rigor of the study. The Data and Methods section should provide more detail on the specific survival analysis techniques employed, beyond just mentioning "survival analysis." Specifically, clarifying the types of survival regression models used (e.g., Cox proportional hazards, parametric models) and the rationale for their selection would strengthen the study's methodological transparency. This would also provide readers with a deeper understanding of the analytical approach and its assumptions. Ultimately, providing more detail on the specific survival analysis methods would enhance the study's scientific rigor and allow for a more informed evaluation of the findings.
Implementation: Expand the explanation of the survival analysis techniques. Specify the type of survival regression model used (e.g., Cox proportional hazards, parametric models). Provide the rationale for choosing the specific model. Explain any assumptions made about the data or the model. Include details about the estimation procedures and software used.
The Results section effectively uses Kaplan-Meier survival curves (Figure 1 and 2) to visually represent gender differences in attrition rates across the four biological disciplines and two cohorts. These visualizations provide a clear and accessible overview of the main findings, highlighting the higher attrition rates for women compared to men.
The Results section provides detailed data in tables (Table 1, Table 2, Table 3) to support the findings presented in the Kaplan-Meier curves. This tabular data allows for a more granular examination of attrition rates by gender, discipline, and year, enhancing the transparency and rigor of the analysis.
This is a high-impact improvement that would enhance the statistical rigor and interpretability of the findings. While the Results section presents Kaplan-Meier curves and mentions survival regression, it lacks explicit reporting of statistical tests to determine the significance of the observed gender differences in attrition. Including p-values or confidence intervals associated with the survival curves and regression results would strengthen the study by providing a quantitative measure of the evidence supporting the observed gender disparities. This would allow readers to assess the statistical significance of the findings and draw more robust conclusions. Ultimately, adding statistical tests and reporting p-values or confidence intervals would significantly enhance the study's scientific rigor and the reliability of its conclusions.
Implementation: Perform appropriate statistical tests (e.g., log-rank test) to compare survival curves between genders for each discipline and cohort. Report the p-values or confidence intervals associated with these tests in the figure captions or in the main text. If survival regression models were used, report the coefficients, standard errors, and p-values for the gender variable in each model.
This is a medium-impact improvement that would enhance the clarity and completeness of the Results section. While the section mentions the use of survival regression, hazard rate curves, and kernel density curves, these analyses are relegated to the supplementary materials. Integrating these analyses into the main Results section, with corresponding figures and interpretations, would strengthen the paper by providing a more comprehensive picture of the attrition patterns. This would allow readers to better understand the temporal dynamics of attrition and the distribution of leaving times. Ultimately, including these analyses in the main Results section would enhance the paper's transparency and provide a more holistic view of the findings.
Implementation: Incorporate the survival regression, hazard rate curves, and kernel density curve analyses into the main Results section. Include the corresponding figures (Figure 3 and 4, and Supplementary Figures 1 and 2) and provide clear interpretations of these results in the text. Discuss how these analyses complement the Kaplan-Meier survival curves and provide additional insights into the attrition patterns.
Figure 1. 2000 cohort of scientists, Kaplan-Meier survival curve, AGRI, agricultural and biological sciences (N=7,970), BIO, biochemistry, genetics, and molecular biology (N=22,692), IMMU, immunology and microbiology (N=1,775), and NEURO, neuroscience (N=2,533). The four disciplines combined (N=34,970).
Figure 2. 2010 cohort of scientists, Kaplan–Meier survival curve, AGRI, agricultural and biological sciences (N=12,792), BIO, biochemistry, genetics, and molecular biology (N=31,542), IMMU, immunology and microbiology (N=2,361), and NEURO, neuroscience (N=4,513). The four disciplines combined (N=51,208).
Figure 3. 2000 (top panel, N=7,970) and 2010 (bottom panel, N=12,792) cohorts of scientists, AGRI, agricultural and biological sciences. Survival regression curve (exponential distribution fitting of Kaplan-Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve).
Figure 4: 2000 (top panel, N=22,692) and 2010 (bottom panel, N=31,542) cohorts of scientists, BIO, biochemistry, genetics, and molecular biology. Survival regression curve (exponential distribution fitting of Kaplan–Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve)
Supplementary Figure 1: 2000 (N=1,775) and 2010 (N=2,361) cohorts of scientists, IMMU, immunology and microbiology. Survival regression curve (exponential distribution fitting of Kaplan-Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve)
Supplementary Figure 2: 2000 (N=2,533) and 2010 (N=4,513) cohorts of scientists, NEURO, neuroscience. Survival regression curve (exponential distribution fitting of Kaplan–Meier curve), hazard rate curve (B-splines smoothing method, smoothing parameter 10k), and kernel density curve (B-splines smoothing method, bandwidth 2, component per point based on Gaussian curve)
Table 1: The sample, four disciplines, two cohorts of scientists (2000 and 2010) by gender.