This paper presents a literature review focused on the application of machine learning (ML) algorithms for predicting hospital length of stay (LoS), a critical factor for efficient hospital management and patient care. The primary objective is to identify the most effective ML algorithms for building LoS predictive models by analyzing existing research concerning model types, performance metrics, dataset characteristics, and ethical considerations.
Methodologically, the review adheres to Kitchenham's systematic protocol. The authors conducted a bibliographic search primarily using Google Scholar, which initially yielded 604 articles. Through a multi-stage filtering process—based on language (English), publication date (post-2020), relevance of title and abstract to ML for LoS prediction, and study availability—this pool was narrowed down to 12 core research articles for in-depth analysis. These selected papers were then scrutinized for the ML algorithms employed, datasets utilized (e.g., MIMIC-II/III, institutional EHRs), data preprocessing techniques, reported performance metrics (like Mean Absolute Error (MAE), R-squared (R2), Accuracy), and discussions on challenges and ethical implications.
Key findings from the reviewed literature indicate that several ML algorithms show promise for LoS prediction. Notably, Neural Networks (NNs) demonstrated high accuracy in some studies (e.g., up to 94.74% accuracy reported in study [29] after optimization) and XGBoost also showed strong performance (e.g., an R2 score of 0.89 in study [38]). However, the review emphasizes that the performance of these algorithms is highly contingent on factors such as the quality and characteristics of the dataset, the extent of data preprocessing (including feature selection and handling missing values), and meticulous hyperparameter tuning. The importance of robust data management frameworks and adherence to ethical principles—particularly patient privacy (e.g., HIPAA, GDPR compliance), data security, and the mitigation of algorithmic bias—was also a recurrent theme.
The paper concludes that while Neural Networks are a popular and often high-performing choice, no single ML algorithm is universally optimal for all LoS prediction scenarios. The selection of the most suitable algorithm ultimately depends on the specific context, including the nature of the available data, computational resources, the need for model interpretability, and the specific goals of the healthcare application. The review highlights the ongoing need for careful validation and consideration of practical implementation challenges when deploying ML models in clinical settings.
This literature review provides a valuable and comprehensive synthesis of the application of machine learning (ML) algorithms for predicting hospital length of stay (LoS). Its systematic approach, following Kitchenham's methodology, lends credibility to its overview of common algorithms, performance metrics, and prevailing challenges in the field. The paper effectively highlights the potential of advanced models like Neural Networks and XGBoost, while judiciously emphasizing that their performance is highly context-dependent and that practical implementation faces hurdles such as data quality, model interpretability, computational costs, and ethical considerations.
The review's main strength lies in its balanced perspective, discussing not only the technical aspects of ML models but also the critical importance of ethical frameworks, patient privacy, and bias mitigation. It correctly concludes that there is no universally superior algorithm for LoS prediction; rather, the optimal choice depends on specific dataset characteristics, available resources, and the clinical context. This nuanced conclusion is crucial for guiding healthcare practitioners and researchers in selecting and deploying ML solutions responsibly and effectively.
While the review successfully maps the current landscape, it also implicitly underscores the limitations within the primary research itself, such as the heterogeneity in datasets, methodologies, and reporting standards across studies. This makes definitive cross-study comparisons of algorithm effectiveness challenging. The review navigates this by focusing on trends and general capabilities rather than making absolute claims of algorithmic supremacy. The study design of this paper—a systematic literature review—is appropriate for its objective of summarizing existing knowledge and identifying research gaps. It reliably contributes an understanding of the current state-of-the-art, common practices, and challenges in ML for LoS prediction.
Ultimately, the paper serves as a useful guide for understanding the complexities involved in leveraging ML for healthcare efficiency. It underscores that future progress requires not only algorithmic advancements but also a focus on robust data governance, ethical oversight, and strategies for seamless clinical integration. The critical unanswered questions revolve around how to best bridge the gap from predictive accuracy in research settings to tangible, equitable improvements in real-world patient care and hospital management, particularly concerning the interpretability and trustworthiness of complex models.
The abstract effectively communicates the paper's main objective and the importance of the research problem. It clearly states the goal of identifying effective ML algorithms for LoS prediction and links this to tangible benefits in hospital management and patient care, setting a clear context for the review.
The abstract outlines a broad scope for the literature review, indicating that it will cover not only the identification and analysis of ML algorithms and metrics but also a discussion of challenges, limitations, data quality, and ethical considerations. This holistic approach suggests a thorough and well-rounded treatment of the topic.
The abstract clearly states the methodological approach: a bibliographic search and subsequent analysis of the existing literature based on model types and metrics. This transparency provides readers with an immediate understanding of how the research was conducted and the basis for its findings.
While the abstract clearly outlines the paper's scope, explicitly mentioning the primary intended audience (e.g., healthcare administrators, data scientists in healthcare, clinical researchers) or specific contexts (e.g., general hospitals vs. specialized units, specific patient populations) could enhance its focus and immediate applicability. This addition would have a medium impact by helping readers more quickly ascertain the paper's direct relevance to their work or field of interest. Including this in the abstract is appropriate as it aids in framing the paper's contribution effectively from the outset.
Implementation: Incorporate a brief phrase identifying the target audience or context. For example, after "...contributing to healthcare efficiency and patient care," consider adding a clause like, "offering crucial insights for hospital managers and data analysts seeking to implement predictive solutions." If the review has a more specific focus, such as LoS in intensive care units, this could also be briefly mentioned.
Although abstracts for literature reviews primarily summarize scope and methodology, offering a very brief, high-level hint at any overarching trends or particularly promising categories of algorithms identified (if feasible without overstating or preempting the main text) could significantly increase reader engagement and the perceived value of the review. This would be a high-impact addition, as it provides a tantalizing preview of the review's core takeaways, encouraging further reading. Such a hint is suitable for an abstract as it concisely conveys the essence of the findings.
Implementation: If the review reveals a dominant trend (e.g., "revealing a growing prominence of deep learning approaches" or "underscoring the consistent efficacy of ensemble methods"), a concise mention could be integrated. For instance, after "...impact on healthcare decision making," a phrase such as, "uncovering key trends in algorithmic performance and application focus" or a more specific note on a class of algorithms could be added, if a clear pattern emerged from the literature.
The introduction clearly establishes the paper's central theme—applying machine learning to predict hospital length of stay—and explicitly states its primary goal: to identify the most effective ML algorithms for this purpose. This directness provides immediate clarity for the reader.
The introduction effectively outlines the structure and content flow of the paper. It informs the reader about the upcoming exploration of ML algorithm types, relevant features, their role in improving healthcare outcomes, and significantly, the discussion of ethical considerations, setting clear expectations.
The section successfully contextualizes the research by highlighting the practical importance and inherent difficulties of LoS prediction. It emphasizes the potential benefits for healthcare systems and patient outcomes, thereby justifying the study's relevance and addressing the multifaceted nature of the problem.
The introduction commendably acknowledges not only the technical challenges in LoS prediction, such as data complexity and resource constraints, but also the crucial ethical dimensions like patient privacy and bias. This balanced perspective indicates a thorough and responsible approach to the topic.
While the main objective is clearly stated, explicitly formulating or at least thematically outlining the specific research questions (RQs) that guide the literature review within the Introduction section itself would sharpen the paper's focus from the outset. This is a high-impact suggestion because RQs provide a clear framework for the review and are conventionally introduced early. Although the RQs are detailed on page 2, integrating their essence into the Introduction would enhance its role as a complete preparatory section.
Implementation: In the final paragraph of the Introduction, consider adding a sentence that previews the guiding research questions. For instance: "This review delves into the intricate world of hospital LoS prediction by addressing key questions concerning the types of ML algorithms utilized, their comparative efficacy, critical predictive features, and the associated challenges and ethical ramifications."
The Introduction mentions it offers a "comprehensive overview," and the paper is a literature review. Specifying the type of literature review (e.g., systematic literature review, scoping review) within the Introduction would immediately inform readers about the methodology's rigor and scope. This is a medium-impact suggestion as it sets precise expectations. Given that Section 4.1 later details a systematic approach (Kitchenham's methodology), signaling this earlier enhances coherence.
Implementation: In the first paragraph, when describing the paper's nature, incorporate the specific type of review. For example: "This paper offers a comprehensive systematic literature review of machine learning (ML) algorithms and their applications..."
While "machine learning" is a common term in applied sciences, providing a very brief, contextual definition or explanation of ML within the Introduction could enhance accessibility for a potentially broader segment of the journal's readership or those less specialized in computational methods. This is a low-to-medium impact suggestion aimed at ensuring a common understanding from the start. It's a minor addition that can improve clarity without significantly altering the text.
Implementation: After the first mention of "machine learning (ML) algorithms," consider adding a concise explanatory phrase. For example: "...overview of machine learning (ML) algorithms—computational techniques that enable systems to learn from and make predictions based on data—and their applications..."
The section effectively introduces Machine Learning (ML) by categorizing algorithms (e.g., supervised, unsupervised) and outlining a structured five-step application workflow (data gathering, preparation, algorithm choice, training, model improvement). This provides readers with a solid conceptual foundation and a clear understanding of the typical ML process before discussing specific healthcare applications.
The paper effectively illustrates the breadth of ML applications in healthcare, covering diagnosis, personalized medicine, treatment personalization, and outbreak prediction. It substantiates the transformative potential of ML with a compelling clinical trial example on sepsis, which demonstrated significant reductions in hospital length of stay and in-hospital mortality, thereby highlighting tangible benefits to patient outcomes.
The section provides a comprehensive and realistic acknowledgment of the challenges associated with applying ML in healthcare. It enumerates key difficulties including inherent bias in ML, the lack of quality data, the need for accurate data annotation, the necessity of hyperparameter tuning, the requirement for large datasets, and the imperative for ethical and responsible validation of developed algorithms.
The text appropriately highlights specific algorithms, Random Forest and Support Vector Machine, detailing their particular strengths and suitability for complex tasks like predicting hospital Length of Stay (LoS), especially with multivariate time-series data. The mention of a comparative study supporting their efficacy further grounds these claims.
The statement that Random Forest's effectiveness is due to its ability to 'capture complex interactions that require intensive preprocessing due to its invariance' is potentially ambiguous and could be misconstrued by readers. Typically, invariance to certain data characteristics (e.g., feature scaling) reduces some preprocessing burdens. Clarifying this relationship—whether invariance allows RF to handle data that would otherwise need extensive preprocessing for other algorithms, or if the complex interactions themselves necessitate preprocessing regardless of RF's invariance—would significantly improve clarity. This is a medium-impact suggestion as it pertains to a nuanced characteristic of a key algorithm discussed for LoS prediction, and clarification belongs in this section where algorithm characteristics are detailed.
Implementation: Rephrase the sentence to explicitly state the nature of the interaction between invariance, complex data, and preprocessing. For example: 'Random Forest is particularly effective because its invariance to certain data transformations enables it to robustly capture complex interactions, even in datasets that might necessitate intensive preprocessing for other types of algorithms.' Alternatively, if the original intent was that the complexity itself drives preprocessing, state: 'Random Forest's ability to model complex interactions is valuable, though the inherent complexity of such data often requires intensive preprocessing, a step for which RF's invariance to certain aspects can be beneficial.'
The term 'hyperparameter tuning' is listed as a challenge without further explanation. While this term is standard for ML specialists, readers from broader scientific disciplines within the journal's scope might not be familiar with it. Adding a brief, parenthetical definition would enhance the accessibility and educational value of this section for a wider audience. This is a low-impact suggestion that improves general comprehension of a listed challenge and fits naturally within the enumeration of challenges.
Implementation: Following the phrase 'Need for hyperparameter tuning,' insert a concise explanation. For instance: 'Need for hyperparameter tuning (i.e., the process of optimizing the algorithm's internal settings before the training phase to achieve best performance).'
The outlined ML application steps include 'Data preparation,' which mentions cleaning, removing nulls, and standardizing data. Explicitly incorporating 'feature engineering'—the process of creating new, more informative input variables from existing data—into this step or the 'Improve the model' step would offer a more complete depiction of the ML pipeline. Feature engineering is often a critical determinant of model performance in healthcare applications. This is a medium-impact suggestion that would strengthen the comprehensiveness of the ML workflow description presented in this section.
Implementation: Amend the 'Data preparation' bullet point to include feature engineering. For example: 'Data preparation: The previously gathered data must be prepared for training the algorithms, and this includes cleaning, removing null values, standardizing the data [9], and often performing feature engineering to derive more predictive inputs.' Alternatively, it could be mentioned under 'Improve the model' as a strategy for enhancement.
The section comprehensively covers the primary ethical domains pertinent to ML in healthcare, including patient privacy, data management frameworks, and fairness/bias mitigation. This breadth ensures that key ethical challenges are introduced and contextualized for the reader.
The discussion on patient privacy is particularly robust, clearly articulating its paramount importance and detailing specific, tangible risks such as data security breaches, re-identification through data aggregation, and the limitations of anonymization techniques. This level of detail effectively highlights the complexities involved in safeguarding sensitive patient data in ML applications.
The subsection on Fairness and Bias Mitigation offers a clear, structured, and actionable framework. By outlining the process as identifying bias, mitigating bias, and evaluating impact, it provides a practical approach for addressing this critical ethical concern in the development and deployment of ML models in healthcare.
The section effectively grounds the discussion of data management in established regulatory standards by referencing key frameworks like HIPAA and GDPR. This connection to real-world legal and ethical guidelines enhances the practical relevance and authority of the considerations presented.
The introductory paragraph of Section 3 explicitly mentions 'informed consent, and data usage' as key ethical issues to be explored. However, these topics are not subsequently developed with the same depth as patient privacy, data management, or fairness/bias mitigation. Given that informed consent is a foundational ethical principle in medicine, and its application to complex ML predictions (e.g., regarding data reuse, model interpretability for patients, consent for evolving algorithms) is particularly nuanced, a more dedicated discussion is warranted. This is a high-impact suggestion as it addresses a critical, yet underdeveloped, ethical dimension within the section. Elaborating on these aspects here would significantly strengthen the paper's ethical analysis.
Implementation: Consider adding a distinct subsection (e.g., 3.4 Informed Consent and Ethical Data Usage, which would require renumbering the current 3.4) or significantly expanding the brief mention of informed consent within subsection 3.1. This expanded discussion should address challenges in obtaining meaningful consent for ML applications, the scope of consent for secondary data use, patients' rights to understand ML-driven decisions, and ethical guidelines for data usage beyond general privacy protections.
While the section adeptly discusses ethical principles and data management frameworks, it could be enhanced by addressing the practical mechanisms for ethical oversight and accountability in the context of ML in healthcare. Discussing the role of bodies like Institutional Review Boards (IRBs) or dedicated AI ethics committees in the review, approval, and ongoing monitoring of ML systems, as well as frameworks for accountability when ethical issues (e.g., biased outcomes, privacy breaches) arise, would add a crucial layer of governance. This is a medium-impact suggestion that would bridge the gap between ethical principles and their operational enforcement, fitting well within the scope of 'Ethical Considerations'.
Implementation: Incorporate a paragraph or a brief new subsection that discusses the necessity of robust ethical governance structures. This could include the adaptation of existing oversight mechanisms (like IRBs) for ML, the potential need for specialized AI ethics review boards, processes for continuous monitoring of deployed ML models for ethical drift or unintended consequences, and clear institutional accountability for the development, deployment, and impact of these technologies.
Subsection 3.4, 'Features to Consider in Hospital Admission and Hospital LoS Predictions,' lists various data features like patient demographics and admission details. While relevant to model development, its inclusion under 'Ethical Considerations' feels somewhat disconnected without an explicit bridge to the section's core ethical themes. Clarifying how the selection and utilization of these specific features can intersect with ethical principles such as fairness, bias, and privacy would improve the section's coherence. This is a medium-impact suggestion that would better integrate subsection 3.4 into the overarching ethical discussion.
Implementation: Add a concise introductory statement to subsection 3.4, or a concluding one to 3.3, that explicitly links feature selection to ethical responsibilities. For example, explain how choices regarding demographic features (e.g., age, gender) must be carefully evaluated to prevent algorithmic bias and discrimination, or how the collection of detailed admission and clinical data must be balanced against data minimization principles to protect patient privacy, ensuring that selected features do not inadvertently perpetuate societal inequities.
The explicit adoption and detailed description of Kitchenham's methodology for the literature review (Section 4.1) provide a strong foundation of rigor and transparency. Clearly outlining the steps from research question definition to data extraction allows readers to understand the systematic process undertaken to identify and select relevant studies, enhancing the review's credibility.
The paper clearly defines its search strategy, including the specific search string used in Google Scholar, the rationale for exclusions (e.g., "COVID-19," "review"), and the multi-stage filtering process (preconditions, inclusion/exclusion criteria). This clarity (Sections 4.1.2-4.1.5, Figure 1) allows for potential replication and demonstrates a focused approach to literature gathering.
Section 4.3 provides a thorough explanation of the key performance metrics (MAE, Accuracy, R2 Score, F1 Score, RMSE) used to evaluate LoS prediction models. Defining each metric and providing its formula ensures that readers understand how model performance is assessed and compared across different studies, which is crucial for interpreting the results of the review.
The "Analysis of Selected Papers" (Section 4.4) systematically summarizes the 12 chosen articles. For each paper, it generally outlines the dataset, methods, and key findings, often referencing Table 2 for dataset overviews. This structured approach helps in understanding the landscape of current research in ML for LoS prediction.
While Kitchenham's methodology is stated as "extensively used in computer science research," briefly justifying why it was specifically chosen for this particular review on ML in healthcare LoS prediction would strengthen the methodological rationale. This is a low-to-medium impact suggestion that adds a layer of deliberate choice to the methodology description. It belongs in Section 4.1 where the methodology is introduced, as it sets the stage for the entire review process.
Implementation: Add a sentence after introducing Kitchenham's methodology, explaining its suitability. For example: "Kitchenham's methodology [26]... was selected due to its structured approach to defining scope, systematically searching literature, and extracting data, which is well-suited for a comprehensive review of ML applications requiring clear process documentation and potential for reproducibility in a rapidly evolving field like healthcare AI."
Section 4.1 mentions "(f) data extraction" as the final step of Kitchenham's methodology. However, the "Materials and Methods" section does not explicitly detail what specific data items were systematically extracted from each of the 12 selected papers to answer the review's research questions (e.g., specific algorithms, dataset characteristics, reported metrics, challenges highlighted). Providing a brief overview of the data extraction form or key data points sought would enhance transparency and methodological rigor. This is a medium-impact suggestion crucial for understanding how the synthesis in the Discussion/Results is derived, and it belongs in Section 4.1 as part of the protocol description.
Implementation: Add a subsection (e.g., 4.1.6 Data Extraction Items) or a paragraph within 4.1 detailing the key information extracted from each paper. For example: "For each of the 12 selected articles, the following data points were extracted: (1) study objectives, (2) dataset source and characteristics (size, features), (3) ML algorithms employed, (4) data preprocessing techniques, (5) key performance metrics reported (e.g., MAE, R2, Accuracy), (6) main findings regarding LoS prediction, and (7) identified limitations or challenges."
The paper states that "COVID-19", "malnutrition", "transfusion", and "review" were removed from the search string "to narrow the search, since the intention was to predict hospital LoS in general." While excluding "COVID-19" (due to its unique impact) and "review" (to focus on primary studies) is clear, the rationale for excluding "malnutrition" and "transfusion" could be briefly elaborated, as these conditions can be significant general factors influencing LoS and might not always indicate an overly narrow subpopulation. Clarification would prevent potential misinterpretation of the search scope. This is a low-impact suggestion for enhanced clarity in Section 4.1.2, where the search string is defined.
Implementation: Expand slightly on the reasoning for these specific exclusions. For example: "...Four words/expression were removed...: “COVID-19” (to avoid focus on pandemic-specific LoS alterations), “malnutrition”, “transfusion” (to prevent an over-concentration of studies on highly specific patient cohorts or interventions rather than broader LoS prediction models), and “review” (to exclude secondary literature from this primary search phase)."
Section 4.2 presents "Most Commonly Used ML Algorithms in Healthcare" (Table 1), which provides a general overview. Section 4.4 then analyzes papers specifically on LoS prediction. The transition and direct relevance of Table 1 to the specific LoS context could be made more explicit within the methods section. It is currently unclear if Table 1 is derived from the 12 selected papers or broader healthcare literature, and how it directly informs or contrasts with the algorithms found to be prominent in the LoS-specific studies. This is a medium-impact suggestion for improving coherence between these subsections and clarifying the methodological flow.
Implementation: Add a sentence at the end of Section 4.2 or the beginning of Section 4.4 to bridge this. If Table 1 is general, state: "While Table 1 provides a broad overview of ML algorithms prevalent in healthcare, the subsequent analysis in Section 4.4 will focus on the specific algorithms and their performance as reported in the 12 studies selected for their direct relevance to hospital LoS prediction." If Table 1 is derived from the selected papers, this should be clearly stated in the introduction to Section 4.2.
The Results section begins by clearly outlining its objective to provide a comparative analysis of ML algorithms from the selected studies, focusing on datasets, methods, metrics, and performance. This establishes clear expectations for the reader regarding the content and purpose of this part of the paper.
The section effectively utilizes Table 3 to present a comprehensive comparison of algorithms and their reported performance metrics from the reviewed studies. This tabular format allows for a concise and accessible overview of diverse results, facilitating easier comparison across different methodologies and studies presented in the literature.
Subsections 5.1 and 5.2 provide succinct textual introductions to the diversity of datasets and the range of ML algorithms employed in the reviewed literature, respectively. These summaries effectively preface the detailed information presented in Table 2 (referenced for datasets) and Table 3 (presented for algorithms and metrics).
While Table 3 effectively presents the comparative metrics, the Results section (specifically 5.2) would be strengthened by a brief narrative summary that highlights key observations or trends directly from this table. For instance, pointing out algorithms that frequently appear with high accuracy or low error rates, or noting the range of performance for certain algorithm types, would provide immediate context and guide the reader's interpretation before they encounter the more in-depth analysis in the Discussion section. This is a medium-impact suggestion that enhances the Results section's role in presenting findings, not just raw data, and aligns with typical conventions for a Results section.
Implementation: After introducing Table 3 in subsection 5.2, add a short paragraph summarizing salient points. For example: "Table 3 reveals a spectrum of performance across the applied algorithms. Notably, neural network approaches, as seen in study [29], achieved high accuracy (94.74%), while XGBoost in study [38] demonstrated a strong R2 score of 0.89. Conversely, some studies, such as [39], did not specify quantitative metrics for certain algorithms, highlighting variability in reporting practices."
Subsection 5.1, "Overview of Datasets," is currently very concise, primarily directing readers to Table 2 (which is located in the Methods section on page 11). To make the Results section more self-contained in presenting its findings related to datasets, a brief textual summary of key dataset characteristics drawn from Table 2 could be included directly within subsection 5.1. This would involve highlighting the range of dataset sizes, common data sources (e.g., MIMIC, EHRs), and the typical number or types of features used in the reviewed studies. This is a medium-impact suggestion that improves the flow and comprehensiveness of the Results section itself, making it easier for readers to grasp dataset context without immediately flipping back to the Methods.
Implementation: Expand subsection 5.1 with a few sentences summarizing key aspects from Table 2. For example: "The studies analyzed utilized diverse datasets, as detailed in Table 2 (page 11). These ranged from datasets with a few hundred patient records (e.g., 792 samples in study [38] using Total Hip Arthroplasty data) to large-scale repositories containing over 350,000 records (study [34] using Unspecified data). Common data sources included Electronic Health Records and specialized clinical databases like MIMIC-II/III, with the number of features varying widely, from 6 features in neurosurgical cases [40] to 46 features in broader EHR data [34]."
Table 3 reports "Not Specified" for accuracy values for algorithms in study [39]. While this may accurately reflect the information available from the source paper, the Results section could explicitly state that these metrics were not provided in the original publication. This clarification, either in the narrative of subsection 5.2 or as a footnote to Table 3, would prevent reader ambiguity about whether the data is missing or if "Not Specified" implies a particular outcome (e.g., zero or not applicable). This is a low-impact suggestion for enhancing clarity and completeness of the presented results within this section.
Implementation: Add a sentence in subsection 5.2 after introducing Table 3, such as: "It should be noted that where 'Not Specified' appears in Table 3 for performance values, this indicates that the corresponding quantitative metric was not reported in the original cited study." Alternatively, add a footnote directly to Table 3: "*'Not Specified' indicates that the metric was not quantitatively reported in the source publication."
The Results section contains interpretative statements about study [39] on page 13, including a self-referential conclusion by the current paper's authors ("In the Results Section, we concluded...") and a claim ("In article [39], although the values are specified, the author concludes that the support vector machine was the most accurate model"). This latter claim directly contradicts Table 3, where performance metrics for study [39] are listed as "Not Specified." Such interpretations, authorial conclusions, and discussions of individual study findings typically belong in the Discussion section. More importantly, the factual inconsistency regarding whether performance values for study [39] were specified needs to be resolved for the paper's credibility. This is a high-impact suggestion as it addresses both the structural appropriateness of a Results section and a significant internal inconsistency.
Implementation: 1. Remove the two paragraphs from page 13: "In the Results Section, we concluded that the support vector machine model..." and "In article [39], although the values are specified...". 2. Verify the original study [39] (Byraboina & Yohan, 2022, ZKG Int.). If performance values are specified in that source, Table 3 of this review must be updated accordingly. If they are not specified in study [39], then the statement on page 13 of this review is erroneous and should be corrected or removed when discussing study [39] in the Discussion section (Section 6). 3. Relocate the substantive (and verified) discussion of study [39]'s findings concerning SVM performance to the Discussion section.
The discussion effectively categorizes and summarizes the performance of various ML algorithms (Regression, Neural Networks, Ensembles, etc.) based on the reviewed literature, providing a clear comparative landscape for understanding their relative effectiveness and efficiency in LoS prediction.
The section moves beyond simple performance reporting to critically analyze the practical limitations of top-performing algorithms like Neural Networks and XGBoost. It thoughtfully discusses context-dependent challenges such as data preprocessing demands, parameter tuning sensitivity, computational costs, overfitting risks on small datasets, and the interpretability issues posed by "black-box" models, which are crucial considerations for real-world healthcare applications.
The discussion systematically addresses each of the four research questions posed earlier in the paper (page 2). It synthesizes the findings from the literature to provide clear, itemized answers regarding key factors influencing LoS, types of ML algorithms used, their comparative performance, and the benefits/limitations of ML models in this context.
A significant strength is the consistent emphasis that the optimal algorithm choice is not universal. The discussion highlights that selection depends heavily on specific dataset characteristics, the quality of data, the extent of preprocessing and tuning, available computational resources, technical expertise, and the particular clinical application context, thereby promoting a nuanced understanding of ML model deployment.
The critical analysis effectively highlights challenges like model complexity and interpretability. Expanding this to discuss actionable strategies or specific research avenues needed to bridge the gap between high-performing ML models and their practical, routine adoption in clinical settings would be a high-impact addition. This involves not just technical solutions (e.g., XAI advancements) but also organizational, ethical, and workflow integration considerations, which are pertinent to a Discussion section aiming to translate research findings into real-world impact and guide future work.
Implementation: Add a paragraph discussing potential pathways to overcome implementation hurdles. This could include: (1) advocating for research into more inherently interpretable yet powerful models suitable for clinical decision support, (2) suggesting the development of standardized evaluation frameworks that include clinical utility and ease-of-integration metrics alongside technical performance, (3) emphasizing pilot studies focusing on integration into hospital IT systems and clinical workflows, and (4) highlighting the role of multidisciplinary teams (clinicians, data scientists, IT specialists, ethicists) in co-designing and deploying these systems responsibly.
The statement 'the analysis determined that the neural network algorithm is the best-performing algorithm in this study on the datasets tested' on page 17 could be slightly ambiguous. Clarifying whether 'this study' refers to the current literature review's overall synthesis of findings across multiple papers, or if it's emphasizing a conclusion drawn from a particularly strong primary study cited within the review, would enhance precision. This is a low-impact suggestion for improved clarity in the Discussion's concluding remarks.
Implementation: Rephrase for enhanced clarity. For example: 'As a result, our synthesis of the analyzed literature suggests that the neural network algorithm frequently emerged as a top-performing approach across the various datasets and contexts examined in the reviewed studies.' Or, if highlighting a specific study's conclusion that strongly supports this, ensure that study is explicitly tied to the assertion in that sentence.
The Discussion section introduces Table 4 ('Best-performing algorithms') when beginning to rank methods. While the table is useful, its relationship to the initial, more detailed breakdown of algorithm categories (Regression Models, Neural Networks, etc.) on page 15 could be signposted earlier. A brief mention of Table 4 as a summary of top performers when these categories are first introduced might help orient the reader more effectively. This is a low-impact suggestion focused on improving the structural flow and integration of tabular information within the narrative of the Discussion.
Implementation: Consider a brief introductory reference to Table 4 when starting the comparative overview. For example, the first paragraph on page 15 could state: 'This section presents a comprehensive comparison of various algorithms and metrics... with a summary of standout performers highlighted in Table 4. Performance analysis is a crucial aspect...'. This would prime the reader for the summarized data before they encounter the detailed ranking.