Machine learning-based prediction of length of stay (LoS) in the neonatal intensive care unit using ensemble methods

Erdogan Yildirim, Ayse; Canayaz, Murat

doi:10.1007/s00521-024-09831-7

Machine learning-based prediction of length of stay (LoS) in the neonatal intensive care unit using ensemble methods

Original Article
Open access
Published: 07 May 2024

Volume 36, pages 14433–14448, (2024)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

Machine learning-based prediction of length of stay (LoS) in the neonatal intensive care unit using ensemble methods

Download PDF

691 Accesses
1 Altmetric
Explore all metrics

Abstract

Neonatal medical data holds critical information within the healthcare industry, and it is important to analyze this data effectively. Machine learning algorithms offer powerful tools for extracting meaningful insights from the medical data of neonates and improving treatment processes. Knowing the length of hospital stay in advance is very important for managing hospital resources, healthcare personnel, and costs. Thus, this study aims to estimate the length of stay for infants treated in the Neonatal Intensive Care Unit (NICU) using machine learning algorithms. Our study conducted a two-class prediction for long and short-term lengths of stay utilizing a unique dataset. Adopting a hybrid approach called Classifier Fusion-LoS, the study involved two stages. In the initial stage, various classifiers were employed including classical models such as Logistic Regression, ExtraTrees, Random Forest, KNN, Support Vector Classifier, as well as ensemble models like AdaBoost, GradientBoosting, XGBoost, and CatBoost. Random Forest yielded the highest validation accuracy at 0.94. In the subsequent stage, the Voting Classifier—an ensemble method—was applied, resulting in accuracy increasing to 0.96. Our method outperformed existing studies in terms of accuracy, including both neonatal-specific length of stay prediction studies and other general length of stay prediction research. While the length of stay estimation offers insights into the potential suitability of the incubators in the NICUs, which are not universally available in every city, for patient admission, it plays a pivotal role in delineating the treatment protocols of patients. Additionally, the research provides crucial information to the hospital management for planning such as beds, equipment, personnel, and costs.

Prediction of neonatal deaths in NICUs: development and validation of machine learning models

Article Open access 19 April 2021

Predicting perinatal mortality based on maternal health status and health insurance service using homogeneous ensemble machine learning methods

Article Open access 28 December 2022

Comparing the Predictive Accuracy of Machine Learning Algorithms for Neonatal Mortality Risk Classification

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Neonatal medical data are used for various purposes such as evaluating the health status of neonates, making clinical decisions, diagnosing diseases, and supporting patient treatment and care processes. By analyzing the important information contained in this data with artificial intelligence algorithms, it may be possible to achieve important goals such as early disease diagnosis, creating appropriate treatment plans and conditions, and supporting healthy growth and development [1]. Machine learning algorithms can also contribute to scientific research on neonatal health, thanks to their ability to process large amounts of data and uncover complex relationships [2, 3]. By employing these algorithms, it becomes possible to monitor the health status of newborns, anticipate potential treatment-related challenges, and proactively take necessary precautions [4].

Length of stay (LoS) is a clinical measure that indicates the time between the patient’s admission to the hospital and discharge. According to the World Health Organization (WHO), the length of hospital stay is a significant performance measurement and monitoring indicator [5]. In hospitals, it is crucial to anticipate patient discharge times to efficiently manage resources such as beds, intensive care units, and especially devices like ventilators, ensuring timely access to critical medical needs, controlling hospital costs, and enhancing service efficiency. Furthermore, the predicting of LoS is essential for not only healthcare institutions but also for patients, their families, and insurance companies, as it aids in requirements and cost planning [6, 7].

Similarly, predicting LoS for neonatal intensive care patients serves as an indicator of health status and holds paramount significance in bed allocation, ensuring a consistent healthcare staff roster, and evaluating the resources within healthcare facilities [6].

Recently, there has been an increasing trend toward medical applications utilizing artificial intelligence techniques. Machine learning, big data analysis, and deep learning (DL) are among the artificial intelligence techniques employed in predicting LoS. However, the fact that the model often functions as a black box is not desirable for medical professionals. Consequently, Explainable Artificial Intelligence (XAI) studies have become widespread [8]. Explainable systems can be categorized into two main groups: post hoc and ante hoc systems. Post hoc systems explain the relationships between features obtained through the model after feature extraction, while ante hoc systems produce inherently understandable relationships. Examples of ante hoc systems include methods like Fuzzy Logic and decision trees (DT) [9, 10]. Since machine learning methods are primarily ante hoc systems, they are generally preferred in studies aiming to predict LoS.

The main motivations and contributions of this study are as follows:

In the literature, there is a noticeable scarcity of studies related to artificial intelligence and machine learning in the analysis of neonatal medical data. Addressing the deficiency in this area is one of our primary motivations.
The study introduces a hybrid approach called Classifier Fusion-LoS (CF-LoS), which comprises two stages for predicting LoS.
CF-LoS involves testing both known and novel classifiers. Subsequently, classification accuracy is significantly enhanced by employing the Voting Classifier technique to combine the high-performing results of these diverse classifiers. This innovative approach presents a new method to improve research performance.

This study aims to develop a machine learning-based prediction model that classifies the length of stay (LoS) of patients hospitalized in the Neonatal Intensive Care Unit (NICU) as either long-term or short-term hospitalization using the original dataset. For this purpose, the epicrisis reports of 512 babies hospitalized in the NICU of the Private Eastern Anatolia Hospital in Elazig, Turkey were retrospectively examined with the necessary ethical permissions. Patients referred to another hospital were excluded from the study. Consequently, 12 features from 453 patients were included in the dataset used for predicting LoS. The dataset underwent preprocessing, including digitizing categorical data. Subsequently, the testing and validation processes were conducted in two stages within the CF-LoS method. In the first stage, various classifiers, including Logistic Regression (LR), ExtraTrees (ET), Random Forest (RF), K-Nearest Neighbor (KNN), SVC, AdaBoost, GradientBoosting (GB), XGBoost (XGB), and CatBoost, were employed. Following this, in the second stage, Voting Classifiers, namely Soft Voting and Hard Voting, were applied to enhance the overall classification accuracy. With CF-LoS, the prediction of LoS was performed by binary classification as either long-term hospitalization or short-term hospitalization, using the categorical and numerical data of patients hospitalized in the NICU. Classification performance was evaluated using different metrics such as Accuracy, Precision, Recall, F1-score, and Area Under the Curve (AUC) and then, compared with similar studies in the literature.

The article is organized as follows: First, in Sect. 2, LoS prediction studies for NICU patients are extensively covered. Additionally, other artificial intelligence studies for LoS prediction are mentioned in this section. Section 3 provides detailed explanations of the dataset and classifiers used. Section 4 encompasses the application and results. Section 5 includes comparisons with other studies in the literature and discusses limitations. The article concludes with the Conclusion section.

2 Literature review

2.1 LoS prediction studies for NICU patients

Research in the field of LoS prediction for NICU patients is relatively scarce. One of the earliest studies in this area was conducted by Zernikow et al. (1999), utilizing Multiple Linear Regression (MLR) and Artificial Neural Network (ANN) techniques. They retrospectively collected data from 2144 prematurely born patients over 7 years, excluding those who were transferred to another hospital without discharge or who passed away during their stay. The performance of the study was assessed using the linear correlation coefficient (r). For the MLR model, the calculated r ranged from 0.85 to 0.90, and a range of 0.87–0.92 was observed for ANN [11].

In their study, Feldman and Chawla (2014) developed a machine learning model called ADMIT to estimate LoS, which they consider important for appropriately adjusting the number of nurses caring for babies treated in the Neonatal Intensive Care Unit. For this purpose, they used a dataset containing the characteristics of 28,061 babies. They reduced the dataset by selecting 7 patient features. They achieved an average AUC of 0.97 with the ADMIT model, an improved version of the AdaBoost Algorithm [12]. In another study, Thompson et al. (2018) developed a long-term LoS prediction application for newborns using machine learning methods. They aimed to predict costs for healthcare services. They utilized the 2014 Florida HCUP SID dataset, which includes administrative data and simple clinical data. They experimented with methods such as Naive Bayes (NB), SVM, C4.5, and Random Tree, achieving the best result with the RF method with an AUC of 0.877 [6]. Mansouri et al. (2022) applied machine learning-based methods to estimate LoS for patients hospitalized in the NICU. They utilized demographic information and post-initial presentation diagnostic data sourced from the MIMIC III database. Their regression analysis study achieved an R² value of 0.78, indicating its performance [13]. Lin et al. (2022) estimated LoS for very-low-birth-weight babies in a hospital in Taiwan using machine learning. They pursued two distinct approaches, seeking continuous and categorical results. For this, they used a dataset consisting of 21 features and 6 different algorithms. Patients who died before 7 days were not included in the dataset. Among the models with binary classification as late discharge/early discharge, LR achieved an AUC of 0.724, while the RF method achieved an AUC of 0.712 [14]. Juraev et al. (2022) implemented a machine learning application that predicts neonatal mortality and LoS using a multilayer dynamic ensemble model. They employed classification for mortality prediction and regression for LoS prediction, utilizing the MIMIC III dataset, which includes time-series data from 3133 patients. The predictive performance resulted in an RMSE of 12.509 and an R² of 78.345 [15].

2.2 Other artificial intelligence studies for LoS prediction

In recent years, there has been a notable rise in studies focused on LoS prediction. The majority of these studies utilize machine learning methods to predict LoS based on patient information collected at admission. A smaller subset explores LoS prediction through big data analysis and DL. Age, gender, and diagnostic criteria are commonly used features across many of these studies, while other features vary according to the studies [7].

The pioneering LoS estimation study was conducted by Tu and Guerriere (1993). They employed the Neural Network (NN) using a dataset of 1409 patients to predict the LoS in the Intensive Care Unit (ICU) following cardiac surgery. They classified LoS into three groups: low, medium, and high. Their model achieved an AUC of 0.69 in the test [16]. Lowell and Davis (1994) conducted a similar study to estimate LoS with NN for psychiatric patients admitted to the hospital. The dataset included demographic information, place of residence, and disease severity, in addition to factors influencing LoS. The accuracy of the proposed method was compared with the predictions of the clinical treatment team. In all cases, the proposed method provided the same or better predictions than those of the treatment team [17].

Hachesu et al. (2013) applied DT, SVM, and ANN to a dataset containing 36 different features of 4948 patients for LoS prediction. Among these three algorithms, the SVM method achieved the best result with an accuracy of 96.4%. On the other hand, the advantage of DT lies in creating easily understandable model rules—a quality highly valued by medical experts. As a result, the longer LoS in patients with lung diseases, respiratory disorders, and high blood pressure has attracted attention [18]. Tsai et al. (2016) estimated LoS using two different datasets. While the first uses the data collected at the pre-admission stage, the second uses all data from hospitalized patients. Thus, the LoS estimate at the discharge stage was used to evaluate the LoS estimate at the pre-admission stage. The dataset includes 2377 cardiac patients diagnosed with Heart Failure (HF), Acute Myocardial Infarction (AMI), and Coronary Atherosclerosis (CAS). The highest accuracy achieved for CAS before discharge is 89.64%, whereas during the pre-admission phase, it reaches 89.65%. Similarly, for HF, the highest accuracy before discharge is 64.12%, compared to 63.69% during the pre-admission stage. In the case of AMI, the highest accuracy before discharge stands at 66.07%, while during the pre-admission phase, it reaches 65.72% [19].

Tang et al. (2018) performed classification for four different tasks using data from the MIMIC III database, which included intensive care patients. These tasks encompass mortality prediction, LoS prediction, differential diagnosis, and readmission prediction. In LoS estimation, a multiple classification model was created for the periods (1–2), (3–5), (5–8), (8–14), (14–21), (21–30), (30+). Among the proposed models, the Collapse models consist of standard machine learning models such as SVM, RF, GB, LR, and Multilayer Perceptron (MLP), while the Sequential models include two RNN models, LSTM, and CNN-LSTM. In terms of LoS prediction, the Collapse models achieved AUC values ranging from 0.64 to 0.88 for different classes, while the Sequential models obtained AUC values ranging from 0.65 to 0.93 [20]. Jalali et al. (2020) used five different methods to determine postoperative death and long-term hospital stay. These methods included Ridge Regression (RR), DT, RF, GB, and Deep Neural Network. The Deep Neural Network model outperformed the others, achieving an accuracy of 89 ± 4% and an AUC of 0.95 ± 0.02 [21].

Castineira et al. (2020) retrospectively collected vital continuous time data and clinical history data of 284 patients hospitalized in the Pediatric Intensive Care Unit of Boston Children’s Hospital. They estimated LoS using a machine learning method based on the Gradient Boosting Decision Tree (GBDT), similar to our study, classifying it into short or long duration. They achieved an AUC of 83% using only vital data, whereas the combination of vital and clinical history data yielded an AUC of 90% [22]. Bacchi et al. (2021) used ANN and LR models to predict hospital discharge probabilities within 2 and 7 days based on daily epicrisis reports of patients in General Medicine at Royal Adelaide Hospital. The models achieved an AUC of 0.80 for discharge within 2 days for ANN and an AUC of 0.78 for LR. For discharge within 7 days, the AUC was 0.68 for ANN and 0.61 for LR [23].

Unlike many previous studies, Abd-Elrazek et al. (2021) utilized general medical characteristics to estimate LoS, in addition to the disease diagnosis collected at the patient admission stage. They employed various machine learning methods for LoS prediction, including Classification Trees (CT), ANN, Tree Bagger (TB), RF, FL, SVM, KNN, Regression Tree (RT), and NB. Among these methods, FL achieved the highest accuracy at 92% [5]. Wu et al. (2021) employed four different machine learning models to predict prolonged LoS in the General Intensive Care Unit. These models included RF, SVM, DL, and GBDT. They utilized eICU and MIMIC III databases to train and validate the models. As a result, the GBDT model achieved the highest performance with an AUC value of 0.747. Comparisons with the Simplified Acute Physiology Score (SAPS-II) showed that all machine learning methods achieved more successful results [24].

Sinha et al. (2021) developed models for initially diagnosing COVID-19 and subsequently predicting LoS. They employed the Catboost model, a multilayer data processing approach, for LoS prediction. The average prediction accuracy for the two models was calculated as 92.33% [25]. Kalgotra and Sharda (2021) proposed a big data analysis and deep learning-based model for predicting LoS. They investigated hidden comorbidities using the records of 10 million patients hospitalized for different diseases in hospitals in the USA. As a result, their model achieved an MAE of 29.8% in LoS estimation [26].

Chrusciel et al. (2021) conducted two different LoS predictions employing the RF model. One data model utilized clinical reports, while the other incorporated age, gender, and primary diagnosis. They retrospectively collected records of patients admitted from the Emergency Department with hospital stays exceeding 2 days. The unstructured model, which received data from plain text, achieved an accuracy of 75.0%, while the structured data model achieved an accuracy of 74.1%. A similar evaluation focused solely on intensive care patients yielded results of 76.3% and 75.0% for the unstructured and structured data models, respectively [27].

In another study, Bacchi et al. (2022) prospectively obtained data for 4 months from the Royal Adelaide Hospital and 12 months from the Queen Elizabeth Hospital for LoS estimation. They estimated LoS using ANN with the Natural Language Processing technique applied to patient admission reports. The models achieved an AUC of 0.74 for discharge within 2 days and 0.67 for discharge within 7 days [28].

Kim et al. (2022) used data from 27,747 patients to estimate the LoS in Emergency Departments. They employed machine learning models including LR, XGBoost, NGBoost, SVM, and DT. The SVM and XGBoost methods achieved the highest accuracy of 0.896 [29]. Zeng (2022) predicted LoS using five machine learning algorithms named Lasso Regression (LSR), RR, RF, Lightweight Gradient Boosting Machine (LightGBM), and XGBoost and six feature-coding methods. The dataset, sourced from hospitalization records provided by the New York State Department of Health, consisted of 34 features encompassing patient characteristics, diagnosis, treatment, services, and costs. Performance was assessed using MSE and R², yielding values of 2.231 and 96.0%, respectively [30]. Alabbad et al. (2022) used four machine learning models, namely RF, Gradient Boosting (GB), XGBoost, and Ensemble, to predict intensive care needs and LoS for COVID-19 patients during the epidemic. For this purpose, they utilized the dataset of 895 COVID-19 patients admitted to King Fahad University Hospital in Saudi Arabia. The RF model reached the highest accuracy at 94.16%. Their findings highlighted the significance of age, CRP, and days on nasal oxygen support as key factors influencing LoS in the Intensive Care Unit [31].

Saravi et al. (2022) employed machine learning and DL algorithms to predict LoS for patients undergoing lumbar spine decompression surgery. The study involved the use of K-means and Unsupervised Learning for data clustering. Then, for LoS classification, LR, RF, Stochastic Gradient Descent (SGD), KNN, DT, Gaussian Naive Bayes (GaussianNB), SVM, CNN, MLP, and Radial Basis Function Neural Network (RBNN) models were used. A decision support system based on the CHAID algorithm was also added to the models. In the dataset containing 236 patients, 14 features were used to train the models. The proposed models achieved AUC values ranging from 67.5 to 87.3%. As a result, it has been observed that the most decisive feature in the LoS prediction of the proposed models is the operation time [32].

Kadri et al. (2023) applied a generative adversarial network (GAN) model to estimate the LoS for patients admitted to a Pediatric Emergency Department in France. They compared the results with different deep learning (Deep Belief Networks, CNN, Stacked Auto-Encoder) and machine learning (SVM, RF, AdaBoost, DT) models. It has been observed that the GAN model gives better results than other models. They achieved an R² and EV score of 0.87, along with an MAE score of 61.72 [33].

The prominent limitations of LoS prediction studies for NICU can be listed as follows:

1.
It is challenging to predict extremely long LoS effectively using regression analysis. Zernikow et al. [11] also emphasized this limitation in their study. In their studies, they were generally unable to accurately predict hospital stays exceeding 140 days. To address this constraint, we adopted a classification approach to enhance the precision of neonatal LoS predictions.
2.
In machine learning studies, sometimes using a large dataset can cause overlearning rather than demonstrating the success of the model. Feldman and Chawla [12] emphasize the difficulties they experienced in applying the same method to other medical units, despite 28,061 data. To overcome this limitation, we used a dataset with a more reasonable number of data.
3.
One of the most important limitations of LoS studies is the lack of data unity. Thompson et al. [6] emphasized the importance of ensuring data unity to apply studies to different states. This also limits the realistic comparison of proposed methods in the literature. In order to ensure data unity in future studies, we obtained the dataset from routinely collected epicrisis reports.
4.
Lin et al. [14] stated that, as another limitation, there were differences in terms of discharge criteria among the hospitals that comprised the dataset. To overcome this limitation, we obtained our data from the NICU of a single hospital. Additionally, Lin et al. [14] state that they faced an overfitting problem because they used the entire dataset for training purposes in their study. However, in our study, we overcame the overfitting limitation by allocating 0.3 of the dataset as test data.
5.
In certain studies, the utilization of unnecessary features can increase the cost, particularly when paid datasets are involved [15]. Our study did not have any cost limitations, and we chose to include a reasonable number of features (12 features). Juraev et al. [15] also noted the use of classical machine learning methods as a limitation in their study. In our study, both classical and ensemble machine learning methods such as CatBoost were used. Additionally, an innovative approach has been developed using techniques such as Voting Classifier to increase classification performance. Juraev et al. [15] also listed among the limitations of their study that they focused on presenting an explainable model and failed to focus on accuracy and robustness. Although our study did not focus on explainability, it achieved high success in all criteria such as accuracy, precision, and sensitivity.

3 Materials and methods

A detailed flowchart describing the CF-LoS method developed for NICU LoS Prediction and the entire application is presented in Fig. 1.

3.1 Dataset

The epicrisis reports of 512 babies hospitalized in the Neonatal Intensive Care Unit at the Private Eastern Anatolia Hospital in Elazig, Turkey, between 2021 and 2022 were retrospectively examined with the necessary ethical permissions. Among these neonates, the data of the babies who were transferred from the relevant unit were not included in the study because the exact LoS could not be known. Consequently, using the reports of 453 babies, 12 features of the baby and its mother were added to the dataset. These features encompassed both categorical and continuous data types. Table 1 contains the features, types, and definitions found in the dataset.

Table 1 Feature description of the dataset

Full size table

The LoS of the patients in the dataset ranges from 1 to 83 days. More than half of the patients are discharged within the first 25 days. Details about the dataset are represented in Fig. 2 by boxplots.

3.2 CatBoost classifier

CatBoost Classifier is a gradient boosting-based machine learning algorithm developed specifically for datasets where categorical variables are effectively utilized. CatBoost offers a unique approach that enables the direct use of categorical features, significantly reducing preprocessing steps. Consequently, it delivers higher performance and easier usability compared to other traditional gradient boosting algorithms.

CatBoost builds symmetrical trees, thereby achieving a high prediction rate without the need to construct excessively deep trees, thus mitigating the risk of overlearning. Moreover, CatBoost employs a proprietary regularization technique to prevent overfitting, considering the internal structure of discrete features. This approach enables the algorithm to achieve high levels of accuracy with low learning speed. Additionally, CatBoost enhances sample weight management through the utilization of category-specific statistical information, resulting in a more balanced modeling approach [34].

CatBoost Classifier is particularly effective on datasets abundant in categorical variables and characterized by complex relationships. Its proficiency in handling such data, along with its high performance, makes it a favored choice for applications demanding precise classification accuracy [35].

3.3 Voting classifier

Voting Classifier is a robust ensemble learning technique designed to achieve an optimal solution by combining various models. It integrates the insights and predictions from these individual models into a unified approach. This unified model makes decisions by favoring the most commonly agreed-upon choice among all the models for each output class [36].

This algorithm aggregates the results of each classifier within the Voting Classifier and considers the majority vote when determining the final output class. By combining predictions from multiple models, the voting classifier can incorporate diverse perspectives and benefit from the collective knowledge of the ensemble.

A notable advantage of employing a voting classifier lies in its resilience against errors or misclassifications from individual models. Should one model underperform in specific cases, it can benefit from the strong performance of other models. This ensemble approach helps reduce the influence of weaknesses in individual models, thereby enhancing the overall predictive accuracy and stability of the classifier.

Essentially, the Voting Classifier capitalizes on the principle that the combined decision-making of multiple models can surpass the performance of any single model. This leads to more precise and dependable predictions by leveraging the strengths of diverse models, offering a comprehensive and well-balanced predictive mechanism for addressing intricate classification problems [37].

Hard Voting classifier creates an ensemble containing many different classifiers. Each classifier makes its prediction based on the data. The hard voting classifier determines the final prediction by taking the majority of these predictions [38]. For example, if three different classifiers choose class A, one chooses class B and the other chooses class C, the hard voting classifier will choose class A because the number of classifiers choosing class A is the majority.

Hard Voting is based on an absolute majority among classes. This means that each classifier has equal weight and each has a single vote. This ensemble technique aims to create a stronger and more reliable classifier by blending diverse insights from various models. In situations where one model may make errors, accurate predictions from other models can offset these mistakes. Figure 3 illustrates the Hard Voting method [39].

Soft Voting classifier is a type of voting classifier that is an ensemble learning method. This classifier utilizes the probability values provided by different classifiers during classification [40].

Let’s imagine we have three distinct models, each providing class predictions for a given input data. Employing such models proves more effective when we not only want to determine which class this input data belongs to but also assess the reliability of this prediction.

Each model generates a probability distribution for every class. For instance, the probabilities for Class A are as follows: Model 1 (0.30), Model 2 (0.47), Model 3 (0.53). Similarly, for Class B, the probabilities are Model 1 (0.20), Model 2 (0.32), Model 3 (0.40).

Soft voting aggregates these predictions by considering the probability values. For instance, for Class A, the average probability is calculated as (0.30 + 0.47 + 0.53)/3 = 0.4333, and for Class B, it is calculated as (0.20 + 0.32 + 0.40)/3 = 0.3067.

Finally, according to the soft voting approach, the class with the highest average probability is selected as the final prediction. In this scenario, Class A is chosen as the final prediction class because it has a higher average probability [41].

This approach takes into account not only the prediction result of the classifiers but also the confidence levels of these predictions. Thus, it aims to obtain a more precise and reliable estimate. Figure 4 shows the Soft voting method.

4 Application and results

Python was used as the software development environment of the study. Moreover, in order to develop the application and obtain the results, a workstation equipped with an I7 processor and a Geforce 1070 GPU with an 8 GB capacity was employed. In this study, we conducted Length of Hospital Stay (LoS) estimation through binary classification. Precisely, a LoS ranging from 1 to 25 days was categorized as ‘short-term’, while a LoS of 26 days or more was designated as ‘long-term.’ The workflow of the study is illustrated in Fig. 5.

The original dataset, provided by NICU experts with the necessary ethical permissions, underwent initial data preprocessing, including data cleaning and digitization. Following this stage, a training data ratio of 0.7 and a testing data ratio of 0.3 were established for each classifier. As a result of the training and testing processes using these ratios, the performance of each classifier was evaluated.

To verify the consistency of the results, the k-fold value was determined as 5. Through this value, the performance of all classifiers was evaluated in more detail. For each classifier, the data were divided into 5 different parts, and the training and testing stages were repeated. Thus, the performance of each classifier was evaluated by the k-fold cross-validation method.

In the final stage, a Voting Classifier was used to optimally combine the obtained results. This approach aimed to achieve superior outcomes by consolidating predictions from various classifiers. Through this method, we strived to bolster our results by harnessing the collective performance of classifier combinations.

In our study, we used Logistic Regression (LR), ExtraTrees (ET), Random Forest (RF), K-Nearest Neighbor (KNN), SVC, AdaBoost, GradientBoosting (GB), XGBoost (XGB), and CatBoost algorithms based on machine learning as well as Voting Classifier technique. Table 2 shows the parameter settings for all algorithms used in this study.

Table 2 Parameter settings for the algorithms

Full size table

In our study, the results obtained by initially setting the test size to 0.3 are presented in Table 3. These results encompass a range of metrics, including Accuracy, Precision, Recall, F1-Score, Kappa [42,43,44], MSE, RMSE, and MAE [45,46,47], which were obtained for both classical and ensemble classifiers.

Table 3 Performance results with test size = 0.3

Full size table

As observed in Table 3, the Random Forest classifier consistently yields the best results, boasting an accuracy of 0.9779, precision of 0.9784, and recall of 0.9779. Conversely, the Logistic Regression classifier demonstrates the lowest performance among the classifiers. It is worth noting that the relatively new CatBoost classifier also exhibits promising results for this dataset, with an accuracy of 0.9632. In summary, Random Forest exhibits the highest overall performance, while CatBoost shows strong potential for this specific dataset.

In the subsequent stage, a cross-validation was conducted using a k-fold value of 5. The outcomes of this cross-validation are presented in Table 4.

Table 4 Results for k-fold = 5

Full size table

When we examine the results obtained when the k-fold value is set to 5, it is seen that the Random Forest classifier exhibits the best performance. While the accuracy achieved with a test data ratio of 0.3 stands at 0.9779, it registers as 0.9425 with cross-validation using k-fold. Similarly, it is observed that the best results in other metrics are again obtained from the RF classifier. In contrast, the CatBoost classifier demonstrates a respectable accuracy of 0.9403, indicating its effectiveness as a valid classifier for this dataset. In summary, the Random Forest classifier shines prominently in terms of Accuracy, Precision, Recall, and F1-Score metrics, while CatBoost closely follows suit in performance.

In the final phase of the study, the Voting Classifier, an ensemble method discussed in Sect. 3.3, was employed. Voting Classifier utilizes two distinct methods: Hard Voting and Soft Voting. The outcomes derived from these methods are presented in Table 5.

Table 5 Voting classifier results

Full size table

Upon reviewing Table 5, it becomes apparent that the accuracy achieved with the Soft Voting method is 0.9632, whereas the Hard Voting method yields an accuracy of 0.9558. These findings underscore the influence of various voting methods on performance. Notably, the Soft Voting method exhibits a slightly higher accuracy rate.

In the evaluation of precision metrics, it was observed that the highest value was achieved by the Hard Voting method at 0.9166, whereas in the Soft Voting method, this value was recorded as 0.8888. Precision values represent the proportion of correct positive predictions to the total positive predictions. In this instance, the Hard Voting method appears to exhibit a higher precision rate.

When we examine the other metrics, it becomes evident that the Soft Voting method generally produces superior results. Soft Voting aggregates predictions through majority voting, whereas Hard Voting selects the most frequently predicted class without considering the confidence scores. As a result, Soft Voting tends to deliver better overall performance.

Furthermore, Table 5 also includes the value of Average CV Recall [48] calculated with a k-fold value of 5. Recall measures the ratio of true positives to total positives and indicates how many true negative results the classifier missed. Upon reviewing these values, it becomes evident that the Soft Voting method consistently yields the highest results.

In summary, Table 5 illustrates that the Soft Voting method generally yields superior results, while Hard Voting tends to offer higher precision. Moreover, Soft Voting exhibits greater success in terms of average recall value. In addition to the metrics presented in the table, the AUC value for Soft Voting was calculated to be 0.947.

ROC_curve and Precision_Recall graphs for Soft Voting are given below. Figure 6 displays ROC_curve graphs for Soft Voting.

Figure 7 displays Precision–Recall curve (PR curve) graphs for Soft Voting. When we look at Fig. 7, we see that the average precision value is 0.95 for Soft Voting.

5 Discussion

For our LoS prediction study in NICU, which is a classification application, we preferred Accuracy, Precision, Recall, F1-Score, Avg CV Recall, Standard Deviation of CV Recall, and AUC performance metrics. It can be seen that our study achieved high success with an Accuracy of 0.96. In addition, Recall values of 0.92, F1-Score of 0.90, and Standard Deviation of CV Recall values of 0.11 reflect the superior performance of the study. In addition, the AUC value of 0.95 confirms the success of the CF-LoS. In Table 6, both the applications made on LoS estimation in different fields in recent years and the LoS estimation studies conducted for NICU are given comparatively. These studies are compared and presented below in terms of method, performance criteria, data types they use, dataset location, and size.

Table 6 Comparison of our results with the related studies in the literature

Full size table

Considering Table 6, it can be seen that a standard has not yet been developed in LoS studies. Regarding this, Stone et al. [7] suggested using routinely collected hospital records in future LoS studies. They emphasized that this would be more suitable in terms of data unity and comparison opportunities. In our study, instead of non-routine applications such as surveys, the information in routinely kept epicrisis reports was used to estimate LoS. This emerges as the strength of our study.

When Table 6 is examined, it is seen that the diversity of metrics used in LoS estimation studies makes performance comparison difficult. Some of the studies conducted in the field of LoS have been in the form of regression analysis. In these studies, performance measures such as R², EV, RMSE, and MAE were used. In applications that estimate LoS in the form of classification, as in our study, metrics such as Accuracy, AUC, Precision, Recall, and F1-Score are generally considered important and preferred. This is a limitation that reduces the possibility of comparison [49]. In contrast, Bacchi et al. [28], Kim et al. [29], Alabbad et al. [31], Saravi et al. [32], Feldman and Chawla [12], Thompson et al. [6], Lin et al. [14] presented their results with Accuracy and AUC metrics in their study. Among them, the best results in terms of Accuracy and AUC were obtained by CF-LoS. Only Feldman and Chawla [12] presented their results in the form of Avg AUC from their study. In this study, the AUC value was calculated separately for each class. Therefore, it is not fully suitable for comparing classification performance. The closest accuracy to our study, with an accuracy of 94.16%, was obtained with the method suggested by Alabbad et al. [31]. Again, the closest result in terms of the AUC was achieved with the method proposed by Kim et al. [29] with an AUC of 0.93.

The limitation of the study of Zernikow et al. [11] includes its low success rate in predicting LoS of 140 days and above. Feldman and Chawla’s [12] study, which utilized a large dataset, may not accurately reflect the model’s success, making its application in other locations challenging. Thompson et al. [6] noted a limitation in their study regarding the potential difficulty in applying the model uniformly across different states due to the lack of a standardized dataset. Lin et al. [14] identified several limitations in their study, including variations in discharge criteria among hospitals in the dataset and the presence of an overfitting problem due to using all patients for model training. Additionally, they mentioned the inability to obtain some features as another limitation. Juraev et al. [15] outlined several limitations in their study, such as the high cost associated with using numerous features, reliance on classical machine learning models instead of more effective deep learning models, and less focus on robustness and fairness while prioritizing explainability.

One limitation that emerged in interdisciplinary studies was the challenge of collecting professional data. To address this, we collaborated closely with healthcare professionals to accurately transfer data from epicrisis reports to dataset tables. Another limitation of our study is the presence of missing data in the epicrisis, resulting in a reduction in the available dataset for analysis and thereby affecting model performance. However, to address this challenge, we enhanced model accuracy by employing various classifiers. Furthermore, we identified the classifier with the highest accuracy among the candidates by utilizing the Voting Classifier.

Voting Classifier is an ensemble learning technique used in the field of machine learning. This technique combines predictions from multiple different algorithms, providing the ability to make a more robust and stable prediction. It has many advantages. First, using different algorithms enhances model diversity and improves generalization ability. Combining multiple models reduces overfitting risks, as each model is trained on different data subsets. Additionally, combining different algorithms allows for better adaptation to the dataset, resulting in a broader perspective. Thus, the Voting Classifier is a powerful tool for solving complex problems, often yielding better performance and more reliable results than a single algorithm alone. Soft and hard voting are different types of Voting Classifiers, and each has its advantages.

Hard voting takes the classification predictions of each base model and selects the class with the most votes. This method allows classification results to be determined by a simple majority vote. The advantage of hard voting is that the classification predictions use only the class labels to make the model’s final decision. This method requires less computational cost and is faster to implement when making the classification decision.

Soft voting, on the other hand, takes the classification predictions of the basic models, along with their probability values, and makes the final prediction by weighting these probabilities. This method provides more precision and generally better performance by taking into account the confidence level of each underlying model. Soft voting provides a more balanced decision-making process by considering the reliability of each underlying model’s classification prediction. The advantage of soft voting is that it provides a more robust and reliable decision-making mechanism by taking into account the confidence level of each underlying model. In particular, when a particular model is more reliable among the underlying models, this information can be better used through soft voting. However, the computational cost of soft voting is slightly higher than hard voting due to the heavier calculation of probability values.

As a result, the use of the Voting Classifier, in addition to the advantages listed above, also increased the model accuracy in the study.

6 Conclusion

The objective of this study is to predict the LoS for neonates in the NICU by using machine learning algorithms to analyze neonatal medical data. In this way, it is aimed to provide valuable insights about the Length of Hospital Stay (LoS) to the relatives of the patient, NICU staff, and hospital management.

In our study, we retrospectively examined the epicrisis reports of 512 patients admitted to the NICU, following ethical guidelines. The patients who were transferred to another hospital from the NICU were excluded from the dataset. The prenatal data and clinical data of the remaining 453 neonates were included in our original dataset. CF-LoS involves binary classification for predicting long and short-term LoS using various machine learning algorithms such as LR, ET, RF, KNN, SVC, AdaBoost, GB, XGB, and CatBoost. Enhancing classifier performance, we introduced a hybrid approach by incorporating the Voting Classifier technique. Our findings revealed the RF classifier’s superior accuracy, precision, and recall, affirming its proficiency in neonatal medical data classification. This confirms that the RF model is capable of accurately classifying neonatal medical data. Furthermore, the CatBoost classifier demonstrated strong performance, with its high accuracy rate demonstrating its effectiveness in effectively classifying neonatal medical data. Comparing hard and soft voting methods, soft voting exhibited superior accuracy, while hard voting achieved higher precision. This indicates that while the soft voting method excels in overall accuracy, the hard voting method can provide more consistent and precise classifications. As a result, CF-LoS, leveraging machine learning for neonatal medical data analysis, achieved noteworthy accuracy, precision, and recall.

The obtained results have enabled us to predict the timing of incubator availability in the Neonatal Intensive Care Unit (NICU), a facility not uniformly present in every city in Turkey and constrained by a limited incubator capacity. This information is crucial for the unit manager in making decisions regarding the acceptance of new patients. Thanks to the LoS estimation, it is predicted that the incubators of patients who are expected to stay for a long time will be full, and therefore, it will not be possible to admit new patients for an extended period. Conversely, a patient may be admitted instead of the baby whose stay is expected to be short-term. Thus, effective utilization of hospital resources is facilitated. Moreover, the LoS estimation significantly contributes to healthcare coordination. Hospital administrators can use these estimates to determine the required number of doctors, nurses, and allied health personnel for a given monthly period, facilitating task allocation. This information also aids in planning medical supplies based on anticipated patient numbers. Additionally, managers can make informed budgetary decisions aligned with the projected patient load. Lastly, responsible doctors could provide more concrete answers to parents' inquiries about the expected LoS of their infants in the NICU, thereby fostering clearer communication and understanding between healthcare providers and families. Consequently, CF-LoS can provide reliable guidance in making important decisions regarding neonatal health and optimizing treatment processes.

In our study, we experienced difficulties in converting the categorical and numerical data that we could use from epicrisis notes into regular dataset tables due to missing data and sometimes different naming of the same categorical data. These limitations were overcome by seeking assistance from expert personnel during the organization of this data. Another limitation was that we had to choose the classification study instead of regression analysis because our original dataset contained data from a limited number of babies. Furthermore, the diversity in metrics used in studies conducted in the relevant field in the literature posed another limitation for comparing performance in our study. The utilization of age and gender criteria to ensure the generalizability of the study is significant in establishing standards for future research. However, despite aiming to include features of disease diagnosis, it was observed that this endeavor reduced accuracy due to limited data, despite the large number of disease types. Thus, the exclusion of diagnostic criteria from our study constitutes another limitation. To summarize, the primary limitations of this study, the ability of algorithms to make inferences based on training data, and the importance of having a larger dataset with accurate, complete, and representative data can be highlighted.

In the future, conducting a deep-learning study to predict mortality in newborns would be of interest. Additionally, future studies could focus on improving data quality to enhance algorithm performance and achieve accurate results. This can be achieved through meticulous management of the model selection and training process. At this stage, to ensure valuable inferences with high accuracy, it is advisable for hospital managers to ensure the professional maintenance of patient records, making them more accessible for analysis by artificial intelligence experts. It is also recommended to provide healthcare professionals with the opportunity to assess the clinical impact of artificial intelligence studies, thereby facilitating the refinement and development of these studies as needed.

Data availability

Data sharing is not applicable to this article as the dataset used in this study belongs to Private Eastern Anatolia Hospital and due to patient privacy. The use of datasets is only possible with ethical and administrative permissions.

References

Kwok TC, Henry C, Saffaran S, Meeus M, Bates D, Laere DV, Boylan G, Boardman JP, Sharkey D (2022) Application and potential of artificial intelligence in neonatal medicine. Semin Fetal Neonatal Med 27(5):101346. https://doi.org/10.1016/j.siny.2022.101346
Article Google Scholar
Saqib M, Iftikhar M, Neha F, Karishma F, Mumtaz H (2023) Artificial intelligence in critical illness and its impact on patient care: a comprehensive review. Front Med 10:1176192. https://doi.org/10.3389/fmed.2023.1176192
Article Google Scholar
Javaid M, Haleem A, Singh RP, Suman R, Rab S (2022) Significance of machine learning in healthcare: features, pillars and applications. Int J Intell Netw 3:58–73. https://doi.org/10.1016/j.ijin.2022.05.002
Article Google Scholar
Mangold C, Zoretic S, Thallapureddy K, Moreira A, Chorath K, Moreira A (2021) Machine learning models for predicting neonatal mortality: a systematic review. Neonatology 118(4):394–405. https://doi.org/10.1159/000516891
Article Google Scholar
Abd-Elrazek MA, Eltahawi AA, Abd Elaziz MH, Abd-Elwhab MN (2021) Predicting length of stay in hospitals intensive care unit using general admission features. Ain Shams Eng J 12(4):3691–3702. https://doi.org/10.1016/j.asej.2021.02.018
Article Google Scholar
Thompson B, Elish KO, Steele R (2018) Machine learning-based prediction of prolonged length of stay in newborns. In: 17th IEEE international conference on machine learning and applications (ICMLA), Orlando, pp 1454–1459. https://doi.org/10.1109/ICMLA.2018.00236
Stone K, Zwiggelaar R, Jones P, Mac Parthalain N (2022) A systematic review of the prediction of hospital length of stay: towards a unified framework. PLOS Digit Health 1(4):e0000017. https://doi.org/10.1371/journal.pdig.0000017
Article Google Scholar
Holzinger A, Biemann C, Pattichis CS, Kell DB (2017) What do we need to build explainable AI systems for the medical domain? arXiv preprint arXiv:1712.09923. https://doi.org/10.48550/arXiv.1712.09923
Lakkaraju H, Kamar E, Caruana R, Leskovec J (2017) Interpretable & explorable approximations of black box models. arXiv preprint arXiv:1707.01154. https://doi.org/10.48550/arXiv.1707.01154
Holzinger A, Plass M, Holzinger K, Crisan GC, Pintea CM, Palade V (2019) A glassbox interactive machine learning approach for solving NP-hard problems with the human-in-the loop. Creat Math Inform 28(2):121–134. https://doi.org/10.37193/CMI.2019.02.04
Article MathSciNet Google Scholar
Zernikow B, Holtmannspötter K, Michel E, Hornschuh F, Groote K, Hennecke KH (1999) Predicting length-of-stay in preterm neonates. Eur J Pediatr 158(1):59–62. https://doi.org/10.1007/s004310051010
Article Google Scholar
Feldman K, Chawla NV (2014) Admission duration model for infant treatment (ADMIT). In: 2014 IEEE international conference on bioinformatics and biomedicine (BIBM), Belfast, pp 583–587. https://doi.org/10.1109/BIBM.2014.6999225
Mansouri A, Noei M, Abadeh MS (2020) Predicting hospital length of stay of neonates admitted to the NICU using data mining techniques. In: 10th International conference on computer and knowledge engineering (ICCKE), Mashhad, pp 629–635. https://doi.org/10.1109/ICCKE50421.2020.9303666
Lin WT, Wu TY, Chen YJ, Chang YS, Lin CH, Lin YJ (2022) Predicting in-hospital length of stay for very-low-birth-weight preterm infants using machine learning techniques. J Formos Med Assoc 121(6):1141–1148. https://doi.org/10.1016/j.jfma.2021.09.018
Article Google Scholar
Juraev F, El-Sappagh S, Abdukhamidov E, Ali F, Abuhmed T (2022) Multilayer dynamic ensemble model for intensive care unit mortality prediction of neonate patients. J Biomed Inform 135:104216. https://doi.org/10.1016/j.jbi.2022.104216
Article Google Scholar
Tu JV, Guerriere MR (1993) Use of a neural network as a predictive instrument for length of stay in the intensive care unit following cardiac surgery. Comput Biomed Res 26(3):220–229. https://doi.org/10.1006/cbmr.1993.1015
Article Google Scholar
Lowell WE, Davis GE (1994) Predicting length of stay for psychiatric diagnosis-related groups using neural networks. J Am Med Inform Assoc 1(6):459–466. https://doi.org/10.1136/jamia.1994.95153435
Article Google Scholar
Hachesu PR, Ahmadi M, Alizadeh S, Sadoughi F (2013) Use of data mining techniques to determine and predict length of stay of cardiac patients. Healthc Inform Res 19(2):121–129. https://doi.org/10.4258/hir.2013.19.2.121
Article Google Scholar
Tsai PFJ, Chen PC, Chen YY, Song HY, Lin HM, Lin FM, Huang QP (2016) Length of hospital stay prediction at the admission stage for cardiology patients using artificial neural network. J Healthc Eng 2016:7035463. https://doi.org/10.1155/2016/7035463
Article Google Scholar
Tang F, Xiao C, Wang F, Zhou J (2018) Predictive modeling in urgent care: a comparative study of machine learning approaches. JAMIA Open 1(1):87–98. https://doi.org/10.1093/jamiaopen/ooy011
Article Google Scholar
Jalali A, Lonsdale H, Do N et al (2020) Deep learning for improved risk prediction in surgical outcomes. Sci Rep 10:9289. https://doi.org/10.1038/s41598-020-62971-3
Article Google Scholar
Castineira D, Schlosser KR, Geva A, Rahmani AR, Fiore G, Walsh BK, Smallwood CD, Arnold JH, Santillana M (2020) Adding continuous vital sign information to static clinical data improves the prediction of length of stay after intubation: a data-driven machine learning approach. Respir Care 65(9):1367–1377. https://doi.org/10.4187/respcare.07561
Article Google Scholar
Bacchi S, Gluck S, Tan Y et al (2021) Mixed-data deep learning in repeated predictions of general medicine length of stay: a derivation study. Intern Emerg Med 16:1613–1617. https://doi.org/10.1007/s11739-021-02697-w
Article Google Scholar
Wu J, Lin Y, Li P, Hu Y, Zhang L, Kong G (2021) Predicting prolonged length of ICU stay through machine learning. Diagnostics 11:2242. https://doi.org/10.3390/diagnostics11122242
Article Google Scholar
Sinha S, Tushar, Goel S (2021) Research on data science ensembles for Covid-19 detection and length of stay prediction. In: 2021 International conference on computing, communication, and intelligent systems (ICCCIS), Greater Noida, pp 499–503. https://doi.org/10.1109/ICCCIS51004.2021.9397218
Kalgotra P, Sharda R (2021) When will I get out of the hospital? Modeling length of stay using comorbidity networks. J Manag Inf Syst 38(4):1150–1184. https://doi.org/10.1080/07421222.2021.1990618
Article Google Scholar
Chrusciel J, Girardon F, Roquette L, Laplanche D, Duclos A, Sanchez S (2021) The prediction of hospital length of stay using unstructured data. BMC Med Inform Decis Mak 21(1):351. https://doi.org/10.1186/s12911-021-01722-4
Article Google Scholar
Bacchi S, Gilbert T, Gluck S, Cheng J, Tan Y, Chim I, Jannes J, Kleinig T, Koblar S (2022) Daily estimates of individual discharge likelihood with deep learning natural language processing in general medicine: a prospective and external validation study. Intern Emerg Med 17(2):411–415. https://doi.org/10.1007/s11739-021-02816-7
Article Google Scholar
Kim E, Han KS, Cheong T, Lee SW, Eun J, Kim SJ (2022) Analysis on benefits and costs of machine learning-based early hospitalization prediction. IEEE Access 10:32479–32493. https://doi.org/10.1109/ACCESS.2022.3160742
Article Google Scholar
Zeng X (2022) Length of stay prediction model of indoor patients based on light gradient boosting machine. Comput Intell Neurosci 2022:9517029. https://doi.org/10.1155/2022/9517029
Article Google Scholar
Alabbad DA, Almuhaideb AM, Alsunaidi SJ, Alqudaihi KS, Alamoudi FA, Alhobaishi MK, Alaqeel NA, Alshahrani MS (2022) Machine learning model for predicting the length of stay in the intensive care unit for Covid-19 patients in the eastern province of Saudi Arabia. Inform Med Unlocked 30(2022):100937. https://doi.org/10.1016/j.imu.2022.100937
Article Google Scholar
Saravi B, Zink A, Ülkümen S, Couillard-Despres S, Hassel F, Lang G (2022) Performance of artificial intelligence-based algorithms to predict prolonged length of stay after lumbar decompression surgery. J Clin Med 11:4050. https://doi.org/10.3390/jcm11144050
Article Google Scholar
Kadri F, Dairi A, Harrou F, Sun Y (2023) Towards accurate prediction of patient length of stay at emergency department: a GAN-driven deep learning framework. J Ambient Intell Humaniz Comput 14:11481–11495. https://doi.org/10.1007/s12652-022-03717-z
Article Google Scholar
Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A (2018) CatBoost: unbiased boosting with categorical features. In: NIPS’18: proceedings of the 32nd international conference on neural information processing systems, Montréal, pp 6639–6649
Kus I, Keser SB, Yolacan EN (2021) Comparison of ensemble learning methods in intrusion detection systems. Eur J Sci Technol 31:725–734. https://doi.org/10.31590/ejosat.971875
Article Google Scholar
Srinivas A, Mosiganti JP (2023) A brain stroke detection model using soft voting based ensemble machine learning classifier. Meas Sens 29:100871. https://doi.org/10.1016/j.measen.2023.100871
Article Google Scholar
Khan MA, Khan MA, Latif S et al (2022) Voting classifier-based intrusion detection for IoT networks. In: Saeed F, Al-Hadhrami T, Mohammed E, Al-Sarem M (eds) Advances on smart and soft computing, vol 1399. Advances in Intelligent Systems and Computing. Springer, Singapore, pp 313–328. https://doi.org/10.1007/978-981-16-5559-3_26
Chapter Google Scholar
Jindal S, Sachdeva M, Kushwaha AKS (2022) Performance evaluation of machine learning based voting classifier system for human activity recognition. Kuwait J Sci. https://doi.org/10.48129/kjs.splml.19189
Article Google Scholar
Kaggle (2023) Voting classifier for better results. https://www.kaggle.com/code/marcinrutecki/voting-classifier-for-better-results. Accessed 26 July 2023.
Manconi A, Armano G, Gnocchi M, Milanesi L (2022) A soft-voting ensemble classifier for detecting patients affected by COVID-19. Appl Sci 12:7554. https://doi.org/10.3390/app12157554
Article Google Scholar
Sherazi SWA, Bae J-W, Lee JY (2021) A soft voting ensemble classifier for early prediction and diagnosis of occurrences of major adverse cardiovascular events for STEMI and NSTEMI during 2-year follow-up in patients with acute coronary syndrome. PLoS ONE 16(6):e0249338. https://doi.org/10.1371/journal.pone.0249338
Article Google Scholar
Dhiman G, Juneja S, Viriyasitavat W, Mohafez H, Hadizadeh M, Islam MA et al (2022) A novel machine-learning-based hybrid CNN model for tumor identification in medical image processing. Sustainability 14(3):1–13. https://doi.org/10.3390/su14031447
Article Google Scholar
Trivedi M, Gupta A (2022) A lightweight deep learning architecture for the automatic detection of pneumonia using chest X-ray images. Multimed Tools Appl 81:5515–5536. https://doi.org/10.1007/s11042-021-11807-x
Article Google Scholar
Fayemiwo MA, Olowookere TA, Arekete SA, Ogunde AO, Odim MO, Oguntunde BO et al (2021) Modeling a deep transfer learning framework for the classification of COVID-19 radiology dataset. Peer J Comput Sci 7:e614. https://doi.org/10.7717/peerj-cs.614
Article Google Scholar
Haq EU, Jianjun H, Huarong X, Li K, Weng L (2022) A hybrid approach based on deep CNN and machine learning classifiers for the tumor segmentation and classification in brain MRI. Comput Math Methods Med 2022:6446680. https://doi.org/10.1155/2022/6446680
Article Google Scholar
Alluhaidan AS (2022) DNA sequence analysis for brain disorder using deep learning and secure storage. Comput Mater Contin 71(3):5949–5962. https://doi.org/10.32604/cmc.2022.022028
Article Google Scholar
Selvaraj S, Gangadharan SS (2021) Privacy preserving hybrid recommender system based on deep learning. Turk J Electr Eng Comput Sci 29(5):9. https://doi.org/10.3906/elk-2010-40
Article Google Scholar
Murphy KP (2022) Probabilistic machine learning: an introduction. MIT Press, Cambridge
Google Scholar
McAdams RM, Kaur R, Sun Y et al (2022) Predicting clinical outcomes using artificial intelligence and machine learning in neonatal intensive care units: a systematic review. J Perinatol 42:1561–1575. https://doi.org/10.1038/s41372-022-01392-8
Article Google Scholar

Download references

Funding

Open access funding provided by the Scientific and Technological Research Council of Türkiye (TÜBİTAK).

Author information

Authors and Affiliations

Department of Computer Engineering, University of Firat, Elazig, Turkey
Ayse Erdogan Yildirim
Department of Computer Engineering, University of Van Yuzuncu Yıl, Van, Turkey
Murat Canayaz

Authors

Ayse Erdogan Yildirim
View author publications
You can also search for this author in PubMed Google Scholar
Murat Canayaz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ayse Erdogan Yildirim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Erdogan Yildirim, A., Canayaz, M. Machine learning-based prediction of length of stay (LoS) in the neonatal intensive care unit using ensemble methods. Neural Comput & Applic 36, 14433–14448 (2024). https://doi.org/10.1007/s00521-024-09831-7

Download citation

Received: 17 October 2023
Accepted: 12 April 2024
Published: 07 May 2024
Issue Date: August 2024
DOI: https://doi.org/10.1007/s00521-024-09831-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Machine learning-based prediction of length of stay (LoS) in the neonatal intensive care unit using ensemble methods

Abstract

Similar content being viewed by others

Prediction of neonatal deaths in NICUs: development and validation of machine learning models

Predicting perinatal mortality based on maternal health status and health insurance service using homogeneous ensemble machine learning methods

Comparing the Predictive Accuracy of Machine Learning Algorithms for Neonatal Mortality Risk Classification

1 Introduction

2 Literature review

2.1 LoS prediction studies for NICU patients

2.2 Other artificial intelligence studies for LoS prediction