Introduction

Acute hypoxemic respiratory failure is a major cause of intensive care unit (ICU) admission, associated with high mortality, long-term morbidity, and a considerable healthcare burden [1]. Acute kidney injury (AKI) is a frequent and serious complication in ICU patients, particularly among those with respiratory failure [2]. Globally, AKI-related mortality exceeds that of major diseases such as heart failure and diabetes [3], with persistently high mortality over the past five decades. In high-income countries, AKI is most prevalent in ICU settings—especially among elderly patients with multiple organ failure—where its severity correlates with higher mortality, longer ICU stays, and greater healthcare costs [4,5,6,7]. The co-occurrence of respiratory failure and AKI is common and is thought to be mediated by complex lung–kidney interactions, leading to accelerated disease progression and worse outcomes [8,9,10,11].

Early identification and timely intervention are critical for improving AKI prognosis [12]. However, widely used ICU risk assessment tools such as SOFA, APACHE II, and SAPS II were not specifically designed for AKI prediction. These scores rely on limited physiological parameters captured at a single time point, overlook dynamic trends, and lack AKI specificity in heterogeneous ICU populations [13]. Their performance may be further reduced in ICU patients with respiratory failure due to overlapping organ dysfunction and complex lung–kidney pathophysiology [14].

Developing accurate AKI prediction models in ICU patients with respiratory failure presents additional challenges: (1) marked heterogeneity in etiology and presentation—such as acute respiratory distress syndrome (ARDS), acute exacerbation of COPD, and hypoxemic vs. hypercapnic respiratory failure—each with distinct AKI risk profiles [1]; (2) frequent missing or incomplete clinical variables in ICU datasets [15]; and (3) multifactorial AKI pathophysiology in critically ill patients, involving hemodynamic instability, systemic inflammation, nephrotoxic exposures, and mechanical ventilation effects [16]. These factors demand predictive tools capable of integrating diverse, high-dimensional data while remaining robust across patient subgroups.

Machine learning (ML) offers a promising solution, with the ability to process complex datasets, identify subtle patterns, and generate more accurate, individualized predictions [17, 18]. The MIMIC-IV database contains comprehensive ICU patient data and provides an ideal platform for developing and validating ML models. Prior retrospective studies have demonstrated the potential of ML algorithms to predict AKI up to 48 h before onset [19] and shown their applicability across diverse patient populations [20,21,22,23,24]. This study aims to develop and validate an ML-based AKI prediction model for ICU patients with respiratory failure using demographic, clinical, and laboratory data from the MIMIC-IV database, with the goal of enabling timely, informed clinical decisions and improving patient outcomes.

Materials and methods

Data source

This study is a retrospective cohort study based on the Medical Information Mart for Intensive Care IV (MIMIC-IV, version 3.0) database. The MIMIC-IV database contains over 250,000 electronic health records from Beth Israel Deaconess Medical Center, spanning from 2008 to 2019 [25]. All identifiable patient information in the database has been de-identified, eliminating the need for informed consent from patients. The MIMIC-IV project has been approved by the Institutional Review Boards of both the Massachusetts Institute of Technology and Beth Israel Deaconess Medical Center. The authors of this study have completed the required online training program through the collaborative institutions and are certified authorized users of the MIMIC-IV database (Certification number: 42304428).

Inclusion and exclusion criteria

Patients diagnosed with ICD-10-CM J96 Respiratory Failure, Not Elsewhere Classified, were eligible for inclusion in this study. Among these patients, individuals diagnosed with AKI were selected as the study population. AKI is characterized by a sudden loss of kidney function, indicated by elevated serum creatinine levels (a marker of renal excretory function) and reduced urine output (oliguria, a quantitative marker of urine production). AKI was diagnosed based on an increase in serum creatinine (sCr) of ≥ 50% within 7 days, or an increase of ≥ 0.3 mg/dl (26.5 µmol/l) within 2 days, or oliguria lasting ≥ 6 h [26]. The following patients were excluded from the study: (1) patients under 18 years of age; (2) patients with multiple ICU admissions; (3) patients with an ICU stay of less than 48 h; and (4) patients with incomplete AKI data, missing partial pressure of oxygen (PaO2) or carbon dioxide information, or insufficient follow-up data within the first 48 h of ICU admission.

Data extraction and processing

Baseline characteristics of the patients were extracted using structured query language (SQL) from the PostgreSQL (version 14.2) database. These features included demographic data such as age, gender, body mass index (BMI), and race; clinical indicators including length of hospital stay (LOS), length of ICU stay (ICU-LOS), height, and weight; comorbidities such as myocardial infarction, congestive heart failure (CHF), peripheral vascular disease, cerebrovascular disease, dementia, chronic pulmonary disease, rheumatic disease, peptic ulcer disease, mild liver disease, diabetes without complications, diabetes with chronic complications, paraplegia, renal disease, malignant cancer, severe liver disease, metastatic solid tumors, and acquired immunodeficiency syndrome (AIDS); as well as the Charlson comorbidity index (CCI).

Vital signs and physiological measurements included minimum, maximum, and mean values of heart rate, systolic blood pressure (SBP), diastolic blood pressure (DBP), mean arterial pressure (MAP), respiratory rate, body temperature, and oxygen saturation (SpO₂). Neurological and organ failure assessments included the Glasgow Coma Scale (GCS) and the Sequential Organ Failure Assessment (SOFA) score.

Laboratory parameters extracted from the database encompassed values of lactate, blood pH, PaO₂, partial pressure of carbon dioxide (PaCO₂), base excess, bicarbonate, arterial/alveolar oxygen gradient (A-aDO2), PaO2/FiO2 ratio, serum glucose, creatinine, blood urea nitrogen (BUN), serum electrolytes (sodium, potassium, chloride, calcium), hemoglobin, hematocrit, platelet count, white blood cell count (WBC), international normalized ratio (INR), prothrombin time (PT), partial thromboplastin time (PTT), alanine aminotransferase (ALT), aspartate aminotransferase (AST), alkaline phosphatase (ALP), and total bilirubin. Additionally, differential leukocyte parameters included absolute basophil, eosinophil, lymphocyte, monocyte, and neutrophil counts.

Medication and intervention-related variables included the use of angiotensin-converting enzyme inhibitors (ACEI), angiotensin receptor blockers (ARB), angiotensin receptor–neprilysin inhibitors (ARNI), beta-blockers, aspirin, statins, fibrates, proton pump inhibitors (PPIs), oral hypoglycemic agents (e.g., metformin), nonsteroidal anti-inflammatory drugs (NSAIDs), anticoagulants, antibiotics, antacathartics, glucocorticoids, vasoactive drugs (e.g., dobutamine, dopamine, epinephrine, norepinephrine, isoprenaline), sedatives (e.g., midazolam, dexmedetomidine, propofol), oxygen inhalation, and mechanical ventilation.

The primary outcome of this study was the development of AKI during ICU stay, as defined by the diagnostic criteria recorded in the MIMIC-IV database. A secondary outcome was 28-day ICU mortality, assessed via survival analysis. Survival status was determined based on the number of days the patient survived following ICU admission. Patients with a survival time of 28 days or less and confirmed death were included in the Kaplan–Meier survival analysis and log-rank test.

To address missing data, variables with less than 20% missingness were imputed using multiple imputation by chained equations (MICE). Variables with a missing rate exceeding 20% were excluded from the final analysis [26].

Statistical analysis

Continuous variables were expressed as Ms with interquartile ranges (IQRs), and intergroup differences were assessed using the Mann–Whitney U test. Categorical variables were presented as counts (n) and percentages (%), with between-group comparisons conducted using the Chi-squared test or Fisher’s exact test as appropriate. Kaplan–Meier survival curves were constructed to evaluate 28-day ICU mortality, and differences between groups were assessed using the log-rank test.

The MIMIC-IV dataset was randomly divided into a training set and an internal validation set in a 7:3 ratio. To identify the most informative predictors, recursive feature elimination (RFE) was employed for feature selection [27]. This technique recursively removes the least important features during model training to obtain an optimal subset that maximizes predictive performance. Based on the selected features, we developed ten supervised ML models: (1) extreme gradient boosting (XGBoost); (2) support vector machine (SVM); (3) AdaBoost; (4) multilayer perceptron (MLP); (5) neural network (NNET); (6) naive Bayes; (7) logistic regression; (8) light gradient boosting machine (LightGBM); (9) random forest (RF); and (10) ElasticNet regression. For each model, we applied RFE with tenfold cross-validation independently to identify its own optimal set of 15 predictors, rather than using a single unified feature set across all algorithms. This approach was chosen deliberately to allow each model architecture—given its unique learning mechanism, bias–variance trade-off, and sensitivity to feature interactions—to leverage the predictors best suited to its strengths. While a unified feature set would facilitate direct comparison under identical input conditions, it could constrain the performance of certain models. Allowing independent feature optimization ensures that each algorithm is evaluated under its optimal configuration, aligning with the primary objective of identifying the best-performing AKI prediction model rather than directly comparing model structures.

Model performance was evaluated using multiple metrics, including the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, specificity, precision, negative predictive value (NPV), F1-score, Cohen’s kappa coefficient, and Brier score.

The AUC reflects the model’s overall ability to distinguish between different outcome classes, with values closer to 1 indicating stronger discriminative power. Accuracy measures the proportion of correct predictions across all instances but may be biased in the presence of class imbalance. Sensitivity (recall) refers to the model’s ability to correctly identify true positives, while specificity indicates its ability to correctly recognize true negatives. Precision measures the proportion of true positives among those predicted as positive, which helps reduce the false positive rate. NPV reflects the proportion of true negatives among those predicted as negative, making it useful for assessing the model’s ability to rule out low-risk individuals. The F1-score is the harmonic mean of precision and sensitivity, offering a balanced measure of performance, particularly in imbalanced datasets. Cohen’s kappa quantifies the agreement between predicted and actual labels, adjusting for agreement due to chance. Finally, the Brier score evaluates the calibration of predicted probabilities, with lower scores indicating better alignment between predicted and observed outcomes.

Statistical analyses were performed using R version 4.4.3 (Vienna, Austria), with P values < 0.05 considered statistically significant. The ML code and original patient data are available on GitHub (https://github.com/DRQ66/AKI).

Ethics approval and consent to participate

The MIMIC-IV database was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center and the Massachusetts Institute of Technology. All protected health information in the database has been de-identified, eliminating the need for individual patient consent. All methods were conducted in accordance with relevant guidelines and regulations.

Results

Participants and baseline characteristics

After applying exclusion criteria, a total of 10,780 ICU patients with respiratory failure were included from the MIMIC-IV database (Fig. 1), out of an initial cohort of 17,900. Baseline characteristics, including continuous, categorical, and binary variables, are summarized in Table 1. Among the included patients, 5819 (53.96%) developed AKI, while 4961 did not.

Fig. 1
figure 1

Research flowchart

Table 1 Comparison of baseline characteristics between AKI and Non-AKI patients

Kaplan–Meier survival curves (Fig. 2) demonstrated a significantly higher 28-day ICU mortality rate in the AKI group compared to the non-AKI group (log-rank test, P < 0.01). Moreover, the ICU length of stay was notably prolonged in patients with AKI (Mann–Whitney U test, P < 0.01; Fig. 3).

Fig. 2
figure 2

Kaplan–Meier survival curves for 28-day ICU mortality by AKI status. Note: The numbers of patients in the survival analysis (AKI = 5819, non-AKI = 4961) reflect those with complete follow-up data. A small number of patients with missing or invalid survival time or event status were excluded from the analysis

Fig. 3
figure 3

ICU length of stay in patients with and without AKI

Further analysis of baseline characteristics demonstrated that patients in the AKI group were older (M: 70 vs. 65 years), had higher body weight and BMI, and longer hospital and ICU stays (all P < 0.001). The AKI group also showed a higher CCI (M: 6 vs. 5), with significantly greater prevalence of cardiovascular diseases (e.g., myocardial infarction, CHF), diabetes, chronic pulmonary disease, and malignancies (all P < 0.001). They presented more abnormal vital signs and laboratory findings suggestive of hemodynamic instability and metabolic disturbances, including elevated creatinine and BUN, electrolyte imbalances, and acid–base disorders. Moreover, the use of critical care interventions, such as vasopressors, sedatives, antibiotics, and invasive mechanical ventilation, was more frequent in the AKI group, indicating a more severe clinical condition.

Feature selection and model development

A total of 156 candidate feature variables were included in this study (Table 1). To efficiently identify key predictors contributing to model performance, RFE with cross-validation was applied to select the top 15 most important features for each ML algorithm. Based on the selected features, ten supervised ML models were developed, including SVM, XGBoost, RF, logistic regression, AdaBoost, naive Bayes, MLP, NNET, LightGBM, and ElasticNet regression. All models were trained using their respective RFE-selected features to ensure comparability and interpretability. This feature selection strategy improved modeling efficiency, reduced redundancy, enhanced model robustness, and emphasized the clinical variables most closely associated with AKI prediction.

Continuous variables are presented as Ms with IQRs. The Mann–Whitney U test was used to compare these variables between the two groups (AKI vs. non-AKI). Categorical variables are presented as frequencies (n) with corresponding percentages (%), and comparisons were made using the Chi-squared test or Fisher’s exact test, as appropriate. The variables are grouped by demographics, comorbidities, vital signs, laboratory values, and therapeutic interventions for clarity. Binary variables are presented as frequencies (n) and percentages (%). Comparisons between the groups (AKI vs. non-AKI) were performed using the Chi-squared test or Fisher’s exact test. Variables include demographics, comorbidities, and therapeutic interventions.

Model performance

To comprehensively compare the predictive performance of all ten ML models, multiple evaluation metrics were considered, including AUC, accuracy, sensitivity, specificity, precision, NPV, F1-score, kappa, and Brier score (Table 2). Among these, the XGBoost model achieved the highest AUC (0.9023), indicating superior overall discrimination ability. It also demonstrated high accuracy (0.8247), specificity (0.8386), and NPV (0.8419), while maintaining a balanced sensitivity (0.8077) and F1-score (0.8058). These results collectively suggest strong predictive performance and robustness.

Table 2 Model performance on the validation set

The RF and SVM models also showed competitive AUCs, but were slightly inferior to XGBoost in most key metrics. Models, such as AdaBoost and ElasticNet, yielded moderate performance but displayed trade-offs between sensitivity and specificity.

The ROC curves (Fig. 4) further illustrated that XGBoost outperformed other models in terms of distinguishing between AKI and non-AKI cases, confirming its superior discriminative ability. Decision curve analysis (DCA) (Fig. 5) demonstrated that XGBoost consistently provided the highest net benefit over a wide range of threshold probabilities, surpassing all other models in clinical utility. Additionally, calibration curve analysis (Fig. 6) further revealed that the XGBoost model had the best alignment between predicted and observed probabilities across the full probability range. Furthermore, the clinical impact of the XGBoost model was clearly demonstrated in Fig. 7, where the model showed consistently high net benefits at clinically relevant thresholds, suggesting that it could be a valuable tool for clinical decision-making.

Fig. 4
figure 4

Receiver Operating Characteristic (ROC) Curves. ROC curves comparing ten ML models: Logistic regression, RF, XGBoost, SVM, ElasticNet, LightGBM, AdaBoost, naive Bayes, MLP, and NNET. The XGBoost model achieved the highest AUC (0.9023), indicating the strongest overall discriminative capability for predicting AKI among ICU patients

Fig. 5
figure 5

Decision Curve Analysis. The XGBoost model provided the highest net clinical benefit across a wide range of threshold probabilities, confirming its utility in guiding AKI-related decision-making in ICU settings

Fig. 6
figure 6

Calibration Curves. The calibration curve for the XGBoost model demonstrated the best agreement between predicted probabilities and observed outcomes, closely aligning with the ideal 45-degree reference line diagonal reference line, thus reflecting excellent model calibration

Fig. 7
figure 7

Clinical Impact Curve (CIC) for XGBoost. The red curve represents the number of patients predicted to be at high risk for AKI at each probability threshold, while the dark curve indicates the number of true positive cases. The close alignment of these curves across a range of thresholds suggests strong clinical utility and low false positive rates

Given its superior performance in discrimination, calibration, and clinical net benefit, XGBoost was selected as the optimal model for predicting AKI among ICU patients in this study. These interpretable visualizations reinforce the clinical plausibility and transparency of the model’s predictions. Collectively, the results highlight the XGBoost model’s capability to integrate physiologically meaningful features and deliver both accurate and explainable risk stratification for AKI in ICU settings.

Model interpretations

To interpret the contribution of individual predictors within the XGBoost model, we employed SHapley Additive exPlanations (SHAP) for both global and individual-level interpretation. As illustrated in the SHAP summary plot (Fig. 8), creatinine_max emerged as the most influential variable, with a mean absolute SHAP value of 1.168, significantly exceeding those of all other features. This suggests that elevated serum creatinine levels are a major driving factor in the model’s prediction of AKI occurrence.

Fig. 8
figure 8

SHAP summary plot of feature importance in the XGBoost model. Each dot represents the SHAP value of a feature for an individual patient. Features are ranked by their average absolute SHAP value. The color gradient indicates the feature value, where yellow represents higher values and purple represents lower values. Creatinine_max was the most important predictor contributing to AKI risk

Following creatinine_max, other important contributors included length of hospital stay (los_hospital), bun_max, presence of renal disease (rel_disease), and urine output. These features are clinically interpretable, as prolonged hospitalization, azotemia, and oliguria are well-established risk indicators for AKI.

In addition, several physiological and biochemical variables—such as mean SpO2, minimum bicarbonate, minimum INR, A-aDO2, and both minimum and maximum BUN—were retained in the model, reflecting their more subtle yet meaningful roles in risk estimation. Notably, administration of dexmedetomidine and elevated CCI scores were associated with increased predicted risk, underscoring the impact of sedative exposure and comorbidity burden on renal vulnerability.

For individualized interpretation, SHAP-based visualization techniques were applied. In the waterfall plot of a representative high-risk case (Sample 2) (Fig. 9), the model classified the patient as high risk for AKI, primarily due to markedly elevated creatinine_max, increased bun_max, and reduced urine output. These features exhibited large positive SHAP values, highlighting their substantial contribution to the predicted outcome. Conversely, the force plot of a low-risk case (Sample 7) (Fig. 10) depicted a low-risk case, driven by normal levels of creatinine and BUN, along with relatively elevated hemoglobin and SpO₂, which exerted negative SHAP influences.

Fig. 9
figure 9

SHAP-based visualization of individual-level predictions in the XGBoost model. Waterfall plot of a representative high-risk case (Sample 2) selected from the internal validation set. The patient’s risk was primarily driven by elevated creatinine_max, bu n_max, and reduce urineoutput

Fig. 10
figure 10

Force plot of a low-risk case (Sample 7) showing protective effects from higher hemoglobin and SpO₂ levels. Red and blue bars indicate positive and negative contributions to AKI risk, respectively

These individualized SHAP visualizations help elucidate the model’s internal decision-making process and affirm the clinical relevance of its predictions. Collectively, the results underscore the XGBoost model’s strength in integrating physiologically meaningful variables to deliver accurate and explainable AKI risk stratifications for ICU patients.

Application of the model

In the MIMIC-IV database, a total of 4961 patients were not diagnosed with AKI at initial assessment (Table 1). To evaluate their potential risk of future AKI development, we applied the XGBoost model constructed with features selected by RFE. Using a predicted probability threshold of ≥ 0.6 to define high risk, the model identified 796 patients (16.0%) as individuals at elevated risk for developing AKI. Early identification of these high-risk patients may support timely clinical interventions, enhanced monitoring, and individualized management strategies to reduce the incidence and adverse outcomes associated with AKI.

Discussion

In this study, approximately 54.0% of ICU patients with respiratory failure developed AKI, which was associated with higher 28-day mortality and prolonged ICU stay. To address this challenge, we developed an XGBoost-based prediction model with strong predictive performance in the validation cohort, enabling early risk identification at ICU admission. The AKI outcome variable was defined as the final discharge diagnosis in the electronic medical record, while predictor variables were limited to data from the first 48 h after ICU admission—a temporal separation that provides predictive value. However, we acknowledge that in some patients, AKI may have already been present but unrecognized at admission, meaning variables such as creatinine_max may partially reflect early, undetected AKI. Thus, our model may serve both to predict future AKI and to identify early or subclinical cases. Future prospective studies with time-stamped AKI onset data are warranted to clarify its true predictive capacity.

Unlike traditional scoring systems, our model used XGBoost’s embedded feature importance to select the top 15 routinely available variables, enhancing interpretability and practicality. Notably, each machine learning model was trained on its own optimally selected 15-feature subset via independent RFE, rather than a single unified set—a deliberate trade-off prioritizing performance maximization over strict comparability.

To our knowledge, this is the first model specifically developed for predicting AKI in ICU patients with respiratory failure. Prior studies have focused mainly on septic or cardiovascular populations [23, 28,29,30,31,32,33,34]. Our model distinguishes itself by integrating comorbidities and laboratory values via data-driven feature selection, achieving robust performance in discrimination, calibration, and decision curve analysis.

Beyond enhanced monitoring, several actionable interventions can be implemented immediately when a patient is flagged high risk: (1) individualized fluid management to avoid both hypovolemia and fluid overload [35];(2) adjustment of nephrotoxic medications, minimizing agents such as aminoglycosides, high-dose loop diuretics, and certain contrast agents [35, 36]; (3) early adoption of kidney-protective hemodynamic targets [37]; (4) strict glycemic control to prevent hyperglycemia, which has been linked to increased AKI risk in ICU patients [38]; (5) early nephrology consultation for patients with rising creatinine, declining urine output, or complex fluid/drug management needs [36]; and (6) lung-protective ventilation strategies—such as low tidal volume ventilation and avoiding excessive airway pressures or PEEP—to reduce hemodynamic compromise and protect renal perfusion [35]. These measures can be rapidly integrated into ICU workflows, aligning with the model’s goal of timely, precise intervention.

The top-ranked predictive features identified by the XGBoost model included creatinine_max, los_hospital, bun_max, rel_disease, urineoutput, spo2_mean_vitalsign, bicarbonate_min, bun_min, aado2_calc_min_bg, inr_min, dexmedetomidine, charlson_comorbidity_index, creatinine_min, hemoglobin_min, and congestive_heart_failure. These variables are clinically meaningful and play a critical role in evaluating the risk of AKI in patients with respiratory failure.

Our study identified serum creatinine as the most influential predictor, followed by BUN and urine output, consistent with prior models [31, 39] and reflecting their central role in renal function assessment [40]. Elevated creatinine and BUN with reduced urine output signify declining kidney function, often accompanied by metabolic imbalances that drive uremic symptoms. Patients with preexisting renal disease face substantially higher AKI risk [41, 42].

AKI also significantly impacts acid–base homeostasis. In AKI patients, impaired excretion of fixed acids leads to tubular metabolic acidosis, which is partially compensated by increased respiratory ventilation [43]. This compensatory response manifests clinically as alterations in respiratory rate, elevated heart rate, fluctuations in blood pressure, and increases in serum bicarbonate. Additionally, arterial blood gas analysis often reveals decreased base excess, a reduced anion gap, and electrolyte imbalances, including abnormalities in sodium, potassium, calcium, and phosphate levels [41]. These biochemical changes are consistent with the important predictive variables identified in our model.

The CCI, widely recognized as a predictor of long-term mortality, also modulates the relationship between AKI severity (particularly stages 2–3) and 1-year mortality [44]. Notably, a lower CCI score has been associated with a higher risk of developing AKI, although this association diminishes in patients with CCI ≥ 8, suggesting a ceiling effect in risk stratification. In patients with CHF, complex hemodynamic changes—including hypotension driven by elevated cardiac filling pressure and increased right ventricular pressure—can impair glomerular filtration and tubular reabsorption. Paradoxically, this may result in a reduced glomerular filtration rate (GFR) and decreased urine output. This pathophysiological mechanism aligns with the observed predictive strength of the congestive_heart_failure feature in our SHAP-based risk model [45].

While dexmedetomidine has shown renoprotective effects in some contexts [46, 47], our findings suggest that its predictive value likely reflects illness severity and ICU-level sedation use, rather than direct causality. Thus, its positive SHAP contribution may serve as a surrogate marker for critical illness.

Conversely, when patients with respiratory failure develop AKI, the pulmonary system becomes further compromised. This is due to compensatory respiratory depression in response to metabolic acidosis, along with hypervolemia, cytokine-induced cytotoxicity, oxidative stress, and accumulated necrotic cellular debris. These factors contribute to pulmonary microvascular injury, ultimately leading to acute respiratory distress syndrome (ARDS) [9, 48], which further deteriorates lung function.

In addition, several other variables—ICU length of stay, SpO₂, alveolar–arterial oxygen gradient, and INR of hemoglobin—have also been validated as important predictors in prior studies and AKI prediction models [22, 28, 29, 31, 49,50,51].

Despite these promising results, several limitations should be noted. First, this was a retrospective, single-center analysis using data from one healthcare system, which may introduce selection bias (e.g., exclusions for < 48-h ICU stay, multiple ICU admissions, and missing key variables) and limit generalizability. Features were primarily collected at ICU admission, without incorporating longitudinal changes or certain clinically relevant predictors such as nephrotoxin exposure, fluid overload, or inflammatory biomarkers. The absence of precise AKI onset timing precluded assessment of performance across different prediction windows (e.g., within 24 vs. 48 h) and made it difficult to distinguish true prediction from early detection of existing but uncoded AKI. Moreover, external validation in independent, multicenter cohorts is currently lacking and will be essential to confirm transportability and mitigate the risk of overfitting. In addition, reliance on broad, non-specific respiratory failure codes may introduce clinical heterogeneity; more refined subtyping could improve predictive specificity. From an implementation standpoint, translating the model into routine practice and scaling it across ICUs pose several challenges: (1) interoperability with diverse electronic health record systems and real-time data availability/latency; (2) variability in case-mix, measurement frequency, laboratory assays, and clinical practices across sites that can cause dataset shift and calibration drift; (3) clinician adoption and workflow integration, including alert burden and the need for clear, actionable thresholds; and (4) ongoing model governance—local threshold tuning, performance monitoring, and periodic recalibration/retraining to maintain safety and reliability.

Future studies should focus on external validation, dynamic updating using real-time data with recorded AKI onset times, and integration into clinical workflows. With further refinement, this model has the potential to support timely AKI prevention and intervention strategies in critically ill patients.

Conclusion

In this retrospective cohort study using the MIMIC-IV database, we developed and validated an XGBoost-based machine learning model to predict acute kidney injury (AKI) in ICU patients with respiratory failure, based solely on clinical data from the first 48 h of ICU admission. The model demonstrated superior predictive performance compared with traditional scoring systems and other supervised learning algorithms, highlighting the importance of early laboratory indicators (e.g., creatinine, blood urea nitrogen, urine output) and comorbidity indices in AKI risk assessment. By simulating a real-world clinical scenario, our approach may facilitate timely risk stratification and early interventions to improve patient outcomes. Nevertheless, the model may partly capture undiagnosed early-stage AKI, and its generalizability is limited by the single-center, retrospective design. Future prospective, multicenter studies incorporating real-time AKI onset data are warranted to further validate and refine its predictive utility.