Application of Machine Learning for Clinical Subphenotype Identification in Sepsis

Hu, Chang; Li, Yiming; Wang, Fengyun; Peng, Zhiyong

doi:10.1007/s40121-022-00684-y

Application of Machine Learning for Clinical Subphenotype Identification in Sepsis

Original Research
Open access
Published: 25 August 2022

Volume 11, pages 1949–1964, (2022)
Cite this article

Download PDF

You have full access to this open access article

Infectious Diseases and Therapy Aims and scope Submit manuscript

Application of Machine Learning for Clinical Subphenotype Identification in Sepsis

Download PDF

Chang Hu^1,2^na1,
Yiming Li^1,2^na1,
Fengyun Wang^1,2^na1 &
…
Zhiyong Peng^1,2

4380 Accesses
5 Citations
10 Altmetric
1 Mention
Explore all metrics

Abstract

Introduction

Sepsis is a heterogeneous clinical syndrome. Identification of sepsis subphenotypes could lead to allowing more precise therapy. However, there is a lack of models to identify the subphenotypes in such patients. Thus, we aimed to identify possible subphenotypes and compare the clinical outcomes for subphenotypes in a large sepsis cohort.

Methods

This machine learning-based, cluster analysis was performed using the Medical Information Mart in Intensive Care (MIMIC)-IV database. We enrolled all adult (> 18 years old) patients diagnosed with sepsis in the first 24 h after intensive care unit (ICU) admission. K-means cluster analysis was performed to identify the number of classes. Multivariable logistic regression models were used to estimate the association between sepsis subphenotypes and in-hospital mortality.

Results

A total of 8817 participants with sepsis were enrolled. The median age was 66.8 (IQR, 55.9–77.1) years, and 38.1% (3361/8817) were female. Two subphenotypes resulted in optimal separation including 11 routinely available clinical variables obtained during the first 24 h after ICU admission. Participants in subphenotype B showed higher levels of lactate, glucose and creatinine, white blood cell count, sodium and heart rate and lower body temperature, platelet count, systolic blood pressure, hemoglobin and PaO₂/FiO₂ ratio. In addition, the in-hospital mortality in patients with subphenotype B was significantly higher than that in subphenotype A (29.4% vs. 8.5%, P < 0.001). The difference was still significant after adjustment for potential covariates (adjusted OR 2.214; 95% CI 1.780–2.754, P < 0.001).

Conclusions

Two sepsis subphenotypes with different clinical outcomes could be rapidly identified using the K-means clustering analysis based on routinely available clinical data. This finding may help clinicians to identify the subphenotype rapidly at the bedside.

Graphical abstract

Identification of the robust predictor for sepsis based on clustering analysis

Article Open access 11 February 2022

Disentangled Hyperspherical Clustering for Sepsis Phenotyping

Combining Biomarkers with EMR Data to Identify Patients in Different Phases of Sepsis

Article Open access 07 September 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

FormalPara Key Summary Points

*Why carry out this study?*
Sepsis is a heterogeneous clinical syndrome characterized by a dysregulated host response to infection
Identifying the sepsis subphenotypes could lead to a better understanding of the pathophysiology and discovery of new treatment targets. However, there is a lack of models to identify the subphenotypes in such patients
The aim of this study was to propose a machine-learning method to identify the sepsis subphenotype, using only routinely available clinical data collected within the first 24 h of ICU admission
*What was learned from the study?*
Machine learning-based algorithms for subphenotype identification in sepsis are possible
Two sepsis subphenotypes could be identified rapidly based on routinely available clinical data
Subphenotype B was independently associated with increased in-hospital mortality among patients with sepsis

Digital Features

This article is published with digital features, including a graphical abstract, to facilitate understanding of the article. To view digital features for this article go to https://doi.org/10.6084/m9.figshare.20418600.

Introduction

Sepsis is a common and frequently fatal clinical condition characterized by a dysregulated host response to infection [1]. Data from the recently epidemiologic studies suggested that the worldwide incidence of sepsis is estimated to 48.9 million cases per year, associated with mortality rates of 20–25% [2, 3]. Although the Surviving Sepsis Campaign Guidelines for Management of Sepsis and Septic Shock have undergone five updates within the last 2 decades since first introduced in 2004, the mortality rate among patients with sepsis remains unacceptably high [4]. A major potential barrier to progress is the heterogeneity in sepsis.

Not all sepsis is the same. However, up to now, a one-size-fits-all approach is still being implemented for clinical practice, which ignores the heterogeneity across sepsis patients. Recently, several studies have accurately identified subphenotypes among sepsis cases; these subphenotypes have different demographics, laboratory values and clinical outcomes [5,6,7,8]. Notably, these methods for subphenotype identification are largely reliant on the measurement of specific protein biomarkers, such as vascular adhesion protein 1 (VAP1), matrix metalloproteinase 8 (MMP8) and proteinase 3 (PRTN3) [7]. However, these variables are not widely available as a routine clinical test, and the high prices for biomarker detection limit the rapid identification of the sepsis subphenotypes in clinical practice. Thus, there is a need to derive the sepsis subphenotypes by using routinely available clinical data in the early intensive care unit (ICU) admission stage.

Machine-learning classifier models that use clinical data could be performed to identify disease subphenotypes. Among them, K-means cluster analysis is a good clustering method that has already gained a wide range of acceptability in medicine [9, 10]. The K-means cluster-based methods have been extensively applied for subphenotype identification of pediatric acute respiratory distress syndrome (ARDS) [11] and sepsis-associated acute kidney injury [12]. However, to the best of our knowledge, there is a lack of models to identify the clinical subphenotypes in patients with sepsis.

Accordingly, the objective of this study was to propose a K-means cluster method to identify the sepsis subphenotype, using only routinely available clinical data collected within the first 24 h of ICU admission.

Methods

Ethical Approval

The data for this study were obtained from the Medical Information Mart for Intensive Care (MIMIC-IV) database. The establishment of the MIMIC-IV database was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because this project did not impact clinical care, and all protected health information was deidentified [13]. This study was performed according to the Declaration of Helsinki in 2013 [14] and reported in accordance with the Strengthening the Reporting of Observational studies in Epidemiology (STROBE) statement [15].

Data Source

As an update lately to the MIMIC-III, the current MIMIC-IV (version 1.0) is a large, freely available, open-access database comprising a variety of clinical-related data associated with 76,540 distinct admissions for patients who stayed in the ICU of the Beth Israel Deaconess Medical Center between 2008 and 2019 [13]. After completing a recognized course in protecting human research participants and signing a data use agreement, one author (Chang Hu) was approved to access the database (certification no. 47460147).

Study Population and Outcome

In this study, sepsis was defined as a confirmed or suspected infection combined with a Sequential Organ Failure Assessment (SOFA) score ≥ 2 in the first 24 h after ICU admission, in accordance with the Third International Consensus Definitions for Sepsis in 2016 [1]. The timeline of sepsis definition was present in Supplementary Fig. S1. We enrolled all adult (> 18 years) septic patients. Exclusion criteria were multiple ICU admissions (only the data of each patient’s first ICU admission were used in this study) and ICU length of stay (LOS) < 24 h. The primary outcome was in-hospital mortality; the secondary outcomes were ICU mortality, ICU length of stay (LOS) and hospital LOS.

Data Extraction and Preprocessing

In the MIMIC-IV database, we extracted a set of clinical variables. As we have previously described [16], these were routinely accessed parameters, including demographic variables (e.g., age, gender, ethnicity, body weight, height, admission time period and admission type), medical history (e.g., hypertension, diabetes, congestive heart failure, cerebrovascular disease, chronic pulmonary disease, liver disease, renal disease, tumor and acquired immune deficiency syndrome), vital signs (e.g., heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, respiratory rate, body temperature and SpO₂), laboratory findings (e.g., blood glucose, lactate, pH, pCO₂, pO₂, base excess, white blood cell, anion gap, bicarbonate, blood urea nitrogen, serum calcium, serum chloride, serum creatinine, serum sodium, serum potassium, serum fibrinogen, international normalized ratio, prothrombin time, partial thromboplastin time, alanine aminotransferase, alkaline phosphatase, aspartate aminotransferase, total bilirubin, amylase, creative phosphokinase, creatine kinase MB, lactate dehydrogenase, PaO₂/FiO₂ ratio, hematocrit, hemoglobin, platelets and albumin), medical treatments (e.g., mechanical ventilation, the time to first dose of antibiotic agents and vasopressor use), urine output and Glasgow Coma Scale score. For each variable, we extracted the most abnormal value recorded within the first 24 h of ICU admission. Missingness is considered to be missing at random, and variables with > 30% missing values were excluded from the analysis. Furthermore, we employed the multiple imputation by chained equations (MICE) method to handle the remaining missing data (MICE package in R).

Variable Selection

In feature selection part, we followed the procedure of Soussi et al. [6] and Zhang et al. [8]. In brief, we first removed the variables with a missing fraction > 30%. Then, we captured the preselected variables based on the prior published literature and their potential association with sepsis onset and progress. The final selection of variables included in the K-means clustering algorithm (11 variables) was made by consensus among two critical care medicine experts: YL and ZP (Supplementary Table S1). The PaO₂/FiO₂ ratio was used for respiratory function; serum creatinine was used for renal function; platelet and hemoglobin were used for hematologic system; the lactate, heart rate, systolic blood pressure and body temperature were used for circulatory system; white blood cell count was used for inflammatory condition; sodium was used for electrolyte parameters; glucose was used for metabolism function.

Subphenotype Classification

In the present study, we employed a K-means clustering algorithm to determine clusters. We first assessed the correlation between candidate variables. Then, all continuous variables were transformed into z-score (mean: 0, standard deviation: -1 to 1) in the algorithm. Two to eight clusters were compared in this unsupervised approach. We determined the optimal number of clusters based on the analysis of total within sum of squares (WSS), the Silhouette score (ranged from − 1 to 1, a value closer to 1 being better), Davies-Bouldin score (ranged from 0 upward, a value closer to 0 being better) and Calinski-Harabasz score (ranged from 0 upward, higher value being better). Finally, we used a principal component analysis (PCA) to visualize the clustering results.

Statistical Analysis

Categorical variables were expressed as number and percentage and tested for baseline comparability with the chi-square test or Fisher’s exact test, as appropriate. Continuous variables were expressed as mean and standard deviation or median and interquartile range (IQR) and compared between the two groups using the Student’s t-test or the Mann-Whitney test, as appropriate.

Multivariable logistic regression models were used to estimate the association between sepsis subphenotypes and in-hospital mortality. Unadjusted and adjusted odd ratios (ORs) and 95% confidence intervals (CIs) were calculated using multiple logistic regression models. In the unadjusted model, we tested the direct effect of sepsis subphenotypes on in-hospital mortality. Model 1 was adjusted for demographic variables (age, gender, body weight, height, racial, admission time period and admission type). Model 2 was adjusted for the covariates included in model 1 as well as the comorbidities (Charlson comorbidity index). Model 3 was adjusted for the covariates included in model 2 as well as the neurologic function (GCS). Model 4 was adjusted for the variables in model 3 along with the severity of illness score (SOFA). Model 5 incoporated model 4 along with medical treatments (antibiotic therapy on day 1, mechanical ventilation on day 1 and vasopressor use on day 1).

A two-sided P value < 0.05 was considered statistically significant. All analyses were carried out using SPSS statistical software version 24.0 (IBM), R statistical software version 3.6.1 (R Foundation) and Python software version 3.6 (Fig. 1).

Results

Characteristics of Cohorts

In the current analysis, 12,292 patients were screened with sepsis in the MIMIC-IV database. After all exclusions (2657 patients with multiple ICU admission and 818 patients whose LOS was < 24 h in the ICU), 8817 participants were finally enrolled in this study (Supplementary Fig. S2). Of the 8817 cases, the median age was 66.8 years (IQR 55.9–77.1 years), and 38.1% (3361/8817) were female. The most frequently documented race and ethnicity category was White (5887/8817, 66.8%), followed by Black/African American (467/8817, 5.3%), Hispanic/Latinx (241/8817, 2.7%) and Asian (205/8817, 2.3%). The top three most frequently reported comorbidities included hypertension (4256/8817, 48.3%), diabetes (2528/8817, 28.7%) and congestive heart failure (2365/8817, 26.8%) (Table 1). The demographics at baseline are provided in Table 1.

Table 1 Demographics at baseline

Full size table

Among 8817 patients, 71.5% (6307/8817) were administered the first antibiotic dose in the first 24 h after ICU admission, 59.5% (5242/8817) received mechanical ventilation on day 1, and 39.8% (3510/8817) were given first vasopressor dose in the first 24 h after ICU admission. The median SOFA score was 5 (IQR: 5–8) across the whole dataset, a result indicating more severe illness. The overall all-cause in-hospital mortality rate was 12.6% (1107/8817) (Table 2).

Table 2 Clinical characteristics of the cohort

Full size table

Derivation of Sepsis Subphenotypes

Supplementary Table 1 presents the 11 clinical variables covering the functions of several organs. Significant correlations are also noted between the candidate variables (Fig. 2). The highest correlations were observed between glucose and lactate (r = 0.3) and heart rate and body temperature (r = 0.3). Supplementary Table S2 displays a summary of K-means clustering ranging from 2 to 7 for this cohort, respectively. Clustering with K = 2 was found to have a higher Silhoutte score of 0.24 and a higher Calinski-Harabasz score of 879 compared with other classes. The within-cluster variance graph is presented in Supplementary Fig. S3. Therefore, a two-class model provided the optimal fit in this study. For simplicity, we henceforth refered to the two classes as subphenotypes A (N = 7094) and B (N = 1723), respectively. For easier exploration and visualization of two subphenotypes, we also created two-dimensional images using principal component analysis (PCA) to mark the differences between before and after clustering (Fig. 3).

Characteristics of Each Subphenotype

Figure 4 shows the selected variables for the two subphenotypes. Compared with subphenotype A, subphenotype B was defined by considerably higher levels of lactate, glucose, creatinine, white blood cell count and sodium; higher heart rate; and lower body temperature, platelet, systolic blood pressure, hemoglobin and PaO₂/FiO₂ ratio. Specifically, the details for comparison of these variables between the two subphenotypes are presented in Fig. 5.

Baseline characteristics of study participants according to subphenotype are also shown in Tables 1 and 2. There were no significant differences in age, gender and height between subphenotype A and subphenotype B. Subjects in subphenotype A vs subphenotype B were more likely to be White (68.3% vs. 60.3%). In addition, participants in subphenotype A vs subphenotype B had lower Charlson Comorbidity Index (5 [3–7] vs. 6 [4–8], P < 0.001) and lower need for antibiotic therapy (70.9% vs. 74.2%, P < 0.001), mechanical ventilation (57.0% vs. 69.6%, P < 0.001) and vasopressor use (34.9% vs. 60.0%, P < 0.001).

Clinical Outcomes of Each Subphenotype

Results for all outcomes are shown in Table 2. The in-hospital and ICU mortality in subphenotypes B were significantly higher than those in subphenotype A (29.4% vs. 8.5%, P < 0.001; 25.4% vs. 6.0%, P < 0.001; respectively). Furthermore, the lengths of hospital stay and ICU stay were significantly longer in patients with subphenotype B compared with those in subphenotype A (10.6 [5.5–18.9] vs. 7.9 [5.1–13.1], P < 0.001; 4.6 [2.4–9.7] vs. 2.8 [1.5–5.3], P < 0.001; respectively).

Results of the univariable and multivariate logistic regression analysis for the primary outcome are presented in Table 3. In a univariable analysis, subjects in subphenotype B were associated with increased in-hospital mortality (OR 4.492; 95% CI 3.932–5.132; P < 0.001). After adjusting for multiple potential confounders using several multivariate logistic regression models, we found that subphenotype B was independently associated with increased risk of in-hospital mortality compared with subphenotype A (adjusted OR 2.214; 95% CI 1.780–2.754, P < 0.001).

Table 3 Univariate and multivariate analysis of sepsis subphenotypes associated with in-hospital mortality for all included patients

Full size table

Discussion

In this study, we demonstrated that K-means cluster analysis, using only routinely available clinical data as factors, could accurately identify the sepsis subphenotypes. We captured 11 representative and easily accessible variables related to different organ systems and observed important differences between the two identified subphenotypes. Additionally, patients in subphenotype B had significant higher mortality even after adjusting for potential confounders compared with those in subphenotype A. Taken together, this finding might be a valuable tool for prognostication stratification of sepsis in the clinical practice.

Data-driven cluster analysis is widely used for diseases classification [17]. Among several cluster analysis methods, the K-means algorithm is the most popular machine-learning clustering algorithm [18]. To date, several researchers have successfully applied K-means cluster analysis to identify asthma phenotypes [17], Parkinson's disease subtypes [19] and complex regional pain syndrome phenotypes [20]. Thus, the K-means cluster analysis used in this study seemed to be an appropriate choice since it maximizes separation between clusters and offering the greatest scope for identifying distinct groups within the patients [16].

Identification of distinct subphenotypes in sepsis is a key component of personalised medicine, which may help in better risk stratification and treatment decisions. Notably, how to translate research into clinical practice remains one of the biggest challenges in subphenotypes identification [10]. For example, HBP (heparin-binding protein), Ela (neutrophil elastase 2), PRTN3 and MMP8 have been reported to be the key factors in phenotype identification for septic acute kidney injury [7]. However, these variables were not widely available as a routine clinical test, which failed to translate into clinically useful applications. In the current study, we used 11 routinely available clinical features for deriving the clinical phenotypes of sepsis in the first 24 h after ICU admission. The value of these parameters reflected the state of different target organs. These findings allowed the realization of personalized physiologic medicine to be practiced at the bedside for critically ill patients with sepsis.

Understanding of the underlying pathophysiologic mechanisms behind the subphenotype identification could help identify the subphenotype in patients with sepsis. Of the two subphenotypes identified in sepsis, subphenotype B was associated with higher levels of lactate, glucose and creatinine and lower levels of hemoglobin and PaO₂/FiO₂ ratio. Lactate is a marker of abnormal microcirculation [21], reflective of tissue hypoperfusion and cellular hypoxia [22], and has been reported to be a sensitive indicator to predict prognosis in sepsis [23]. The glycometabolism disorder is common in critically ill patients, especially those with sepsis [24]. A mild elevation of glucose is acceptable because it allows the host to survive during severe stress. However, an excessively high level of glucose may cause immunosuppression and oxidative stress, which were associated with worse outcomes [25, 26]. Serum creatinine is the most widely used measure of renal function in clinical practice. The significantly elevated creatinine in subphenotype B revealed the subjects in this group had a higher proportion of renal dysfunction compared with patients in subphenotype B [27]. This result is in accord with a recent study in which the serum creatinine was a predictor of mortality in sepsis [6]. Additionally, individuals in subphenotype B were found to have a lower PaO₂/FiO₂ ratio compared with those in subphenotype A. Previous study also demonstrated that the PaO₂/FiO₂ was an important indicator in respiratory function and widely used to differentiate groups of patients at high risk for adverse clinical outcomes [28]. Taken together, it is not difficult to understand why patients in subphenotype B had worse clinical outcomes compared to those in subphenotype A.

In the current study, we used a classification approach based on routinely available clinical data to yield insights into sepsis subphenotypes. Embedding this classification model into the electronic health record (EHR) system would allow for rapid bedside screening. Moreover, automating the capture and processing of an abundant data stream in the ICU would contribute to evaluating prognostic or therapeutic differences among septic patients. However, this advanced method for sepsis classification would require additional external validation before clinical implementation.

This study has several limitations. First, because the etiology of sepsis is complex, it might be not enough to discover all the subphenotypes using known rules and some important variables. Additionally, the MIMIC-IV database covers the period from 2008 to 2019 without the exact year of patient admission, and changes in the management of sepsis may have occurred in the interim, which increased the bias. We have attempted to partly mitigate this effect by applying a unified definition of sepsis (Sepsis-3) and divided patients into four groups in terms of admission time period (2008–2010, 2011–2013, 2014–2016 and 2017–2019) for enrollment in the model. The results held true even after adjusting for admission time period. Second, this study only focused on baseline data with the most abnormal value in the first 24 h of ICU admission, which limited dynamic classification for sepsis subphenotype. On the other hand, the natural progression of sepsis over time might lead to changing of subphenotype. Nonetheless, our static classification model could provide potential suggestions for care providers and prevent alert fatigue, which was blamed for high override rates in dynamic systems. Third, the inclusion of potential indicators may provide additional insights, such as type of infection, infection site, microbiology data, cigarette smoking data as well as drinking data. However, these were not available in this clinical database. Fourth, the lack of lactate and PaO₂/FiO₂ values was 22.18% and 27.39%, respectively. Although we used a multiple imputation approach to handle these missing data in the clustering model, this may still produce biased estimates of the relative risk. Fifth, sepsis was defined as a confirmed or suspected infection combined with a SOFA score ≥ 2. However, we could not obtain the exact values except ICD-9 (or ICD-10) codes for chronic conditions in the electronic database. Therefore, we could not calculate the baseline SOFA score, which may have overestimated sepsis, and the results should be interpreted with caution. Sixth, the variables of clinical subphenotypes in this study were derived from a single-center retrospective database in the USA. Thus, it remains unknown whether these subphenotypes exist outside studies in a more diverse population of critically ill patients with sepsis. Additional studies are needed to further verify and validate the two distinct subphenotypes of sepsis.

Conclusion

Two sepsis subphenotypes with different clinical outcomes can be rapidly identified using the K-means clustering analysis based on routinely available clinical data. This finding may help clinicians to rapidly and easily identify the subphenotype of sepsis at the bedside.

References

Singer M, Deutschman CS, Seymour CW, et al. The third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA. 2016;315(8):801–10.
Article CAS PubMed PubMed Central Google Scholar
Rudd KE, Johnson SC, Agesa KM, et al. Global, regional, and national sepsis incidence and mortality, 1990–2017: analysis for the Global Burden of Disease Study. Lancet. 2020;395(10219):200–11.
Article PubMed PubMed Central Google Scholar
Rhee C, Dantes R, Epstein L, et al. Incidence and trends of sepsis in US hospitals using clinical vs claims data, 2009–2014. JAMA. 2017;318(13):1241–9.
Article PubMed PubMed Central Google Scholar
Evans L, Rhodes A, Alhazzani W, et al. Surviving sepsis campaign: international guidelines for management of sepsis and septic shock 2021. Intensive Care Med. 2021;47(11):1181–247.
Article PubMed PubMed Central Google Scholar
Seymour CW, Kennedy JN, Wang S, et al. Derivation, validation, and potential treatment implications of novel clinical phenotypes for sepsis. JAMA. 2019;321(20):2003–17.
Article CAS PubMed PubMed Central Google Scholar
Soussi S, Sharma D, Juni P, et al. Identifying clinical subtypes in sepsis-survivors with different one-year outcomes: a secondary latent class analysis of the FROG-ICU cohort. Crit Care. 2022;26(1):114.
Article PubMed PubMed Central Google Scholar
Wiersema R, Jukarainen S, Vaara ST, et al. Two subphenotypes of septic acute kidney injury are associated with different 90-day mortality and renal recovery. Crit Care. 2020;24(1):150.
Article PubMed PubMed Central Google Scholar
Zhang Z, Zhang G, Goyal H, Mo L, Hong Y. Identification of subclasses of sepsis that showed different clinical outcomes and responses to amount of fluid resuscitation: a latent profile analysis. Crit Care. 2018;22(1):347.
Article PubMed PubMed Central Google Scholar
Sanchez-Pinto LN, Luo Y, Churpek MM. Big data and data science in critical care. Chest. 2018;154(5):1239–48.
Article PubMed PubMed Central Google Scholar
Reddy K, Sinha P, O’Kane CM, Gordon AC, Calfee CS, McAuley DF. Subphenotypes in critical care: translation into clinical practice. Lancet Respir Med. 2020;8(6):631–43.
Article PubMed Google Scholar
Yehya N, Varisco BM, Thomas NJ, Wong HR, Christie JD, Feng R. Peripheral blood transcriptomic sub-phenotypes of pediatric acute respiratory distress syndrome. Crit Care. 2020;24(1):681.
Article PubMed PubMed Central Google Scholar
Chaudhary K, Vaid A, Duffy A, et al. Utilization of deep learning for subphenotype identification in sepsis-associated acute kidney injury. Clin J Am Soc Nephrol. 2020;15(11):1557–65.
Article PubMed PubMed Central Google Scholar
Johnson AE, Pollard TJ, Shen L, et al. MIMIC-III, a freely accessible critical care database. Scientific data. 2016;3: 160035.
Article CAS PubMed PubMed Central Google Scholar
World Medical A. World Medical Association declaration of helsinki: ethical principles for medical research involving human subjects. JAMA. 2013;310(20):2191–4.
Article Google Scholar
von Elm E, Altman DG, Egger M, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet. 2007;370(9596):1453–7.
Article Google Scholar
Hu C, Li L, Huang W, et al. Interpretable machine learning for early prediction of prognosis in sepsis: a discovery and validation study. Infect Dis Ther. 2022;11(3):1117–32.
Article PubMed PubMed Central Google Scholar
Haldar P, Pavord ID, Shaw DE, et al. Cluster analysis and clinical asthma phenotypes. Am J Respir Crit Care Med. 2008;178(3):218–24.
Article PubMed Google Scholar
Selim SZ, Ismail MA. K-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans Pattern Anal Mach Intell. 1984;6(1):81–7.
Article CAS PubMed Google Scholar
Pourzinal D, Yang JHJ, Byrne GJ, et al. Identifying subtypes of mild cognitive impairment in Parkinson’s disease using cluster analysis. J Neurol. 2020;267(11):3213–22.
Article PubMed Google Scholar
Dimova V, Herrnberger MS, Escolano-Lozano F, et al. Clinical phenotypes and classification algorithm for complex regional pain syndrome. Neurology. 2020;94(4):e357–67.
Article CAS PubMed Google Scholar
Ryoo SM, Lee J, Lee YS, et al. Lactate level versus lactate clearance for predicting mortality in patients with septic shock defined by sepsis-3. Crit Care Med. 2018;46(6):e489–95.
Article CAS PubMed Google Scholar
Zhai X, Yang Z, Zheng G, et al. Lactate as a potential biomarker of sepsis in a rat cecal ligation and puncture model. Mediators Inflamm. 2018;2018:8352727.
Article PubMed PubMed Central Google Scholar
Scott HF, Brou L, Deakyne SJ, Kempe A, Fairclough DL, Bajaj L. Association between early lactate levels and 30-day mortality in clinically suspected sepsis in children. JAMA Pediatr. 2017;171(3):249–55.
Article PubMed Google Scholar
Van den Berghe G, Wilmer A, Hermans G, et al. Intensive insulin therapy in the medical ICU. N Engl J Med. 2006;354(5):449–61.
Article PubMed Google Scholar
Wang W, Chen W, Liu Y, et al. Blood glucose levels and mortality in patients with sepsis: dose-response analysis of observational studies. J Intensive Care Med. 2021;36(2):182–90.
Article PubMed Google Scholar
Lu Z, Tao G, Sun X, et al. Association of blood glucose level and glycemic variability with mortality in sepsis patients during ICU hospitalization. Front Public Health. 2022;10: 857368.
Article PubMed PubMed Central Google Scholar
De Rosa S, Samoni S, Ronco C. Creatinine-based definitions: from baseline creatinine to serum creatinine adjustment in intensive care. Crit Care. 2016;20:69.
Article PubMed PubMed Central Google Scholar
Villar J, Perez-Mendez L, Blanco J, et al. A universal definition of ARDS: the PaO2/FiO2 ratio under a standard ventilatory setting—a prospective, multicenter validation study. Intensive Care Med. 2013;39(4):583–92.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank the participants of the study and the MIMIC-IV program for access to the database.

Funding

This work and the journal’s rapid service fee were funded by the National Natural Science Foundation of China (grants 81772046 and 81971816 to Dr. Peng) and the Subject Cultivation Project of Zhongnan Hospital of Wuhan University (Zhiyong Peng, no. ZNXKPY2021001).

Authorship

All named authors meet the International Committee of Medical Journal Editors (ICMJE) criteria for authorship for this article, take responsibility for the integrity of the work as a whole, and have given their approval for this version to be published.

Author Contributions

CH and ZP designed this study; CH and YL were responsible for the data collection; YL and FW were responsible for data analysis; CH and ZP conducted the manuscript writing; ZP critically revised the manuscript. All authors read and approved the final manuscript.

Disclosures

Chang Hu, Yiming Li, Fengyun Wang and Zhiyong Peng have nothing to disclose.

Compliance with Ethics Guidelines

The data for this study were obtained from the Medical Information Mart for Intensive Care (MIMIC-IV) database. The establishment of the MIMIC-IV database was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). Requirement for individual patient consent was waived because this project did not impact clinical care and all protected health information was deidentified. This study was performed according to the Declaration of Helsinki in 2013 and reported in accordance with the Strengthening the Reporting of Observational studies in Epidemiology (STROBE) statement.

Data Availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Author information

Chang Hu, Yiming Li and Fengyun Wang contributed equally and share the first authorship.

Authors and Affiliations

Department of Critical Care Medicine, Zhongnan Hospital of Wuhan University, Wuhan, 430071, Hubei, China
Chang Hu, Yiming Li, Fengyun Wang & Zhiyong Peng
Clinical Research Center of Hubei Critical Care Medicine, Wuhan, 430071, Hubei, China
Chang Hu, Yiming Li, Fengyun Wang & Zhiyong Peng

Authors

Chang Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yiming Li
View author publications
You can also search for this author in PubMed Google Scholar
Fengyun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiyong Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhiyong Peng.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 239 KB)

Supplementary file2 (DOCX 32 KB)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which permits any non-commercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc/4.0/.

Reprints and permissions

About this article

Cite this article

Hu, C., Li, Y., Wang, F. et al. Application of Machine Learning for Clinical Subphenotype Identification in Sepsis. Infect Dis Ther 11, 1949–1964 (2022). https://doi.org/10.1007/s40121-022-00684-y

Download citation

Received: 29 June 2022
Accepted: 02 August 2022
Published: 25 August 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s40121-022-00684-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Application of Machine Learning for Clinical Subphenotype Identification in Sepsis

Abstract

Introduction

Methods

Results

Conclusions

Graphical abstract

Similar content being viewed by others

Identification of the robust predictor for sepsis based on clustering analysis

Disentangled Hyperspherical Clustering for Sepsis Phenotyping

Combining Biomarkers with EMR Data to Identify Patients in Different Phases of Sepsis

Digital Features

Introduction

Methods

Ethical Approval

Data Source

Study Population and Outcome

Data Extraction and Preprocessing

Variable Selection

Subphenotype Classification

Statistical Analysis

Results

Characteristics of Cohorts

Derivation of Sepsis Subphenotypes

Characteristics of Each Subphenotype

Clinical Outcomes of Each Subphenotype

Discussion

Conclusion

References

Acknowledgements

Funding

Authorship

Author Contributions

Disclosures

Compliance with Ethics Guidelines

Data Availability

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

Supplementary file1 (DOCX 239 KB)

Supplementary file2 (DOCX 32 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation