Abstract
Traumatic brain injury (TBI) is a complex condition where heterogeneity impedes the advancement of care. Understanding the diverse presentations of TBI is crucial for personalized medicine. Our study aimed to identify clinically relevant patient endotypes in TBI using latent class analysis based on comorbidity data. We used the Medical Information Mart for Intensive Care III database, which includes 2,629 adult TBI patients. We identified five stable endotypes characterized by specific comorbidity profiles: Heart Failure and Arrhythmia, Healthy, Renal Failure with Hypertension, Alcohol Abuse, and Hypertension. Each endotype had distinct clinical characteristics and outcomes: The Heart Failure and Arrhythmia endotype had lower survival rates than the Renal Failure with Hypertension despite featuring fewer comorbidities overall. Patients in the Hypertension endotype had higher rates of neurosurgical intervention but shorter stays in contrast to the Alcohol Abuse endotype which had lower rates of neurosurgical intervention but significantly longer hospital stays. Both endotypes had high overall survival rates comparable to the Healthy endotype. Logistic regression models showed that endotypes improved the predictability of survival compared to individual comorbidities alone. This study validates clinical endotypes as an approach to addressing heterogeneity in TBI and demonstrates the potential of this methodology in other complex conditions.
Similar content being viewed by others
Introduction
Traumatic brain injury (TBI) is a leading cause of death and disability worldwide1 with incidence of 0.2–0.5% per year2. It is a complex, heterogeneous condition affecting a broad population, with a wide range of clinical presentations and outcomes. Despite advances toward understanding the pathophysiology of TBI3,4, outcome-improving treatments remain rare. A potential explanation for this lack of progress is the heterogeneity of TBI, which may underlie the lack or heterogeneity of benefit observed in multiple large-scale TBI trials (CRASH, DECRA, RESCUE-ICP)5,6,7. One of the important frontiers of TBI research is to deconvolve the condition into a spectrum of disorders that manifest differently between patients8. Precision medicine aims to address the shortcomings of our current reductionist approach to the treatment of TBI, instead considering clinical characteristics in individual patients. This strategy expects to identify relatively homogenous patient subgroups that share treatment responses. Precision approaches appear well adapted to increasingly data-rich healthcare systems9.
The evolution of high-dimensional longitudinal databases and adaptation of data analytics have improved insight into disease heterogeneity10,11. Specifically, the analysis of completed clinical trials or molecular datasets on acute respiratory distress syndrome12, asthma13,14 and sepsis15 have identified homogenous subgroups with distinct clinical features, transforming clinical practice. These analyses promoted the concept of patient “endotypes”, defined as patient subgroups that share clinical and/or molecular characteristics. Endotype research on TBI patient subgroups aims to identify homogeneous populations that share diagnostic criteria, outcomes, and treatment response16,17. The identification of such relevant endotypes in TBI may enhance our understanding of disease mechanisms, enable targeted therapeutic approaches, and facilitate stratification for randomized controlled trials.
Comorbidity, the coexistence of two or more chronic conditions, is increasingly common secondary to aging populations18 with prevalence between 15 and 54%19. Within TBI heterogeneity, patient comorbidity is a commonly reported prognostic factor20,21,22. However, comorbidities do not typically arise in patients randomly. Often, they follow distinct patterns of co-occurrence due to underlying pathophysiological mechanisms. Latent class analysis (LCA) is a well-validated unsupervised machine learning algorithm that can be used as an objective way to discover co-occurrence patterns of comorbidities. LCA assumes unobserved (or latent) clusters underlying the data and uses a probabilistic approach to determine the set of clusters that best explains the observed data23. Given comorbidity data inputs, these latent clusters represent patient subgroups with similar comorbidity profiles. LCA has been successfully implemented to describe endotypes in ARDS12, sepsis15, diabetes24, and obesity25.
In this study, we hypothesize that TBI patient outcomes vary between comorbidity endotypes. Our primary outcome of interest was survival to discharge, while our secondary outcomes included surgical intervention and duration of hospital stay. We leveraged the prospective Medical Information Mart for Intensive Care III (MIMIC-III) dataset to create a study cohort of adult patients with TBI and used LCA to identify comorbidity endotypes with distinct characteristics. These endotypes may help formalize clinical intuition about prognostication of groups of patients, inform recruitment to clinical trials, and contribute towards the foundation for improved treatment of TBI.
Methods
MIMIC III database
Patient information on clinical characteristics and outcomes were extracted from MIMIC-III, a large, single-centre database including 38,597 adult (≥ 16 years old) patients admitted to critical care units at, Beth Israel Deaconess Medical Center, a large tertiary care hospital in Boston, MA26. TBI patients were identified from MIMIC-III based on International Classification of Diseases, Ninth Edition (ICD-9) codes. Our study cohort was selected based on established ICD-9 diagnostic codes previously shown to capture the diagnosis of TBI and can be found in Supplementary Table 127,28,29. We extracted data on patient sex, date of birth, admission date, discharge date, discharge status, ICD-9 diagnosis codes (for patient comorbidities), ICD-9 procedure codes, and initial Glasgow Coma Score (GCS). In patients with re-admission, only data from their first admission was used as this represents the acute TBI treatment phase of interest.
Data variable construction
The initial total GCS score for each patient was used in defining injury severity as mild (GCS 14–15), moderate (GCS 9–13), or severe (GCS 3–8), as per Brain Trauma Foundation Guidelines30. Age was stratified into three bins: young (16–39), middle-aged (40–69), and old (70 +). We used previously published procedure codes that capture intracranial pressure monitor placement, craniotomy/craniectomy, and ventriculostomy, to derive a logistical variable for neurosurgical intervention28.
Definition of comorbidities
We used the well-established Elixhauser comorbidity index31 to define chronic conditions by implementing an algorithm provided by the authors of the MIMIC-III database32. This algorithm groups ICD-9 codes into 30 chronic conditions to define comorbidity categories. The output is a binary matrix reflecting the presence or absence of a morbidity category by patient ID which was used in subsequent analyses.
Network discovery
We implemented network discovery to identify co-occurrences in morbidities as described previously11. Relative risk (RR) of observing a pair of comorbidities affecting the same patient was calculated using the formula described by Hidalgo et al.33. The RR is calculated from the number of patients affected by the two comorbidities, study population, and prevalence of each comorbidity (Supplementary Fig. 1). We then constructed a network of associated comorbidities by only including associations over the significance threshold of p < 0.05 to visualize the relationships between various conditions. Within this network graph, each node represents an Elixhauser comorbidity with node size corresponding to its prevalence and each edge between nodes representing an association between comorbidities with the color corresponding to the RR of the pair and the width to the number of co-occurrences. The colormap for RRs was capped at 5.0 to prevent outliers from skewing visualization.
Latent class analysis and cluster stability
Latent class analysis (LCA) uses structural equation modeling to identify different subgroups (endotypes) within study populations that share certain characteristics. The R package “poLCA”34 was used to identify patient endotypes based on the derived 30 Elixhauser comorbidities. The LCA algorithm fits mixture models to the input data and, through optimization steps, determines the latent classes (endotypes) based on the best-fitting model. For optimization, we used previously published guidance35,36, including the elbow method and Akaike information criteria (AIC) to find the optimal number of endotypes. This method involves finding the inflection point of the graph and plotting the number of clusters against an evaluation metric (AIC in this case) which can be seen in Supplementary Fig. 2. We used the R package “inflection”37 to objectively determine the inflection point to be at five clusters which represents the optimal number of endotypes.
Due to the stochastic nature of implementing LCA, we took the following additional measures to ensure model validity and cluster stability. We repeated the LCA clustering algorithm on the same data to generate 30 sets of five endotypes. The comorbidity distributions of all 150 total endotypes were then analyzed using latent profile analysis (LPA) which, like LCA, looks for clusters, but in continuous variables rather than categorical. This was done via the R package “mClust”38 and compressed the 150 endotypes into a smaller number of recurring endotypes which are visualized in Supplementary Fig. 3. We found that the most well-fit models (models with the lowest AIC) had statistically identical endotype comorbidity profiles. Therefore, we decided to use a representative model within these low-AIC models for our subsequent analysis as this model was inferred to contain endotypes with the greatest stability. Resultant endotypes were characterized clinically through consensus agreement among study authors.
Quantitative analysis/statistics and visualization
Absolute counts with percentages were reported for categorical data and mean with standard deviation (SD) for continuous data. Statistics were completed using R packages “stats”39 and “multcomp”40 and Python libraries “statsmodels”41 and “SciPy”42. The primary outcome was defined as survival to discharge and the secondary outcomes as hospital length of stay (LOS), and neurosurgical intervention.
The Pearson Correlation analysis was conducted to assess the collinearity between age and the number of comorbidities as these were the most likely variables to demonstrate correlation. Only a weak correlation between age and number of comorbidities was found so we used multivariable logistic regression to analyze the relationship between our primary outcome of survival to discharge and number of comorbidities while accounting for key prognostic factors: age, sex, and GCS.
Pairwise chi-squared (survival and intervention rates) and t-tests (LOS) were used for the initial comparisons of clinical outcomes across endotypes and P-values were adjusted using the Holm-Bonferroni method. Multivariable logistic regression and Analysis of Covariance (ANCOVA) was used for more in-depth comparisons of clinical outcomes across endotypes: Multivariable logistic regression analysis was used to compare survival to discharge (yes/no) and neurological intervention (yes/no) between comorbidity endotypes (using the “Healthy” (HE) endotype as the reference category) adjusting for age, sex, and GCS. The odds ratios (OR) with 95% confidence intervals (CI) were reported where appropriate. ANCOVA was used to compare hospital LOS between comorbidity endotypes while adjusting for age, sex, and GCS with a Tukey post-hoc for comparisons between each pair combination of endotypes.
To obtain additional analysis resolution for our primary outcome of survival, we stratified patients within endotypes into groups based on age and GCS bins and conducted paired t-tests between the strata of the endotypes.
Additionally, we created the following multivariable logistic regression models to isolate and assess the performance of endotypes in predicting our primary outcome of survival: (1) model 1 using only the 30 Elixhauser comorbidity categories, (2) model 2 using only the 5 derived comorbidity endotypes and, (3) model 3 using both the 30 Elixhauser comorbidity categories and 5 derived comorbidity endotypes. These models were compared using ANOVA with F scores and P-values reported. Area under the curve (AUC) was also calculated for each model using the R package “pROC”43. The above analysis was repeated after adding age, GCS, and sex as covariates to models to assess the effect of Elixhauser comorbidities and endotypes on overall survival prediction performance. Logistic regression models and AUC calculations were also completed for a base model that included only age, GCS, and sex as well as models using only these individual variables. Cases with missing data were excluded from statistical analyses. Figures were generated via the R library “ggplot2” and the Python libraries “matplotlib” and “networkX”.
Results
Comorbidity characterization
We included 2,629 patients in our analysis. The mean age was 59.0, 61.9% were male, average GCS was 10.4, the mean number of comorbidities was 2.0, and the rate of survival to discharge was 36.7%. The demographics of our study cohort are summarized in Table 1.
The relationship between age, number of comorbidities, and survival can be visualized in Fig. 1. Age and comorbidity count was only weakly associated (r = 0.20, p < 0.001). We then tested the association of the comorbidity count with the survival to discharge after accounting for age and GCS score in a multivariable logistic regression model and found that the number of comorbidities was negatively associated with survival (OR for survival per one increase in comorbidity count = 0.984, CI: 0.975–0.994).
We next investigated the pattern of comorbidities, specifically their associations in the entire cohort to test our hypothesis that many comorbidities co-occur. These relationships between the Elixhauser comorbidity groups are visualized by the network graph in Fig. 2. This network discovery of the data revealed intuitive subgroups: congestive heart failure (CHF) with arrhythmia (RR = 2.76, CI: 2.73–2.79), drug abuse with alcohol abuse (RR = 3.02, CI: 2.95–3.11), and renal failure with complicated hypertension (RR = 16.27, CI: 15.96–16.58).
Clinical characterization of comorbidity endotypes
We next analyzed the comorbidity data for the presence of endotypes using LCA. After 30 iterations, five stable clusters of patient endotypes emerged, detailed in Fig. 3 by their comorbidity probability distribution. Endotype 1 was labeled Heart Failure and Arrhythmia (HFA), characterized by cardiac comorbidities. Endotype 2 featured few comorbidities and was labelled Healthy (HE) and used as the reference endotype for subsequent analyses. Endotype 3 is characterized mainly by high rates of renal failure with complicated hypertension and was labelled Renal Failure with Hypertension (RFH). Endotype 4 was labelled Alcohol Abuse (AA), featuring high rates of alcohol abuse and relatively low rates of other comorbidities. Endotype 5 had high rates of hypertension and relatively low rates of other comorbidities, labelled Hypertension (HTN).
The distribution of TBI injury severity categories was relatively similar although HE and AA endotypes contained slightly larger proportions of severe TBI at 46.9% and 39.4%, respectively (Table 2). The HFA and RFH endotypes had nearly identical age distributions of mostly older individuals in addition to a small subpopulation of middle-aged individuals (17.9% middle-aged, 82.4% old vs 19.7% middle-aged, 80.9% old, respectively). The HTN endotype was composed mainly of older adults (68.6%) with a moderate proportion of middle-aged individuals (28.3%) and a small subpopulation of young individuals (3.7%). The AA group was primarily composed of middle-aged individuals (73.7%). HE endotype was composed of mainly young (47.6%) and middle-aged (40.0%) individuals. There were minor sex differences among a few endotypes, contributed to mainly by the young age segment which consistently had a higher proportion of male patients; middle-aged and old age groups across endotypes had comparable sex distributions (Supplementary Table 2). For comorbidity count, HE had the least number of comorbidities (0.8 ± 0.8) followed by HTN (2.5 ± 1.2) while the other comorbidities had higher averages within a similar range (AA: 4.0 ± 1.2, HFA: 4.5 ± 1.6, RFH: 5.5 ± 2.1).
Each endotype had distinct rates of survival to discharge (Table 3). HE had the highest survival to discharge, followed by AA, HTN, RFH, and HFA. Survival differences between all endotypes were statistically significant (HE 78.5%, AA 70.1%, HTN 51.4%, RFH 40.4%, HFA 27.8%, p < 0.05 for all comparisons). RFH had the lowest rate of neurosurgical intervention (13.2%) which was statistically lower than HFA (27.3%) and HTN (25.7%) which had the highest intervention rates (RFH vs HFA p = 0.033, RFH vs HTN p = 0.022). All other comparisons of intervention rates were non-significant between endotypes). For LOS, endotypes were split into long (HFA: 10.2 ± 7.2 days, RFH: 8.7 ± 6.0 days, AA: 10.6 ± 7.5 days,) and short (HE: 7.3 ± 6.9 days, HTN: 7.7 ± 6.4 days) LOS groups in where p < 0.05 for comparisons between endotypes across groups and p > 0.05 for comparisons of endotypes within a group.
Survival to discharge
Multivariable logistic regression models accounting for age, sex, and GCS showed that the endotypes HE, AA, and HTN had the highest survival rates (Table 4). Using the HE endotype as reference, AA (OR = 0.999, CI: 0.935–1.066) and HTN (OR = 0.962, CI: 0.919–1.007) had statistically similar survival rates compared to HE. HFA had the lowest survival rate (OR with respect to HE = 0.803, CI: 0.747–0.862) which was statistically different from all other endotypes. RFH had lower survival than HE (OR with respect to HE = 0.889, CI: 0.821–0.964) but higher than HFA (OR with respect to HFA = 1.108, CI: 1.011–1.214). Survival rates when accounting for age, sex, and GCS are as such: HE, AA, HTN > RFH > HFA. In contrast to the initial analyses, comparison of survival rates between HE, AA, and HTN becomes statistically insignificant when including age (OR per increase in age = 0.991, CI: 0.991–0.992) and GCS (OR per increase in GCS score = 1.028, CI: 1.024–1.031).
Additional investigation of these three endotypes based on stratification by age and GCS revealed that HE had higher survival in the middle-aged group compared to AA (80.3% vs 69.8%, p = 0.010) and was similar in all age groups in comparison to HTN (Table 5). However, when further sub-stratifying age groups by GCS, some substrata comparisons become significant: HE has a higher survival rate than AA for the mild GCS group in both young (97.5% vs 83.3%, p = 0.043) and middle-aged groups (94.1% vs 70.0%, p < 0.001). Although survival was not different across age strata between HE and HTN, sub-stratification by GCS revealed that HE had better survival for the mild (94.1% vs 87.2%, p = 0.043) and severe (66.7% vs 52.6%, p = 0.043) GCS groups within the middle-aged group.
Neurosurgical intervention
In comparison to HE, HFA (OR = 1.146, CI: 1.067–1.231) and HTN (OR = 1.123, CI: 1.074–1.175) had the highest rate of neurosurgical intervention while RFH (OR = 1.005, CI: 0.928–1.089) and AA (OR = 1.037, CI: 0.972–1.107) had statistically similar rates of intervention after adjusting for age, sex, and GCS (Table 6). Overall, results from the multivariable logistic regression divide the endotypes into the low (HE, RFH, AA) and high (HFA, HTN) intervention groups where rates are statistically different compared to endotypes of the other group for comparisons) while being statistically similar to endotypes of the same group.
Length of stay
Using ANCOVA to adjust for the covariates of age, sex, and GCS, HE had shorter LOS than HFA (difference = 4.6 ± 0.9 days, p < 0.001), RFH (difference = 3.3 ± 1.1 days, p = 0.015), and AA (different = 4.8 ± 0.9 days, p < 0.001) but had similar LOS compared to HTN (difference = 1.2 ± 0.6 days, p = 0.214) (Table 7). HFA and AA had almost identical LOS (difference = 0.2 ± 1.2 days, p = 1.000). LOS for the RFH endotype was statistically comparable to every other endotype (p > 0.05 for all comparisons) except HE. Overall, HE and HTN had the shortest LOS while HFA and AA had the longest LOS of the endotypes after accounting for age, sex, and GCS.
Predictive value of comorbidities and endotypes for survival
Logistic regression using all 30 Elixhauser comorbidity categories to predict survival to discharge accounts for ~ 15.3% of the variation in outcome (F(30, 2598) = 15.59, R2 = 0.153, p < 0.001) and had an AUC of 0.73. A simpler logistic regression using the five derived comorbidity endotypes accounted for 11.9% of the variation in survival (F(4, 2624) = 88.61, R2 = 0.119, p < 0.001) with an AUC of 0.69. Combining the base 30 Elixhauser categories with the endotypes in a full logistic regression model explains 17.0% of the variation in survival (F(34, 2594) = 15.57, R2 = 0.170, p < 0.001) and has an AUC of 0.74. The decrease in model error for predicting survival to discharge seen between the base logistic regression (including only the comorbidities) and the full model (including both comorbidities and endotypes) is statistically significant (F(4, 2594) = 13.23, p < 0.001).
With the inclusion of age, GCS, and sex into the logistic model using the 30 Elixhauser comorbidities, the model explains ~ 30.5% of the variation in survival (F(33, 2548) = 35.28, R2 = 0.305, p < 0.001) and had an AUC of 0.83. The addition of the five derived endotypes into this model results in explanation of ~ 31.7% of the variation in survival (F(37, 2544) = 31.91, R2 = 0.317, p < 0.001) with an AUC of 0.84. The decrease in model error in predicting survival after adding endotypes is statistically significant (F(4, 2544) = 3.13, p = 0.014). A model with age, GCS, sex, and comorbidity endotypes explains ~ 28.7% of survival variation (F(7, 2574) = 142.9, R2 = 0.287, p < 0.001) with an AUC of 0.82.For reference, a logistic model with only age, GCS, and sex accounts for 26.8% of the variation in survival with an AUC of 0.80. A model with only age accounts for 19.7% of survival variation with an AUC of 0.77; a model with only GCS accounts for 2.7% of survival variation with an AUC of 0.61; a model with only sex accounts for 0.9% of survival variation with an AUC of 0.55.
Discussion
Endotypes in TBI and other conditions
Our initial analyses are consistent with previous literature and clinical experience—that comorbidities are associated with clinical outcomes and that comorbidities tend to occur in patterns44,45.
In characterizing the relationship between comorbidities and clinical outcomes, we have identified five patient endotypes using LCA, based solely on the presence of medical conditions. These represent the most observed clinical patterns within the TBI population of the MIMIC III dataset. The notion of patient endotypes is a validated approach to precision medicine and has been used to elucidate heterogeneity in many other conditions such as sepsis and asthma14,16,46. In TBI, Åkerlund et al. suggested six endotypes in TBI patients based on GCS, body temperature, and lab values that added predictive value for in-hospital mortality47. This is the first study to consider endotypes in TBI patients utilizing comorbidities.
Our previous work11 investigated comorbidity endotypes within all of MIMIC III, identifying six overall endotypes while this current study focuses on the TBI population and produces similar but distinct endotypes. Both studies’ endotypes included hypertension, cardiac comorbidities, renal failure, and the lack of comorbidities (i.e., healthy). However, there were also major differences. While the AA endotype contained only alcohol use disorder as a comorbidity, the corresponding endotype in our previous analysis was “Hepatic addiction”, which, in addition to alcohol abuse, had significant rates of liver failure and coagulopathy. As well, the current analysis did not find any endotype that corresponded to the “Cardiopulmonary” endotype in our previous study, characterized by high rates of chronic pulmonary and cardiac comorbidities. Although similarities exist, these results suggest that distinct TBI comorbidity endotypes exist that differ from those of the general ICU population.
Viewing TBI clinical outcomes by endotypes
Analyzing comorbidities via endotypes rather than individually shifts emphasis onto the fact that individual comorbidities don't exist in isolation and allows for a more comprehensive understanding of how a group of conditions affects the outcomes of acute injuries such as TBI. Additionally, it allows us to group patient presentations into clusters based on common pathophysiological patterns and clinical considerations which may facilitate decision-making.
Outcomes differences between medically complex endotypes
Heart failure comorbid with arrhythmia (HFA) and renal failure comorbid with hypertension (RFH) appear to be two common patterns of comorbidity in medically complex TBI patients. Overall, HFA has higher rates of neurosurgical intervention and mortality but similar LOS.
The association between renal failure and cardiac comorbidities is supported by many studies48,49,50,51 and may explain notable rates of congestive heart failure (CHF) and arrhythmia in the RFH comorbidity distribution. However, it is interesting that despite the notable presence of cardiac comorbidities in RFH, the pure cardiac endotype HFA, which has fewer comorbidities overall, still exhibited poorer outcomes. A large multicenter study by Shibahashi et al.52 found that both CHF and chronic kidney disease (CKD) increased the odds of in-hospital mortality for TBI patients. The odds ratio estimate for CKD (2.76, CI: 1.96–3.89) was notably higher than CHF (1.82, CI: 1.31–2.51), contrasting our results. However, Shibahashi et al. compared individual comorbidities while our analysis compares endotypes that contain multiple comorbidities at different rates. A possible explanation for the difference in our findings could be the notable prevalence of arrhythmias in the HFA and RFH endotypes. Patients with arrhythmias are frequently administered anticoagulant and/or antiplatelet medications, which may increase the need for intervention, risk of intervention, and mortality rates53,54. Hence, HFA having a higher rate of intervention and mortality may be contributed to by its higher rate of arrhythmia compared to the RFH endotype. Overall, our results suggest that the TBI patients of the cardiac-centric HFA endotype carry a poorer prognosis than the renal-centric RFH endotype. Both endotypes represent cohorts of older, clinically complex patients that fare significantly worse than the other endotypes even after controlling for age or injury severity. Future trials of TBI interventions may benefit from treating patients that fit into these endotypes as their own subgroups.
Alcohol abuse and length of stay
As alcohol abuse and hypertension are common in TBI patients22, we were interested in how the AA and HTN endotypes differed from HE. There is a robust relationship between alcohol use and TBI55 that is consistent with our finding that AA is a major TBI endotype. Overall, HE and AA had similar rates of survival and neurosurgical intervention but notably higher LOS. There have been many factors linking alcohol use disorder and increased LOS such as alcohol withdrawal including seizure56,57,58, increased risk of hospital complications and infection, particularly of pneumonia59, and pre-injury cerebral atrophy secondary to neurotoxicity60,61. Taken together, a slow and potentially turbulent recovery may explain the increased LOS in AA. Overall, we see that the AA endotype identified via LCA is consistent with previous literature and aligns with clinical observations.
Hypertension and outcomes of intervention and survival
We found that the HTN endotype has a higher rate of intervention compared to HE and a lower rate of survival, isolated to the middle-aged strata, particularly among mild and severe injury severities. Given that chronic hypertension is known to cause and accelerate cerebrovascular damage62, higher rates of adverse outcomes after TBI in the HTN endotype could be explained by a less compliant, and hence less resilient, remodelled cerebrovascular system63,64. The same mild injuries to HTN patients may result in greater damage and hence warrant a more aggressive treatment (e.g., ICP monitoring, craniotomy). The observation that differences in survival were only significant for the middle-aged group and not for the old age group may be due to the effect of age overshadowing the weaker effect of hypertension in this age group. While pre-hospital hypertension has been linked to increased TBI mortality65,66, a systematic review did not find chronic hypertension to increase in-hospital mortality in TBI patients22. These differences may result from other studies using the presence/absence of hypertension for their effect analysis in patients that potentially have other pre-existing comorbidities, contrasting our analysis that directly compares healthy patients with patients primarily with hypertension (without other comorbidities).
Endotypes for precision medicine
LCA endotyping assesses the patient more holistically, accounting for both the presence and absence of diseases, to make comparisons between groups of patients that are similar. Linear regression attempts to describe the net effect of a single condition (e.g., hypertension) across a potentially heterogeneous population and struggles when input variables are correlated with each other, as is the case with comorbidities67,68. It also struggles to describe outcomes that cannot be explained by a linear combination of the model inputs whereas LCA endotypes have an aspect of non-linearity inherent to the complex relationships between input variables that they capture69,70. Our observed improvement in a logistic regression model's ability to predict survival by incorporating endotypes, in addition to individual comorbidities, supports the capturing of such additional non-linear effects through LCA. A large proportion of the endotypes’ predictive power for survival likely comes from their capture of patient age (e.g., patients with hypertension are likely to be older than those with alcohol abuse). Such is suggested by the large amount of variation in survival explained by age alone (19.7%) combined with the minor improvement in performance seen by adding endotypes as a variable to the base model of only age, GCS, and sex (R2: 26.8% to 28.0%; AUC: 0.80 to 0.82). Interestingly, our comorbidity endotypes have more predictive value as an isolated variable for survival than GCS (R2: 11.9%vs 2.7%, AUC: 0.69 vs 0.61) suggesting that patient-inherent factors may be more important to TBI outcomes than initial injury severity as measured by GCS.
As our endotypes involved only broad comorbidity variables, the overall improved predictability offered by endotypes in our analysis was relatively modest (~ 2% in model fit by R2, ~ 2% in model AUC). This improvement may be larger for complex endotypes that integrate continuous variables, especially those exhibiting a parabolic association with outcomes such as blood pressure, hemoglobin, and temperature. Therefore, endotype analysis offers a different perspective on the data that may be better suited to precision medicine and prognostication than regression with individual variables.
Limitations
LCA is a method that attempts to deduce a possible explanation given the observed data points. A common concern with LCA is that results invite interpretation despite the lack of a true reasonable interpretation in reality23,71. Therefore, it relies on the researcher’s judgement to determine if the latent classes represent meaningful entities which introduce a degree of interpretive bias into the analysis. Through our cluster selection process, we have attempted to ensure that the final set of clusters represented the simplest clinical explanation, but we cannot exclude the existence of a potentially more complex set of endotypes that better explain the variance in TBI patient comorbidities.
While the LCA endotypes help explain the variation of comorbidities across patients, there also exists more layers of variation within each identified endotype, as not all patients within endotypes are identical. We try to account for this in the analysis of our primary outcome of survival with the use of stratification by GCS and age to capture important variations common to all TBI patients. However, it could be argued that GCS and age have differential effects on outcomes depending on endotype and that there may be additional condition-specific factors to consider for each endotype. For example, creatinine measures may be relevant in RFH to account for the extent of renal failure while total cumulative alcohol intake could be used to stratify AA patients into different exposure levels. Unfortunately, due to limited granularity, and potentially limited sample size in substrata, we did not investigate this further. This approach of granular analysis could be a strategy for future research toward increasingly personalized medicine using big data.
Another limitation of this study is the size of the MIMIC III dataset. Although it is a large and comprehensive dataset, there is limited granularity that may not capture the full complexity of patient health or outcomes. We do not have clinical justifications behind decisions, such as neurosurgical intervention, which requires the context of individual patient risk assessment. The data is limited to a single center, which may not be generalizable to the overall population and may have policies and personnel preferences that introduce artifacts in variable coding. In particular, there may be inaccuracies and staff biases in the ICD9 codes that we have used for the implementation of Elixhauser comorbidities. The same analysis using another database may yield a set of comorbidity endotypes that differ in significant ways. Finally, limitations inherent to retrospective designs apply as well. Therefore, the next steps for our research will involve the replication of our methods on different TBI databases, and ultimately prospective studies, to assess the generalizability of our findings.
Conclusion
TBI, like many other conditions, is complex with varied clinical and biological presentations that contribute to their heterogeneity. In this study, we demonstrate the discovery of TBI comorbidity endotypes with distinct clinical outcomes. Our results suggest that viewing clinical variables, such as the presence or absence of a disease, in combinations is a promising way to address and better understand the heterogeneity observed in TBI and other complex conditions. This study serves to expand the applicability of the growing concept of clinical endotypes by validating the approach in a highly prevalent condition such as TBI. By utilizing approaches such as LCA that can extract meaningful combinations from large volumes of data, we can move towards precision medicine in an era that is becoming increasingly data-rich.
Data availability
All code used for analysis and visualization of our data is available on GitHub: https://github.com/SteveHQiu/TBIClustering. Data used in our analysis were extracted from the Medical Information Mart for Intensive Care III (MIMIC-III) dataset which is a freely available database: https://physionet.org/content/mimiciii/1.4/.
References
Faul, M. & Coronado, V. Chapter 1 - Epidemiology of traumatic brain injury. in Handbook of Clinical Neurology (eds. Grafman, J. & Salazar, A. M.) vol. 127 3–13 (Elsevier, 2015).
Roozenbeek, B., Maas, A. I. R. & Menon, D. K. Changing patterns in the epidemiology of traumatic brain injury. Nat. Rev. Neurol. 9, 231–236 (2013).
de Noriega, F. R. et al. A swine model of intracellular cerebral edema: Cerebral physiology and intracranial compliance. J. Clin. Neurosci. 58, 192–199 (2018).
Rosenthal, G. et al. Incorporating a parenchymal thermal diffusion cerebral blood flow probe in bedside assessment of cerebral autoregulation and vasoreactivity in patients with severe traumatic brain injury: Clinical article. J. Neurosurg. 114, 62–70 (2011).
Cooper, D. J. et al. Decompressive craniectomy in diffuse traumatic brain injury. New Engl. J. Med. 364, 1493–1502 (2011).
Hutchinson, P. J. et al. Trial of decompressive craniectomy for traumatic intracranial hypertension. New Engl. J. Med. 375, 1119–1130 (2016).
The CRASH trial protocol (Corticosteroid randomisation after significant head injury) [ISRCTN74459797]. BMC Emerg Med 1, 1 (2001).
Agoston, D. V. & Langford, D. Big data in traumatic brain injury; promise and challenges. Concussion 2, CNC44 (2017).
Wu, P.-Y. et al. Omic and electronic health record big data analytics for precision medicine. IEEE Trans. Biomed. Eng. 64, 263–273 (2017).
Zador, Z., Landry, A., Balas, M., Marshall, J. C. & Cusimano, M. D. Data driven analysis reveals shared transcriptome response, immune cell composition, and distinct mortality rates across differing etiologies of critical illness. Crit. Care Med. 48, 338–343 (2020).
Zador, Z., Landry, A., Cusimano, M. D. & Geifman, N. Multimorbidity states associated with higher mortality rates in organ dysfunction and sepsis: A data-driven analysis in critical care. Crit. Care 23, 247 (2019).
Calfee, C. S. et al. Latent class analysis of ARDS subphenotypes: Analysis of data from two randomized controlled trials. Lancet Respir. Med. 2, 611–620 (2014).
Anderson, G. P. Endotyping asthma: New insights into key pathogenic mechanisms in a complex, heterogeneous disease. Lancet 372, 1107–1119 (2008).
Svenningsen, S. & Nair, P. asthma endotypes and an overview of targeted therapy for asthma. Front. Med. 4, 158 (2017).
Scicluna, B. P. et al. Classification of patients with sepsis according to blood genomic endotype: A prospective cohort study. Lancet Respir. Med 5, 816–826 (2017).
Azad, T. D., Shah, P. P., Kim, H. B. & Stevens, R. D. Endotypes and the path to precision in moderate and severe traumatic brain injury. Neurocrit. Care 37, 259–266 (2022).
Maslove, D. M., Lamontagne, F., Marshall, J. C. & Heyland, D. K. A path to precision in the ICU. Crit. Care 21, 79 (2017).
Roberts, K. C., Rao, D. P., Bennett, T. L., Loukine, L. & Jayaraman, G. C. Prevalence and patterns of chronic disease multimorbidity and associated determinants in Canada. Health Promot. Chronic Dis. Prev. Can. 35, 87–94 (2015).
Wolff, J. L., Starfield, B. & Anderson, G. Prevalence, expenditures, and complications of multiple chronic conditions in the elderly. Arch. Intern. Med. 162, 2269–2276 (2002).
Fu, T. S., Jing, R., McFaull, S. R. & Cusimano, M. D. Recent trends in hospitalization and in-hospital mortality associated with traumatic brain injury in Canada: A nationwide, population-based study. J. Trauma Acute Care Surg. 79, 449 (2015).
Chan, V., Mollayeva, T., Ottenbacher, K. J. & Colantonio, A. Clinical profile and comorbidity of traumatic brain injury among younger and older men and women: A brief research notes. BMC Res. Notes 10, 371 (2017).
Xiong, C. et al. Comorbidity in adults with traumatic brain injury and all-cause mortality: A systematic review. BMJ Open 9, e029072 (2019).
McCutcheon, A. L. Latent Class Analysis (SAGE, 1987).
Wu, Y. et al. Applying latent class analysis to risk stratification of incident diabetes among Chinese adults. Diabetes Res. Clin. Pract. 174, 108742 (2021).
Andreacchi, A. T. et al. Clustering of obesity-related characteristics: A latent class analysis from the Canadian Longitudinal Study on Aging. Prev. Med. 153, 106739 (2021).
Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016).
De Crescenzo, L. A., Gabella, B. A. & Johnson, J. Interrupted time series design to evaluate ICD-9-CM to ICD-10-CM coding changes on trends in Colorado emergency department visits related to traumatic brain injury. Injury Epidemiol. 8, 15 (2021).
Carroll, C. P., Cochran, J. A., Guse, C. E. & Wang, M. C. Are we underestimating the burden of traumatic brain injury? Surveillance of severe traumatic brain injury using centers for disease control International classification of disease, ninth revision, clinical modification, traumatic brain injury codes. Neurosurgery 71, 1064–1070 (2012).
WISH Injury-Related Traumatic Brain Injury ICD-9-CM Codes. Wisconsin Department of Health Services https://www.dhs.wisconsin.gov/wish/injury/tbi-codes.htm (2019).
Carney, N. et al. Guidelines for the Management of Severe Traumatic Brain Injury, Fourth Edition. Neurosurgery 80, 6–15 (2017).
Elixhauser, A., Steiner, C., Harris, D. R. & Coffey, R. M. Comorbidity measures for use with administrative data. Med. Care 36, 8 (1998).
MIMIC Code Repository. (2023).
Hidalgo, C. A., Blumm, N., Barabási, A.-L. & Christakis, N. A. A dynamic network approach for the study of human phenotypes. PLOS Comput. Biol. 5, e1000353 (2009).
Linzer, D. A. & Lewis, J. B. poLCA: An R package for polytomous variable latent class analysis. J. Stat. Softw. 42, 1–29 (2011).
Salvador, S. & Chan, P. Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. in 16th IEEE International Conference on Tools with Artificial Intelligence 576–584 (2004). doi:https://doi.org/10.1109/ICTAI.2004.50.
Hu, X. & Xu, L. Investigation on several model selection criteria for determining the number of cluster. Neural Inf. Process. 4, 1–10 (2004).
Christopoulos, D. T. inflection: Finds the Inflection Point of a Curve. (2022).
Scrucca, L., Fop, M., Murphy, T. B. & Raftery, A. E. mclust 5: Clustering, classification and density estimation using gaussian finite mixture models. R J. 8, 289–317 (2016).
Ihaka, R. & Gentleman, R. R: A language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996).
Hothorn, T. et al. Multcomp: Simultaneous Inference in General Parametric Models. (2023).
Perktold, J. et al. statsmodels/statsmodels: Release 0.14.0. (2023) https://doi.org/10.5281/ZENODO.593847.
Virtanen, P. et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat Methods 17, 261–272 (2020).
Robin, X. et al. pROC: Display and Analyze ROC Curves. (2023).
Cramer, A. O. J., Waldorp, L. J., van der Maas, H. L. J. & Borsboom, D. Comorbidity: A network perspective. Behav. Brain Sci. 33, 137–150 (2010).
Folino, F., Pizzuti, C. & Ventura, M. A. Comorbidity network approach to predict disease risk. In Information Technology in Bio- and Medical Informatics 102–109 (Springer, 2010). https://doi.org/10.1007/978-3-642-15020-3_10.
Leligdowicz, A. & Matthay, M. A. Heterogeneity in sepsis: New biological evidence with clinical applications. Crit Care 23, 80 (2019).
Åkerlund, C. A. I. et al. Clustering identifies endotypes of traumatic brain injury in an intensive care cohort: A center-TBI study. Crit. Care 26, 228 (2022).
Akhtar, Z. et al. Arrhythmias in chronic kidney disease. Eur. Cardiol. 17, e05 (2022).
Schefold, J. C., Filippatos, G., Hasenfuss, G., Anker, S. D. & von Haehling, S. Heart failure and kidney dysfunction: Epidemiology, mechanisms and management. Nat. Rev. Nephrol. 12, 610–623 (2016).
Muntner, P., He, J., Hamm, L., Loria, C. & Whelton, P. K. Renal insufficiency and subsequent death resulting from cardiovascular disease in the United States. J. Am. Soc. Nephrol. 13, 745–753 (2002).
Silverberg, D., Wexler, D., Blum, M., Schwartz, D. & Iaina, A. The association between congestive heart failure and chronic renal disease. Curr. Opin. Nephrol. Hypertens. 13, 163–170 (2004).
Shibahashi, K., Sugiyama, K., Okura, Y., Hoda, H. & Hamabe, Y. Multicenter retrospective cohort study of “talk and die” after traumatic brain injury. World Neurosurg. 107, 82–86 (2017).
Tykocki, T. & Guzek, K. Anticoagulation therapy in traumatic brain injury. World Neurosurg. 89, 497–504 (2016).
Peck, K. A. et al. The impact of preinjury anticoagulants and prescription antiplatelet agents on outcomes in older patients with traumatic brain injury. J. Trauma Acute Care Surg. 76, 431 (2014).
Weil, Z. M., Corrigan, J. D. & Karelina, K. Alcohol use disorder and traumatic brain injury. Alcohol Res. 39, 171–180 (2018).
Grady, E., Hayward, R. D. & Edhayan, E. Associations of alcohol and drug misuse with hospital outcomes in traumatic injury patients. Substance Use Misuse 55, 622–627 (2020).
De Guise, E. et al. Early outcome in patients with traumatic brain injury, pre-injury alcohol abuse and intoxication at time of injury. Brain Injury 23, 853–865 (2009).
Sabhesan, S. & Natarajan, M. Alcohol abuse and recovery after head injury. Indian J. Psychiatry 29, 143–148 (1987).
Trevejo-Nunez, G., Kolls, J. K. & de Wit, M. Alcohol use as a risk factor in infections and healing. Alcohol Res. 37, 177–184 (2015).
Monnig, M. A., Tonigan, J. S., Yeo, R. A., Thoma, R. J. & McCrady, B. S. White matter volume in alcohol use disorders: A meta-analysis. Addict. Biol. 18, 581–592 (2013).
Büttner, A. & Weis, S. Central nervous system alterations in alcohol abuse. In Forensic Pathology Reviews (ed. Tsokos, M.) 69–89 (Humana Press, 2008). https://doi.org/10.1007/978-1-59745-110-9_5.
Gąsecki, D., Kwarciany, M., Nyka, W. & Narkiewicz, K. Hypertension, brain damage and cognitive decline. Curr. Hypertens. Rep. 15, 547–558 (2013).
Agabiti-Rosei, E., Rizzoni, D. & Cunha, P. Pathophysiology of brain damage in hypertension: Small vessel disease. In Hypertension and Brain Damage (ed. Coca, A.) 47–60 (Springer International Publishing, 2016). https://doi.org/10.1007/978-3-319-32074-8_5.
Laurent, S., Gasecki, D., Boutouyrie, P. & Narkiewicz, K. Pathophysiology of subclinical brain damage in hypertension: Large artery disease. In Hypertension and Brain Damage (ed. Coca, A.) 1–74 (Springer International Publishing, 2016). https://doi.org/10.1007/978-3-319-32074-8_6.
Barmparas, G. et al. Prehospital hypertension is predictive of traumatic brain injury and is associated with higher mortality. J. Trauma Acute Care Surg. 77, 592 (2014).
Sellmann, T. et al. The Impact of Arterial Hypertension on Polytrauma and Traumatic Brain Injury. Dtsch. Arztebl. Int. 109, 849–856 (2012).
Lesaffre, E. & Marx, B. D. Collinearity in generalized linear regression. Commun. Stat. Theory Methods 22, 1933–1952 (1993).
Midi, H., Sarkar, S. K. & Rana, S. Collinearity diagnostics of binary logistic regression model. J. Interdiscipl. Math. 13, 253–267 (2010).
Hagenaars, J. A. & Halman, L. C. Searching for ideal types: The potentialities of latent class analysis. Eur. Sociol. Rev. 5, 81–96 (1989).
Hickendorff, M., Edelsbrunner, P. A., McMullen, J., Schneider, M. & Trezise, K. Informative tools for characterizing individual differences in learning: Latent class, latent profile, and latent transition analysis. Learn. Individ. Differ. 66, 4–15 (2018).
Weller, B. E., Bowen, N. K. & Faubert, S. J. Latent class analysis: A guide to best practice. J. Black Psychol. 46, 287–311 (2020).
Acknowledgements
We would like to thank Dr. Frank Rudzicz (University of Toronto) for his feedback and comments on the manuscript.
Author information
Authors and Affiliations
Contributions
Study conception and design: H. Q., Z. Z.; data collection: H. Q.; data analysis: H. Q.; interpretation of results: H. Q., Z. Z., M. L., F. F., T. D., S. S.; draft manuscript preparation: H. Q., Z. Z., M. L., F. F., T. D., S. S.; all authors reviewed and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Qiu, H., Zador, Z., Lannon, M. et al. Identification of clinically relevant patient endotypes in traumatic brain injury using latent class analysis. Sci Rep 14, 1294 (2024). https://doi.org/10.1038/s41598-024-51474-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-51474-0
- Springer Nature Limited