Abstract
Background
Sepsis from infection is a global health priority and clinical trials have failed to deliver effective therapeutic interventions. To address complicating heterogeneity in sepsis pathobiology, and improve outcomes, promising precision medicine approaches are helping identify disease endotypes, however, they require a more complete definition of sepsis subgroups.
Methods
Here, we use RNA sequencing from peripheral blood to interrogate the host response to sepsis from participants in a global observational study carried out in West Africa, Southeast Asia, and North America (N = 494).
Results
We identify four sepsis subtypes differentiated by 28-day mortality. A low mortality immunocompetent group is specified by features that describe the adaptive immune system. In contrast, the three high mortality groups show elevated clinical severity consistent with multiple organ dysfunction. The immunosuppressed group members show signs of a dysfunctional immune response, the acute-inflammation group is set apart by molecular features of the innate immune response, while the immunometabolic group is characterized by metabolic pathways such as heme biosynthesis.
Conclusions
Our analysis reveals details of molecular endotypes in sepsis that support immunotherapeutic interventions and identifies biomarkers that predict outcomes in these groups.
Plain language summary
Sepsis is a life-threatening multi-organ failure caused by the body’s immune response to infection. Clinical symptoms of sepsis vary from one person to another likely due to differences in host factors, infecting pathogen, and comorbidities. This difference in clinical symptoms may contribute to the lack of effective interventions for sepsis. Therefore, approaches tailored to targeting groups of patients who present similarly are of great interest. This study analysed a large group of sepsis patients with diverse symptoms using laboratory markers and mathematical analysis. We report four patient groups that differ by risk of death and immune response profile. Targeting these defined groups with tailored interventions presents an exciting opportunity to improve the health outcomes of patients with sepsis.
Similar content being viewed by others
Introduction
Sepsis is life-threatening organ dysfunction due to a dysregulated host response to infection1. The signs and symptoms of sepsis are highly variable and typically nonspecific, including aberrations in vital signs like tachycardia and tachypnea alongside indications of organ dysfunction, such as altered mental status, hypotension, and renal dysfunction, potentially leading to shock and death. The infecting pathogen, pathogen load, site of infection, comorbidities, and immunological response by the patient contribute to the clinical presentation and course of disease resulting in a complex heterogeneous syndrome. In addition, the model for host response to sepsis has evolved from a simplified linear hyperinflammatory phase followed by a compensatory anti-inflammatory phase to a more complex paradigm where pro- and anti-inflammatory mechanisms are acting in concert2. The failure to develop effective therapeutics and the limited success in developing diagnostic or prognostic tests are often attributed to this clinical and immunological heterogeneity3,4.
To overcome the problem of heterogeneity in sepsis there is a growing interest in leveraging biologically related host response phenotypes or endotypes to guide precision medicine and companion biomarker discovery for diagnosis and clinical management. Notably, the human host response to infection is highly sensitive and specific, enabling the use of protein and gene expression measurements in peripheral blood for characterization of patient groups based on related underlying sepsis pathophysiology5,6,7,8. Host gene expression analysis from peripheral blood leukocytes can also diagnose pathogen class or identify subjects that respond differentially to corticosteroid treatment9,10,11. In some cases, these findings have been translated to FDA-approved assays for infectious sepsis diagnosis and pathogenesis12,13.
Despite these advances, the performance of host-based measures can be population-specific and impacted by the epidemiology of illness14. Sepsis is a global health priority accounting for 47 million deaths in 2017. There is an urgent need to identify, characterize, and generalize sepsis endotypes in diverse populations to facilitate new and better prognostic and therapeutic solutions, especially in low- and middle-income countries that carry the greatest burden for sepsis15. Toward this goal, we analyze host gene expression in a large prospective multi-site international sepsis cohort in West Africa, Southeast Asia, and the United States as part of the Austere environments Consortium for Enhanced Sepsis Outcomes (ACESO)10,11. We employ soft-clustering decomposition that allows for the visualization and identification of both discrete and overlapping clusters within high-dimensional gene expression data, thereby accommodating the syndromic nature of sepsis16. As a result, we can report the identification of molecular endotypes from the combined global cohort that predict sepsis mortality. Interpretation of the new gene expression features described here from understudied populations suggests that immune dysregulation is a key component of the biology linked to sepsis heterogeneity and outcomes.
Methods
Study sites and subjects
Blair P.W. et al.17 describe a detailed description of the entire study cohort. Five hundred and six patients across three sites had specimens available for this study. Study protocols were approved by the Naval Medical Research Command (NMRC) Institutional Review Board (IRB) (Cambodia sepsis study # NMRC.2013.0019; Ghana sepsis study # NMRC.2016.0004-GHA; Duke sepsis study (Duke # PRO00054849) in compliance with all applicable Federal regulations governing the protection of human subjects as well as host country IRBs. The study protocol in Cambodia was approved by the Cambodian National Ethics Committee for Health Research (NECHR). The protocol in Ghana was approved by the Committee on Human Research, Publication, and Ethics (CHRPE) at Kwame Nkrumah University of Science & Technology. All procedures were in accordance with the ethical standards of the Helsinki Declaration of the World Medical Association. All patients, or their legally authorized representatives, provided written informed consent.
Demographic comparisons
To compare gender, size, age, and mortality in the study, we considered all the patients with available 28-day mortality information. For continuous variables (age) were compared using Welch’s two-sample t-test, while discrete variables (mortality) used the Wilcoxon rank sum test18. To show the distribution of individual data, medians, and interquartile ranges, the age was plotted and colored in relation to gender, mortality, and site using functionalities of ggplot19, and ggridges20 packages in R statistical environment21.
RNA-seq library preparation from patient peripheral blood
Specific details for the preparation of RNA for sequencing are described in Rozo et al.22. Briefly, the peripheral blood RNA was collected in PAXgene RNA tubes (PreAnalytiX), total RNA was purified using PAXgene Blood miRNA KIT (Qiagen) and depleted of human rRNA and globin using Globin-Zero Gold rRNA Removal Kit (Illumina). The vendor (Azenta) prepared the paired-end RNA sequencing according to standard procedures and sequenced generating ~ 50 million, 150-bp long paired-end reads per sample.
Genome alignment and sequencing data pre-processing
Sequencing data were aligned to the human genome (GRCh38). Briefly, all samples underwent quality control using FastQC 23. Passing samples were aligned to the genome using Hisat224, and transcripts were assembled using Stringtie25. Low-expression features (counts per million <10), sex-linked features (located on chromosomes X, and Y), and features not mapping to known genes were removed to decrease noise and avoid gender bias in feature selection and modeling. To normalize the data between different study sites, the raw read counts were normalized using Median Ratio Normalization (MoR) method26. Normalized data were then transformed using Variance Stabilizing Transformation (VST). The processing yielded 3061 genes for 506 patients from across the three study sites.
Dimensionality reduction
To aid in data interpretation we employed multiple dimensionality reduction methods including principal component analysis (PCA)21, uniform manifold approximation, and projection (UMAP)27. For PCA and UMAP, data were filtered to contain the 494 subjects with 28-day mortality information and analyzed in an R statistical environment21. Topological data analysis (TDA) was used to group participants with comparable gene expression profiles in an unbiased manner28,29 and identify trends and endotypes within the sepsis cohort. TDA was performed using the AyasdiAI machine intelligence platform (Symphony AyasdiAI, Palo Alto, CA, USA) on the normalized RNA sequencing data, employing a correlation metric combined with two proprietary neighborhood lenses. Groups of patients with similar gene expression profiles were defined within the TDA structure based on node density and connectivity. By its very essence, TDA captures the continuous nature of data30, and it therefore allows for a degree of overlap between adjacent phenotypes (i.e., a patient can be a member of more than one group at the same time). In the TDA network presented here, resolution and gain settings were selected that limited the degree of overlap between groups (less than 10% of participants were allowed to be shared), while at the same time maintaining the integrity of the TDA structure. PCA and UMAP analysis was plotted using ggplot219 package in R21. All figures were finalized using Adobe Illustrator.
Differential gene analysis within and between TDA groups
The normalized expression table of 3061 genes for 505 patients was filtered to only contain 494 patients with known mortality outcomes. In case of between TDA groups comparison, patients that could be attributed to multiple TDA groups were removed leaving only patients exclusive to individual TDA groups. Within TDA groups, patients were compared by 28-day mortality, contrasting those who died by 28-day and those who lived to day 28. The significance of the mean difference was evaluated using a Welch two-sample t-test and the final p.value was adjusted for multiple comparisons using Benjamini & Hochberg31 model in R21. The complete results of both analyses can be found in the corresponding tables in Supplementary Data 2. Data were sorted by p.adjusted or p-value along with log2 fold change (L2FC) to highlight the most significantly changed genes. Terms for all comparisons were combined and respective L2FC values were plotted as a heatmap using the ggplot219 package in R21. To cluster genes, the distance matrix was calculated using manhattan method32, row order was determined by unsupervised hierarchical clustering using mcquitty method33 and the dendrogram was plotted using the ggdendro package34 in R. Resulting figures were finalized using Adobe Illustrator.
Gene set enrichment analyses within and between TDA groups
Gene set enrichment analysis was performed using the results of differential gene expression analyses within and between TDA groups. For each contrast, the log2 fold changes (L2FC) were calculated using group means and ranked in decreasing order. For the ‘between’ TDA comparison, only genes with significant p.adjust value (≤= 0.05) were considered for the analysis. All genes were considered for the within TDA comparison due to the low count of statistically significant values. For gene set enrichment analysis of the entire cohort comparing 28-day mortality, we used genes with significant p.value (≤=0.05). The selected genes were compared against the Molecular Signature Database gene set representing Hallmark pathways accessed using the msigdbr35 package and analyzed using the clusterProfiler36 package. For the full cohort, we set pvalueCutoff parameter to 0.05 to detect significant pathways only, whereas for other analyses we set this parameter to 1 and marked the statistically significantly enriched pathways in the figures with boxes for a clearer comparison. All the gene set enrichment results are shown in Supplementary Data 2. Final data was plotted as stacked barplot or heatmap using ggplot219, rows organized by unsupervised clustering using the manhattan distance matrix and the ward.d clustering method37, and a dendrogram plotted using ggdendro package34 in R. Resulting figures were finalized using Adobe Illustrator.
Performance modeling
All modeling as well as plotting of ROC curves were performed in Python (v3.9.12). The LogisticRegression, RepeatedStratifiedKFold, and cross_val_score functions from the sklearn (v1.1.2)38 python package were used to perform modeling. The logistic regression was run with a max_iter setting set to 1000. The stratified k-folds were run with 10 repeats and 10 splits. The cross-validation score used roc_auc as a scoring metric. The mean and standard deviation of the resulting AUROCs were calculated using mean and std functions from the sklearn38 python package. ROC curves were plotted using RocCurveDisplay and pyplot functions from sklearn and matplotlib (v3.5.1)39 python packages respectively. Resulting figures were finalized using Adobe Illustrator.
Comparison of the clinical parameters
All clinical comparative analysis was conducted in the R statistical environment21. The data were filtered to include 480 patients used in the TDA analysis. The available clinical features were filtered to those collected at timepoint 0 (T00h) and those for which measurements were collected single time and were unlikely to change throughout the study (i.e., medical history). The 28-day mortality time point was also included, due to its study relevance. The features were further filtered to those with data available for at least two patients in each TDA group being compared. To compare groups the patients were combined into a single group if needed (i.e., t1 + t2 + t3 = t123). The variance was evaluated using aov, and TDA group comparisons were evaluated using t.test built-in R functions respectively21. The results of the analyses are provided in Supplementary Data 1.
Statistics and reproducibility
All comparisons were evaluated using standard statistical methods built-in into the R statistical computing environment. The different TDA comparison patient groups were clearly defined and are described in Supplementary Fig. 2d. All statistical significance tests were subject to multiple testing corrections and the adjusted p-value (p.adjusted) was reported first, followed by the p-value second in case corrected values were insignificant in order to reflect trends. This study was based on the results and materials collected in previously completed human clinical trials that did not involve sample replicate collection.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Results
Gene expression predicts mortality in global sepsis cohorts
Total RNA from peripheral blood collected at enrollment (July 2016 – October 2017) was sequenced from 494 subjects in observational studies of sepsis in Ghana, Cambodia, and Durham, North Carolina (Fig. 1a, Supplementary Fig. 1a)17,22,40. Male and female subjects were similar in age and 28-day mortality in the combined cohort (Fig. 1b, Supplementary Fig. 1b). The overall 28-day mortality, defined as the percentage of patients that died within 28 days of enrollment into the study, was 17%, with individual cohorts having 28-day mortalities of 8% (Duke), 13% (Cambodia), and 32% (Ghana), respectively (Figs. 1c and 1d, Supplementary Fig. 1c, and 1d). Inspection of the data following preprocessing revealed no site bias (Fig. 1e, Supplementary Fig. 1e, 1f, 1g, 1h). Differentially expressed genes between 28-day survivors and non-survivors include transcripts previously linked to sepsis outcomes such as IL1R2, OLAH, and CX3CR1 (Fig. 1f and Supplementary Fig. 1i and 1j)6,41. Gene set enrichment analysis42,43 was used to identify biological pathways enriched in non-survivors. Hypoxia was the top enriched term while interferon response was significantly reduced (Fig. 1g). Feature selection using the Minimum Redundancy Maximum Relevance44,45 algorithm was performed using 391 differentially expressed genes to develop a prognostic classifier for 28-day mortality (Supplementary Data 1). The use of Minimum Redundancy Maximum Relevance resulted in a list of the top 50 ranked genes in decreasing relation to the risk of sepsis-related mortality (Supplementary Data 1). Average performance versus the number of features used in logistic regression was determined and resulted in eight transcripts (Fig. 1h). These top eight were used as inputs to logistic regression. Performance with qSOFA scores that combine mental status, respiratory rate, and blood pressure was also evaluated in these same subjects1. The eight-transcript model had improved performance (AUROC = .81) versus the model using qSOFA alone (AUROC = 0.72) in the 441 subjects where matched data were available (Fig. 1i). These results show that a molecular prognostic for sepsis mortality can be derived from diverse global cohorts.
Data dimension reduction identifies patient groups that improve prognostic performance
Genome-wide expression profiling has been used successfully to identify septic shock subclasses that describe biological heterogeneity5,7. We hypothesized that heterogeneity within, and across global cohorts could compromise the performance of the molecular sepsis prognostic. To identify relevant groups for improved prognostic models, we used topological data analysis (TDA) to cluster patients in an unsupervised, data-driven manner16 (Fig. 2a). In TDA, a 2-dimensional topological network is created based on similarities between data points, as well as the overall distribution of the data in n-dimensional space (Supplementary Fig. 2a). This provides an intuitive means of stratifying subjects with similar gene expression profiles into groups where relative position within the network can reveal shared biology or endotypes. Nodes in the TDA network correspond to groups of patients that are clustered together based on similarities in their gene expression, with the node size dependent on the number of patients in each node. Since TDA captures the continuous nature of data, it allows for a degree of overlap between adjacent nodes, meaning that patients can be included in more than one node at a time. This is represented by the edges in the network, which indicate that one or more patients are shared between two nodes46 (Supplementary Fig. 2a).
TDA analysis of the gene expression data resulted in the identification of 5 patient groups along a major left-to-right axis (Fig. 2b & Supplementary Fig. 2b). Further analysis showed that this axis corresponded to differences in 28-day mortality of the patients: TDA groups t1, t2, and t3 all exhibited elevated rates of 28-day mortality (22%, 22%, and 20% respectively) versus the low mortality groups t4 and t5 (10% and 11%, respectively) (Fig. 2b and Supplementary Fig. 2b). Feature selection was repeated within the TDA groups to ask if 28-day mortality prognostic performance could be improved using a stratified approach. Groups t4 and t5 contained too few non-survivors individually for this analysis, but shared a subset of subjects and were therefore combined into one low mortality group t4-5 (28-day mortality=11%) (Supplementary Fig. 2b and c). A total of 13 features across these four final groups were identified by step-up performance analysis of the previously identified 50 Minimum Redundancy Maximum Relevance transcripts (Fig. 3a and b, Supplementary Fig. 3, Supplementary Data 1). Models consisting of features for specific TDA groups were fitted and evaluated using only the subjects belonging to the respective TDA group. Performance increased in each of the 3 high-mortality groups (t1,2,3), and was decreased in the low-mortality t4-5 group versus what was achieved for the combined global cohort (Figs. 1i and 3c). All these group-specific features and models proved to be more effective than qSOFA (Fig. 3c).
Clinical and laboratory features of high-mortality patient groups describe multi-organ dysfunction
We have previously shown multiple clinical and physiological measures that are abnormal in this combined global cohort including elevated white blood cells, blood urea nitrogen, and lactate17. To define the clinical basis of the TDA-stratified patient groups we asked if physiologic and laboratory features varied across the entire network and within the two major axes of the network. Chemistry and hematology features including lactate, blood urea nitrogen, lymphocytes, and white blood cells varied significantly across the network (Fig. 4a and Supplementary Data 2). We looked more specifically at the left-right axis described by the low (t4-5) and high-mortality (t1,2,3) groups for significant differences. We noted a dramatic contrast in lymphocyte-monocyte percentage (up in low mortality) versus white blood cells (up in high mortality) (Fig. 4a and b and Supplementary Data 2). Lactate, blood urea nitrogen, and qSOFA are also significantly elevated in the high mortality groups while oxygen saturation is decreased (Fig. 4a, b, Supplementary Fig. 4, Supplementary Data 2). Additional symptoms including confusion in the combined high-mortality groups indicate multi-organ dysfunction in these subjects. Inspection of the vertical axis defined by groups t2 and t3 revealed a site-specific bias including pathogen etiologies and hemoglobin\hematocrit (Fig. 4a and Supplementary Data 2). This is consistent with our previous findings that show adjudicated bacterial, viral, fungal, and parasitic infections were significantly different across the Ghana, Cambodia, and U.S. populations11.
Molecular definition of sepsis endotypes
Gene set enrichment analysis using molecular features was used to explore the TDA-stratified groups. We first compared gene set enrichment analysis results across the major left-right mortality axis (Fig. 5a, b, and Supplementary Data 2). Type I and II interferon responses and allograft rejection were elevated in the low mortality t4-5 group, as were pathways linked to cell growth and proliferation including MYC and E2F targets. Key interferon-stimulated genes including IFI27 and ISG15 were significantly elevated in this group (Supplementary Fig. 5). These signatures are consistent with an active adaptive immune response in the low-mortality patient group t4-5. In contrast, the high-mortality groups were enriched for terms including hypoxia and coagulation which are characteristic of severe sepsis with group t1 being the most elevated. The individual gene-level inspection identified genes such as CYP1B1, S100A12, and IL1R2 that were most elevated in this high-mortality group t1 (Supplementary Fig. 5 and Supplementary Data 2). Notably, these genes characterize a CD14+ immunosuppressive monocytic myeloid-derived suppressor cell population MS1 recently described in bacterial sepsis and COVID-19 as negatively associated with survival47. Patient group t1 showed the least interferon response signatures by gene set enrichment analysis versus all other groups. Taken together these data are consistent with an immunosuppressed phenotype in group t1. Pathways linked to inflammation including TNFα and IL6-JAK-STAT3 signaling were also elevated in the high mortality groups compared to the low mortality group. Comparison across the three high mortality groups revealed that group t3 is most enriched for acute inflammatory signatures. We assessed differences between groups t2 and t3 and noted a significant difference in heme metabolism consistent with the clinical hemoglobin\hematocrit laboratory results (Fig. 4a and Supplementary Data 2). We also asked if we could identify pathways linked to mortality within the potential endotypes. A subset of signatures including interferon response and allograft rejection describing the adaptive immune response was positively linked to survival in all groups (Fig. 5b). However, we did note pathways with TDA group-specific differences by outcome including the inflammatory response term that was significantly enriched in survivors in the immunocompetent low-mortality group t4-5.
Discussion
TDA decomposition of host gene expression from our combined global sepsis cohort suggests at least 4 endotypes (Fig. 6). The low mortality t4-5 group with less severe clinical correlates is enriched for immunocompetent molecular and hematological signatures most notably for the adaptive immune response. This is in direct contrast to high mortality group t1 which has reduced lymphocytes and interferon gene expression signatures and has molecular markers consistent with immunosuppressive cells. High-mortality group t3 shares key clinical features of severe illness with group t1, but gene set enrichment analysis shows that there is a robust innate and adaptive immune response in this group. Group t2 is notable for heme metabolism signatures and reduced mitochondrial gene expression linked to oxidative phosphorylation. Overall, results from our combined global cohort are similar to recent reports of biological endotypes using unsupervised analysis of clinical and multi-omics data from sepsis studies. Sweeney and colleagues identified three stable patient groups in a combined analysis of bacterial sepsis cohorts designated Inflammopathic, Adaptive, and Coagulopathic8. Another study found three stable sepsis subclasses coined immune-innate, immune-coagulant, and immune-adaptive9. Finally, Scicluna and colleagues identified four endotypes in European cohorts designated Molecular Diagnosis and Risk Stratification of Sepsis (Mars) 1-46. The Mars1 group was characterized by reduced expression of T-cell and adaptive immune genes versus a Mars3 group with robust innate immune activation. Consistent across all these studies is the association of reduced mortality and less severe clinical correlates with endotypes most strongly defined by signatures of a functional adaptive immune response similar to our immunocompetent group t4-5. In contrast, subclasses in these studies that have reduced adaptive immune signatures, coagulopathies, elevated mortality, and poor clinical scores align with our immunosuppressed group t1. The t3 group described here, with innate pathway signatures linked to acute inflammation and high mortality shares these features with the inflammopathic group reported by Sweeney and colleagues8. Finally, group t2 characterized by heme biosynthesis is consistent with the described immunometabolic endotypes and free heme has been shown to contribute to sepsis pathogenesis6,48.
This study reinforces the generality of common sepsis endotypes, introduces to the best of our knowledge new cohorts of relatively understudied populations from West Africa and Southeast Asia, and identifies potential new therapeutic approaches to improving sepsis outcomes. The endotype-specific gene expression features that we identify are similarly regulated in stratified patient groups in multiple sepsis studies published over the past decade (Supplementary Fig. 6)7,49,50,51. Notably, there is great interest in using endotype and biomarker-guided strategies for precision medicine interventions that incorporate immunotherapeutics in subjects predicted to benefit52. Lymphocyte growth factors including IL7 have shown promise in promoting the innate immune response in humans53. Accordingly, endotypes have been proposed to contribute to inconclusive results from immunomodulator trials in sepsis subjects10. A direct link between outcomes and endotypes has been shown following the analysis of data from patients treated with hydrocortisone9,10. Corticosteroid treatment was shown to have differential effects on survival in subjects based on immunophenotypic endotypes. Interestingly, our data offers some preliminary support to this finding when we consider gene expression signatures in survivors and non-survivors within TDA groups. We note that terms including TNFα and IL6 signaling, and inflammatory response are increased in survivors in the immunocompetent group t4-5 but decreased in survivors in the acute-inflammation group t3 (Fig. 5b). Future studies will validate our endotypes and prognostic models in additional international sites across the globe40.
Conducting complex observational studies in low- and middle-income countries presents unique challenges but is necessary to create a more complete definition of sepsis endotypes and can only be done through a global study using unified methods. Pathogen identification is especially limiting because the infecting pathogen is often not identified in the developing world. In the United States, common causative bacterial pathogens include E. coli, Staphylococcus aureus, and Streptococcus pneumoniae54. In Africa, common sepsis etiologies include Salmonella spp., malaria, and Mycobacterium tuberculosis, particularly in persons living with HIV infection55. We have previously shown that sepsis screening tools that are widely used during clinical care had a sub-optimal performance for risk stratification in diverse international cohorts17. Pathogen diversity and the syndromic nature of sepsis might contribute to this result and encourage future studies to reveal endotypes linked to pathogens endemic to low- and middle-income countries or parasitic co-infections.
Data availability
Raw sequence data and processed subject gene level data used in this study have been deidentified and deposited in dbGaP (Accession: phs003661.v1) under restricted access in compliance with study informed consent and National Institutes of Health Human Subjects Protection guidelines. Data can be obtained following local IRB approval and with a letter of collaboration with the primary study investigator(s). Source data underlying the data analysis used to generate the figures in this study can be found in Supplementary Data 1 and 2.
References
Singer, M. et al. The third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA 315, 801 (2016).
Rubio, I. et al. Current gaps in sepsis immunology: new opportunities for translational research. Lancet Infect. Dis. 19, e422–e436 (2019).
Opal, S. M. & Cross, A. S. Clinical trials for severe sepsis: Past failures, and future hopes. Infect. Dis. Clin. North Am. 13, 285–297 (1999).
Carlet, J., Cohen, J., Calandra, T., Opal, S. M. & Masur, H. Sepsis: Time to reconsider the concept. Crit. Care Med. 36, 964–966 (2008).
Wong, H. R. et al. Identification of pediatric septic shock subclasses based on genome-wide expression profiling. BMC Med. 7, 34 (2009).
Scicluna, B. P. et al. Classification of patients with sepsis according to blood genomic endotype: a prospective cohort study. Lancet Respir. Med. 5, 816–826 (2017).
Davenport, E. E. et al. Genomic landscape of the individual host response and outcomes in sepsis: A prospective cohort study. Lancet Respir. Med. 4, 259–271 (2016).
Sweeney, T. E. et al. Unsupervised analysis of transcriptomics in bacterial sepsis across multiple datasets reveals three robust clusters. Crit. Care Med. 46, 915–925 (2018).
Yao, L. et al. Gene expression scoring of immune activity levels for precision use of hydrocortisone in vasodilatory shock. Shock 57, 384–391 (2022).
Antcliffe, D. B. et al. Transcriptomic signatures in sepsis and a differential response to steroids from the VaNISH randomized trial. Am. J. Respir. Crit. Care Med. 199, 980–986 (2019).
Herberg, J. A. et al. Diagnostic test accuracy of a 2-transcript host RNA signature for discriminating bacterial vs viral infection in febrile children. JAMA 316, 835 (2016).
McHugh, L. et al. A molecular host response assay to discriminate between sepsis and infection-negative systemic inflammation in critically Ill patients: discovery and validation in independent cohorts. PLoS Med. 12, e1001916 (2015).
Oved, K. et al. A novel host-proteome signature for distinguishing between acute bacterial and viral infections. PLoS One 10, e0120012 (2015).
Rao, A. M. et al. A robust host-response-based signature distinguishes bacterial and viral infections across diverse global populations. Cell Rep. Med. 3, 100842 (2022).
Rudd, K. E. et al. Global, regional, and national sepsis incidence and mortality, 1990–2017: analysis for the Global Burden of Disease Study. Lancet 395, 200–211 (2020).
Carlsson, G. Topological methods for data modelling. Nat. Rev. Phys. 2, 697–708 (2020).
Blair, P. W. et al. Screening tools for predicting mortality of adults with suspected sepsis: an international sepsis cohort validation study. BMJ Open 13, e067840 (2023).
Tiit, E.-M. Nonparametric Statistical Methods. Myles Hollander and Douglas A. Wolfe, Wiley, Chichester, 1999. No. of pages: xiii+779. Price: 39.95. ISBN 0-471-19045-4. Stat. Med. 19, 1386–1388 (2000).
Wickham, H. Ggplot2: Elegant Graphics for Data Analysis. (Springer New York, NY, New York, NY, 2009). https://doi.org/10.1007/978-0-387-98141-3.
Wilke, C. ggridges: Ridgeline Plots in ‘ggplot2’. R package version 0.5.4 https://CRAN.R-project.org/package=ggridges (2022).
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ (2023).
Rozo, M. et al. An observational study of sepsis in takeo province cambodia: An in-depth examination of pathogens causing severe infections. PLoS Negl. Trop. Dis. 14, 1–23 (2020).
Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2010).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 3, 861 (2018).
Singh, G., Mémoli, F. & Carlsson, G. Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition. in 4th Symposium on Point Based Graphics, PBG@Eurographics 2007 91–100 (2007).
Blair, P. W. et al. Distinct blood inflammatory biomarker clusters stratify host phenotypes during the middle phase of COVID-19. Sci. Rep. 12, 22471 (2022).
Lum, P. Y. et al. Extracting insights from the shape of complex data using topology. Sci. Rep. 3, 1236 (2013).
Benjamini, Y. & Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. R. Stat. Soc.: Ser. B (Methodol.) 57, 289–300 (1995).
O’Connell, A. A., Borg, I. & Groenen, P. Modern Multidimensional Scaling: Theory and Applications. J. Am. Stat. Assoc. 94, 338 (1999).
Mcquitty, L. L. Similarity analysis by reciprocal pairs for discrete and continuous data. Educ. Psychol. Meas. 26, 825–831 (1966).
de Vries, A. & Ripley, B. D. ggdendro: Create Dendrograms and Tree Diagrams Using ‘ggplot2’. R package version 0.1.23. https://CRAN.R-project.org/package=ggdendro (2020).
Bhuva D., Smyth G. & Garnham A. msigdb: An ExperimentHub Package for the Molecular Signatures Database (MSigDB). R package version 1.8.0. https://bioconductor.org/packages/msigdb (2023).
Wu, T. et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation 2, 100141 (2021).
Murtagh, F. & Legendre, P. Ward’s hierarchical agglomerative clustering method: which algorithms implement ward’s criterion? J. Classif. 31, 274–295 (2014).
Pedregosa, F. et al. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Hunter, J. D. Matplotlib: A 2D graphics environment. Comput Sci. Eng. 9, 90–95 (2007).
Krishnan, S., Beckett, C., Espinosa, B. & Clark, D. V. Austere environments Consortium for Enhanced Sepsis Outcomes (ACESO). Shock. 53, 377–378 (2020).
Schaack, D., Siegler, B. H., Tamulyte, S., Weigand, M. A. & Uhle, F. The immunosuppressive face of sepsis early on intensive care unit—A large-scale microarray meta-analysis. PLoS One 13, e0198555 (2018).
Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Mootha, V. K. et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet 34, 267–273 (2003).
De Jay, N. et al. MRMRe: An R package for parallelized mRMR ensemble feature selection. Bioinformatics 29, 2365–2368 (2013).
Peng, H., Long, F. & Ding, C. Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1226–1238 (2005).
Skaf, Y. & Laubenbacher, R. Topological data analysis in biomedicine: A review. J. Biomed. Inform. 130, 104082 (2022).
Reyes, M. et al. Plasma from patients with bacterial sepsis or severe COVID-19 induces suppressive myeloid cell production from hematopoietic progenitors in vitro. Sci. Transl. Med. 13, eabe9599 (2021).
Larsen, R. et al. A central role for free heme in the pathogenesis of severe sepsis. Sci. Transl. Med. 2, 51ra71 (2010).
Reyes, M. et al. An immune-cell signature of bacterial sepsis. Nat. Med. 26, 333–340 (2020).
Cazalis, M. A. et al. Early and dynamic changes in gene expression in septic shock patients: a genome-wide approach. Intensive Care Med. Exp. 2, 1–17 (2014).
Tsalik, E. L. et al. An integrated transcriptome and expressed variant analysis of sepsis survival and death. Genome Med. 6, 111 (2014).
Hotchkiss, R. S. & Opal, S. M. Activating Immunity to Fight a Foe — A New Path. N. Engl. J. Med. 382, 1270–1272 (2020).
Francois, B. et al. Interleukin-7 restores lymphocytes in septic shock: the IRIS-7 randomized clinical trial. JCI Insight 3, 17 (2018).
Khilnani, P. Severe Sepsis and Septic Shock. in ICU Protocols vol. 369 703–707 (Springer India, 2012).
Reddy, E. A., Shaw, A. V. & Crump, J. A. Community-acquired bloodstream infections in Africa: a systematic review and meta-analysis. Lancet Infect. Dis. 10, 417–432 (2010).
Acknowledgements
We thank the Naval Medical Research Command Genomics group at the Biological Defense Research Directorate as well as Carlo Colantuoni and Derren Barken for their support. This work was accomplished through the efforts of many in the Austere environments Consortium for Enhanced Sepsis Outcomes. We also thank the clinical research teams at Komfo Anokye Teaching Hospital in Ghana, Duke University Hospital in North Carolina, and Takeo Provincial Hospital in Cambodia. This work was supported by the Defense Health Agency through the Joint West Africa Research Group, Combating Antibiotic Resistance Bacteria (FY1819 0130.1832), and Defense Threat Reduction Agency (JSTO-CBA) to Naval Medical Research Command (NMRC) (HDTRA1516108), Defense Health Bureau of Medicine & Surgery to NMRC for Naval Medical Logistics Command Cooperative Agreement (N626451920001). K.L.S. is an employee of the US government and A.G.L. is a military service member. This work was prepared as a part of official duties. Title 17 U.S.C. 105 provides that Copyright protection under this title is not available for any work of the United States Government. Title 17 U.S.C. 101 defines a U.S. Government work as a work prepared by a military service member or employee of the U.S. Government as part of a person’s official duties. The views expressed reflect the results of research conducted by the authors and do not necessarily reflect the official policy or position of the Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., Department of the Navy, Department of Defense, nor the United States Government.
Author information
Authors and Affiliations
Contributions
J.G.C., D.V.C., K.L.S., C.W.W., and E.L.T. conceived the study. Gene expression analysis and bioinformatics was performed by P.G., J.B., D.A.S. G.B.F and J.G.C. Feature selection and prognostic modeling was performed by G.B.F. and E.C. Data interpretation was performed by J.G.C., D.V.C., P.G., J.B., D.A.S., E.D., I.B., and S.K. Clinical interpretation was contributed by P.W.B., C.O., A.O.O, C.G.B., G.O., T.V., and A.G.L. The manuscript was written by J.G.C., P.G., J.B., G.B.F. and D.A.S. with input from all authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare the following competing interests: D.V.C., J.B., J.G.C., and D.A.S. are listed as inventors on a U.S. Provisional Application No. 63/578,492, entitled: “Prediction of Mortality of Patients Diagnosed with Sepsis,” applied for by the Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc., on the identification of molecular endotypes that predict sepsis mortality. E.L.T is an employee and has equity in Danaher. E.L.T and C.W.W have received consulting fees from Biomeme. All other authors have no competing interests to declare.
Peer review
Peer review information
Communications Medicine thanks Richard Hotchkiss and Edy Kim for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chenoweth, J.G., Brandsma, J., Striegel, D.A. et al. Sepsis endotypes identified by host gene expression across global cohorts. Commun Med 4, 120 (2024). https://doi.org/10.1038/s43856-024-00542-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s43856-024-00542-7
- Springer Nature Limited