Introduction

Idiopathic Pulmonary Fibrosis (IPF) is an interstitial lung disease of unknown origin characterized by progressive lung fibrosis [1]. The pathogenesis of IPF is complex and still unclear. Previous studies of whole genome transcriptomics have described alterations in different molecular pathways in end-stage IPF lungs, including aberrant activation of epithelial cells that promote fibroblast to myofibroblast differentiation [2, 3], excessive production of extracellular matrix proteins, such as matrix metalloproteases (MMPs), collagen and fibronectin [4, 5], aberrant activation of lung developmental pathways [6, 7], mitochondrial abnormalities [8, 9] and oxidative stress [9, 10, and type II epithelial cells and fibroblasts senescence [2, 11, 12]. The combination of all these pathogenic mechanisms leads to a highly heterogeneous disease, in which the identification of disease endotypes is an important unmet clinical need to move toward precision treatment [13].

In this setting, the role of the immune system is unclear. Some studies have proposed a role of immune pathways such as CD3 + and CD20 + lymphocytes in the development of fibrosis [14, 15] through the promotion of epithelial to mesenchymal transition (EMT) [4, 7, 4,16,17]. Further, the progression of IPF and the occurrence of exacerbations was associated with B cell responses [18, 19] through their capacity to modify the pro or anti-fibrotic lung micro-environment, thus influencing fibroblasts activity [20]. However, other findings challenge the role of the immune response in IPF [21]. First, clinical trials with immune-suppressive agents showed increased mortality and fibrosis in treated patients [22]. Second, the expression of markers of lung T lymphocytes exhaustion (such as PD-1, ICOS and CD28) is associated with enhanced TGF-β production and poor survival in IPF [23, 24]. Finally, the proportion of NK cells with impaired activity is reduced in IPF lungs [25] and their functionality is profoundly compromised by the lung microenvironment [26].

We therefore hypothesized that it is likely to be significant immune-related molecular heterogeneity in patients with IPF. To test this hypothesis, we used gene set variation analysis (GSVA) in lung tissue samples of patients with IPF, instead of previous studies using conventional analysis of single-gene expression. GSVA is a statistical technique that enables the discovery of inflammatory and leukocyte lineage gene signatures by comparing combined enrichment scores (ESs) of established and predefined gene sets, especially in heterogeneous samples [27, 28]. Specifically: (1) we first applied GSVA to lung transcriptomic data of 109 severe IPF patients (explanted lungs) available at the Lung Tissue Research Consortium (LTRC) to estimate the proportion of immune cells in their lungs; (2) we then used unbiased cluster analysis to identify distinct groups of IPF patients with overall distinct level of immune signatures; and, finally, (3) we explored differential gene expression between observed clusters, both for newly identified signatures as well as for previously stablished IPF related pathways.

Methods

Availability of data and materials

The datasets supporting the conclusions of this article are available in the NIH public repository Lung Tissue Research Consortium (LTRC), https://www.nhlbi.nih.gov/science/lung-tissue-research-consortium-ltrc. Tables with the full results of the analysis performed to support the conclusions are available in the online supplement.

Study design, patients and ethics

Transcriptomic data of IPF explanted lungs (n = 109) was obtained from the LTRC following established procedures. Experimental validation using cell-based (not mRNA) methods (flow cytometry) was performed in lung tissue samples of IPF patients undergoing bilateral lung transplant at the University of Pittsburgh (USA). The Institutional Review Board and the Committee for Oversight of Research and Clinical Training Involved Decedents of the University of Pittsburgh, approved the study and the sample transfer respectively. In all cases, a signed informed consent form was collected before organ procurement.

Clinical characterization of IPF patients

Available clinical data in LTRC include age, sex, body mass index, Forced Expiratory Volum (FEV1), Forced Volum Capacity (FVC), carbon monoxide diffusing capacity (DLCO), quantify Computed Tomography (CT) of the thorax by an adapted version of the CALIPER software and daily activity and health questionaries. All procedures were realized following LTRC protocols, the diagnosis of IPF was performed by a specialist evaluating the medical record, CT scan report and the post-transplant pathology report.

GSVA, immune-signatures enrichment and unbiased cluster analysis

We analyzed the transcriptomic data set GSE47460 from the LTRC [29]. This data set was split in two, GPL14550 was used as a discovery data set (D#1, n = 109) whereas GPL6480 was used for validation (D#2, n = 34). For the current analysis we used the normalized matrix downloaded from GEO, selecting only patients with a diagnosis of IPF. Gene set variation analysis (GSVA) was used to determine patient‐wise enrichment scores (ES) that indicate the relative collective expression of genes within the gene signatures for patients relative to the rest of the cohort of patients in a given transcriptomic dataset [30]. Sets of the immune signatures used were based on available gene expression publications (n = 31, Additional file 2: Table S1) [27, 31]. Unbiased clustering of the GSVA immune signatures were identified using the dendextend R package in R [32]. To maximize the differences in the GSVA scores, the number of clusters was set at 2, the distance metric was calculated with the minkowski method and the hierarchical clustering method was ward. D2 [32].

Differential gene expression between clusters was investigated using limma [33]. To build the correlation network with the clinical parameters and to further understand the relationship between the immune and epithelial cells in these patients, the gene sets included in our GSVA analysis were extended, while preserving the already obtained immune-based unbiased clustering, to include epithelial lineage cell signatures (skipping genes already included in the immune cell signatures) (Additional file 2: Table S2).

Experimental validation of LTRC results in fresh lung tissue samples by flow cytometry

To validate results from the GSVA immune enrichment in the LTRC, we used flow cytometry, a non-mRNA related method. Fresh lung tissue samples of IPF patients undergoing bilateral lung transplant at the University of Pittsburgh (USA) were washed with PBS and enzymatically digested as previously described [34]. Lung homogenates included multiple areas of the same lung lobe, ensuring the representability of the sample to address patient’s heterogeneity. Lung tissue homogenates (106 cells) were then stained 5 min with the viability staining (Fixable viability-Alexa600, BD, USA) and 30 min at 4ºC in the dark with the following conjugated monoclonal antibodies CD3-PECy5.5, CD45-Alexa700, CD16-BV412, CD56-FITC, CD8-V500, CD4-APC-Cy7, CD19-BV650 (BD, USA) and CD14-PE (BioLegend, USA). A minimum of 5 × 105 cells per sample were acquired in a FACS LSRII (BD Biosciences, USA), and data was analyzed using FlowJo v10 (FlowJo LLC, USA). Immune cell populations were determined using the gating strategy depicted in Additional file 1: Fig. S1.

Biologic pathway analysis

To evaluate the enrichment of biological signatures in the observed clusters, gene ontology (GO) enrichment and hypergeometric tests were used [35]. The gene signatures for the hypergeometric test were selected from previously published sc-RNAseq studies: epithelial cells signatures [36,37,38,39] and fibroblast related signatures [37,38,39,40,41]; or from the Gene Ontology (GO) extracellular matrix (GO:0031012), oxidative stress (GO:0000302), mitochondrial transport (GO:0006839), mitochondrial respiratory chain (GO:0005746) and response to stress (GO:0006950). Additional file 2: Table S3 shows the complete list of gene signatures investigated here.

Statistical analysis

Quantitative and qualitative data is presented as mean, or n and proportion, respectively. Results were compared using the ANOVA or Fisher tests, as appropriated. Differences in the distribution of the GSVA calculated signatures between clusters were assessed with the ANOVA test too. Correlations between immune cell signatures and clinical features were assessed using the Spearman correlation test, which was considered statistically significant if its r value was >|0.5| and the p value < 0.05. To explore correlations between biological and clinical features, we used network analysis, where each node was the variable of interest, its size was proportional to its mean value in each cluster, and links (edges) represent the Spearman Rho between linked variables, with results being plotted using Cytoscape [42]. All statistics were computed with R 4.2.2, using custom scripts.

Results

Cluster analysis of enriched immune-signatures in the LTRC

The main demographic and clinical characteristics of IPF patients included in D#1 and D#2 were similar (Additional file 2: Table S4). Briefly, the studied population presented the clinical characteristics of end-stage IPF disease, a severe impairment of the DLCO and FVC, and fibrotic features in the CT scan, presence of honeycombing, ground grass opacity, reticular densities and vessels. As shown in Fig. 1, in both data-sets (panels A and B) k-means unbiased clustering of GSVA enriched immune signatures identified two clusters of IPF patients (C#1 and C#2) with different levels of immune expression. Additional file 2: Table S5 shows the mean ES in each cluster and the p-value for the comparison of both clusters. C#1 had a higher ES than C#2 in all analyzed immune signatures except for three of them where no significant differences were observed between clusters. The biggest differences were found in cytotoxic cells (both adaptive CD8 + T cells and innate NK lymphocytes) and memory T cells.

Fig. 1
figure 1

Unbiased clustering of obtained GSVA-immune enrichment scores, in the IPF LTRC samples. A Data-set 1 and B Data-set 2. The density color keys at the top left of each figure define the scoring for each gene signature ranging from − 1 in blue to 1 in red

Table 1 compares the main clinical differences between IPF patients included in the two clusters (C#1 and C#2) identified in D#1. Briefly, C#1 (high immune expression) included slightly younger individuals, with more symptoms, and less low attenuation area by CT scan. These differences were reproduced in the two clusters determined in D#2 (Fig. 1B and Additional file 2: Table S6).

Table 1 Main clinical characteristics of the two IPF GSVA clusters in D#1. Statistically significant results are highlighted in bold

A sensitivity analysis, done using only the data of upper lobes or lower lobes showed that there were no differences in the immune signatures enrichment in upper vs. lower lung lobes (Additional file 1: Figure S2 and Additional file 2: Table S7, S8). Likewise, the direct comparison between different lobes did not identify differences in the immune-signatures ES, although we found the expected differences in CT scan parameters with increased extend of the fibrosis related parameters in the lower lobes (Additional file 2: Table S9).

Validation of results by flow cytometry in fresh lung tissue

To validate the above discussed results, we used flow cytometry in fresh lung tissue samples harvested from IPF explanted lungs (Additional file 2: Table S10). Further, to exclude the possibility that the two clusters identified above may actually correspond to pathology heterogeneity within the sampled lung lobe rather than differences between patients, for flow cytometry measurements we used lung homogenates from multiple areas to ensure proper representation of the whole pulmonary lobe. By doing so, unbiased clustering of flow cytometry data confirmed the existence of two clusters of patients that differed in the proportion of T-cells, CD4, CD8, B-cells, NK cells, NKT-like cells and macrophages (Fig. 2A).

Fig. 2
figure 2

Unbiased clustering of the flow cytometry data generated for the validation. Flow cytometry determination of the main lung immune populations followed by unbiased clustering showed the presence of two clusters of IPF patients based on their level of immune infiltrate in fresh lung samples. The density color key at the top left define the scoring for each gene signature ranging from (− 1) in blue to (1) in red

Biological pathways

To gain insight into the biological process altered in the two clusters of IPF patients identified from the immune signatures enrichment in the LTRC, we investigated differentially express genes (DEG) in C#1 and C#2. Using an adjusted p value < 0.05 and log Fold Change (LgFC) >|0.65| we found 777 DEG; 153 (19.7%) of them were upregulated in C#1 and 624 (80.3%) in C#2 (Additional file 2: Table S11).

Additional file 2: Table S12 lists the biological ontologies associated with these DEG. Of note, C#1 showed activation of immune response ontologies whereas C#2 included ontologies related to ciliary function. To contrast these results with pathways previously reported in IPF, we performed a hypergeometric test on DEG with specific IPF related signatures, including epithelial lineage, cell cycle, senescence, extracellular matrix, myofibroblast activation, response to pirfenidone treatment, oxidative stress, endoplasmic reticulum stress, mitochondrial related genes and immune lineage (Additional file 2: Table S3). We observed that C#1 showed increased viral response and immune infiltrate gene signature, thus supporting GSVA unbiased clustering results. By contrast, C#2 was characterized by altered epithelial cell lineage (Fig. 3 and Table 2), particularly upregulation of genes related to EMT, secretory and ciliated cells. Interestingly, there were no differences between C#1 and C#2 in fibrosis associated gene signatures (Fig. 3 and Table 2).

Fig. 3
figure 3

Hypergeometric test of the percentage of cluster differentially express genes in the studied biological gene signatures. A Immune signatures, B IPF related signatures and C epithelial signatures

Table 2 Hypergeometric test comparing the percentage of differentially express genes between clusters that belong to the following specific signatures. Statistically significant results are highlighted in bold

Network analysis

Finally, to further understand the relationship between the immune and epithelial cells in these IPF patients, the GSVA analysis was extended by adding the epithelial lineage cell signatures (Additional file 2: Table S2) and correlation networks were built considering the transcriptomic immune and epithelial signatures enrichment and the clinical characteristics of the patients. Of note, to analyze the relationship between CT findings and cell signatures we used only CT measures in the profiled pulmonary lobe. Additional file 2: Figure S3 shows a first neighbor correlation network of the clinical parameters and epithelial signatures in C#1 and C#2. In both clusters we observed a negative correlation between epithelial and immune cells (dashed edges (links)). Specifically, epithelial, ciliated, and secretory cell signatures were negatively correlated with central memory CD8 + T cells, Th2 T cells and immature B cells. Interestingly, only in C#2 we identified a correlation between the transcriptomic signatures and disease severity: a positive correlation with fibrosis associated CT parameters and a negative correlation with the FVC value.

Discussion

The main and novel observation of this study are that, by using unbiased cluster analysis of lung immune signatures in a large cohort of patients with IPF (n = 109), we identified two clusters (C#1 and C#2) of similar size with different immune-related characteristics and differentially expressed genes: C#1 (n = 55, 53%) was characterized by a higher expression of immune signatures, particularly cytotoxic and memory T cells, whereas C#2 (n = 49, 47%) was characterized by an upregulated expression of cilium associated genes, epithelial and secretory cells (structural cell cluster). Interestingly, though, the clinical presentation of these two clusters was remarkably similar, indicating that at the end-stage of the disease the identified molecular heterogeneity does not translate directly into a different clinical phenotype. However, further research is need to understand whether these clusters are already present in earlier phases of the disease and/or associated with the disease progression.

Previous studies

A few previous studies used transcriptomic data to identify clusters of IPF patients. Using lung transcriptomics, Yang et al. identified a cilium associated subtype and a fatty acid metabolism one [43], but the expression of immune related genes or the associated cell types was not reported. Using blood transcriptomics, Kraven et al. described three clusters of IPF patients, one of them enriched in immune response genes [44]. Additionally, Herazo-Maya JD et al. identified a 52 gene signature on PBMCs that stratified patients with different disease outcomes [45, 46], and an increase of peripheral blood monocytes has been associated with poor prognosis [47]. Finally, De Sadeleer et al. used transcriptomic results of bronchoalveolar lavage fluid analysis identified 6 clusters in IPF patients, one of them again enriched in immune signatures [48]. Collectively, these studies support our observation of immune heterogeneity in IPF. To our knowledge, however, no previous study has used unbiased cluster analysis of IPF lung immune signatures enrichment. Importantly, results were validated experimentally in independent lung tissue samples using non-mRNA related method (flow cytometry).

Interpretation of novel findings

The application of this cutting-edge methodology to IPF lung tissue allowed us to identify two clusters of IPF patients (C#1 and C#2) with marked biological differences: while C#1 was an "immune-cell" cluster, particularly enriched in cytotoxic and memory T cells, C#2 was a "structural cell" cluster, with marked upregulation of cilium, epithelial and secretory cells genes. Because in the study mentioned above Yang et al. also identified a cilium associated IPF subtype using lung transcriptomics [11], we explored the degree of overlap between their results and our identified clusters. The hypergeometric test showed that our C#2 shared a 99% and 72% of their described genes, indicating that our unbiased clustering of immune signature enrichment generates a similar grouping of IPF patients than the more traditional transcriptomic hierarchical clustering.

From the clinical viewpoint, it is of note that these two very different biologic clusters of patients with IPF show remarkably similar clinical characteristics (Table 1). We think that this may likely be due to the fact that lungs were harvested at transplantation, this is at an end-stage course of the disease. It is possible that at an earlier stage, clinical differences may have been more evident or that these two clusters represent different disease trajectories, varying in either rate of progression, frequency of infections or exacerbations and/or the response to treatment. All these possibilities require and deserve future research. This is the main limitation of the study, the lack of longitudinal information to understand the disease evolution, progression and a record of infections and exacerbations that could have a direct impact in the lung immunological state.

Conclusions

The use of unbiased clustering of the transcriptomic enrichment in immune signatures in lung tissue of patients with end-stage IPF identified two distinct clusters, an immune-cell one and a structural-cell one, with a negative correlation between the expression of immune and epithelial related signatures. These very different biological clusters are not related with clinical characteristics but whether they are present at an earlier stage and/or there is an association with disease phenotypes or progression should be further studied.