Introduction

Lung cancer is the most prevalent type of cancer that threatens human health and life quality, and it is the leading cause of cancer-related deaths worldwide1. Approximately 85% of lung cancer cases are non-small cell lung cancer (NSCLC), and Lung adenocarcinoma (LUAD) is the most common histological subtype of NSCLC2. Despite new medical treatments that have prolonged the survival of LUAD patients, the overall treatment outcomes of LUAD remain not ideal due to distant metastasis, recurrence, or advanced stage at diagnosis3,4. Therefore, continued efforts to identify novel and sensitive therapeutic targets for prognosis prediction and providing individualized therapy for LUAD patients are urgently required.

Immune checkpoint inhibitors (ICIs), a class of therapeutic agents that function by activating the immune system and modulating T lymphocyte activities, have achieved unprecedented results in NSCLC5,6,7,8. However, it is important to note that ICIs are associated with a specific set of treatment-related adverse events9,10,11,12. Growing studies focused on tumor microenvironments have reported that the tumor immune microenvironment (TME) determined by tumor-infiltrating stromal and immune cells, is closely related to tumor progression, metastasis, immune escape, and affects the sensitivity of immunotherapy and the survival of cancer patients13. For instance, during LUAD progression, cancer-associated fibroblasts (CAFs) regulate the synthesis and remodeling of the extracellular matrix, which could increase the flintiness of tumor tissue and tumor growth14,15. CAFs can paracrine transforming growth factor-β (TGF-β) and lead to tumor heterogeneity in LUAD16. Also, CAFs can secret chemokine CXCL12 to mediate M2-polarized macrophages, thereby promoting LUAD progression both in vitro and in vivo17. Tumor-associated immune cells, primarily T cells, can produce cytokines that promote tumor cell migration and metastatic spreading18.

Growing studies had reported that pre-metastatic tumor microenvironment, recruiting various myeloid cells, to facilitate cancer cell recruitment and colonization 19,20. Stefanescu et.al found that myeloid cells can suppress CD8 + T cell activity to exert antitumor immune responses during lung metastasis by activating TGF-β signaling pathway21. Another study showed Gr-1 + CD11b + myeloid cells can upregulating metalloproteinases MMP9 to promote lung metastasis and vascular remodeling22. Since the above studies distinctly demonstrated the important roles of myeloid cells in driving lung cancer cell metastases, we believe myeloid cells may act as prognostic markers for LUAD patient overall survival.

Herein, we collected the clinical and mRNA expression data of LUAD patients from TCGA and GEO databases in order to identify the association of myeloid cell differentiation—related genes expression with prognosis, immune cell infiltration, therapeutic benefits via integrative bioinformatics.

Method and material

Data download and myeloid cell differentiation—related genes selection

The RNA-sequencing (RNA-seq) data and clinicopathological features of TCGA LUAD patients were downloaded from the TCGA website (https://portal.gdc.cancer.gov/). Among them, 58 normal samples and 501 LUAD cases with mRNA expression data, follow-up information, and clinical characteristics were used for further analysis. Additionally, another independent cohort, GSE68465, which contained 442 LUAD cases, was retrieved from the Gene Expression Omnibus database for external validation (https://www.ncbi.nlm.nih.gov/geo/). The genes involved in the myeloid cell differentiation—related pathway were downloaded from the Molecular Signatures Database (https://www.gsea-msigdb.org/gsea/msigdb/human/search.jsp)” (Supplementary Table 1).

Construction and validation of myeloid cell differentiation—related prognostic signature

We applied the “edgeR” R package to screen out the differentially expressed genes (DEGs) between normal tissues and tumor tissues. Gene sets based on the criteria (False discovery rates (FDRs) < 0.05 and threshold of |logFC|> 1) were defined as DEGs. To elucidate the mechanism of DEGs, Gene Ontology (GO) analysis and the Kyoto Encyclopedia of Genes and Genomes (KEGG) annotation were performed using DAVID online tools.

Univariate regression analysis was used to analyze the association between DEGs and overall survival, respectively. Only genes with hazard ratio (HR) > 1 and logFC > 1(P < 0.05), or HR < 1 and logFC < − 1 (P < 0.05) will be included in the subsequent analysis. Further, the backward stepwise selection was applied using the likelihood ratio test with Akaike's information criterion as the stopping rule to obtain the risk model, and the lowest value of AIC provided the sensitivity and specificity. Subsequently, six genes (F2RL1, PRKDC, TNFSF11, INHA, PLA2G3 and TUBB1) were selected. The risk score model was calculated as follows: risk score = \({\Sigma }_{\text{i}=1}^{\text{n}}\) Coeffi ∗ Xi. In this formula, n, Xi, and Coeffi represented the number of hub genes, gene expression level, and coefficient of gene, respectively. In the TCGA cohort and GEO cohort, patients were divided into low- and high-risk groups according to the optimal cut-off value. The "surv_cutpoint" function of “survminer” R package was used to measure optimal cut-off value of risk score. Then, K-M survival curves were carried out to compare the survival differences between low-risk and high-risk groups. Moreover, time-dependent ROC analysis was applied to assess the efficiency and accuracy of the established prognostic risk model of LUAD using the “survivalROC” R package.

Assessment of immune microenvironment

The ESTIMATE algorithm was used to calculate the ratio of the immune and stromal components in the tumor microenvironment based on gene expression profiles9. Then, the tumor purity and TME scores which include ImmuneScore (indicating the level of immune cells infiltration), StromalScore (indicating the level of stromal cells infiltration) and ESTIMATEScore (reflecting the sum of both) for each patient were measured by the R package “estimate”23.

Tumor immune cell infiltration analysis

For a more comprehensive estimation of immune cell infiltration, TIMER, XCELL, CIBERSORT abs, QUANTISEQ, MCPCOUNTER, EPIC, CIBERSORT, and ssGSEA algorithms were applied to quantify the relative proportions of immune cells in the LUAD patients24. The relationship between risk score levels and immune infiltrating cells was explored using Spearman’s rank correlation analysis.

Nomogram construction and verification

To provide clinicians with a quantitative method for predicting the prognosis of LUAD patients, the R packages “rms” and “survival” were applied to establish the nomogram, and the prognosis was predicted by combining the risk score and clinical characteristics (age, gender, smoking, T, and N) of LUAD patients. The accuracy of the model was evaluated by calibration curves. The calibration curve approaching to 45-degree diagonal line indicates perfect predictive capability.

Drug sensitivity analysis

The “pRRophetic” R package was applied to predict the half-maximal inhibitory concentration (IC50) of each small molecular compounds for the low-risk and high-risk groups25.

Cell lines and culture

The following human LUAD cell lines (A549 (RRID:CVCL_0023), PC-9 (RRID:CVCL_B260)) and human normal lung epithelium cell line BEAS-2B were purchased from the Chinese Academy of Sciences (Shanghai, China). A549 and PC-9 were cultured in RPMI-1640 (KeyGen Biotechnology, Nanjing, China), and BEAS-2B was cultured in DMEM (KeyGen Biotechnology, Nanjing, China), supplemented with 10% fetal bovine serum (FBS) at 37 °C with 5% CO2 in a humidified atmosphere. All cell lines were authenticated by STR profiling.

Quantitative real-time polymerase chain reaction (qPCR)

Total RNA were isolated from cells using TRIzol reagent (Invitrogen, Carlsbad, CA) according to the manufacturer’s protocol, and then reversely transcribed into cDNA using 4 × EZscript RT Mix II (EZBioscience, USA) Kit. cDNA was amplified using SYBR Green PCR Kit (EZBioscience, USA) with a QuantStudio5 Sequence Detector. The primers for amplification were listed in Supplementary Table 2.

Patient consent statement

TCGA and GEO belong to public databases. The patients involved in the databases provided their written informed consent to participate in this study. Users can download relevant data for free for research and publish relevant articles.

Results

Construction and evaluation of myeloid cell differentiation—related gene signature

To describe our study more clearly, a flow chart is provided in Fig. 1 to summarize the entire study. The RNA-seq data of 58 normal lung tissues and 501 LUAD tissues from the TCGA-LUAD cohort were merged and normalized to make further comparisons. The detailed basic characteristics of these patients are showed in Table 1. In the present study, 83 DEGs were identified, the Volcano plot and heatmap of DEGs are shown in Fig. 2A and Fig. 2B. Then, we performed GO functional annotation and KEGG analyses for DEGs using DAVID online analyses (Fig. 2C,D). Univariate cox regression analysis was conducted to estimate the prognostic significance of DEGs. Of the DEGs, 13 were correlated with OS in the univariate Cox regression analyses (all P < 0.05). Then, based on the selection criteria, we carried out backward stepwise selection to determine the best performance efficacy predictive model with the minimum AIC value. Ultimately, a risk model including F2RL1, PRKDC, TNFSF11, INHA, PLA2G3 and TUBB1 was successfully constructed. Based on the formula (-0.102 *gene expression level of PLA2G3 + -0.129*gene expression level of TUBB1 + 0.081* gene expression level of TNFSF11 + 0.053* gene expression level of INHA + 0.199* gene expression level of F2RL1 + 0.132* gene expression level of PRKDC), each patient’s risk score was calculated. The results obtained were used to divided patients into low-risk (n = 390) and high-risk (n = 111) subsets based on the optimal cut-off point, which was calculated by “survminer” package. The K-M curve showed that the high-risk group had a greater mortality risk compared with the low-risk group (Fig. 3A). Meanwhile, the risk heatmap clearly shows F2RL1, PRKDC, TNFSF11 and INHA were up-regulated as the risk score increases, while PLA2G3 and TUBB1 were down-regulated (Fig. 3B).

Figure 1
figure 1

Flow chart of data collection and analysis in the present study.

Table 1 Basic characteristics of lung adenocarcinoma patients.
Figure 2
figure 2

Identification of the myeloid cell differentiation -related DEGs in TCGA cohort. (A, B) Volcano plots and heatmap visualize the myeloid cell differentiation—related DEGs in TCGA-LUAD. The red dots represent up‑regulated genes and the blue dots represent down-regulated genes (FDR < 0.05, |FC|> 1). (C) Top 20 terms of BP, CC, and MF in GO analysis for gene signature-related DEGs. (D) Top 20 terms of pathways in KEGG analysis for gene signature-related DEGs.

Figure 3
figure 3

The overall performance of the myeloid cell differentiation -related gene signature in TCGA and GEO cohorts. K-M survival analysis of myeloid cell differentiation—related gene signature in TCGA (A) and GEO (C) cohorts. (B, D) Distributions of risk score and patients’ survival time in TCGA and GEO LUAD patients, and heatmaps of six gene expression profiles in both cohorts.

Validation of the six-gene signature in the GSE68465 cohort

To further test the performance of the constructed model, the risk score of each sample (n = 442) in GSE68465 was calculated using the same risk formula above. Ultimately, 49.32% of the LUAD patients (n = 218) in the GSE68465 cohort were categorized into the low-risk subgroup, while the remaining patients (n = 224) were categorized into the high-risk subgroup. Consistent with the results of the TCGA-LUAD dataset, the patients in the high-risk group also had lower survival rate than those in the low-risk group with p = 0.003 (Fig. 3C and Fig. 3D).

Correlation between risk score and immune landscape

To characterize the immune environment of LUAD patients, we assessed the immune infiltration of tumors using the TIMER, XCELL, CIBERSORT abs, QUANTISEQ, MCPCOUNTER, EPIC, and CIBERSORT algorithms. Then, spearman’s rank correlation analysis showed most immune and stromal cells were negatively correlated with risk scores (Figs. 4A and 5A). Furthermore, the results of ESTIMATE algorithm showed that the estimate score and immune score of the low-risk group were significantly higher than those of the high-risk group, and lower tumor purity was observed in the low-risk group (Figs. 4B and 5B). The relative proportion of 28 tumor infiltrating immune cells (TIICs) calculated by the ssGSEA algorithm in the both cohorts were presented in Fig. 4C and Fig. 5C. Compared with high-risk subset, the low-risk subset had higher percentages of Activated B cell, Activated CD4 T cell, T follicular helper cell, Eosinophil, Mast cell, Monocyte and Natural killer cell.

Figure 4
figure 4

Evaluation of the TME between the low-risk and high-risk groups in TCGA cohort. (A) Bubble chart revealed that most immune and stromal cells were critically negatively correlated with the risk score. (B) Violin plots revealed the difference of immune scores, stromal score, estimate score and tumor purity between high-risk and low-risk groups in TCGA-LUAD patients (C) Different infiltrating abundances of 28 TIICs estimated by ssGSEA between subgroups.

Figure 5
figure 5

Evaluation of the TME between the low-risk and high-risk groups in GEO cohort. (A) Bubble chart revealed that most immune and stromal cells were critically negatively correlated with the risk score. (B) Violin plots revealed the difference of immune scores, stromal score, estimate score and tumor purity between high-risk and low-risk groups in GSE68465 patients (C) Different infiltrating abundances of 28 TIICs estimated by ssGSEA between subgroups.

Therapeutic potential of the myeloid cell differentiation—related gene signature in LUAD

In order to elucidate the potential correlation between myeloid cell differentiation—related gene signature and immunotherapy, we compared the expression differences of 12 immune checkpoints (CD28, CD274, CTLA4, TNFSF4, BTLA, TGFBR1, TNFSF18, CD40LG, CD70, CD200, TNFSF9 and CD48) between the two subgroups and detected that five immunotherapy targets had high expression levels in the high-risk subgroup (Fig. 6A).

Figure 6
figure 6

Expression of immune checkpoints, m6A genes and sensitivity to chemotherapeutic drugs between the high-risk and low-risk subgroups in TCGA cohort. (A) The expression level of key immune checkpoint genes and (B) m6A genes between the two groups divided by risk score level. (C) The IC50 of six chemotherapeutic drugs compared between high-risk and low-risk groups.

Increasing studies have focused on m6A, the most abundant mRNA modification and is involved in tumor occurrence and development26. Our results showed the expression of m6A-related genes (METTL14, YTHDC1, YTHDF1, YTHDF2, FMR1 and IGF2BP1) were remarkably higher among the high-risk group compared with low-risk patients (Fig. 6B).

As for chemotherapy sensitivity, we calculated six chemotherapeutic drugs via “pRRophetic” R package. The results showed JNK inhibitor VIII, cisplatin, paclitaxel, Cyclopamine, Rapamycin and Erlotinib displayed lower IC50 values in the high-risk group (Fig. 6C). Thus, high risk patients based on the myeloid cell differentiation -related gene signature will better benefit from chemotherapy.

Construction of a nomogram

Time-dependent ROC curves were plotted by “survivalROC” R package, and the AUC was computed at different time points to estimate the predictive performance of the gene signature. In TCGA dataset, the AUC of 1-, 3- and 5-year OS were 0.682, 0.675 and 0.656, respectively (Fig. 7A). Furthermore, the risk score model was an independent factor in both univariate and multivariate Cox analysis (P < 0.001, HR: 2.718, 95% CI: 1.957–3.776 and P < 0.001, HR: 2.315, 95% CI: 1.513–3.541, respectively) (Table 2).

Figure 7
figure 7

Evaluation of the predictive ability of the myeloid cell differentiation—related gene signature. (A, B) Time-dependent ROC curve of risk model at 1-, 3-, 5- year in TCGA and GEO cohorts. (C) The nomogram for predicting the overall survival of TCGA-LUAD patients. (DF) Calibration curves for evaluating the compatibility between the predicted and actual outcomes at 1-, 3-, 5- year in TCGA cohort. (GI) Calibration curves for evaluating the compatibility between the predicted and actual outcomes at 1-, 3-, 5- year in GEO cohort.

Table2 Univariate analysis and multivariate analysis of the correlation of risk score with outcomes among lung adenocarcinoma patients in two cohorts.

Besides, in GEO dataset, the AUC for OS were 0.658 at 1 year, 0.619 at 3 years, and 0.589 at 5 years (Fig. 7B). Also, the six-gene signature was an independent factor in both the univariate and multivariate Cox analyses (p < 0.001, HR: 3.553, and 95% CI: 1.783–7.080; and p = 0.010, HR: 2.651, and 95% CI: 1.264–5.560, respectively) (Table 2).

To further test the performance of the constructed model, we combined it with the clinical characteristics (included age, gender, T stage, N stage, and smoking) of the TCGA-LUAD cohort to construct a quantitative nomogram for predicting 1-year, 3-year, and 5-year OS of LUAD (Fig. 7C). Calibration curves were then plotted to compare the predicted probability of OS with the actual OS. The results showed that the predicted results of OS in terms of the 1-, 3-, and 5-year were consistent with the actual results in both TCGA-LUAD cohort (Fig. 7D–F) and the GSE68465 cohort (Fig. 7G–I).

Analysis of the six gene expression in silico and in cell lines

We analyzed the normalized mRNA expression of six genes in TCGA cohort and the result indicated F2RL1, INHA, PRKDC and TNFSF11 were significantly higher in LUAD tumor tissues than that in adjacent normal samples, while PLA2G3 and TUBB1 were opposite (Fig. 8A). qPCR was used to validate mRNA level of six genes in LUAD cell lines (PC-9 and A549) and human normal lung epithelial cells (BEAS-2B). The results revealed that the expression of F2RL1, INHA, PRKDC and TNFSF11 were significantly increased in PC-9 and A549 cell lines. In addition, PLA2G3 and TUBB1 were down-regulated significantly (Fig. 8B). These data was in accordance with our bioinformatics analysis results.

Figure 8
figure 8

differences of six-gene expression levels. (A) F2RL1, PRKDC, TNFSF11, INHA, PLA2G3 and TUBB1 mRNA expression levels were significantly different in the unpaired tumor-adjacent normal TCGA LUAD samples. (B) F2RL1, PRKDC, TNFSF11, INHA, PLA2G3 and TUBB1 mRNA expression levels were significantly different in LUAD cell lines and human normal lung epithelial cells.

Discussion

LUAD is the most common malignant tumor and a major cause of cancer-related deaths worldwide27. Due to sequencing technologies, molecular targeted therapy and immunotherapy, significant progresses have been achieved in treating LUAD, but LUAD prognosis remains poor28. Thus, identifying reliable prognostic biomarkers will help predict patient survival and guide precision medical intervention.

Myeloid cells constitute the primary component of the immunosuppressive tumor microenvironment29,30. Particularly, the existence of tumor-infiltrating myeloid cells in LUAD tumors has been confirmed to be associated with LUAD patient prognosis, implying that myeloid cell differentiation—related genes could be sever as prognostic indicators for LUAD patient survival31,32. Increasing studies indicated that targeting myeloid cells in TME might be an emerging and promising treatment for immunotherapy, reshaping the immunosuppressive TME and enhancing the efficacy of tumor immunotherapy33. In this research, we explored the prognostic value of risk model based on myeloid cell differentiation—related pathway genes in LUAD, in place of directly detecting the presence of myeloid cells in LUAD tissue. Given the crucial role that myeloid cells play in cellular immune surveillance, it is necessary to identify potential biomarkers based on myeloid cell differentiation—related pathway genes to stratify patients for the personalized therapy.

In the present study, we developed a six-gene myeloid cell differentiation—related gene signature that was comprised of F2RL1, PRKDC, TNFSF11, INHA, PLA2G3 and TUBB1 in TCGA-LUAD cohort via backward stepwise regression model, which could efficiently and independently predict the LUAD patients’ prognosis and therapy response. Furthermore, we validated this signature in an independent LUAD cohort from the GEO database.

Based on the expression of the six genes, the LUAD patients were divided into high- and low-risk subsets. The prognosis of the low-risk subsets was significantly better than that of the high-risk subsets. Univariate and multivariate cox regression analysis confirmed that myeloid cell differentiation -related gene signature was an independent risk factor for LUAD. Moreover, when we further assessed the differences in subtypes of immune cells, we found that low-risk patients based on the gene signature was associated with more immune cells, namely Activated B cell, Activated CD4 T cell, T follicular helper cell, Eosinophil, Mast cell, Monocyte and Natural killer cell, showing benefits on tumors’ prognosis34.

Subsequently, to further evaluate the performance of the constructed model, we constructed a nomogram by integrating clinical characteristics, providing clinicians with a personalized scoring system. The calibration curves confirmed that the nomogram predicted 1-, 3-, and 5-year survival probabilities were in accordance with observed survival results. Moreover, the analysis of immune genes and m6A genes of LUAD patients in the two risk subsets indicated that the risk score was significantly associated with immune checkpoint molecules and m6A genes.

We also found that JNK inhibitor VIII, cisplatin, paclitaxel, cyclopamine, Rapamycin and Erlotinib showed better responses in the treatment of the high-risk score LUAD patients, indicating that gene signature was a powerful predictive tool for immune therapy, drug sensitivity and could promote personalized precision medical treatment in the future.

Among the six genes in the risk signature, F2R-like Trypsin Receptor 1 (F2RL1), also known as PAR2, as a G-protein-coupled receptor, played an oncogenic role in a wide variety of cellular processes and can be activated by serine proteases35,36. In lung cancer, F2RL1 can promote tumor proliferation and angiogenesis by promoting EGFR phosphorylation and upregulated VEGFA expression37. Protein kinase, DNA-activated, catalytic polypeptide (PRKDC), is a component of the non‐homologous end joining pathway involved in the DNA double‐strand break (DSB) repair response38. Several studies have shown that PRKDC plays an important role in regulating the motility, migration, and development of lung cancer38,39. Also, Tan et al. demonstrated PRKDC mutations can be potential biomarkers to guide LUAD patient selection for checkpoint blockade immunotherapy40. TNFSF11, also known as RANKL, promotes osteoclastogenesis and facilitates bone metastasis of lung cancer cell41. The RANKL- mediated pathway is also an important immune mediator in lung cancer, and anti-RANKL therapy has been linked to improved responses to immunotherapy42. As for INHA, was found to be remarkably overexpressed in lung cancer tissue compared with normal tissue. Also, high INHA expression associated with shorter and a higher pathological stage43. Group III sPLA2 (PLA2G3), regulates antigen- and IgE-dependent mast cell degranulation and can induce mast cell maturation44. Also, PLA2G3 is a possible target to adjust neuronal damage induced by free radicals45. However, as far as I know, the regulation function of PLA2G3 in lung cancer has not been reported. The last TUBB1, encodes the tubulin b-1 chain, has been shown to be particularly expressed in platelets and megakaryocytes, and TUBB1 dysfunction will cause genome instability. Also, at the biological level in lung cancer progression is, as yet, unknown.

Unlike the previous risk model in LUAD, our research was the first myeloid cell differentiation—associated gene signature, which was thought to be highly associated with the immune cell infiltration in LUAD. It can predict which LUAD patients are more prone to chemotherapy agents and immune checkpoint inhibitors.

Of course, our study has several limitations. Firstly, the construction and validation of myeloid cell differentiation -related gene signature is based on data from public databases. Its robustness should be further enhanced and validated by multicenter clinical research in the future. Secondly, more experiments are needed to verify the individual or combined roles of the six genes involved in gene signature in LUAD. Additional investigations, both in vitro and in vivo, are required to explore these interactions fully.

Conclusions

To sum up, myeloid cell differentiation—related gene signature is an efficient and independent prognostic model for assessing the prognosis of LUAD patients. We concluded the risk model was significantly associated with immune cells infiltration in LUAD, providing new sights of the myeloid cell differentiation -related genes in anti-tumor immunity and can be used to guide the therapy selection for LUAD patients in the foreseeable future.