1 Introduction

Prostate cancer (PCa) is one of the most common malignant tumors in men and the global burden of this disease is rising. It is estimated that the number of new cases of PCa will reach approximately 1.7 million worldwide in 2030 [1]. Epidemiologic studies have shown that age, family history, and genetic susceptibility are significant risk factors of PCa [2]. The incidence of PCa is highest among men of African descent, followed by men of European and Asian ancestry, underscoring the important role of genetics in the development of PCa [3]. Lifestyle modifications, such as smoking cessation, exercise and weight control, may reduce the risk of PCa, but few modifiable risk behavioral factors have been identified [4].

Proteins are important molecules that perform biological functions. Plasma proteins secreted by tissues or organs can serve as predictive targets for many diseases or symptoms in humans [5]. Previous epidemiologic studies have identified a number of circulating proteins that are associated with cancer risk [6,7,8]. However, most of these studies have been based on observational designs or have focused on only a few proteins. Currently, Mendelian randomization (MR) analysis has emerged as a novel tool for re-evaluating drugs and discovering new therapeutic targets [9]. With large-scale proteomics studies, more than 18,000 protein quantitative trait loci (pQTL) for more than 4,000 proteins have been identified [10]. This has provided a valuable source of data for exploring associations between circulating proteins and disease using MR. MR uses these pQTLs as instrumental variables to explore potential causal relationships between exposures and outcomes and to screen for biomarkers that can be used as drug targets [11]. Compared with observational studies, MR helps to mitigate the effects of confounding factors, thereby enhancing the reliability and accuracy of causality assessment [12]. In addition, with the application of phenotype-wide association studies (PheWAS), the prediction of target-associated adverse effects can be widely investigated [13].

Plasma proteins play crucial roles in a variety of cellular biological processes such as signaling, transport, growth, repair and immune defense. Dysregulation of plasma proteins is strongly associated with many diseases and is an important source of targets for drug development [14]. Therefore, we performed a MR analysis of plasma proteomics to explore its causal association with PCa.

2 Methods

2.1 Study design

The flowchart of study design is shown in Fig. 1. We first performed a two-sample MR analysis using 4907 plasma proteins from the deCODE genetics dataset with the PCa GWAS dataset from the PRACTICAL consortium to obtain 143 positive plasma proteins that were causally associated with PCa. Similar analysis of the UK Biobank GWAS dataset for PCa yielded 126 positive plasma proteins. Thirteen common plasma proteins were obtained by taking intersections. These 13 proteins were then analyzed for colocalization and drug target summary-statistics-based Mendelian Randomization (SMR). Subsequently, we performed downstream PheWAS study of three plasma proteins to identify possible side effects of target drugs. Finally, a systematic MR analysis was performed on 26 lifestyle-related factors with 13 plasma proteins to determine which lifestyle factors might serve as upstream intervening factors for the target proteins.

Fig. 1
figure 1

Flowchart of study design

2.2 Study population and data sources

Summary statistics of genetic associations for plasma proteins were obtained from a large-scale proteomics study (deCODE genetics) that included 35,559 Icelandic participants measured using the SOMAscan, which provides comprehensive pQTLs localization for 4907 plasma proteins.10 Summary-level data of PCa and lifestyle-related factors were obtained from publicly available datasets of genome-wide association studies (GWASs). Details for related GWASs are described in Supplementary Table 1). All participants were of European ancestry and provided informed consent. Considering that our study was based on publicly available summary statistics, no further ethical review was required.

2.3 Statistical analysis

2.3.1 Proteome-wide Mendelian randomization analysis

We performed two-sample MR analyses with plasma proteins as the exposure and PCa as the outcome. The screening criteria for pQTLs as instrumental variables were as follows: First, SNPs (cis-pQTLs) within ± 1 mb around the gene region. Second, the genome-wide significance threshold for SNPs highly associated with plasma proteins was P < 5 × 10−8. Finally, in order to alleviate the effect of linkage disequilibrium (LD) on the results, we set a threshold of 0.001 for the linkage disequilibrium parameter (r2) and a genetic distance of 10,000 kb. The strength of instrumental variables was measured using the F-statistic, with an F-statistic of less than 10 considered a weak instrumental variable [15]. Detailed information on the instrumental variables is provided in Supplementary Table 2.

Proteome-wide MR analysis was performed using the R package “TwoSampleMR”. When only one SNP was available for a specific protein, we applied the Wald ratio method. If two or more SNPs were available, we applied the inverse variance weighting (IVW) method. A P value below 0.05 was considered statistically significant.

2.3.2 Colocalization analysis

We performed Bayesian colocalization analysis using the coloc package to detect SNPs within ± 1 MB around the gene region (cis-pQTLs) with positive MR results, and further analyzed whether the identified plasma proteins shared common causal genetic variants with PCa [16]. The colocalization analysis involves five hypotheses: H0 indicates that the selected SNP within the locus is unrelated to both protein A and disease B; H1 suggests that the SNP within the selected locus is associated with protein A but not with disease B; H2 implies that the SNP within the selected locus is related to disease B but not to protein A; H3 states that the SNP within the selected locus is associated with either protein A or disease B, but the two are independent SNPs; H4 signifies that the SNP within the selected locus is concurrently associated with both protein A and disease B, and is a shared SNP. In view of the tissue-specificity of the blood and prostate, we performed the analysis in blood (eQTLgen and GTEx V8 Blood) and prostate tissue (GTEx V8 Prostate) for colocalization, respectively.

2.3.3 SMR analysis

We applied the SMR approach to test for associations between PCa and expression levels of plasma protein-coding genes using GWAS data of PCa from PRACTICAL and pooled data from eQTL, and subsequently validated this using the same methodology on data from UK Biobank. eQTL pooled data includes blood and prostate data from eQTLgen and GTEx. eQTLgen summary statistics include blood gene expression genetic data from 31,684 individuals of 37 datasets [17]. GTEx summary statistics include gene expression genetic data from 670 blood samples and 221 prostate tissue samples [18]. A Psmr value below 0.05 was considered statistically significant.

2.3.4 Phenome-wide association study

PheWAS, a method used to explore the relationship between SNPs or specific phenotypes and a broad range of phenotypes, is particularly valuable for investigating potential side effects associated with drug targets [19, 20]. PheWAS can identify diseases or traits that are positively or negatively correlated with the target protein, which can be used as indicators of potential side effects or complications when developing agonists or inhibitors on this target for appropriate clinical observation. We chose three plasma proteins (ZG16B, PEX14 and NAPG) that had a Psmr < 0.05 in the SMR analyses as exposures, obtained data of 2803 phenotypes from the FinnGen database. Screening criteria for instrumental variables remained consistent with those described previously. A P value below 0.05 was considered statistically significant.

2.3.5 Modulation of PCa-related plasma proteins by lifestyle-related factors

A total of 26 lifestyle-related factors (Supplementary Table 1) were used to assess associations with PCa-related proteins. The analytical methods for the MR were consistent with the description provided in the proteome-wide MR analysis. All statistical analyses were performed in R Software 4.1.0.

3 Results

3.1 Proteome-wide MR identified 13 PCa-related plasma proteins

With strict adherence to the instrumental variable screening criteria of this study, 1605 (PRACTICAL) and 1603 (UK Biobank) plasma proteins were included in the MR analysis, respectively. MR analysis results based on IVW or Wald ratio method showed that 143 plasma proteins (PRACTICAL) and 126 plasma proteins (UK Biobank) were associated with PCa (Fig. 2). We took the intersection of the two lists of positive proteins and ended up with 13 common plasma proteins. 9 proteins were positively associated with PCa (EFNA3, LIMA1, HDGF, PEX14, DLL4, NAPG, ZG16B, TPST1, and NFASC) and 4 proteins were negatively associated with PCa (SERPINA5, RBP7. CCL27, and LAYN). Detailed results of MR analysis are shown in Fig. 3 and Supplementary Tables 3, 4. Detailed results of heterogeneity and horizontal pleiotropy analyses are shown in Supplementary Tables 5–8.

Fig. 2
figure 2

Volcano plot of MR results in PRACTICAL (A) and UK Biobank (B): Causal relationship between plasma proteins and prostate cancer. OR: odds ratio

Fig. 3
figure 3

A. Venn plot of 13 common plasma proteins in PRACTICAL and UK Biobank. B, C. Forest plot of the MR results: Effects of 13 plasma proteins on prostate cancer in PRACTICAL and UK Biobank. OR: odds ratio

3.2 Results of colocalization analysis

We performed colocalization analysis of the genes encoding each of the 13 proteins obtained in the previous step in blood and prostate tissues, respectively. The results showed that PEX14 (PPH3 + PPH4 = 0.99/0.98, eQTLgen/GTEx Blood), RBP7 (PPH3 + PPH4 = 0.96/0.99, eQTLgen/GTEx Blood) and NFASC (PPH3 + PPH4 = 0.98, GTEx Prostate) were respectively Higher likelihood of sharing genetic variants with PCa in blood and prostate tissue (PPH3 + PPH4 > 0.8). Detailed results of colocalization analysis are shown in Fig. 4 and Supplementary Table 9.

Fig. 4
figure 4

Heatmap of colocalization analysis of 13 plasma proteins and prostate cancer in eQTLgen, GTEx Blood and Prostate. H3: SNP within the selected locus is associated with either protein or disease, but the two are independent SNPs; H4: SNP within the selected locus is concurrently associated with both protein and disease, and is a shared SNP

3.3 Results of drug target SMR analysis

We performed SMR analysis of the expression levels of genes encoding plasma proteins in blood and prostate tissues with GWAS data (PRACTICAL) for PCa and validated them against GWAS data from UK Biobank. The results showed that the expression levels of ZG16B and PEX14 in blood were positively correlated with PCa. The expression levels of ZG16B and NAPG in prostate tissues were positively correlated with PCa. Detailed results of SMR analysis are shown in Fig. 5A–F and Supplementary Table 10.

Fig. 5
figure 5

SMR analysis of the expression levels of genes encoding plasma proteins in blood tissue (eQTLgen) with prostate cancer in two GWAS datasets. A–C ZG16B, PEX14 and NAPG (PRACTICAL). D–F. ZG16B, PEX14 and NAPG (UK Biobank). G Dotplot of result of PheWAS analysis of associations between ZG16B, PEX14 and NAPG and other disease outcomes.

3.4 Phenome-wide association study of three plasma proteins associated with PCa

We applied Phenome-wide association study to assess the potential impact of the three plasma proteins associated with prostate cancer in the SMR analysis on other phenotypes. Among 2803 phenotypes screened from the Finnish database, we observed a significant causal association (P < 0.05) between plasma ZG16B and 78 phenotypes, in which ZG16B level were positively associated with the risk of diseases such as corneal degeneration, thyroid disorders, and chronic bronchitis, and negatively associated with the risk of disorders of mineral metabolism, cerebrovascular disorders, and intervertebral disc infections. There was a significant causal relationship (P < 0.05) between plasma PEX14 and 88 phenotypes, in which PEX14 levels were positively associated with the risk of diseases such as disorders of porphyrin and bilirubin metabolism, liver disease, and esophagitis, and negatively associated with the risk of diseases such as benign tumors of the rectum or anal canal, autism, and urticaria nodosa. There was a significant causal relationship (P < 0.05) between plasma NAPG and 121 phenotypes, in which NAPG levels were positively associated with the risk of diseases such as personality or behavioral disorders, vascular occlusive disease, and pulmonary cysts, whereas they were negatively associated with the risk of diseases such as parathyroid disorders, liver disease, and ulcerative colitis. Detailed data are shown in Fig. 5G and Supplementary Table 11.

3.5 Effects of 26 life-related factors on PCa-related proteins

Among 26 common lifestyle-related factors we selected, the frequency of alcohol intake was positively correlated with TPST1 and SERPINA5; BMI was positively correlated with RBP7, TPST1, NFASC, LAYN, HDGF, SERPINA5, DLL4, LIMA1, and CCL27; body weight was positively correlated with RBP7, LAYN, TPST1, HDGF SERPINA5, DLL4, NFASC, and LIMA1; hip circumference was positively associated with RBP7, LAYN, and HDGF; Cannabis use was negatively associated with SERPINA5; fat intake was negatively associated with EFNA3; hypertension was negatively associated with EFNA3; aspirin use was positively associated with LAYN and NFASC; and type 2 diabetes was negatively associated with EFNA3 (Pfdr < 0.05). Detailed results are shown in Fig. 6 and Supplementary Table 12.

Fig. 6
figure 6

Heatmap showing results from Mendelian randomization between lifestyle factors and 13 proteins associated with prostate cancer

4 Discussion

PCa is the second most common cause of cancer-related deaths in men worldwide [21]. A deficiency in the lack of specificity of serum prostate-specific antigen, the most widely used test, has been shown to lead to over-diagnosis and over-treatment of PCa [22]. Prostate gland is located deep in the pelvis, which makes it difficult to have a direct contact with other organs. The circulatory system is the primary object that comes into contact with and potentially affects the prostate, and blood samples are relatively easy to obtain and less invasive to the patient. Therefore, investigating the association between plasma proteins and PCa and discovering protein targets in the blood has positive and practical clinical implications for the prediction and treatment of PCa.

In this study, the plasma proteins positively associated with PCa were EFNA3, LIMA1, HDGF, PEX14, DLL4, NAPG, ZG16B, TPST1 and NFASC. EFNA3 (Ephrin-A3) belongs to the family of membrane-bound ligands of hepatic collagens. Many members of the hepatic collagen family are aberrantly expressed in cancer cells, often predicting more aggressive tumors, higher likelihood of metastasis, and poorer prognosis [23]. Overexpression of EFNA3 accelerates self-renewal of hepatocellular carcinoma cells, and enhances proliferation and migration [24]. Gastric cancer patients with high expression of EFNA3 have a worse overall and disease-free survival, the mechanisms of which may involve immune cell infiltration and immune checkpoint regulation [25]. LIMA1 (LIM Domain And Actin Binding 1) plays an important role in regulating actin cytoskeleton dynamics, and its defective expression may lead to dysregulation of cytoskeleton dynamics, altered cell motility, and inter-cellular adhesion breaks, thus promoting tumor proliferation, invasion, and migration [26]. LIMA1 up-regulation leads to prostate cancer cell invasion and reduced extracellular matrix adhesion [27, 28]. These studies emphasize the importance of endogenous LIMA1 in regulating PCa cells growth and invasiveness. However, tumor cells still require sufficient exogenous LIMA1 protein to shape their own cytoskeleton. In addition, whether plasma LIMA1 protein inhibits LIMA1 expression levels in tumor cells remains to be further investigated. HDGF (Hepatoma Derived Growth Factor) up-regulation is present in many types of tumors and positively correlates with clinicopathological features [29]. Mechanisms by which HDGF regulates tumor progression include PI3K/AKT and ERK signaling pathway activation29, epithelial-mesenchymal transition promotion [30] and vascular endothelial growth factor induction [31]. HDGF plays an important role in prostate cancer cell growth, apoptosis, and invasion, which may be mediated by the AKT and NF-κB pathways [32]. PEX14 (Peroxisomal Biogenesis Factor 14), a cellular biosynthesis factor involved in cellular redox homeostasis and lipid metabolism, is a peroxisome [33]. Its aberrant regulation causes tumor angiogenesis and extracellular matrix degradation [34]. PEX14 is up-regulation in aggressive PCa with loss-of-function mutations in TP53, suggesting the importance of peroxisomes in advanced PCa [35]. The results of colocalization and SMR analyses in our study support that PEX14 expression levels are positively correlated with PCa. According to the results of PheWAS analysis, the use of PEX14 inhibitors may be beneficial in ameliorating disorders of porphyrin and bilirubin metabolism, tear overflow, and liver-related disorders, but may increase the risk of diseases such as esophagitis and benign tumors of the rectum and anal canal. DLL4 (Delta Like Canonical Notch Ligand 4) plays an important role in tumor angiogenesis [36]. Knockdown of DLL4 impedes Notch-1 signaling pathway activation and inhibits the self-renewal and invasive ability of gastric cancer stem cells as well as resistance to 5-FU chemotherapy [37]. NAPG (NSF Attachment Protein Gamma) encodes a gamma-soluble NSF attachment protein that mediates platelet cytokinesis and controls membrane fusion events. Mutations in NAPG may lead to insufficient platelet spreading, triggering autosomal dominant angiodysplasia syndrome [38]. Our study found that high expression of NAPG in prostate tissues increases the risk of PCa, but there is no other relevant literature to support this view. Interestingly, a paper describing genetic variation and adaptive phenotypes in high-altitude populations triggered our thinking. NAPG may be involved in oxygen sensing, metabolism, and vascular homeostasis, and is associated with physiological adaptations in low-pressure hypoxic environments [39]. In PheWAS analysis, plasma NAPG was positively associated with vascular occlusion, HER2-negative breast cancer, and nonorganic psychiatric disorders, and negatively associated with hyperparathyroidism, colitis, and retinal chorioretinitis. The biological function of NAPG in PCa remains to be further explored. ZG16B (Zymogen granule protein 16B) as a growth factor overexpressed in pancreatic cancer, inhibits NF-κB signaling thereby promoting tumor proliferation and enabling immune escape [40]. ZG16B also enhances angiogenesis through activation of CXCR4 and FAK, increases vascular permeability, and facilitates pancreatic cancer progression and metastasis [41]. In colorectal cancer, up-regulation of ZG16B, through modulation of the Wnt/β-catenin pathway enhances immunosuppressive activity in the tumor microenvironment and enhances tumor cell migration and invasion, leading to a poor prognosis [42]. In addition, ZG16B is similarly highly expressed in cervical, oral squamous cell, and ovarian cancers [43,44,45], which is consistent with our findings in PCa. Of interest, ZG16B upregulation seems to be associated with a better prognosis in PCa patients [46]. This phenomenon suggests that the biological mechanisms of ZG16B may change with the progression of PCa. The results of the PheWAS analysis suggest that the use of ZG16B inhibitors may ameliorate diseases such as corneal degeneration, chronic bronchitis, and thyrotoxicosis, but may increase risk of neurologic abnormalities, mineral metabolism disorders and cerebrovascular disease. Therefore, the development of drugs targeting ZG16B still requires more comprehensive and careful evaluation. TPST1 (Tyrosylprotein Sulfotransferase 1) is overexpressed in breast cancer [47], oral squamous cell carcinoma [48] and nasopharyngeal carcinoma [49]. In bladder cancer, elevated expression of TPST1 is associated with high pathologic stage and poor survival, and correlates with the immune profile and responsiveness to immunotherapy in bladder cancer. characteristics and responsiveness to immunotherapy [50]. NFASC (Neurofascin), a member of the immunoglobulin superfamily of adhesion molecules, may play a role in the regulation of the cytoskeleton and migration of cancer cells [51]. NFASC and its potential regulatory variants suggest that long-range chromatin interactions may be an etiological factor in PCa [52].

The plasma proteins that were negatively associated with prostate cancer risk in this study were SERPINA5, RBP7, CCL27, and LAYN. SERPINA5 (Serpin Family A Member 5), a serine protease inhibitor, belongs to the serine protease inhibitor superfamily. Previous reports have shown that SERPINA5 is lowly expressed in a range of cancers, including renal, breast, thyroid, colorectal, prostate, and ovarian cancers [53,54,55,56,57,58]. SERPINA5 exerts its anticancer properties through anti-angiogenesis and inhibits tumor metastasis [59]. RBP7 (Retinol Binding Protein 7), a member of the cellular retinol-binding protein family, is involved in immune regulation and can be used to predict the prognosis of patients with uroepithelial carcinoma of the bladder [60]. High expression of RBP7 correlates with tumor invasion and epithelial mesenchymal transition, which predicts a poor prognosis for patients with colorectal carcinoma [61]. CCL27 (C-C Motif Chemokine Ligand 27), which is mainly produced by keratinocytes, is critical in directing immune cells to epithelial and mucosal tissues. Down-regulation of CCL27 in melanoma, basal cell carcinoma, colon cancer, and breast cancer allows tumor cells to achieve immune escape, which in turn leads to tumor progression [62]. LAYN (Layilin), which is located on chromosome 11 and shares homology with C-type lectins, also acts as a hyaluronic acid surface receptor, which plays an important role in cell adhesion, motility, and migration [63]. In most tumors, LAYN up-regulation is associated with poor prognosis, but LAYN expression levels are low in PCa. This phenomenon reflects the differential expression of LAYN among tumors as well as different biological functions [64]. A similar situation can be seen in melanoma, where LAYN is highly expressed on CD8T cells, which promotes integrin-mediated cell adhesion and thus enhances anti-tumor immunity [65].

The strengths of the current study are as follows. First, this is a systematic and extensive study exploring the causal relationship between plasma proteins and PCa, which helps to provide a comprehensive perspective to understand the etiologic role of circulating proteins in PCa. Second, this study employed a variety of analytical methods based on MR and validated them in multiple datasets of different dimensions to identify the association between plasma proteins and PCa in blood and prostate tissues respectively, to improve the accuracy and robustness of the analysis. Third, we performed an extensive evaluation of potential downstream associated phenotypes and upstream intervening factors for PCa-related plasma proteins. However, this study also has limitations. First, all GWAS participants are from Europe, which can greatly avoid the interference of geographical differences, but also limits the general applicability of the analysis results. Second, the expression levels or predictive values of the target genes we identified may fluctuate in different types of PCa or different stages of PCa progression. Therefore, future studies are best conducted in specific types or stages of prostate cancer, and experimental studies are needed to confirm these findings.

5 Conclusion

In conclusion, by applying an MR approach to a broad range of plasma proteins we identified causal associations between 13 plasma proteins and PCa, and prioritized the potential intervention protein targets by drugs or lifestyle changes, which provided new insights into the etiology, prevention and treatment of PCa.