Introduction

Chronic kidney disease (CKD) is a condition that is widespread worldwide and has a substantial impact on both illness and death rates. Various complications, including cardiovascular disease and kidney failure, are linked to the condition. As a result, CKD imposes significant caregiving strains upon individuals, families at large1,2. A notable indicator of CKD is the reduction in glomerular filtration rate3. The estimated effect on eGFRcrea is of specific worry4. In most instances, the underlying causes of CKD remain unclear, and its intricate biological etiology poses limitations for the creation of novel medications5. Consistently, proteins that circulate and act as crucial controllers of molecular pathways are widely recognized as primary objectives for pharmacological interventions. Prior research has indicated associations between CKD and specific agents such as SGLT-2i, GLP-1 and Urinary CKD-2736,7. Nevertheless, these associations are susceptible to confounding factors or reverse causality. While randomized controlled trials are widely recognized for studying causality, conducting such trials to explore the causal relationship between thousands of proteins and CKD is often cost-prohibitive and impractical.

Utilizing random allocation of alleles, Mendelian randomization (MR) studies offer a reliable method to reduce the impact of confounding biases. Moreover, MR studies are valuable in circumventing reverse causality issues that are often encountered in other types of observational studies8,9. In a prior investigation, over 1.2 million individuals were analyzed to explore genetic variations and genes linked to kidney function10. Genetic studies have offered opportunities to look at circulating protein levels in CKD from a comprehensive perspective. In this study, we conducted MR proteomic analysis along with co-localization techniques, aimed at investigating candidate therapeutic targets for CKD and their defining factors, particularly eGFRcrea and eGFR.

Methods

Study design

Refer to Fig. 1 for a visual representation of the study design. The present study adhered to the Strengthening the Reporting of Observational Studies in Epidemiology-Mendelian Randomization (STROBE-MR) guidelines (see Supplementary Table 1)11. All GWAS and protein quantitative trait loci (pQTL) data utilized were obtained from publicly accessible databases that have received the necessary ethical approvals. The research relies on data that is accessible to the public from a large-scale genome-wide association study on blood proteome (https://www.decode.com/summarydata/)12, plasma proteome analysis of European descent plasma(nilanjanchatterjeelab.org/pwas/)13 and GWAS Catalog (GWAS Catalog (ebi.ac.uk))14,15,16 (Supplementary Table 2). Our research methodology adopts a two-phase strategy, commencing with the ARIC study as the primary discovery phase. This phase is dedicated to investigating associations between CKD and a comprehensive range of circulating proteins, leveraging data on 4657 proteins obtained from this cohort. Subsequent validation of our findings is conducted in the replication phase, utilizing proteomic data from the Icelandic study, which encompasses 4,907 proteins. This systematic approach ensures the robustness and reproducibility of our conclusions.

Fig. 1
figure 1

Study design. FDR, false discovery rate; MR, Mendelian Randomization.

Data source for proteomics

The circulating proteins associated with genetic markers were aggregated from ARIC studies13. These proteins exhibit associations with common variations in the cis-region. Compared to trans linking, cis-linking demonstrates higher reproducibility across diverse proteome platforms. Another set of summary-level statistics on circulating protein was derived from Ferkingstad E's study12. The use of genome-wide association studies (GWASs) focusing on circulating protein levels allows for the identification of sequence determinants, known as protein quantitative trait loci (pQTLs), associated with protein levels17,18,19,20. Due to their colocalization with disease variants, pQTLs can play a crucial role as informative markers for directing the exploration of causal genes and disease pathways21,22.

Data source for outcome data

Data summaries for CKD, eGFRcrea and eGFR were meticulously curated from the GWAS Catalog. The CKD dataset encompassed 625,219 European ancestry, defined by an eGFR lower than 60 ml/min^−1 per 1.73 m^2. Notably, Wuttke M’s analysis on eGFRcrea, utilizing the Renal Disease Study equation, included 1,004,040 individuals of European descent. Conversely, the eGFR dataset comprised 567,460 individuals, also of European ancestry. It is imperative to clarify that the CKD dataset served as the primary outcome for our analysis, while the eGFRcrea and eGFR datasets were utilized as secondary outcomes.

MR analysis

In our Mendelian randomization (MR) analysis, it is essential to clarify that we utilized the random allocation of alleles, emphasizing the broader impact of genetic variation rather than focusing solely on the distribution of specific single nucleotide polymorphisms (SNPs). This distinction underscores our methodological approach, which aligns more closely with the principles of Mendelian Randomization, utilizing genetic variants as instrumental variables to infer causal relationships. An effective instrumental variable is grounded in three key assumptions: The instrumental variable (such as genetic variation) must be significantly associated with the exposure. It should be independent of confounding factors, meaning it is not correlated with potential confounders unrelated to the exposure, thereby avoiding interference in causal inference. Additionally, the instrumental variable should affect the outcome only through the exposure, without directly influencing the outcome by any other means23.

To elucidate the connections between circulating proteins and the susceptibility to CKD and its respective indicators, we conducted a two-sample MR analysis using index SNPs for proteins. In this study, we employed a targeted approach by selecting a genomic region of 100 kilobases around the start and end sites of the target genes from the database. Within this region, we identified SNPs that exhibited significant associations (P < 1 × 10−5, linkage disequilibrium [LD] r2 < 0.01) with CKD and its respective indicators. To quantify the associations between the identified proteins and the outcomes, we employed the odds ratios (ORs) along with their corresponding confidence intervals (CIs) using either the Wald ratio or inverse-variance weighted method. In the “MR analysis” section, we performed a false discovery rate (FDR) test to avoid false positive results and ensured the robustness and validity of our instrumental variables by calculating the F-statistic, a crucial metric for assessing instrument quality.

To measure the intensity of each plasma protein, the F-statistic is calculated using the following formula: \({\text{F}} = {\text{R}}_{{{\text{combined}}}}^{2} {\text{ (N}} - {\text{k}} - {\text{1)/k (1}} - {\text{R}}_{{{\text{combined}}}}^{2} {\text{)}}\), where \({\text{R}}_{{{\text{combined}}}}^{2}\) represents the sum of the proportions of plasma protein variability explained by each instrument, N denotes the sample size of the GWAS for SNP-plasma protein associations, and k indicates the number of instrumental variables. The formula for calculating R2 is as follows: R2 = 2 × β2 × MAF ×(1 − MAF), where MAF is the minor allele frequency and β is the estimated genetic effect on the plasma protein. The formula for calculating \({\text{R}}_{{{\text{combined}}}}^{2}\) is as follows: \({\text{R}}_{{{\text{combined}}}}^{{\text{2}}} {\text{ = }}\sum\nolimits_{{{\text{i = 1}}}}^{{\text{k}}} {{\text{R}}_{{\text{i}}}^{2} }\). This statistic serves as a key indicator of instrumental variable strength, with higher values denoting stronger instruments and thereby enhancing the reliability of our MR estimates. An F-statistic exceeding 10 is typically regarded as indicative of a robust instrument, thereby mitigating the risk of weak instrument bias in our MR analysis.

Colocalization analysis

We utilized the coloc R package to perform colocalization analysis in order to examine if the observed connections between proteins and CKD as well as its indicators were affected by linkage disequilibrium24. The Bayesian method appraised evidence for five distinct hypotheses concerning each locus: (1) no connection with either trait, (2) exclusive connection with trait 1, (3) a unique link to trait 2, (4) both traits display links, but each has its own unique causal variants, and (5) both traits share a common causal variant, and they are interconnected25. Posterior probabilities were obtained for every hypothesis (H0, H1, H2, H3, and H4) as a result of the analysis. We assigned initial probabilities for specific situations in the following manner: a SNP linked solely to trait 1 (p1) with a probability of 1 × 10−4, a SNP linked solely to trait 2 (p2) with a probability of 1 × 10−4, and a SNP associated with both traits (p12) with a probability of 1 × 10−526.In this study, we adopted a specific approach for selecting SNPs located 250 kilobases above and below the target gene in the Iceland dataset and ARIC study.

Protein-CKD associations with FDR-corrected P value < 0.05 in MR were subsequently classified into three groups. Proteins that exhibit statistically significant p-values in both the discovery and validation cohorts, and have high colocalization support evidence (PH4 > 0.8), are considered tier 1 targets. Proteins that exhibit statistically significant p-values in both the discovery set and the validation set, coupled with medium colocalization support evidence (0.5 < PH4 < 0.8), are considered tier 2 targets. the remaining proteins were considered Tier 3 targets.

All statistical analyses were implemented by the package TwoSampleMR(version 0.5.6)27, MendelianRandomization package (version 0.5.0)28, coloc R package24 and locuscomparer package (version 1.0.0)29.

Results

Proteome-wide MR analysis

The research examined the connections between 1788 proteins, which had accessible index pQTL signals, and the likelihood of experiencing CKD outcomes using MR analysis. The MR analysis results are outlined in Fig. 2. Through this examination, we pinpointed 155 combinations of protein and CKD that showed slight importance (P < 0.05) in the MR analysis. Nevertheless, following the application of the FDR test to address potential pleiotropy, only 10 of these associations remained significant (as shown in Supplementary Table 3). Following the exclusion of associations that did not satisfy the FDR test criteria, we identified seven circulating proteins, namely Transcription elongation factor A protein 2 (TCEA2), Isopentenyl-diphosphate delta-isomerase 2 (IDI2), Microfibril-associated glycoprotein 4 (MFAP4), Lactosylceramide 4-alpha-galactosyltransferase (A4GALT), Leukocyte immunoglobulin-like receptor subfamily A member 5 (LILRA5), N-acetylglucosamine-1-phosphotransferase subunit gamma (GNPTG), and Neuregulin-4 (NRG4), which were inversely associated with CKD risks. On the other hand, higher levels of Glucokinase regulatory protein (GCKR) and Rho GTPase-activating protein 1 (ARHGAP1) were determined to be a contributing factor to an elevated CKD risk (Fig. 2 and Supplementary Table 3).

Fig. 2
figure 2

Manhattan plots for associations of genetically predicted 1778 circulating proteins levels with CKD. The X-axis represents genetic associations on autosomes, with each point above each chromosome representing a circulating protein. Blue points denote plasma proteins in odd chromosome regions, orange points denote plasma proteins in even chromosome regions, and green points represent plasma proteins confirmed as Tier 1 targets. Parallel to the X-axis, the blue line represents the threshold line of p-values after FDR correction, with points above this line indicating these plasma proteins are associated with the outcome. The red line parallel to the X-axis is the genome-wide significance threshold line (P < 5 × 10−8), with points above this line indicating significant associations of these plasma proteins with the outcome. The Y-axis represents the results of the association analysis, depicted as − log10(p-values), where higher points indicate plasma proteins with more significant associations.

For every uptick of one standard deviation in protein levels, the odds ratio (OR) of CKD was 0.614 (95% CI 0.507–0.743) for TCEA2, 0.642 (95% CI 0.527–0.783) for NRG4, and 1.188 (95% CI 1.087–1.281) for GCKR. The same connections were duplicated in the 35,559 individuals from the Icelandic group, showing consistent effects in the same direction during the analysis of replication (Table 1 and Supplementary Table 4). Likewise, Per SD increment of genetically predicted levels of protein, the OR of CKD was 0.742 (95% CI 0.662–0.831) for TCEA2, 0.341 (95% CI 0.211–0.552) for NRG4, and 1.460 (95% CI 1.210–1.762) for GCKR.

Table 1 Mendelian randomization analysis and colocalization of circulating proteins (cohort of icelanders) with CKD and its indicators.

By eliminating associations that did not meet the criteria of the FDR test, we discovered 125 proteins that showed a significant connection to eGFRcrea in the two-sample MR analysis, with a P-value less than 0.05 (see Fig. 3 and Supplementary Table 5). Consistently, the validation consistently confirmed the effects of TCEA2, NRG4, and GCKR in the same direction. The levels of TCEA2 (OR 1.020, 95% CI 1.013–1.027) and NRG4 (OR 1.055, 95% CI 1.049–1.061) showed a positive relationship with eGFRcrea, whereas GCKR (OR 0.975, 95% CI 0.972–0.977) exhibited an inverse association with the probability of eGFRcrea. Moreover, the associations for TCEA2, NRG4, and GCKR were replicated in the 35,559 individuals from the Icelandic cohort, with consistent directional effects observed in the replication analysis (Table 1 and Supplementary Table 4). For every increase of 1 standard deviation in genetically estimated protein levels, OR for eGFRcrea showed values of 1.012 (95% CI 1.008–1.016) for TCEA2, 1.139 (95% CI 1.124–1.155) for NRG4, and 0.943 (95% CI 0.936–0.949) for GCKR.

Fig. 3
figure 3

Manhattan plots for associations of genetically predicted 1778 circulating proteins levels with eGFRcrea. The X-axis represents genetic associations on autosomes, with each point above each chromosome representing a circulating protein. Blue points denote plasma proteins in odd chromosome regions, orange points denote plasma proteins in even chromosome regions, and green points represent plasma proteins confirmed as Tier 1 targets. Parallel to the X-axis, the blue line represents the threshold line of p-values after FDR correction, with points above this line indicating these plasma proteins are associated with the outcome. The red line parallel to the X-axis is the genome-wide significance threshold line (P < 5 × 10−8), with points above this line indicating significant associations of these plasma proteins with the outcome. The Y-axis represents the results of the association analysis, depicted as − log10(p-values), where higher points indicate plasma proteins with more significant associations.

Regarding eGFR, following FDR adjustment, higher genetically predicted levels of circulating GCKR were linked to a reduction in eGFR, while higher levels of circulating proteins such as TCEA2 and NRG4 were linked to a heightened likelihood of eGFR. The OR for eGFR was 1.024 (95% CI 1.015–1.032) for TCEA2, 1.049 (95% CI 1.041–1.057) for NRG4, and 0.977 (95% CI 0.974–0.981) for GCKR with each one-standard-deviation increase in protein levels (Fig. 4 and Supplementary Table 6). Furthermore, the findings for TCEA2, NRG4, and GCKR were duplicated in the 35,559 participants of the Icelandic group (Table 1 and Supplementary Table 4). In particular, increased levels of TCEA2 (OR = 1.014, 95% CI 1.009–1.019) and NRG4 (OR = 1.123, 95% CI 1.102–1.144) exhibited a connection with a heightened probability of eGFR, while elevated GCKR (OR = 0.949, 95% CI 0.941–0.956) were connected to a decreased likelihood of eGFR.

Fig. 4
figure 4

Manhattan plots for associations of genetically predicted 1775 circulating proteins levels with eGFR. The X-axis represents genetic associations on autosomes, with each point above each chromosome representing a circulating protein. Blue points denote plasma proteins in odd chromosome regions, orange points denote plasma proteins in even chromosome regions, and green points represent plasma proteins confirmed as Tier 1 targets. Parallel to the X-axis, the blue line represents the threshold line of p-values after FDR correction, with points above this line indicating these plasma proteins are associated with the outcome. The red line parallel to the X-axis is the genome-wide significance threshold line (P < 5 × 10−8), with points above this line indicating significant associations of these plasma proteins with the outcome. The Y-axis represents the results of the association analysis, depicted as − log10(p-values), where higher points indicate plasma proteins with more significant associations.

Colocalization analysis

We performed colocalization analyses to explore the connection between circulating proteins and CKD as well as its indicators. Initially, we investigated if the connections between the circulating protein and CKD, as well as its markers, were influenced by common causal variants (Table 2, Supplementary Figs. 19). Notably, we discovered compelling evidence of colocalization between three proteins (TCEA2, NRG4, and GCKR) and CKD, categorizing them as tier 1 targets. Furthermore, three proteins exhibited high support for colocalization with eGFRcrea and eGFR associations, also classified as tier 1 targets. Unfortunately, 7 plasma proteins were classified as Tier 3 targets due to failure to pass the relevant validation, and the number of Tier 2 targets was zero in Tables 1 and 2.

Table 2 Mendelian randomization analysis and colocalization of circulating proteins (ARIC study) with CKD and its indicators.

In an effort to establish the credibility of our results, we performed colocalization analyses on the dataset of 35,559 individuals from Iceland to examine the co-occurrence of circulating proteins. Specifically, we observed strong indication of colocalization between CKD and three proteins: TCEA2, NRG4, and GCKR. Similarly, these three circulating proteins exhibited high support for colocalization with eGFRcrea and eGFR (Table 1, Supplementary Figs. 1018).

Discussion

In order to reveal the possible causal effects of over 4000 circulating proteins on CKD and its indicators, we conducted a thorough analysis of the proteome using MR and colocalization techniques. The aim was to offer worthy preclinical information for the development of drugs. Through the MR analysis, we spotted three proteins that were associated with CKD, and these associations were further confirmed in their respective indicators. Subsequently, by employing colocalization analysis to account for the impacts of linkage disequilibrium, we gathered compelling evidence supporting the inverse association between higher levels of TCEA2 and NRG4 with CKD risks. Furthermore, it was found that elevated circulating GCKR are linked to an increased vulnerability to CKD. In addition, it was noted that increased levels of TCEA2 and NRG4 showed a positive correlation with eGFRcrea and eGFR. We also observed that, after FDR adjustment, higher genetically predicted circulating levels of IDI2, MFAP4, A4GALT, LILRA5, and GNPTG were tied to decreased CKD risk, whereas elevated levels of ARHGAP1 were associated with heightened CKD risk. Regrettably, due to not passing the detection in subsequent validation, we did not classify them as Tier 1 targets.

In our study, we discovered strong evidence supporting the potential targeting of TCEA2, GCKR, and NRG4 for CKD. GCKR, a member of the Sugar ISomerase (S IS) protein family, encodes a protein with regulatory function that suppresses glucokinase activity. This inhibition is achieved by forming a non-covalent complex with the enzyme, rendering it inactive. Previous research has indicated an association between the GCKR gene and CKD, as well as eGFRcrea and eGFR14,30,31,32. Notably, evidence suggests that GCKR is a potential genetic predisposition site for Non-alcoholic fatty liver disease (NAFLD)33 and individuals with elevated fibrosis levels in NAFLD are at an Intensified likelihood of developing CKD34. Furthermore, by concentrating our examination on individuals with European descent, we reduced possible prejudices arising from population stratification, thereby enhancing the credibility and applicability of our findings.

NRG4 belongs to the neuregulin family, recognized for its ability to trigger the receptors for growth factor type-1. Previous research has provided evidence that NRG4 attenuates tubulointerstitial fibrosis through the regulation of TNF-R1 signaling35. Significantly, the administration of exogenous NRG4 during colitis has been documented to decrease the population of colon macrophages and ameliorate inflammation36. The disrupted gut-kidney axis establishes a detrimental cycle that ultimately accelerates the advancement of CKD37. The findings regarding NRG4’s effects on CKD in this study align with previous research, thereby reinforcing the accuracy of the results. TCEA2 encodes a protein predominantly localized in the nucleus, serving as an SII class transcription elongation factor. Previous research has highlighted varied expression levels of TCEA2 in the kidney38,39. While the associations of NRG4 and GCKR with CKD have been extensively investigated, the relationship between TCEA2 and CKD remains relatively understudied. However, considering the consistent findings of GCKR and NRG4 effects on CKD in this study and previous research, the potential effects of TCEA2 on CKD hold significant reference value.

Our study possesses several notable advantages. Firstly, by employing a MR design, we were able to emulate a randomized controlled trial. Randomized control features are highly recognized for establishing causality; however, they are often expensive and impractical to implement. In contrast, MR studies effectively mitigate confounding biases by leveraging the random allocation of SNPs during conception. Furthermore, MR studies offer the advantage of addressing reverse causation, which is a common concern in other observational studies. Secondly, we utilized colocalization analysis, a powerful tool for elucidating pleiotropic effects of specific loci on multiple traits. By employing this analysis, we gained valuable insights into the interconnectedness of various traits and their shared genetic underpinnings. Furthermore, our study leveraged GWASs featuring extensive sample sizes. In our analysis, we exclusively employed robust instrumental variables (IVs) that exhibited F statistics greater than 10. This meticulous approach substantially heightened our capacity to identify even the most subtle associations. Additionally, we conducted analyses across multiple datasets, reinforcing the robustness and consistency of our findings. Moreover, by focusing our analysis on individuals of European ancestry, we minimized potential biases stemming from population stratification, thus increasing the validity and generalizability of our results.

Nonetheless, it is imperative to acknowledge various constraints in our research. To begin with, during the execution of sensitivity analyses in MR, encompassing weighted median, MR-PRESSO, and MR-Egger intercept tests, we encountered specific limitations. Our analysis predominantly relied on a single SNP as the instrumental variable, which may have constrained the detection of intricate causal relationships and led to an underestimation of the risk of horizontal pleiotropy. While weighted median and MR-Egger tests provide supplementary evidence regarding the robustness of causal effects, their efficacy is contingent upon sample size and the strength of instrumental variables, with MR-Egger being particularly sensitive to instrument imbalance. Additionally, while colocalization analysis helped mitigate bias resulting from linkage disequilibrium, the presence of horizontal pleiotropy could not be completely eliminated, which may introduce confounding effects. Nevertheless, our study sought to overcome this limitation by conducting a thorough colocalization analysis of circulating proteins. This approach enabled us to identify potential candidate genes that could potentially have a causal effect on CKD. It is important to note that performing colocalization analyses can sometimes result in apparent conflicts with the results obtained through Mendelian randomization. However, exercising caution and considering the implications of such discrepancies is often warranted and appropriate. Lastly, in our study, we acknowledged the inclusion of samples from non-European ancestries within the broader datasets. This recognition underscores the necessity of examining the European ancestry proportion to ensure the applicability and relevance of our findings across different populations. In addition, future clinical trials are crucial for confirming its therapeutic potential.

Conclusion

Our study presented evidence that TCEA2, NRG4, and GCKR emerged as promising candidates for targeted drug interventions in CKD. Nevertheless, further validation through future clinical trials is warranted to ascertain their therapeutic potential.