Pan-cancer study detects genetic risk variants and shared genetic basis in two large cohorts

Rashkin, Sara R.; Graff, Rebecca E.; Kachuri, Linda; Thai, Khanh K.; Alexeeff, Stacey E.; Blatchins, Maruta A.; Cavazos, Taylor B.; Corley, Douglas A.; Emami, Nima C.; Hoffman, Joshua D.; Jorgenson, Eric; Kushi, Lawrence H.; Meyers, Travis J.; Van Den Eeden, Stephen K.; Ziv, Elad; Habel, Laurel A.; Hoffmann, Thomas J.; Sakoda, Lori C.; Witte, John S.

doi:10.1038/s41467-020-18246-6

Pan-cancer study detects genetic risk variants and shared genetic basis in two large cohorts

Article
Open access
Published: 04 September 2020

Volume 11, article number 4423, (2020)
Cite this article

Download PDF

You have full access to this open access article

From

View current issue

Pan-cancer study detects genetic risk variants and shared genetic basis in two large cohorts

Download PDF

15k Accesses
129 Citations
29 Altmetric
Explore all metrics

Abstract

Deciphering the shared genetic basis of distinct cancers has the potential to elucidate carcinogenic mechanisms and inform broadly applicable risk assessment efforts. Here, we undertake genome-wide association studies (GWAS) and comprehensive evaluations of heritability and pleiotropy across 18 cancer types in two large, population-based cohorts: the UK Biobank (408,786 European ancestry individuals; 48,961 cancer cases) and the Kaiser Permanente Genetic Epidemiology Research on Adult Health and Aging cohorts (66,526 European ancestry individuals; 16,001 cancer cases). The GWAS detect 21 genome-wide significant associations independent of previously reported results. Investigations of pleiotropy identify 12 cancer pairs exhibiting either positive or negative genetic correlations; 25 pleiotropic loci; and 100 independent pleiotropic variants, many of which are regulatory elements and/or influence cross-tissue gene expression. Our findings demonstrate widespread pleiotropy and offer further insight into the complex genetic architecture of cross-cancer susceptibility.

Pan-cancer and cross-population genome-wide association studies dissect shared genetic backgrounds underlying carcinogenesis

Article Open access 20 June 2023

Assessment of polygenic architecture and risk prediction based on common variants across fourteen cancers

Article Open access 03 July 2020

Genome-wide association studies of cancer: current insights and future perspectives

Article 13 October 2017

Introduction

The global burden of cancer is substantial, with an estimated 18.1 million individuals diagnosed each year and approximately 9.6 million deaths attributed to the disease¹. Efforts toward cancer prevention, screening, and treatment are thus imperative, but they require a more comprehensive understanding of the underpinnings of carcinogenesis than we currently possess. While studies of twins², families³, and unrelated populations^4,5,6 have demonstrated substantial heritability and familial clustering for many cancers, the extent to which genetic variation is unique versus shared across different types of cancer remains unclear.

Genome-wide association studies (GWAS) of individual cancers have identified loci associated with multiple cancer types, including 1q32 (MDM4)^7,8; 2q33 (CASP8-ALS2CR12)^9,10; 3q28 (TP63)^11,12; 4q24 (TET2)^13,14; 5p15 (TERT-CLPTM1L)^9,12; 6p21 (HLA complex)^15,16; 7p15¹⁷; 8q24^12,18; 11q13^18,19; 17q12 (HNF1B)^18,20; and 19q13 (MERIT40)²¹. In addition, recent studies have tested single-nucleotide polymorphisms (SNPs) previously associated with one cancer to discover pleiotropic associations with other cancer types^22,23,24,25. Consortia, such as the Genetic Associations and Mechanisms in Oncology, have looked for variants and pathways shared by breast, colorectal, lung, ovarian, and prostate cancers^{26,27,28,29,30}. Comparable studies for other cancers—including those that are less common—have yet to be reported.

In addition to individual variants, recent studies have evaluated genome-wide genetic correlations between pairs of cancer types^4,5,6. One evaluated 13 cancer types and found shared heritability between kidney and testicular cancers, diffuse large B-cell lymphoma (DLBCL) and osteosarcoma, DLBCL and chronic lymphocytic leukemia (CLL), and bladder and lung cancers⁴. Another study of six cancer types found correlations between colorectal cancer and both lung and pancreatic cancers⁵. In an updated analysis with increased sample size, the same group identified correlations of breast cancer with colorectal, lung, and ovarian cancers and of lung cancer with colorectal and head/neck cancers⁶. While these studies provide compelling evidence for shared heritability across cancers, they lack data on several cancer types (e.g., cervix, melanoma, and thyroid).

Here, we present analyses of genome-wide SNP data on 18 cancer types, examining 408,786 individuals of European ancestry from two large, independent, and contemporary cohorts unselected for phenotype—the UK Biobank (UKB) and the Kaiser Permanente Genetic Epidemiology Research on Adult Health and Aging (GERA) cohorts. We seek to detect risk SNPs and pleiotropic loci and variants and to estimate the heritability of and genetic correlations between cancer types. We then conduct in silico functional analyses of pleiotropic variants to catalog biological mechanisms potentially shared across cancers. Leveraging the wealth of individual-level genetic and phenotypic data from both cohorts allows us to extensively interrogate the shared genetic basis of susceptibility to different cancer types, with the ultimate goal of better understanding common genetic mechanisms of carcinogenesis and improving risk assessment. We find widespread pleiotropy that offers further insights into the complex genetic architecture of cross-cancer susceptibility.

Results

Genome-wide association analyses of individual cancers

We found 21 previously unreported genome-wide significant associations between variants and cancers at P < 5 × 10⁻⁸ upon meta-analysis of the UKB and GERA results (Table 1). These included 20 unique variants, with 1 variant that was associated with two cancers (rs78378222). Nine of these 21 associations were in known susceptibility regions for the cancer of interest but independent of previously reported variants (r² < 0.1; see “Methods”). The remaining 12 were in regions not previously associated with the cancer of interest in individuals of European ancestry. Fourteen of these 21 associations indicated pleiotropy in that the relevant variants were in regions previously associated with at least one of the other cancer types evaluated in this study (Table 1). The effect estimates for these 21 associations were not materially changed when stratified by age at diagnosis, Surveillance, Epidemiology, and End Results Program (SEER) grade, or SEER stage (heterogeneity P > 0.05/[number of strata and variants]; see “Methods”).

Table 1 Previously unreported genome-wide significant loci from meta-analysis of UKB and GERA SNPs for each cancer site.

Full size table

In addition, there were nine previously unreported variants associated with cancers at P < 5 × 10⁻⁸ that were only genotyped or imputed in one cohort (Supplementary Data 1; yellow rows). For the sake of completeness and future efforts, Supplementary Data 1 also includes the 21 associations from Table 1 (green rows) and an additional 113 suggestive associations (P < 1 × 10⁻⁶) independent of previously reported results. Finally, we replicated 308 independent cancer risk variants identified as GWAS significant by previous studies (Supplementary Data 2; P < 1 × 10⁻⁶).

In genome-wide sensitivity analyses in the UKB cohort restricted to incident cases (i.e., excluding prevalent cases), our findings for significant and suggestive associations were essentially unchanged (heterogeneity P > 0.05/[number of variants per cancer]; see “Methods”; Supplementary Fig. 1). Similarly, genome-wide sensitivity analysis results in the UKB cohort for esophageal and stomach cancers separately were comparable to those for the two phenotypes combined (heterogeneity P > 0.05/6; see “Methods”; Supplementary Fig. 2).

Genome-wide heritability and genetic correlation

Array-based heritability estimates across cancers ranged from h² = 0.04 (95% CI: 0.00–0.13) for oral cavity/pharyngeal cancer to h² = 0.26 (95% CI: 0.15–0.38) for testicular cancer (Table 2). For some of the cancers, our array-based heritability estimates were comparable to twin- or family-based heritability estimates^2,3 but were more precise. Several were also similar to array-based heritability estimates from consortia comprised of multiple studies^4,5,6. One of our highest heritability estimates was observed for thyroid cancer (h² = 0.21; 95% CI: 0.09–0.33), a cancer that has not been evaluated in other array-based studies.

Table 2 Heritability estimates (h²) and 95% confidence intervals (CIs) for each cancer based on the union set of UKB and GERA SNPs and previous estimates.

Full size table

Among pairs of cancers, only colon and rectal cancers (r_g = 0.85, P = 5.33 × 10⁻⁷) were genetically correlated at a Bonferroni-corrected significance threshold of P = 0.05/153 = 3.27 × 10⁻⁴ or using a false discovery rate (FDR) threshold of q < 0.1 (Fig. 1a, Table 3 and Supplementary Data 3). However, at a nominal threshold of P = 0.05, we observed suggestive relationships between 11 other pairs. Seven pairs showed positive correlations: esophageal/stomach cancer was correlated with Non-Hodgkin’s lymphoma (NHL; r_g = 0.40, P = 0.0089), breast (r_g = 0.26, P = 0.0069), lung (r_g = 0.44, P = 0.0035), and rectal (r_g = 0.32, P = 0.024) cancers; bladder and breast cancers (r_g = 0.22, P = 0.017); melanoma and testicular cancer (r_g = 0.23, P = 0.028); and prostate and thyroid cancers (r_g = 0.23, P = 0.013). The remaining four pairs showed negative correlations: endometrial and testicular cancers (r_g = −0.41, P = 0.0064); esophageal/stomach cancer and melanoma (r_g = −0.27, P = 0.038); lung cancer and melanoma (r_g = −0.28, P = 0.0048); and NHL and prostate cancer (r_g = −0.21, P = 0.012).

**Fig. 1: Cross-cancer genetic correlations (r_g) calculated via LD-score regression (LDSC) and associated cancers from the locus-specific pleiotropy analysis.**

Table 3 Cross-cancer genetic correlations (r_g) calculated via LD-score regression (LDSC) for all cancer pairs with P < 0.05.

Full size table

Locus-specific pleiotropy

We detected 25 pleiotropic regions associated with more than one cancer (P < 5 × 10⁻⁸ for each cancer; independent regions were defined using our linkage disequilibrium [LD] clumping procedure; see “Methods”; Fig. 1b and Supplementary Table 1). Most were at known cancer pleiotropic loci: HLA (14 regions), 8q24 (7 regions), TERT-CLPTM1L (2 regions), and TP53 (1 region). All of the HLA regions were associated with both cervical cancer and NHL. Five regions in 8q24 were associated with prostate and colon cancers (one also associated with rectal cancer), and two were associated with prostate and breast cancers. Of the regions in TERT-CLPTM1L, one was associated with breast cancer and melanoma, and the other was associated with melanoma and cervical and pancreatic cancers. The TP53 region, indexed by rs78378222, was associated with melanoma and lymphocytic leukemia. The remaining pleiotropic region, indexed by rs6507874, was in SMAD7, which has been previously linked to colorectal cancer³¹, and we confirmed its association with colon and rectal cancers separately.

Genome-wide variant-specific pleiotropy

We assessed variant-specific pleiotropy by testing all variants genome-wide using the summary statistics for each cancer using ASSET. We found 85 independent (LD r² < 0.1) one-directional pleiotropic variants with at least two associated cancers, the same direction of effect for all associated cancers, and an overall pleiotropic P < 5 × 10⁻⁸ (Supplementary Data 4). Of these one-directional pleiotropic variants, there were 17 for which the overall pleiotropic P was smaller than the P for each of the associated cancers (Fig. 2 and Table 4). While 84 of the 85 one-directional pleiotropic variants were in regions that have previously been associated with any cancer, 68 were associated with at least one cancer not previously reported. The variant in a region not previously associated with any cancer is rs150260898, intronic of RABIF5, which was associated with melanoma and oral cavity/pharyngeal cancer.

**Fig. 2: Manhattan plot displaying one-directional variant-specific pleiotropy from ASSET.**

Table 4 Top independent variants from the one-directional variant-specific pleiotropic analysis.

Full size table

We also considered bidirectional pleiotropic associations, wherein the same allele for a given variant was associated with an increased risk for some cancers but a decreased risk for others. We found 15 such variants with P < 5 × 10⁻⁸, all of which were independent from one another and from the one-directional pleiotropic variants (LD r² < 0.1; Fig. 3, Table 5 and Supplementary Data 5). There were eight variants where the overall pleiotropic P was smaller than the P for the associated cancers. While all of the bidirectional pleiotropic variants were in regions that have previously been associated with cancer, six were independent of known risk variants, and all 15 were associated with at least one cancer not previously reported.

**Fig. 3: Manhattan plot displaying bidirectional variant-specific pleiotropy from ASSET.**

Table 5 Top independent variants from the bidirectional variant-specific pleiotropic analysis.

Full size table

For any pair of cancers associated with the same variant, the type of association falls in one of three categories: (1) SNPs identified in the one-directional analysis, where all associations are in the same direction; (2) SNPs identified in the bidirectional analysis, where both cancers in the pair are associated in the same direction (both risk increasing or both risk decreasing), even though at least one other cancer is associated in the opposite direction; and (3) SNPs identified in the bidirectional analysis, where the pair of cancers are associated in opposite directions (one risk increasing and one risk decreasing). For each of the possible 153 pairs of cancers, we tabulated how many of the 100 pleiotropic SNPs fall into each category (Fig. 4a and Supplementary Data 6). The number of one- and bidirectional SNPs shared by cancer pairs ranged from one (bladder and breast) to 13 (lymphocytic leukemia and testis) (Fig. 4a and Supplementary Data 6). For 30 cancer pairs, the shared associations had exclusively the same direction of effect (i.e., tabulating across the first two categories of pleiotropic SNPs). For three cancer pairs, at least 50% of the shared variants were associated in opposite directions.

**Fig. 4: Summary of cancer pairs associated with and functional consequences of the 100 one- and bidirectional pleiotropic variants.**

For each of the 100 independent SNPs showing either one- or bidirectional pleiotropy (Supplementary Data 4–5), we assessed whether the results differed according to age at diagnosis, SEER grade, or SEER stage for any of the associated cancers. After correcting for the number of SNPs and strata tested, only a single one-directional pleiotropic SNP showed heterogeneity across case subtypes. rs111362352-C was significantly positively associated with the risk of low grade prostate cancer in GERA, while it was not associated with high-grade disease. These results are consistent with previous findings for this SNP (or SNPs in strong LD): the C allele has been associated with lower Gleason score, and it is located at KLK3, the prostate-specific antigen gene, which may reflect its previous association with lower grade prostate cancer^32,33.

Functional characterization of pleiotropic variants

The biological significance of these 100 independent pleiotropic variants (Supplementary Data 4–5) was evaluated using in silico annotation tools (Supplementary Data 7)^34,35,36. Pleiotropic variants were enriched in intergenic (P = 0.043) and non-coding RNA transcripts (P = 0.015) compared to all variants in the reference panel of UKB European descent individuals (Fig. 4b). The distribution of DeepSea functional significance scores was skewed toward 0 (P = 7.3 × 10⁻⁴), indicating a higher likelihood of regulatory effects compared to a reference distribution of 1000 Genomes variants (Fig. 4d). Suggestively functional variants (n = 26, DeepSEA score < 0.05) were also predicted to be pathogenic by Combined Annotation-Dependent Depletion³⁶ (CADD; mean score of 10.66, corresponding to the top 10% of deleterious substitutions). Twenty-two of the 100 pleiotropic variants were characterized by active chromatin states, 33 were classified as enhancers, and 64 had significant (FDR < 0.05) effects on gene expression (Fig. 4c). Five variants belonged to all three classes (Fig. 4c).

Consistent with hypothesized pleiotropy, 78.1% of the 64 expression quantitative trait loci (eQTLs) identified among the pleiotropic variants had more than one target tissue, and 78.1% influenced the expression of more than one gene (Supplementary Fig. 3), for a total of 596 significant SNP-gene pairs. The most common expression tissues for eQTLs among pleiotropic variants were whole blood (49.2%), followed by adipose (14.8%) and esophageal (4.7%) tissues. Regulatory effects mediated by chromatin looping were observed for 28 variants, including 3 enhancer-promoter links in 6p21.23 (rs535777, rs73728618) and 22q13.2 (rs5759167, PACSIN2 promoter; Supplementary Fig. 4). Notably, rs5759167 is an eQTL for PACSIN2 in whole blood (BIOS QTL: P = 9.89 × 10⁻¹⁴; GTEx v8: P = 3.39 × 10⁻⁷).

The functional profile of the 100 pleiotropic variants was significantly different across multiple features when compared to a randomly selected set of 100 independent variants. Pleiotropic variants had a significantly higher proportion of enhancers (P = 3.38 × 10⁻⁴), eQTLs (P = 3.38 × 10⁻⁴; >1 tissue: P = 2.33 × 10⁻³; >1 gene: P = 1.34 × 10⁻⁴), and chromatin interactions (P = 3.48 × 10⁻⁴). Pleiotropic variants did not have a significantly higher proportion in active chromatin states (P = 0.48).

Genes represented by pleiotropic variants were significantly enriched for 36 KEGG pathways that formed two clusters broadly characterized by immune-related functions and cancer-specific genes (Supplementary Table 2 and Supplementary Fig. 5). Top-ranking pathways in the first cluster included antigen processing and presentation (P = 4.29 × 10⁻⁶), cell adhesion molecules (P = 4.29 × 10⁻⁶), allograft rejection (P = 4.29 × 10⁻⁶), cancer-related infections (human T cell leukemia virus 1: P = 4.35 × 10⁻⁶; Epstein-Barr virus: P = 3.49 × 10⁻⁵), and autoimmune diseases (type I diabetes: P = 9.84 × 10⁻⁶; inflammatory bowel disease: P = 1.16 × 10⁻³). The second cluster was enriched for genes related to multiple cancers (gastric: P = 9.94 × 10⁻⁵; small cell lung cancer: P = 3.14 × 10⁻³; prostate: P = 3.65 × 10⁻³), drug resistance (endocrine resistance: P = 2.55 × 10⁻⁴), and cellular senescence (P = 0.014).

Discussion

In this study of cancer pleiotropy in two large cohorts, we found multiple lines of evidence for a shared genetic basis of several cancer types. By characterizing pleiotropy at the genome-wide, locus-specific, and variant-specific levels for a large number of cancer sites, we generated several insights into cancer susceptibility. Specifically, we detected 21 previously unreported genome-wide significant variant associations across 11 of the 18 individual cancers examined. We also detected 100 independent variants displaying one- or bidirectional pleiotropy that were enriched for a number of regulatory functions that reflect hallmarks of carcinogenesis.

One notable finding from our cervical cancer GWAS was rs10175462 in PAX8 on 2q13, which, to our knowledge, is the first genome-wide significant cervical cancer risk SNP identified outside of the HLA region in a European ancestry population¹⁵. In a candidate SNP study of PAX8 eQTLs in a Han Chinese population, two variants in LD with rs10175462 in Europeans (rs1110839, r² = 0.33; rs4848320, r² = 0.34) were suggestively associated with cervical cancer risk in the same direction³⁷. Several GWAS findings also provided evidence of pleiotropy, in that previously unreported risk variants for one cancer had known associations with one or more other cancers. For instance, rs9818780 was associated with melanoma and has been implicated in sunburn risk³⁸. This intergenic variant is an eQTL for LINC00886 and METTL15P1 in skin tissue. The former gene has previously been linked to breast cancer³⁹, and both genes have been implicated in ovarian cancer⁴⁰. Beyond the previously unreported associations, our GWAS detected 308 independent associations with P < 1 × 10⁻⁶ that confirmed signals identified in previous GWAS with P < 5 × 10⁻⁸. This finding strengthened our confidence in using our genome-wide summary statistics for subsequent analyses of cancer pleiotropy.

In evaluating pairwise genetic correlations between the 18 cancer types, we observed the strongest signal for colon and rectal cancers—an expected relationship consistent with findings from a twin study⁴¹. We also identified several cancer pairs for which the genetic correlations were nominally significant. One pair supported by previous evidence is melanoma and testicular cancer; some studies have found that individuals with a family history of the former are at an increased risk for the latter^42,43. Esophageal/stomach cancer was a component of five correlated pairs—with melanoma, NHL, and breast, lung, and rectal cancers. Despite some similarities between esophageal and stomach cancers, testing them as a combined phenotype may have inflated the number of correlated cancers.

Our genetic correlation results contrast with some previous findings^4,5,6; we did not find several correlations that they did and found others that they did not. The differences may be partly due to a smaller number of cases in our cohorts for some sites. Further studies with larger sample sizes are necessary to validate our correlations, as those that did not attain Bonferroni-corrected significance may have been due to chance. However, we achieved comparable or higher cancer-specific heritability estimates for breast, colon, and lung cancers, which suggests that differences in study design may also play a role. Previous analyses aggregated case–control studies recruited during different time periods. While such meta-analyses can be effective at reducing residual population stratification, our extensive quality control processes also seemingly mitigated population stratification; the mean λ_GC across the 18 cancers was 1.02 (standard deviation = 0.027). Moreover, our design allowed for the assessment of cross-cancer relationships in the same set of individuals and the examination of several cancers that have yet to be studied in large consortia.

The assessment of pleiotropy at the locus level confirmed previously reported associations at 5p15.33, HLA, and 8q24 (refs. ^{9,12,15,16,18}). Out of the 25 pleiotropic loci that we identified, most were at these known cancer pleiotropic loci. Over half, all in the HLA locus, were associated with cervical cancer and NHL. The two cancers were weakly negatively correlated in the two cohorts combined and nominally significantly negatively correlated in the UKB alone (Supplementary Data 8). The difference may reflect better coverage and imputation of the HLA region in the UKB than in GERA.

Variant-specific analyses provided further evidence in support of locus-specific cancer pleiotropy, including validation of previously reported signals at 1q32 (refs. ^7,8) and 2q33 (refs. ^9,10) (ALS2CR12). Interestingly, our lead 1q32 variant (rs1398148) maps to PIK3C2B and is in LD (r² > 0.60) with known MDM4 cancer risk variants^7,8, suggesting that the 1q32 locus may be involved in modulating both p53-and PI3K-mediated oncogenic pathways. The 100 independent pleiotropic variants (with overall pleiotropic P < 5 × 10⁻⁸) mapped to a total of 56 genomic locations (defined by cytoband), which included the six genomic locations to which all 25 of the regions identified from the locus-specific analysis map. Although 99 of the 100 variants showing one- or bidirectional pleiotropic associations are in regions previously associated with cancer, 83 of the 99 were associated with at least one cancer not previously reported.

Out of 100 independent variants identified from the variant-specific pleiotropy analyses, 17 were in 8q24 and 15 were in the HLA region. Different distributions of one- and bidirectional results highlight patterns of directional pleiotropy: of the 15 HLA variants, 7 were bidirectional, while only three of the 17 variants in 8q24 were bidirectional. The HLA region is critical for innate and adaptive immune response and has a complex relationship with cancer risk. Heterogeneous associations with HLA haplotypes have been reported for different subtypes of NHL⁴⁴ and lung cancer⁴⁵, suggesting that relevant risk variants are likely to differ within, as well as between, cancers. Studies have further demonstrated that somatic mutation profiles are associated with HLA class I (ref. ⁴⁶) and class II alleles⁴⁷. Specifically, mutations that create neoantigens more likely to be recognized by specific HLA alleles are less likely to be present in tumors from patients carrying such alleles. It is thus possible that some of the positive and negative pleiotropy we identified is related to mutation type. These results reinforce the importance of the immune system playing a role in cancer susceptibility.

In contrast to the HLA region, the majority of the 8q24 pleiotropic variants had the same direction of effect for all associated cancers, implying the existence of shared genetic mechanisms driving tumorigenesis across sites. The proximity of the well-characterized MYC oncogene makes it a compelling candidate for such a consistent, one-directional effect. It could work via regulatory elements, such as acetylated and methylated histone marks⁴⁸. Consistent with this hypothesis, we observed heritability enrichment⁴⁹ for variants with the H3K27ac annotation for breast (P = 3.09 × 10⁻⁴), colon (P = 4.44 × 10⁻⁴), prostate (P = 2.74 × 10⁻⁵), and rectal (P = 0.036) cancers—all of which share susceptibility variants in 8q24, according to our analyses and previous studies⁴⁸.

In silico analyses found the 100 pleiotropic variants to be enriched across multiple regulatory domains compared to non-pleiotropic randomly selected variants and highlighted cross-cancer susceptibility loci. The 11q13.3 region includes rs12275055, which maps to active enhancers and is also an eQTL for TPCN2, a gene involved in controlling the angiogenic response to VEGF and extracellular vesicle trafficking in cancer cells^50,51. An additional interesting region, 22q13.2, is indexed by rs5759167, an intergenic variant linked to prostate and lung cancers risk. Its pleiotropic effects are likely mediated by regulation of PACSIN2, which codes for a cyclin D1 binding partner that serves as a brake for CCND1-mediated cellular migration⁵². This is consistent with our observation that that the risk-increasing G-allele is associated with increased PACSIN2 expression in whole blood⁵³. Lastly, our pathway analysis indicated that pleiotropic variants as a group are enriched for genes involved in immune regulation and infection, as well as cancer development and progression. Our in silico findings highlight loci that are good candidates for investigation in future in vivo studies.

It is important to acknowledge some limitations of our study. First, counts for some of the cancer types were limited. However, small sample sizes are partially offset by the advantages of using two population-based cohorts. Second, due to the complexity of the LD structure in the HLA region, we may have overestimated the number of distinct, independent signals. Slight overestimation, however, does not affect our overall conclusions regarding the pleiotropic nature of this region. Third, our analyses included both prevalent and incident cases. Nevertheless, sensitivity analyses restricted to incident cancers yielded comparable results. Fourth, we grouped esophageal and stomach cancers despite possible differences in their risk factor profiles. However, there is precedent for using a composite phenotype⁵⁴, and analyses of stomach and esophageal tumors suggest that they have many overlapping molecular features^55,56. In addition, sensitivity analyses for each cancer alone gave similar results, suggesting that they may have similar genetic bases despite potentially having different environmental risk factors. Fifth, we focused solely on individuals of European ancestry. Further analyses are needed to accurately assess patterns of pleiotropy in non-Europeans. Finally, the two distinct cohorts studied here—the UKB and GERA—were recruited from different populations and time periods and were genotyped with different versions of Axiom GWAS arrays. Only variants genotyped or well-imputed across the cohorts were combined in our meta-analysis. Moreover, studying two cohorts provides complementary evidence for pleiotropy.

The characterization of pleiotropy is fundamental to understanding the genetic architecture of cross-cancer susceptibility and its biological underpinnings. The availability of two large, independent cohorts provided an opportunity to efficiently evaluate the shared genetic basis of many cancers, including some not previously studied together. The result was a multifaceted assessment of common genetic factors implicated in carcinogenesis, and our findings illustrate the importance of investigating different aspects of cancer pleiotropy. Broad analyses of genetic susceptibility and targeted analyses of specific loci and variants may both contribute insights into different dimensions of cancer pleiotropy. Future studies should consider the contribution of rare variants to cancer pleiotropy and aim to elucidate the functional pathways mediating associations observed at pleiotropic regions. Such research, combined with our findings, has the potential to inform drug development, risk assessment, and clinical practice toward reducing the burden of cancer.

Methods

Study populations and phenotyping

The UKB is a population-based cohort of 502,611 individuals in the United Kingdom. Study participants were aged 40–69 at recruitment between 2006 and 2010, at which time all participants provided detailed information about lifestyle and health-related factors and provided biological samples⁵⁷. GERA participants were drawn from adult Kaiser Permanente Northern California (KPNC) health plan members who provided a saliva sample for the Research Program on Genes, Environment and Health (RPGEH) between 2008 and 2011. Individuals included in this study were selected from the 102,979 RPGEH participants who were successfully genotyped as part of GERA and answered a baseline survey concerning lifestyle and medical history^58,59.

Cancer cases in the UKB were identified via linkage to various national cancer registries established in the early 1970s⁵⁷. Data in the cancer registries are compiled from hospitals, nursing homes, general practices, and death certificates, among other sources. The latest cancer diagnosis in our data from the UKB occurred in August 2015. GERA cancer cases were identified using the KPNC Cancer Registry, including all diagnoses captured through June 2016. Following SEER standards, the KPNC Cancer Registry contains data on all primary cancers (i.e., cancer diagnoses that are not secondary metastases of other cancer sites; excluding non-melanoma skin cancer) diagnosed or treated at any KPNC facility since 1988.

In both cohorts, individuals with at least one recorded prevalent or incident diagnosis of a borderline, in situ, or malignant primary cancer were defined as cases for our analyses. Individuals with multiple cancer diagnoses were classified as a case only for their first cancer. For the UKB, all diagnoses described by International Classification of Diseases (ICD)-9 or ICD-10 codes were converted into ICD-O-3 codes; the KPNC Cancer Registry already included ICD-O-3 codes. We then classified cancers according to organ site using the SEER site recode paradigm⁶⁰. We grouped all esophageal and stomach cancers and, separately, all oral cavity and pharyngeal cancers to ensure sufficient statistical power. The 18 most common cancer types (except non-melanoma skin cancer) were examined. Testicular cancer data were obtained from the UKB only due to the small number of cases in GERA.

Controls were restricted to individuals who had no record of any cancer in the relevant registries, who did not self-report a prior history of cancer (other than non-melanoma skin cancer), and, if deceased, who did not have cancer listed as a cause of death. Individuals whose first cancer diagnosis was for a cancer not among our 18 cancers of interest were excluded. For analyses of sex-specific cancer sites (breast, cervix, endometrium, ovary, prostate, and testis), controls were restricted to individuals of the appropriate sex.

Quality control

For the UKB population, genotyping was conducted using either the UKB Axiom array (436,839 total; 408,841 self-reported European) or the UK BiLEVE array (49,747 total; 49,746 self-reported European)⁵⁷. The former is an updated version of the latter, such that the two arrays share over 95% of their marker content. UKB investigators undertook a rigorous quality control (QC) protocol⁵⁷. Genotype imputation was performed using the Haplotype Reference Consortium as the main reference panel and the merged UK10K and 1000 Genomes phase 3 reference panels for additional data, resulting in a unified set of 93,095,623 imputed SNPs⁵⁷, which is used for all analyses. Ancestry principal components (PCs) were computed using fastPCA based on a set of 407,219 unrelated samples and 147,604 genetic markers⁵⁷.

For GERA participants, genotyping was performed using an Affymetrix Axiom array (Affymetrix, Santa Clara, CA, USA) optimized for individuals of European race/ethnicity. Details about the array design, estimated genome-wide coverage, and QC procedures have been published previously^59,61. The genotyping produced high-quality data with average call rates of 99.7% and average SNP reproducibility of 99.9%. Variants that were not directly genotyped (or that were excluded by QC procedures) were imputed to generate genotypic probability estimates. After pre-phasing genotypes with SHAPE-IT v2.5, IMPUTE2 v2.3.1 was used to impute SNPs relative to the cosmopolitan reference panel from 1000 Genomes. Ancestry PCs were computed based on 144,799 high-performing SNPs using the smartpca program in the EIGENSOFT4.2 software package⁵⁸.

For both cohorts, analyses were limited to self-reported European ancestry individuals for whom self-reported and genetic sex matched. To further minimize potential population stratification, we excluded individuals for whom either of the first two ancestry PCs fell outside five standard deviations of the mean of the population. Based on a subset of genotyped autosomal variants with minor allele frequency (MAF) ≥ 0.01 and genotype call rate ≥97%, we excluded samples with call rates <97% and/or heterozygosity more than five standard deviations from the mean of the population. With the same subset of SNPs, we used KING to estimate relatedness among the samples. We excluded one individual from each pair of first-degree relatives, first prioritizing on maximizing the number of the cancer cases relevant to these analyses and then maximizing the total number of individuals in the analyses. Our study population ultimately included 408,786 UKB participants and 66,526 GERA participants. We excluded SNPs with imputation quality score (r²_INFO) <0.3, call rate <95% (alternate allele dosage required to be within 0.1 of the nearest hard call to be non-missing; UKB only), Hardy–Weinberg equilibrium P among controls <1 × 10⁻⁵, and/or MAF < 0.01, leaving 8,876,519 variants for analysis for the UKB and 8,973,631 for GERA.

For indels, the r²_INFO scores indicated extremely high accuracy, ranging from 0.81 to 0.99 in the UKB (median = 0.99) and from 0.72 to 0.99 in GERA (median = 0.99) (Supplementary Data 1–2). In addition, the correlation was very high between imputed and sequenced genotypes for 44 EUR samples from the 1000 Genomes Project genotyped with the Axiom UK Biobank array and imputed using the 1KGP WGS Phase 3 reference panel: the average r² was 0.97 for SNPs and 0.90 for indels (MAF > 0.01; Jeremy Gollub, Personal Communication).

Genome-wide association analyses of individual cancers

We used PLINK to implement within-cohort logistic regression models of additively modeled SNPs genome-wide, comparing cases of each cancer type to cancer-free controls. All models were adjusted for age at specimen collection, sex (non-sex-specific cancers only), first ten ancestry PCs, genotyping array (UKB only), and reagent kit used for genotyping (Axiom v1 or v2; GERA only). Case counts ranged from 471 (pancreatic cancer) to 13,903 (breast cancer) in the UKB and from 162 (esophageal/stomach cancer) to 3978 (breast cancer) in GERA (Supplementary Table 3). Control counts were 359,825 (189,855 females) and 50,525 (29,801 females) in the UKB and GERA, respectively. After separate GWAS were conducted in each cohort, association results for the 7,846,216 SNPs in both cohorts were combined via meta-analysis. For variants that were only examined in one cohort (22% of the total 10,003,934 SNPs analyzed), original summary statistics were merged with the meta-analyzed SNPs to create a union set of SNP statistics for each cancer for use in downstream analyses (Supplementary Fig. 6).

To determine independent signals in our union set of SNPs, we implemented the LD clumping procedure in PLINK based on genotype hard calls from a reference panel comprised of a downsampled subset of 10,000 random UKB participants. For each cancer separately, LD clumps were formed around index SNPs with the smallest P not already assigned to another clump. While only variants with P < 5 × 10⁻⁸ were considered significant, to also identify suggestive variants for supplementary results, in each clump, index SNPs had a suggestive association based on P < 1×10⁻⁶, and SNPs were added if they were marginally significant with P < 0.05, were within 500 kb of the index SNP, and had r² > 0.1 with the index SNP. To confirm independence, we implemented GCTA’s conditional and joint analysis (COJO) method with the aforementioned downsampled subset of UKB participants as a reference panel, performing stepwise selection of the index SNPs within a ±1000 kb region of one another. SNPs were deemed independent if they maintained a P < 1 × 10⁻⁶ in the joint model. The remaining independent variants were determined to be novel if they were independent of previously reported risk variants in European ancestry populations (as described below).

To identify SNPs previously associated with each cancer type, we abstracted all genome-wide significant SNPs from relevant GWAS published through June 2018. We determined that a SNP was potentially novel if it had LD r² < 0.1 with all previously reported SNPs for the relevant cancer based on both the UKB reference panel and the 1000 Genomes EUR superpopulation via LDlink. As an additional filter for novelty, we again used COJO to condition each potentially novel SNP on previously reported SNPs for the relevant cancer using the UKB reference panel, and SNPs were not considered novel if they did not maintain P < 1 × 10⁻⁶ in the joint model. To confirm novelty and consider pleiotropy, we conducted an additional literature review to investigate whether these SNPs had previously been reported for the same or other cancers, including those not attaining genome-wide significance and those in non-GWAS analyses. For this additional review, we used the PhenoScanner database to search for SNPs of interest and variants in LD in order to comprehensively scan previously reported associations. We then supplemented with more in-depth PubMed searches to determine if the genes in which novel SNPs were located had previously been reported for the same or other cancers. Finally, for cancers with publicly available summary statistics (breast [>120,000 cases]³⁹, prostate [~80,000 cases]⁶², and ovarian [~30,000 cases]⁴⁰), we tested our potentially previously unreported SNPs with P < 1 × 10⁻⁶ for replication (defined as having the same direction of effect and P < 0.05). Tested SNPs that did not replicate were not considered previously unreported.

We considered whether clinical characteristics of the cases were informative about associated phenotypes by examining SEER stage and grade (both GERA only) and age at cancer diagnosis (UKB and GERA). For each clinical variable, we decomposed cases into one of two categories: grade 1—2 (well or moderately differentiated) or grade 3–4 (poorly or undifferentiated); stage 0–1 (in situ or localized) or stage 2–7 (regional or distant metastases); age < median or age ≥ median. The case counts for all cancer-outcome strata are tabulated in Supplementary Table 4. For each of the previously unreported GWAS SNPs, we conducted logistic regression comparing controls to each of the relevant case subtypes. We then compared the effect estimates across the strata for each clinical variable (e.g., for each relevant SNP–cancer pair, we compared the OR for grade 1–2 with the OR for grade 3–4) and calculated Cochran’s Q statistic to test for heterogeneity, adjusting for multiple testing for the number of strata and SNPs tested.

To assess whether our results were influenced by factors associated with survival, we conducted sensitivity analyses restricted to incident cases in the larger UKB cohort. For each cancer, we compared the independent SNPs that were suggestively associated in the analysis using both prevalent and incident cases (P < 1 × 10⁻⁶) with those in the incident only analysis. We assessed whether the effect sizes varied by calculating Cochran’s Q statistic to test for heterogeneity, adjusting for multiple testing across the number of SNPs tested for each cancer. Additional sensitivity analyses evaluated esophageal and stomach cancers as separate phenotypes in the UKB cohort. For independent SNPs with P < 1 × 10⁻⁶ in the analysis of the composite phenotype in UKB alone, we compared effect sizes for the composite phenotype to effect sizes for esophageal and stomach cancers separately and calculated Cochran’s Q statistic to test for heterogeneity, adjusting for multiple testing across the number of SNPs tested. For both of these sensitivity analyses, we assessed all SNPs with P < 1 × 10⁻⁶ to allow for a sufficient number of variants for comparison.

Genome-wide heritability and genetic correlation

We used LD-score regression (LDSC) on summary statistics from the union set of all SNPs genome-wide to calculate the genome-wide liability-scale heritability of each cancer type and the genetic correlation between each pair of cancer types. Internal LD scores were calculated using the aforementioned downsampled subset of UKB participants. To convert to liability-scale heritability, we adjusted for lifetime risks of each cancer based on SEER 2012–2014 estimates (Supplementary Table 5)⁶³. LDSC was unable to estimate genetic correlations for testicular cancer with both oral cavity/pharyngeal and pancreatic cancers, likely due to small sample sizes.

Locus-specific pleiotropy

Using our union set of SNP-based summary statistics, we constructed pleiotropic regions of SNPs associated with more than one cancer with P < 5 × 10⁻⁸. Non-overlapping regions were iteratively formed around index SNPs associated with any cancer, beginning with the SNP associated with the smallest P. SNPs were added to a region if they were associated with any cancer with P < 5 × 10⁻⁸, were within 500 kb of the index SNP, and had LD r² > 0.5 with the index SNP. We used a larger threshold for assessing pleiotropic regions (r² > 0.5) than for identifying truly independent signals (r² > 0.1; above) to ensure that all SNPs within a region were in LD. If all SNPs in a region were associated with the same cancer, the region was not considered pleiotropic.

Genome-wide variant-specific pleiotropy

We quantified one-directional and, separately, bidirectional variant-specific pleiotropy via the R package ASSET (association analysis based on subsets)⁶⁴. Briefly, ASSET explores all possible subsets of traits for the presence of association signals, resulting in the best combination of traits to maximize the test statistic⁶⁴. ASSET has two procedures: in one, all traits are assumed to be associated with a variant in the same effect direction (one-directional pleiotropy); in the other, variants can be associated with traits in opposite directions (bidirectional pleiotropy)⁶⁴. In the one-directional pleiotropy analysis, an overall P across the selected traits is provided, and in the bidirectional pleiotropy analysis, a P for each direction is provided as well as an overall P for the total association signal for both directions combined. ASSET corrects for the internal multiple testing burden accrued by iterating through all possible trait subsets for each variant as well as controlling for shared samples among the traits⁶⁴.

Genome-wide ASSET analyses were conducted on the union sets of summary statistics for all 18 cancers. Independent variants were determined via LD clumping, where index SNPs were suggestively significant (overall P < 1 × 10⁻⁶), and other SNPs were clumped with the lead variant if they had overall P < 0.05, were within 500 kb of the index SNP, and had r² > 0.1 with the index SNP. While we only considered variants with an overall P < 5 × 10⁻⁸ significant, we used a suggestive significance threshold to comprehensively assess all potentially pleiotropic variants. A SNP was determined to have a one-directional pleiotropic association if the overall P was <1 × 10⁻⁶ and it was associated with at least two cancers. A SNP was determined to have a bidirectional pleiotropic association if the overall P was <1 × 10⁻⁶ and the P for each direction was <0.05. For one- and bidirectional SNPs in LD with each other, the SNP with the smaller overall P was retained. We deconstructed bidirectional associations into cancers with risk-increasing effects and cancers with risk-decreasing effects.

To assess whether clinical aspects of the cases could be informative about the pleiotropic variants, for each of the one-directional and bidirectional pleiotropic SNPs, we conducted logistic regression comparing controls to each of the relevant case subtypes described above and calculated Cochran’s Q statistic to test for heterogeneity between estimates across the strata for each clinical variable.

Functional characterization of pleiotropic variants

Functional consequences for the 100 pleiotropic variants identified in the ASSET analysis were obtained from ANNOVAR. Enrichment of functional classes was evaluated using Fisher’s exact test, comparing the distribution observed among the pleiotropic variants to that of all variants with INFO > 0.90 in the reference panel of UKB European descent individuals (16,972,700 SNPs total).

Overall functional significance was assessed using DeepSEA, a deep learning tool that prioritizes functional variants by integrating regulatory binding and ENCODE modification patterns of ~900 cell-factor combinations with evolutionary conservation features. Resulting functional significance scores, ranging from 0 to 1, represent the degree of deviation from a reference distribution of 1000 Genomes variants, with lower scores indicating a higher likelihood of functional significance. We also report CADD scores, which combine over 60 diverse annotations to predict deleteriousness³⁶. CADD scores are transformed into a log10-derived rank score based on the genome-wide distribution of scores for 8.6 billion single-nucleotide variants in GRCh37/hg19 (i.e., CADD = 10 corresponds to top 10% most deleterious substitutions)³⁶.

To assess more specific functional features, we annotated each SNP according to Roadmap’s 15-core chromatin states across 127 cell or tissue types^35,65. Chromatin state was assigned by taking the most common state, with values ≤7 indicating open, accessible chromatin regions. Three-dimensional chromatin interactions were explored to identify significant interaction and enhancer-promoter links. We also explored associations with gene expression in using data from the GTEx v8 and BIOS QTL databases. The distribution of functional features among pleiotropic cancer risk variants was compared to a random sample of the same number of SNPs. Chromatin features and BIOS QTL annotations were obtained from the FUMA (Functional Mapping and Annotation) database. Differences in the proportion of variants belonging to each functional class were tested using a two-sample chi-squared test. Lastly, after annotating variants to their nearest gene, we conducted gene-set pathway enrichment analyses using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database⁶⁶ with an FDR q < 0.05 significance threshold.

Ethics

The study was approved by the University of California and KPNC Institutional Review Boards and the UKB data access committee, and informed consent was obtained from all participants.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Our meta-analysis summary statistics are publicly available at https://github.com/Wittelab/pancancer_pleiotropy. The UKB cohort data is publicly available from the UKB access portal at https://www.ukbiobank.ac.uk. The Kaiser Permanente data are available via application with a local collaborator at https://researchbank.kaiserpermanente.org/our-research/for-researchers/. All remaining relevant data are available in the article, supplementary information, or from the corresponding author upon reasonable request.

References

Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424 (2018).
Article PubMed Google Scholar
Mucci, L. A. et al. Familial risk and heritability of cancer among twins in Nordic countries. JAMA 315, 68–76 (2016).
CAS PubMed PubMed Central Google Scholar
Czene, K., Lichtenstein, P. & Hemminki, K. Environmental and heritable causes of cancer among 9.6 million individuals in the Swedish Family-Cancer Database. Int. J. Cancer 99, 260–266 (2002).
CAS PubMed Google Scholar
Sampson, J. N. et al. Analysis of heritability and shared heritability based on genome-wide association studies for 13 cancer types. J. Natl Cancer Inst. 107, djv279 (2015).
Lindström, S. et al. Quantifying the genetic correlation between multiple cancer types. Cancer Epidemiol. Biomark. Prev. 26, 1427–1435 (2017).
Google Scholar
Jiang, X. et al. Shared heritability and functional enrichment across six solid cancers. Nat. Commun. 10, 431 (2019).
ADS CAS PubMed PubMed Central Google Scholar
Couch, F. J. et al. Genome-wide association study in BRCA1 mutation carriers identifies novel loci associated with breast and ovarian cancer risk. PLoS Genet. 9, e1003212 (2013).
CAS PubMed PubMed Central Google Scholar
Eeles, R. A. et al. Identification of 23 new prostate cancer susceptibility loci using the iCOGS custom genotyping array. Nat. Genet. 45, 385–391, 391e1-2 (2013).
CAS PubMed Google Scholar
Barrett, J. H. et al. Genome-wide association study identifies three new melanoma susceptibility loci. Nat. Genet. 43, 1108–1113 (2011).
CAS PubMed PubMed Central Google Scholar
Broeks, A. et al. Low penetrance breast cancer susceptibility loci are associated with specific breast tumor subtypes: findings from the Breast Cancer Association Consortium. Hum. Mol. Genet. 20, 3289–3303 (2011).
PubMed PubMed Central Google Scholar
Ellinghaus, E. et al. Identification of germline susceptibility loci in ETV6-RUNX1-rearranged childhood acute lymphoblastic leukemia. Leukemia 26, 902–909 (2012).
CAS PubMed Google Scholar
Rothman, N. et al. A multi-stage genome-wide association study of bladder cancer identifies multiple susceptibility loci. Nat. Genet. 42, 978–984 (2010).
CAS PubMed PubMed Central Google Scholar
Eeles, R. A. et al. Identification of seven new prostate cancer susceptibility loci through a genome-wide association study. Nat. Genet. 41, 1116–1121 (2009).
CAS PubMed PubMed Central Google Scholar
Michailidou, K. et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat. Genet. 45, 353–361 (2013).
CAS PubMed PubMed Central Google Scholar
Bahrami, A. et al. Genetic susceptibility in cervical cancer: from bench to bedside. J. Cell Physiol. 233, 1929–1939 (2017).
PubMed Google Scholar
Smedby, K. E. et al. GWAS of follicular lymphoma reveals allelic heterogeneity at 6p21.32 and suggests shared genetic susceptibility with diffuse large B-cell lymphoma. PLoS Genet. 7, e1001378 (2011).
CAS PubMed PubMed Central Google Scholar
Jin, G. et al. Genetic variants at 6p21.1 and 7p15.3 are associated with risk of multiple cancers in Han Chinese. Am. J. Hum. Genet. 91, 928–934 (2012).
CAS PubMed PubMed Central Google Scholar
Eeles, R. A. et al. Multiple newly identified loci associated with prostate cancer susceptibility. Nat. Genet. 40, 316–321 (2008).
CAS PubMed Google Scholar
Purdue, M. P. et al. Genome-wide association study of renal cell carcinoma identifies two susceptibility loci on 2p21 and 11q13.3. Nat. Genet. 43, 60–65 (2011).
CAS PubMed Google Scholar
Spurdle, A. B. et al. Genome-wide association study identifies a common variant associated with risk of endometrial cancer. Nat. Genet. 43, 451–454 (2011).
CAS PubMed PubMed Central Google Scholar
Couch, F. J. et al. Common variants at the 19p13.1 and ZNF365 loci are associated with ER subtypes of breast cancer and ovarian cancer risk in BRCA1 and BRCA2 mutation carriers. Cancer Epidemiol. Biomark. Prev. 21, 645–657 (2012).
CAS Google Scholar
Setiawan, V. W. et al. Cross-cancer pleiotropic analysis of endometrial cancer: PAGE and E2C2 consortia. Carcinogenesis 35, 2068–2073 (2014).
CAS PubMed PubMed Central Google Scholar
Rafnar, T. et al. Sequence variants at the TERT- CLPTM1L locus associate with many cancer types. Nat. Genet. 41, 221–227 (2009).
CAS PubMed PubMed Central Google Scholar
Cheng, I. et al. Pleiotropic effects of genetic risk variants for other cancers on colorectal cancer risk: PAGE, GECCO and CCFR consortia. Gut 63, 800–807 (2014).
CAS PubMed Google Scholar
Jones, C. C. et al. Cross-cancer pleiotropic associations with lung cancer risk in African Americans. Cancer Epidemiol. Biomark. Prev. 28, 715–723 (2019).
CAS Google Scholar
Hung, R. J. et al. Cross cancer genomic investigation of inflammation pathway for five common cancers: lung, ovary, prostate, breast, and colorectal cancer. J. Natl. Cancer Inst. 107, djv246 (2015).
Qian, D. C. et al. Identification of shared and unique susceptibility pathways among cancers of the lung, breast, and prostate from genome-wide association studies and tissue-specific protein interactions. Hum. Mol. Genet. 24, 7406–7420 (2015).
CAS PubMed PubMed Central Google Scholar
Fehringer, G. et al. Cross-cancer genome-wide analysis of lung, ovary, breast, prostate, and colorectal cancer reveals novel pleiotropic associations. Cancer Res. 76, 5103–5114 (2016).
CAS PubMed PubMed Central Google Scholar
Toth, R. et al. Genetic variants in epigenetic pathways and risks of multiple cancers in the GAME-ON consortium. Cancer Epidemiol. Prev. Biomark. 26, 816–825 (2017).
CAS Google Scholar
Karami, S. et al. Telomere structure and maintenance gene variants and risk of five cancer types. Int. J. Cancer 139, 2655–2670 (2016).
CAS PubMed PubMed Central Google Scholar
Broderick, P. et al. A genome-wide association study shows that common alleles of SMAD7 influence colorectal cancer risk. Nat. Genet. 39, 1315–1317 (2007).
CAS PubMed Google Scholar
Kote-Jarai, Z. et al. Identification of a novel prostate cancer susceptibility variant in the KLK3 gene transcript. Hum. Genet. 129, 687–694 (2011).
CAS PubMed PubMed Central Google Scholar
Parikh, H. et al. Fine mapping the KLK3 locus on chromosome 19q13.33 associated with prostate cancer susceptibility and PSA levels. Hum. Genet. 129, 675–685 (2011).
PubMed PubMed Central Google Scholar
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
CAS PubMed PubMed Central Google Scholar
Watanabe, K., Taskesen, E., van Bochoven, A. & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017).
ADS PubMed PubMed Central Google Scholar
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019).
CAS PubMed Google Scholar
Han, J. et al. Expression quantitative trait loci in long non-coding RNA PAX8-AS1 are associated with decreased risk of cervical cancer. Mol. Genet. Genomics 291, 1743–1748 (2016).
CAS PubMed Google Scholar
Kichaev, G. et al. Leveraging polygenic functional enrichment to improve GWAS power. Am. J. Hum. Genet. 104, 65–75 (2019).
CAS PubMed Google Scholar
Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).
ADS PubMed PubMed Central Google Scholar
Phelan, C. M. et al. Identification of 12 new susceptibility loci for different histotypes of epithelial ovarian cancer. Nat. Genet. 49, 680–691 (2017).
CAS PubMed PubMed Central Google Scholar
Graff, R. E. et al. Familial risk and heritability of colorectal cancer in the Nordic Twin Study of Cancer. Clin. Gastroenterol. Hepatol. 15, 1256–1264 (2017).
PubMed PubMed Central Google Scholar
Hemminki, K. & Chen, B. Familial risks in testicular cancer as aetiological clues. Int. J. Androl. 29, 205–210 (2006).
PubMed Google Scholar
Zhang, L. et al. Familial associations in testicular cancer with other cancers. Sci. Rep. 8, 10880 (2018).
ADS PubMed PubMed Central Google Scholar
Wang, S. S. et al. HLA Class I and II diversity contributes to the etiologic heterogeneity of non-Hodgkin lymphoma subtypes. Cancer Res. 78, 4086–4096 (2018).
CAS PubMed PubMed Central Google Scholar
Ferreiro-Iglesias, A. et al. Fine mapping of MHC region in lung cancer highlights independent susceptibility loci by ethnicity. Nat. Commun. 9, 3927 (2018).
ADS PubMed PubMed Central Google Scholar
Marty, R. et al. MHC-I genotype restricts the oncogenic mutational landscape. Cell 171, 1272–1283.e15 (2017).
CAS PubMed PubMed Central Google Scholar
Marty Pyke, R. et al. Evolutionary pressure against MHC vlass II binding cancer mutations. Cell 175, 416–428.e13 (2018).
PubMed Google Scholar
Grisanzio, C. & Freedman, M. L. Chromosome 8q24-associated cancers and MYC. Genes Cancer 1, 555–559 (2010).
CAS PubMed PubMed Central Google Scholar
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
CAS PubMed PubMed Central Google Scholar
Favia, A. et al. VEGF-induced neoangiogenesis is mediated by NAADP and two-pore channel-2-dependent Ca2+ signaling. Proc. Natl Acad. Sci. USA 111, E4706–E4715 (2014).
CAS PubMed PubMed Central Google Scholar
Sun, W. & Yue, J. TPC2 mediates autophagy progression and extracellular vesicle secretion in cancer cells. Exp. Cell Res. 370, 478–489 (2018).
CAS PubMed Google Scholar
Meng, H. et al. PACSIN 2 represses cellular migration through direct association with cyclin D1 but not its alternate splice form cyclin D1b. Cell Cycle 10, 73–81 (2011).
CAS PubMed PubMed Central Google Scholar
Zhernakova, D. V. et al. Identification of context-dependent expression quantitative trait loci in whole blood. Nat. Genet. 49, 139–145 (2017).
CAS PubMed Google Scholar
Okines, A. F. C. et al. Biomarker analysis in oesophagogastric cancer: results from the REAL3 and TransMAGIC trials. Eur. J. Cancer Oxf. Engl. 49, 2116–2125 (2013).
CAS Google Scholar
Barra, W. F. et al. GEJ cancers: gastric or esophageal tumors? searching for the answer according to molecular identity. Oncotarget 8, 104286–104294 (2017).
PubMed PubMed Central Google Scholar
Cancer Genome Atlas Research Network et al. Integrated genomic characterization of oesophageal carcinoma. Nature 541, 169–175 (2017).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
ADS CAS PubMed PubMed Central Google Scholar
Banda, Y. et al. Characterizing race/ethnicity and genetic ancestry for 100,000 subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) Cohort. Genetics 200, 1285–1295 (2015).
PubMed PubMed Central Google Scholar
Kvale, M. N. et al. Genotyping informatics and quality control for 100,000 subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. Genetics 200, 1051–1060 (2015).
PubMed PubMed Central Google Scholar
Site Recode ICD-O-3/WHO 2008 Definition. https://seer.cancer.gov/siterecode/icdo3_dwhoheme/index.html. Accessed 30, 2017.
Hoffmann, T. J. et al. Next generation genome-wide association tool: design and coverage of a high-throughput European-optimized SNP array. Genomics 98, 79–89 (2011).
CAS PubMed Google Scholar
Schumacher, F. R. et al. Association analyses of more than 140,000 men identify 63 new prostate cancer susceptibility loci. Nat. Genet. 50, 928–936 (2018).
CAS PubMed PubMed Central Google Scholar
Howlader, N. et al. SEER Cancer Statistics Review, 1975-2014, National Cancer Institute. Bethesda, MD, https://seer.cancer.gov/csr/1975_2014/, based on November 2016 SEER data submission, posted to the SEER web site, April 2017.
Bhattacharjee, S. et al. A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits. Am. J. Hum. Genet. 90, 821–835 (2012).
CAS PubMed PubMed Central Google Scholar
Roadmap Epigenomics Consortium. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
PubMed Central Google Scholar
Kanehisa, M., Sato, Y., Furumichi, M., Morishima, K. & Tanabe, M. New approach for understanding genome variations in KEGG. Nucleic Acids Res. 47, D590–D595 (2019).
CAS PubMed Google Scholar

Download references

Acknowledgements

This research was supported by the following National Institutes of Health grants: R01CA088164, R01CA201358, R25CA112355, K07CA188142, K24CA169004, and U01CA127298, and the UCSF Goldberg-Benioff Program in Cancer Translational Biology. The UK Biobank analyses were conducted using the UKB Resource under application number 14105. Support for participant enrollment, survey completion, and biospecimen collection for the RPGEH was provided by the Robert Wood Johnson Foundation, the Wayne and Gladys Valley Foundation, the Ellison Medical Foundation, and Kaiser Permanente national and regional community benefit programs. Genotyping of the GERA cohort was funded by a grant from the National Institute on Aging, the National Institute of Mental Health, and the NIH Common Fund (RC2 AG036607). We thank the Breast Cancer Association Consortium (BCAC) for breast cancer summary statistics (http://bcac.ccge.medschl.cam.ac.uk/bcacdata/oncoarray/gwas-icogs-and-oncoarray-summary-results/), the Ovarian Cancer Association Consortium (OCAC) for ovarian cancer summary statistics (http://ocac.ccge.medschl.cam.ac.uk/data-projects/results-lookup-by-region/), and the Prostate Cancer Association Group to Investigate Cancer Associated Alterations in the Genome (PRACTICAL) consortium for prostate cancer summary statistics (http://practical.icr.ac.uk/blog/?page_id=8164). We thank Drs. Jeremy Gollub and Anuradha Mittal at Thermo Fisher Scientific for providing information on the correlation between imputed and sequenced genotypes on the Axiom UKBiobank array.

Author information

These authors contributed equally: Sara R. Rashkin, Rebecca E. Graff.
These authors jointly supervised this work: Lori C. Sakoda, John S. Witte.

Authors and Affiliations

Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, USA
Sara R. Rashkin, Rebecca E. Graff, Linda Kachuri, Taylor B. Cavazos, Nima C. Emami, Joshua D. Hoffman, Travis J. Meyers, Thomas J. Hoffmann & John S. Witte
Division of Research, Kaiser Permanente Northern California, Oakland, CA, USA
Rebecca E. Graff, Khanh K. Thai, Stacey E. Alexeeff, Maruta A. Blatchins, Douglas A. Corley, Eric Jorgenson, Lawrence H. Kushi, Stephen K. Van Den Eeden, Laurel A. Habel, Thomas J. Hoffmann & Lori C. Sakoda
Program in Biological and Medical Informatics, University of California, San Francisco, San Francisco, CA, USA
Taylor B. Cavazos & Nima C. Emami
Department of Urology, University of California, San Francisco, San Francisco, CA, USA
Stephen K. Van Den Eeden & John S. Witte
Institute for Human Genetics, University of California, San Francisco, San Francisco, CA, USA
Elad Ziv, Thomas J. Hoffmann & John S. Witte
Department of Medicine, University of California, San Francisco, San Francisco, CA, USA
Elad Ziv
Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, CA, USA
Elad Ziv & John S. Witte

Authors

Sara R. Rashkin
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca E. Graff
View author publications
You can also search for this author in PubMed Google Scholar
Linda Kachuri
View author publications
You can also search for this author in PubMed Google Scholar
Khanh K. Thai
View author publications
You can also search for this author in PubMed Google Scholar
Stacey E. Alexeeff
View author publications
You can also search for this author in PubMed Google Scholar
Maruta A. Blatchins
View author publications
You can also search for this author in PubMed Google Scholar
Taylor B. Cavazos
View author publications
You can also search for this author in PubMed Google Scholar
Douglas A. Corley
View author publications
You can also search for this author in PubMed Google Scholar
Nima C. Emami
View author publications
You can also search for this author in PubMed Google Scholar
Joshua D. Hoffman
View author publications
You can also search for this author in PubMed Google Scholar
Eric Jorgenson
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence H. Kushi
View author publications
You can also search for this author in PubMed Google Scholar
Travis J. Meyers
View author publications
You can also search for this author in PubMed Google Scholar
Stephen K. Van Den Eeden
View author publications
You can also search for this author in PubMed Google Scholar
Elad Ziv
View author publications
You can also search for this author in PubMed Google Scholar
Laurel A. Habel
View author publications
You can also search for this author in PubMed Google Scholar
Thomas J. Hoffmann
View author publications
You can also search for this author in PubMed Google Scholar
Lori C. Sakoda
View author publications
You can also search for this author in PubMed Google Scholar
John S. Witte
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.R.R. and R.E.G. contributed by designing presented idea, conducting the analyses, and writing the manuscript. L.K. contributed to writing the manuscript. K.K.T. contributed by conducting analyses. S.E.A., M.A.B., T.B.C., D.A.C., N.C.E., J.D.H., E.J., L.H.K., T.J.M., S.K.V.D.E., E.Z., L.A.H., and T.J.H. aided in data acquisition, provided critical feedback, and helped shape the research, analysis, and manuscript. L.C.S. and J.S.W. contributed to study conception and design, supervised the project, and writing the manuscript.

Corresponding authors

Correspondence to Lori C. Sakoda or John S. Witte.

Ethics declarations

Competing interests

J.S.W. is a non-employee co-founder of Avail.bio and serves as an expert witness for Pfizer and Sanofi. All other authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks Angela Cox, Marza de Andrade, and the other, anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Descriptions of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Supplementary Data 7

Supplementary Data 8

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Rashkin, S.R., Graff, R.E., Kachuri, L. et al. Pan-cancer study detects genetic risk variants and shared genetic basis in two large cohorts. Nat Commun 11, 4423 (2020). https://doi.org/10.1038/s41467-020-18246-6

Download citation

Received: 02 January 2020
Accepted: 13 August 2020
Published: 04 September 2020
DOI: https://doi.org/10.1038/s41467-020-18246-6
Springer Nature Limited

This article is cited by

Breaking down causes, consequences, and mediating effects of telomere length variation on human health
- Samuel Moix
- Marie C Sadler
- Chiara Auwerx
Genome Biology (2024)
Relationships between nine neuropsychiatric disorders and cervical cancer: insights from genetics, causality and shared gene expression patterns
- Jie Li
- Jie Qi
- Xianghua Huang
BMC Women's Health (2024)
Association of glucose-lowering drug target and risk of gastrointestinal cancer: a mendelian randomization study
- Yi Yang
- Bo Chen
- Yi Wang
Cell & Bioscience (2024)
Integrating plasma protein-centric multi-omics to identify potential therapeutic targets for pancreatic cancer
- Siyu Zhou
- Baian Tao
- Ji Li
Journal of Translational Medicine (2024)
Association between inflammatory bowel disease and cancer risk: evidence triangulation from genetic correlation, Mendelian randomization, and colocalization analyses across East Asian and European populations
- Di Liu
- Meiling Cao
- Youxin Wang
BMC Medicine (2024)

Pan-cancer study detects genetic risk variants and shared genetic basis in two large cohorts

Abstract

Similar content being viewed by others

Introduction

Results

Genome-wide association analyses of individual cancers

Genome-wide heritability and genetic correlation

Locus-specific pleiotropy

Genome-wide variant-specific pleiotropy

Functional characterization of pleiotropic variants

Discussion

Methods

Study populations and phenotyping

Quality control

Genome-wide association analyses of individual cancers

Genome-wide heritability and genetic correlation

Locus-specific pleiotropy

Genome-wide variant-specific pleiotropy

Functional characterization of pleiotropic variants

Ethics

Reporting summary

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation