Introduction

Breast cancer is known to have both a genetic and non-genetic etiology. Several common genetic susceptibility variants have recently been identified, predominantly by genome-wide association studies (GWAS). These include single nucleotide polymorphisms (SNPs) at loci containing the genes FGFR2, LSP1, MAP3K1, TOX3, MRPS30, COX 11, SLC4A7, and at chromosomes 8p24 and 2q35 [15]. To date, the only SNP associated with breast cancer risk with genome-wide statistical significance (P < 10-7) coming from candidate gene approaches is CASP8 [6]; more equivocal evidence has been reported for SNPs in TGFB1 [6] and ESR1 [7], among others.

It is important to determine how these common SNPs combine with other known risk factors such as age at menarche, parity, age at first birth and body mass index (BMI) [8, 9] to influence breast cancer risk because this knowledge could be used to improve risk prediction models [10, 11]. The identification of modification of SNP associations by other risk factors could also provide insight into the biological mechanisms by which genetic variants are implicated in breast cancer etiology. Many of these SNPs and other risk factors have been observed to be differentially associated with estrogen receptor (ER)-positive and ER-negative disease [1, 4, 5, 7, 12, 13] and so interactions between them may also differ by disease subtype.

We, therefore, aimed to assess effect modification for 12 SNPs, 10 of which have been clearly associated with breast cancer risk (10q26-rs298158 (FGFR2), 8q24-rs13281615, 11p15-rs3817198 (LSP1), 5q11-rs889312 (MAP3K1), 16q12-rs2803662 (TOX3), 2q35-rs13387042, 5p12-rs10941679, 17q23-rs6504950, 3p24-rs4973768 and CASP8-rs17468277) and two for which there is less clear evidence of a main effect (TGFB1-rs1982073 and ESR1-rs3020314). The potential effect modifiers considered were age at menarche, ever having had a live birth, number of live births, age at first birth and BMI. A secondary aim was to evaluate these interactions in susceptibility to breast cancer subtypes defined by ER and progesterone receptor (PR) status. Data for white women of European ancestry were combined from 21 case-control studies participating in the Breast Cancer Association Consortium (BCAC).

Materials and methods

A description of the 21 case-control studies participating in this pooled BCAC analysis is provided in Table 1, with more detailed information given in Additional Data Table S1 in Additional file 1. These included 11 population-based studies and seven studies with at least 1,000 cases and 1,000 controls. All studies collected self-reported information for cases and controls on age at diagnosis (cases) or interview (controls), racial/ethnic group (white European, Asian or other) and at least one of the following: age at menarche, ever having had a live birth, number of live births, age at first live birth (if parous), BMI (or height and weight). The time-point at which these variables were assessed for each study is detailed in Additional Data Table S1 in Additional file 1. Additional risk and other lifestyle factor information were not available at the time of the present analysis. All studies used structured questionnaires to collect these data, with the exception of the CNIO-BCS and the LMBC study, for which the information was abstracted from medical records. Nineteen studies also provided information on the ER and PR status of the tumors for a subset of cases. This information was mostly abstracted from medical records. Subjects who reported being of ethnicities other than white European were excluded, as were cases with non-invasive disease. All study participants gave written informed consent and each study was approved by the relevant local institutional review board(s).

Table 1 List of participating studies and number of subjects included in at least one analysis

Genotyping methods have been previously described [1, 6, 7, 12, 14]. Briefly, five studies (ABCFS, GENICA, kConFab/AOCS, MARIE and SASBAC) used Sequenom's MassARRAY system and iPLEX technology (Sequenom, San Diego, CA, USA) for most SNPs. All other genotyping was done using Taqman® Assays-by-DesignSM (Applied Biosystems, Foster City, CA, USA). SNP CASP8- rs17468277 is in complete linkage disequilibrium with CASP8-rs1045485, which has previously been reported to be associated with breast cancer [6]. All studies included at least one blank well (containing no DNA) per 384-well assay plate, at least 2% of samples in duplicate, and a common set of 93 samples from the Centre d'Etude Polymorphisme Humain (CEPH) used by the HapMap Consortium (HAPMAPPT01, Coriell Institute for Medical Research, Camden, NJ, USA). Genotyping call rates and duplicate concordance rates were calculated after excluding samples that had previously repeatedly failed; all were greater than 95%. Concordance with CEPH genotypes was greater than 98%.

Statistical methods

Overall genetic associations were evaluated for each of the 12 SNPs by estimating odds ratios (ORs) and their 95% confidence intervals (CI) via logistic regression, assuming multiplicative per-allele effects for the risk allele, as first reported in the literature (see Table 2). Main effects of risk factors were assessed only in the 11 population-based studies using logistic regression, adjusted for age (categorical: ≤34, 35 to 39, 40 to 44, 45 to 49, 50 to 54, 55 to 59, 60 to 64, 65 to 69, 70 to 74, ≥75 years; and continuous, the latter to account for differences between cases and controls in the extreme age-groups) and study (categorical). Risk factors considered were age at menarche (categorical: ≤11, 12, 13, 14, ≥15 years; and continuous), ever having had a live birth (no, yes), number of live births (parous women only, categorical: 1, 2, 3, ≥4; and continuous), age at first birth (parous women only, categorical: ≤19, 20 to 24, 25 to 29, ≥30 years; and continuous) and BMI, defined as weight in kilograms divided by the square of height in meters (categorical: ≤24.99, 25.00 to 29.99, ≥30.00; and continuous). Since BMI is known to be positively associated with breast cancer risk in postmenopausal women, but inversely associated with risk in premenopausal women [9], we analyzed the interactions with BMI separately for women aged <55 years and ≥55 years, considering these as a surrogates for pre- and post-menopausal status, respectively. Results from analyses using a younger age limit (50 years) to determine surrogate categories for premenopausal status were similar and are therefore not presented. Estimates of per-allele ORs for SNPs stratified by risk factors (for the categories defined above) were obtained using a single logistic regression model including appropriate dummy variables, in addition to those for the main effects of the risk factor categories.

Table 2 Estimated per-allele odds ratios and 95% confidence intervals for 12 SNPs, by availability of non-genetic risk factor information*

Interaction, or modification of genetic associations by other risk factors, was assessed for each SNP/risk factor combination by fitting logistic regression models. Each model included dummy variables for study plus three parameters, one for the main per-risk-allele effect, one for the main risk factor effect (all modeled as continuous variables, except ever having had a live birth) and a single interaction term for the product of the number of risk alleles and the value of the risk factor. This was tested statistically by a likelihood ratio test comparing this model to that without the interaction term. Effect modification by BMI was assessed separately for women <55 and ≥55 years of age.

In addition, a parametric bootstrap test was used to estimate interaction P-values adjusted for multiple testing [15]. For each of the 72 interactions tested, we estimated the probability of being a case for each subject under the null hypothesis of no interaction, by applying the logistic regression model including only main effects for study (categorical), SNP (per-allele) and risk factor (continuous, except ever having had a live birth). Each replicate of the parametric bootstrap consisted of, for each interaction tested: (i) generating a dummy case-control status for each subject by sampling from a binomial distribution based on the estimated probability of being a case (by generating a single random number from the uniform distribution and assigning "case" to subjects for which this was less than the probability of being a case and "control" otherwise); and (ii) based on this dummy case-control status and the actual data for all other variables, fitting the interaction model described above and noting the likelihood ratio test P-value for the comparison of this model to the main effects only model applied to the same data. The minimum P-value was recorded for each of 10,000 replicates and the adjusted P-values were estimated as the proportion of replication P-values less than the corresponding unadjusted P-value. Results rounded to two decimal places were identical to those obtained using a standard non-parametric permutation test [15].

All statistical analyses were carried out using Stata: Release 10 (Statacorp, College Station, TX, USA) [16] with the exception of power calculations which were done using Quanto (University of Southern California, Los Angeles, CA, USA) [17, 18].

Results

The 21 participating studies contributed 26,349 cases and 32,208 controls of self-reported white European race/ethnicity, all with available data for at least one of the 12 SNPs considered and at least one of the other risk factors considered (minimal data). Of these, 17,603 cases from 18 studies (all except BBCS, MCCS and USRT) were interviewed within two years after their breast cancer diagnosis and 29,187 controls came from the same 18 studies. Forty-six percent of cases and 38% of controls were under age 55 years at diagnosis and interview, respectively. ER and PR status was known for 19,561 and 16,962 cases, respectively. Details by study are provided in Table 1. In total, 12,822 cases and 19,703 controls with minimal data were included from 11 population-based studies and 16,107 cases and 23,140 controls with minimal data were included from seven studies with at least 1,000 cases and 1,000 controls.

When analyses were restricted to population-based studies, the expected associations with breast cancer were observed for the risk factors, with one exception. After adjustment for age and study, each one-year increase in age at menarche was associated with a 4% (95% CI = 2 to 5%) decrease in breast cancer risk, and being parous was associated with a 16% (95% CI = 10 to 22%) decreased risk. For parous women, each additional live birth was associated with an 11% (95% CI = 8 to 13%) decrease in risk, while each five-year increment in age at first birth was associated with a 7% (95% CI = 4 to 10%) increase in risk. Obesity (BMI ≥ 30.0 kg/m2) was associated with a 20% (95% CI = 10 to 29%) lower risk of breast cancer for women under age 55 years. The one unexpected observation was that obesity was not associated with breast cancer risk in women aged 55 years and older (OR = 0.96, 95% CI 0.88 to 1.04).

Table 2 provides estimated per-allele ORs and their 95% CIs for the 12 SNPs considered, for all included subjects with genotype data, and for the subsets of women with information available for each of the four risk factors considered. All ORs were adjusted for study, and each subset was adjusted for study, age and the relevant risk factor. The OR estimates in the overall and subset analyses were very similar, and provide no evidence of confounding by the risk factors, nor of bias in OR estimates related to data availability.

For the vast majority of SNP/risk factor combinations, there was no evidence that the per-allele OR for the SNP varied by category of the risk factor. This was true for analyses based on data from all studies (Additional Data Table S2 in Additional file 1), for analyses based on population-based studies only (Additional Data Table S3 in Additional file 1) and for analyses based on the seven studies with at least 1,000 cases and 1,000 controls (Additional Data Table S4 in Additional file 1). Restricting analyses to the 18 studies with cases interviewed within two years after their breast cancer diagnosis made no substantial difference to the results obtained (data not shown). Similarly null results were observed for analyses restricted to ER-positive and ER-negative breast cancer (Additional Data tables S5 and S6 in Additional file 1) and for analyses restricted to PR-positive and PR-negative breast cancer (Additional Data Table S7 and S8 in Additional file 1).

The strongest evidence of interaction (unadjusted P = 0.002) was for the modification of the association with 11p15-rs3817198 (LSP1) by number of live births. Per-allele OR estimates increased from 1.04 (95% CI = 0.97 to 1.11) for women who had had just one live birth to 1.24 (95% CI = 1.11 to 1.38) for women with at least four live births, and an interaction OR of 1.05 per live birth and per allele was estimated. This trend was also observed when data from only population-based studies and from only studies with at least 1,000 cases and 1,000 controls were considered (P = 0.01 in both sub-analyses). Evidence for this interaction was observed when the analysis was restricted to ER-positive and PR-positive disease (P = 0.004 and P = 0.01, respectively; Figure 1), but not for analyses based on ER-negative and PR-negative cases (P = 0.3 and 0.06, respectively). However, considering that 72 tests for interaction were carried out, chance cannot be excluded as an explanation for these results. The multiple-test-adjusted P-value for the modification of the 11p15-rs3817198 association by number of live births was 0.12. The adjusted p-values for all other interactions tested were all ≥0.61.

Figure 1
figure 1

Per-allele OR estimates for 11p15-rs3817198 ( LSP1 ) stratified by number of live births ( parous women only). For breast cancer disease subtypes defined by estrogen receptor (ER) and progesterone receptor (PR) status. The size of the box is inversely proportional to the standard error of the log OR estimate.

Post-hoc power calculations estimated that for age at menarche (per year), parity (per live birth) and age at first birth (per five-year age increase), our study had 90% power at a significance level of 0.0007 (corresponding to a multiple-testing-adjusted P-value of 0.05) to detect interaction ORs of at least 1.06 for all loci tested except CASP8-rs17468277, for which the minimum was 1.08. For BMI (per five-unit increase) the minimum interaction OR detectable with 90% power in both age strata (<55 and ≥55) was 1.08 for the more common variants and 1.10 for CASP8-rs17468277. For parity, considered as never or ever having had a live birth, the study had similar power to detect interaction ORs of at least 1.20 for CASP8-rs17468277 and 1.16 for the remaining loci.

Discussion

This combined analysis of more than 25,000 cases and 30,000 controls found no conclusive evidence that age at menarche, parity, age at first birth or BMI modify the established associations of breast cancer risk with 10q26-rs298158 (FGFR2), 8q24-rs13281615, 11p15-rs3817198 (LSP1), 5q11-rs889312 (MAP3K1], 16q12-rs2803662 (TOX3), 2q35-rs13387042, 5p12-rs10941679, 17q23-rs6504950, 3p24-rs4973768 and CASP8-rs17468277) nor the putative associations with TGFB1-rs1982073 or ESR1-rs3020314. This was also true for disease subtypes defined by ER and PR status.

The strongest evidence of effect modification was for number of live births and 11p15-rs3817198 (LSP1). However, the observed trend of increasing relative risk with increasing parity was not statistically significant after correction for multiple testing. It should be noted that the interaction OR was 1.05 per allele and per live birth. This corresponds to an estimated per-allele OR increasing from 1.04 for women with one child to 1.24 for women with four or more children, for a SNP with an estimated average OR of 1.08 across all levels of parity. Such weak interactions would only result in very small differences in estimates of joint effects relative to those from models assuming multiplicative effects. This finding in this very large study highlights the difficulty of identifying modifying effects of this magnitude.

A recent study by Travis et al. of 7,610 cases and 10,196 controls reported null results for interactions in breast cancer susceptibility between 9 of the same genetic loci and 10 risk factors, including age at menarche, parity, age at first birth and BMI [19]. Our null findings replicate the results from this prospective study of older women (over age 50 years), but in a study with more than twice the sample size in this age group, and confirm that they are also applicable to women under age 50 years. Our study also extends the genetic loci evaluated for interactions with a subset of established breast cancer risk factors to include 17q23-rs6504950 and 3p24-rs4973768 [1] and ESR1-rs3020314 [7], which were not considered by Travis et al. [19]. Furthermore, with regard to the susceptibility locus at 5p12, we considered the more strongly associated SNP rs10941679 rather than rs981782 (which is only weakly correlated with rs10941679) [5]. Of note, Travis et al. found no evidence of interaction between 11p15-rs3817198 (LSP1) and number of children (P = 0.9) [19].

One of the strengths of the BCAC is the large combined sample size achieved through international collaboration. This has proven to be very effective in confirming or ruling out association with breast cancer for common SNPs identified through GWAS and candidate gene studies [1, 2, 6, 14, 20, 21]. The BCAC has also been able to provide highly precise estimates of the ORs associated with susceptibility alleles, with very high consistency observed between the many studies that participate in the consortium, despite the range of study designs represented. The inclusion of multiple studies that recruited selected cases and/or volunteer controls means that the main effects for some risk factors cannot be appropriately evaluated across the whole consortium. However, this potential selection bias in estimating main effects should not influence the assessment of interactions [22]. Nevertheless, we carried out sensitivity analyses considering only data from population-based studies and only data from studies with at least 1,000 cases and 1,000 controls and observed no substantial change in the results obtained regarding interactions. We also performed analyses of the full dataset, allowing for between-study heterogeneity in the main effects for the risk factors by including interaction terms for each, and similarly observed that this did not influence the results obtained (data not shown).

A potential limitation of our study derives from heterogeneity in data collection methods across studies. All studies except two (neither population-based) used structured questionnaires administered by a variety of means, including in-person interviews, phone-interviews and self-administration. Nevertheless, the measurement of age at menarche, ever having had a live birth, number of live births and age at first birth seem likely to be robust to these differences in data collection method. Our results for BMI may be more likely to be affected by heterogeneity in data collection methods, although standardized measurement within studies and adjustment for study as a covariate should limit this to a loss of power, rather than any systematic bias. We repeated our primary analyses excluding cases interviewed before, or more than two years after, their breast cancer diagnosis and results were not substantially different. This suggests that between-study differences in the reference time at which BMI was reported did not influence the inference from our study. A further limitation of our study was that we did not collate information on hormone therapy (HT) use from the majority of participating studies and so were unable evaluate interactions between SNPs and BMI by HT use in older women. This requires further investigation because HT has been observed to modify the effect of obesity on post-menopausal breast cancer risk [23]. Since menopausal status was not assessed and/or derived uniformly across all studies, we used age as a surrogate to more appropriately stratify analyses of effect modification by BMI. Finally, the present study had limited statistical power to detect interactions in susceptibility to ER-negative and PR-negative disease.

Conclusions

In summary, in the largest collaborative analyses of gene-environment interactions carried out to date, we have observed no conclusive evidence for modification of the per-allele relative risk associated with common breast cancer susceptibility variants by age at menarche, parity, age at first birth or BMI. This finding is consistent with those from a recently published smaller prospective study. These results imply that the combined effects of these common susceptibility alleles and other established risk factors can be assumed to multiplicative in risk predicted models for breast cancer.