Abstract
Purpose
Few breast cancer risk assessment models account for the risk profiles of different tumor subtypes. This study evaluated whether a subtype-specific approach improves discrimination.
Methods
Among 3389 women who had a screening mammogram and were later diagnosed with invasive breast cancer we performed multinomial logistic regression with tumor subtype as the outcome and known breast cancer risk factors as predictors. Tumor subtypes were defined by expression of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) based on immunohistochemistry. Discrimination was assessed with the area under the receiver operating curve (AUC). Absolute risk of each subtype was estimated by proportioning Gail absolute risk estimates by the predicted probabilities for each subtype. We then compared risk factor distributions for women in the highest deciles of risk for each subtype.
Results
There were 3,073 ER/PR+ HER2 − , 340 ER/PR +HER2 + , 126 ER/PR−ER2+, and 300 triple-negative breast cancers (TNBC). Discrimination differed by subtype; ER/PR−HER2+ (AUC: 0.64, 95% CI 0.59, 0.69) and TNBC (AUC: 0.64, 95% CI 0.61, 0.68) had better discrimination than ER/PR+HER2+ (AUC: 0.61, 95% CI 0.58, 0.64). Compared to other subtypes, patients at high absolute risk of TNBC were younger, mostly Black, had no family history of breast cancer, and higher BMI. Those at high absolute risk of HER2+ cancers were younger and had lower BMI.
Conclusion
Our study provides proof of concept that stratifying risk prediction for breast cancer subtypes may enable identification of patients with unique profiles conferring increased risk for tumor subtypes.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Despite reductions in cancer mortality in the past several decades, breast cancer remains the second leading cause of cancer death among women in the U.S. [1]. Breast cancer has been classified into four main subtypes based on tumor molecular profiling [2]. In clinical practice, immunohistochemistry is typically used to classify tumors into subtypes defined by expression of estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2). Tumors expressing ER and PR respond to endocrine therapies, and tumors expressing HER2 are treated with the anti-HER2 antibody drug Trastuzumab. Tumors that do not express ER, PR, or HER2, termed triple-negative breast cancers (TNBCs), do not respond to endocrine therapy or HER2-targeted therapy. TNBCs tend to be more aggressive, more likely to recur, and have higher mortality than other breast cancer subtypes [3, 4]. Breast cancer subtypes also display etiologic heterogeneity, with some risk factors having different or opposite associations with TNBC versus hormone receptor-positive subtypes [5]. For example, while prior biopsy and atypical hyperplasia are strongly associated with ER/PR + HER2 − breast cancers and are included in several existing breast cancer risk prediction models [6,7,8,9], there appears to be no association between these factors and TNBC [5]. We previously demonstrated that existing breast cancer risk prediction models perform more poorly in predicting TNBC compared to other subtypes [10]. Lack of accounting for different risk profiles of breast cancer subtypes may contribute to existing breast cancer risk prediction models’ limited discriminatory accuracy. Risk models developed for specific breast cancer subtypes may improve discriminatory accuracy and would be useful to direct more intensive screening to women at highest risk for cancer, particularly poor prognosis TNBC, for whom early detection may help improve outcomes. The purpose of this study was to evaluate the ability to discriminate between breast cancer subtypes using known breast cancer risk factors among a large cohort of women undergoing mammography screening in the U.S.
Methods
We utilized a case-only design to evaluate differences in risk factors by tumor subtypes and derive the probabilities of each subtype given a breast cancer diagnosis. The absolute risk for each subtype is then obtained by proportioning the Breast Cancer Risk Assessment Tool (BCRAT) 5-years or lifetime risk according to these subtype probabilities. Methods for computing BCRAT absolute risk scores have been previously described [10].
The study population included a cohort of women ages 40–84 years old who had a screening mammogram at Massachusetts General Hospital (MGH), Newton Wellesley Hospital (NWH), or the University of Pennsylvania Health System (Penn) from 2006 to 2015 and who were diagnosed with invasive breast cancer at least 6 months after their initial mammogram. Patients completed a questionnaire at the time of the mammogram assessing reproductive history and family history. Body mass index (BMI) and breast density were determined from electronic health records (EHR). Missing BMI was supplemented with the closest available measurement from EHR within 1 year prior to or 6 months after the mammogram. Missing information on prior breast biopsy was also supplemented using EHR biopsy reports. Race/ethnicity was based on identity self-reported in EHR and categorized as Asian/Pacific Islander, Black/African American, Hispanic/Latino, White, or Other/Unknown, with the last category, including patients who reported other racial/ethnic groups or who had missing data on race/ethnicity in EHR. Breast cancer diagnoses were obtained from the Massachusetts Cancer Registry for MGH and NWH and from the state cancer registries of Pennsylvania, New Jersey, and Delaware for Penn. Patients from all three sites were also linked to each site’s institutional cancer registry. Cancer diagnoses from state cancer registries were available through 2016, while diagnoses from institutional registries were available through 2018.
For individuals with multiple mammograms over the study period, risk factors were extracted at the time of the earliest mammogram. Patients were included if they were diagnosed with invasive breast cancer at least 6 months after the mammogram, were not missing information on molecular subtype, and had not been diagnosed with breast cancer at any time prior to their initial mammogram. Individuals with breast implants, with a known BRCA1/2 mutation or who were missing breast density information were excluded. This resulted in a final study sample of 3839 individuals with invasive breast cancer. Tumor subtypes were classified into four mutually exclusive categories based on immunohistochemistry: (1) ER or PR positive and HER2 negative (ER/PR+HER2−), (2) ER or PR positive and HER2 positive (ER/PR+HER2+), (3) ER and PR negative and HER2 positive (ER&PR−HER2+), and (4) ER and PR and HER2 negative (triple-negative breast cancer, TNBC).
Multinomial logistic regression was used to estimate associations of risk factors with the four-level outcome breast cancer molecular subtype, using ER/PR+HER2− as the reference category. Age, race, prior breast biopsy, atypical hyperplasia, age at menarche, age at first live birth, family history of breast cancer, BMI, and breast density were assessed as predictors. Atypical hyperplasia was removed from the final model due to lack of statistical significance and low prevalence in our data. Furthermore, we tested interaction terms between BMI and menopause status, BMI and breast density, and BMI and race, but these were dropped from the final model as they were not statistically significant. Calibration plots were derived by plotting the proportion of each observed outcome against the predicted probabilities within deciles. Discrimination was assessed using the area under the receiver operating curve (AUC) for each subtype. AUCs were calculated using a one vs. rest approach, in which the predicted probabilities for each subtype were classified using a dummy variable taking the value of 1 if the patient had that subtype and 0 if they had any other subtype [11]. Multiple imputation using chained equations (MICE) was used to fill in additional missing data for BMI, age at menarche, and age at first live birth. Model estimates, predictions, and AUCs were pooled across 50 imputed datasets. Absolute risk estimates were generated by multiplying the 5-years and lifetime BCRAT absolute risk estimates by the predicted probability from the new model. Patients were stratified in deciles of absolute 5-years risk and characteristics of patients in the top decile of risk for each subtype were compared. Analyses were conducted using Stata 17 (College Station, TX).
Results
Table 1 shows the distribution of risk factors for the study population. Among 3839 invasive breast cancers, there were 3073 ER/PR+HER2− , 340 ER/PR+HER2+, 126 ER&PR−HER2+, and 300 TNBCs. Women with HER2+ disease were younger at diagnosis than both ER/PR+HER2− and TNBC and women with ER/PR−HER2+ disease were more likely to report a family history of breast cancer. A smaller proportion of women with TNBC had prior biopsies or atypical hyperplasia. Women with TNBC had earlier age at first birth and were more likely to be Black.
The multinomial regression model (Table 2) included age, race/ethnicity, atypical hyperplasia, number of biopsies, age at menarche, age at first live birth, first degree family history of breast cancer, BMI category, and breast density. In this model, age was associated with slightly lower likelihood of ER/PR + HER2+ , ER/PR−HER2+, and TNBC compared to the ER/PR + HER2 − eference group. Black women had a higher likelihood of ER&PR −HR2 + (RR: 1.85, 95% CI 1.04, 3.31) and TNBC (RRR: 3.64, 95% CI 2.65–5.0) compared with ER/PR+HER2 − . Prior breast biopsy was associated with a lower likelihood of TNBC (RRR: 0.71, 95% CI 0.51, 0.99).
Model calibration was evaluated by plotting the observed versus the expected within deciles of predicted probabilities of each subtype (Fig. 1). We observed good calibration for all three subtypes compared with ER/PR+HER2 − , and calibration was particularly good for TNBC. We observed moderately good discrimination by subtype, as measured by AUCs ranging from 0.61 to 0.64 (Table 3).
Table 4 compares characteristics of women in the uppermost deciles of risk for each breast cancer subtype. In contrast to other subtypes, most patients at high risk of TNBC were Black, had no known family history of breast cancer, had a BMI of over 25 kg/m2, and were more likely to have younger age at first live birth than patients with ER/PR+HER2 − disease. Women in the highest decile of TNBC risk had lower percentage of dense breasts (57%) compared to other subtypes (76–88%). Those at high risk of HER2 + cancers were on average younger than those at high risk of other cancers, and mostly had BMIs of less than 25 kg/m2. The median five-year BCRAT absolute risk estimate ranged from 1.73% among patients at highest risk for TNBC to 3.45% for those at highest risk for ER/PR+ HER2− breast cancer. Absolute risk estimates generated by multiplying the 5-years and lifetime BCRAT absolute risk estimates by predicted probability from the new model are displayed for various strata of risk factors in Online Resources 1 and 2.
Discussion
In this study we explored the feasibility of a breast cancer subtype-specific risk prediction model. Using a two-stage approach, we first used a case-only multinomial model that was then used to proportion absolute risk estimates obtained from the BCRAT model. The model, which utilized established breast cancer risk factors, showed good discrimination between breast cancer subtypes and was well calibrated. Patients with high risk for TNBC were more likely to be Black and were considerably less likely to have had a prior biopsy than patients with ER/PR+ HER2− breast cancers. Our results suggest that estimating risk for specific breast cancer subtypes, specifically TNBC, may better identify women at highest risk for this poor prognosis cancer than existing breast cancer risk prediction models.
While numerous studies have identified distinct risk factor profiles between breast cancer subtypes [5, 12], to our knowledge there are no validated breast cancer subtype-specific risk models that incorporate ER, PR, and HER2 status. Our prior work found that AUCs from the BCRAT, BCSC, BRCAPRO, and BRCAPRO + BCRAT models were all considerably lower for HER2 + and triple-negative breast cancers (AUC range 0.513–0.585) than for ER/PR+HER2 − disease (AUC range 0.605–0.629) [10]. The Rosner–Colditz breast cancer incidence model provides prediction of subtypes based on ER and PR only, with poorer performance for ER/PR − (AUC = 0.598) disease than ER/PR + (AUC = 0.629) disease [13]. Incorporating adolescent body size, vegetable intake, and breastfeeding duration into the model improved accuracy for ER/PR− (AUC = 0.630) [14], yet these factors tend not to be assessed clinically or recorded in electronic health records, making their use in clinical risk prediction models difficult. A model developed and validated among Black women in the U.S. that included age, family history, age at menarche, parity, breastfeeding, oral contraceptives use, oophorectomy, and breast biopsy achieved an AUC = 0.57 for ER − disease [15].
Our result that patients with TNBC were less likely to have had a prior biopsy than patients with hormone receptor-positive disease is consistent with prior studies demonstrating that prior biopsy and/or benign breast disease is associated with estrogen receptor-positive disease and not TNBC [5, 16,17,18,19]. This is believed to reflect different etiologic pathways between TNBC and other breast cancer subtypes [19, 20], as well as due to TNBC bearing benign appearing imaging characteristics on mammography, aggressive growth of TNBCs, and younger age at diagnosis, which may lead to lower likelihood of patients having biopsies prior to diagnosis [21]. Therefore, while presence of a prior biopsy significantly increases risk of estrogen receptor-positive disease, having had a prior biopsy is less relevant for risk of a future TNBC.
Our results provide proof of concept that a subtype-specific risk model may prove useful. We observed stark differences in characteristics of patients in the highest decile of risk for each subtype. Most strikingly, 75% of women at highest risk for TNBC were Black, compared with only 2% of women at highest risk for ER/PR+HER2 + and only 20% of women at highest risk for TNBC had a prior breast biopsy, compared to 62% of ER/PR+HER2 + and 41% of ER/PR+HER2− patients. Furthermore, the median BCRAT 5-year risk score was only 1.73% compared with 3.45% among women at highest risk for ER/PR+HER2− This result highlights that women at high risk for TNBC are poorly identified by existing risk models. Given that TNBC has a younger age at onset than ER/PR+HER2 − disease and the fact that it is more likely to be diagnosed as an interval cancer [5] [22, 23], knowledge of elevated risk for TNBC might be used to begin screening earlier, screening more frequently, and/or incorporating supplemental screening with breast ultrasound or breast MRI.
The main limitation of our analysis is the lack of external validation of our model. Given that most invasive breast cancers are ER/PR+ HER2 −, developing and validating a subtype-specific risk model has proven challenging due to the need for very large prospective samples to achieve a reasonable number of TNBCs for model training and validation. Given this reality, the two-stage approach that leverages existing absolute risk models allows us to bypass the need for a prospective cohort, however, this approach makes it challenging to accommodate risk factors that are not present in the overall risk model. In addition, we lacked information on risk factors that have been shown to be differentially associated with triple-negative breast cancer, such as breastfeeding, oral contraceptive use, or early-life adiposity. In addition, an ER − specific polygenic risk score has been developed and validated, yet we lacked genetics data in this sample. Adding these important risk factors will be key to developing the most accurate subtype-specific risk model. Additional limitations include inability to evaluate temporal changes in characterization of subtypes by immunohistochemistry and lack of state cancer registry data in later years of the study.
In summary, our work suggests the potential utility of a subtype-specific approach to improve breast cancer risk prediction, particularly for women at elevated risk for TNBC. Given the promising findings, future studies incorporating additional risk factors and validating in large screening cohorts should commence.
Data availability
Deidentified data underlying this article will be shared upon reasonable request to the corresponding author.
References
Siegel RL, Miller KD, Fuchs HE, Jemal A (2021) Cancer statistics, 2021. CA Cancer J Clin 71(1):7–33
Koboldt DC, Fulton RS, McLellan MD, Schmidt H, Kalicki-Veizer J, McMichael JF et al (2012) Comprehensive molecular portraits of human breast tumours. Nature 490(7418):61–70
Haque R, Ahmed SA, Inzhakova G, Shi J, Avila C, Polikoff J et al (2012) Impact of breast cancer subtypes and treatment on survival: an analysis spanning two decades. Cancer Epidemiol Biomark Prev 21(10):1848–1855
Parise CA, Caggiano V (2014) Breast cancer survival defined by the ER/PR/HER2 subtypes and a surrogate classification according to tumor grade and immunohistochemical biomarkers. J Cancer Epidemiol 2014:469251
McCarthy AM, Friebel-Klingner T, Ehsan S, He W, Welch M, Chen J et al (2021) Relationship of established risk factors with breast cancer subtypes. Cancer Med 10(18):6456–6467
Gail MH, Brinton LA, Byar DP, Corle DK, Green SB, Schairer C et al (1989) Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst 81(24):1879–1886
Gail MH, Costantino JP, Pee D, Bondy M, Newman L, Selvan M et al (2007) Projecting individualized absolute invasive breast cancer risk in African American women. J Natl Cancer Inst 99(23):1782–1792
Tice JA, Cummings SR, Smith-Bindman R, Ichikawa L, Barlow WE, Kerlikowske K (2008) Using clinical factors and mammographic breast density to estimate breast cancer risk: development and validation of a new predictive model. Ann Intern Med 148(5):337–347
Tyrer J, Duffy SW, Cuzick J (2004) A breast cancer prediction model incorporating familial and personal risk factors. Stat Med 23(7):1111–1130
McCarthy AM, Liu Y, Ehsan S, Guan Z, Liang J, Huang T et al (2021) Validation of breast cancer risk models by race/ethnicity family history and molecular subtypes. Cancers. https://doi.org/10.3390/cancers14010045
Biesheuvel CJ, Vergouwe Y, Steyerberg EW, Grobbee DE, Moons KGM (2008) Polytomous logistic regression analysis could be applied more often in diagnostic research. J Clin Epidemiol 61(2):125–134
Jung AY, Ahearn TU, Behrens S, Middha P, Bolla MK, Wang Q et al (2022) Distinct reproductive risk profiles for intrinsic-like breast cancer subtypes: pooled analysis of population-based studies. J Natl Cancer Inst 114(12):1706–1719
Glynn RJ, Colditz GA, Tamimi RM, Chen WY, Hankinson SE, Willett WW et al (2017) Extensions of the Rosner–Colditz breast cancer prediction model to include older women and type-specific predicted risk. Breast Cancer Res Treat 165(1):215–223
Rice MS, Tworoger SS, Hankinson SE, Tamimi RM, Eliassen AH, Willett WC et al (2017) Breast cancer risk prediction: an update to the Rosner–Colditz breast cancer incidence model. Breast Cancer Res Treat 166(1):227–240
Palmer JR, Zirpoli G, Bertrand KA, Battaglia T, Bernstein L, Ambrosone CB et al (2021) A validated risk prediction model for breast cancer in US Black women. J Clin Oncol 39(34):3866–3877
Gaudet MM, Press MF, Haile RW, Lynch CF, Glaser SL, Schildkraut J et al (2011) Risk factors by molecular subtypes of breast cancer across a population-based study of women 56 years or younger. Breast Cancer Res Treat 130(2):587–597
Mitro SD, Ali-Fehmi R, Bandyopadhyay S, Alosh B, Albashiti B, Radisky DC et al (2014) Clinical characteristics of breast cancers in African-American women with benign breast disease: a comparison to the surveillance, epidemiology, and end results program. Breast J 20(6):571–577
Visscher DW, Frost MH, Hartmann LC, Frank RD, Vierkant RA, McCullough AE et al (2016) Clinicopathologic features of breast cancers that develop in women with previous benign breast disease. Cancer 122(3):378–385
Newman LA, Stark A, Chitale D, Pepe M, Longton G, Worsham MJ et al (2017) Association between benign breast disease in African American and White American women and subsequent triple-negative breast cancer. JAMA Oncol 3(8):1102–1106
Newman LA, Reis-Filho JS, Morrow M, Carey LA, King TA (2015) The 2014 Society of Surgical Oncology Susan G Komen for the cure symposium: triple-negative breast cancer. Ann Surg Oncol 22(3):874–882
Adrada BE, Moseley TW, Kapoor MM, Scoggins ME, Patel MM, Perez F et al (2023) Triple-negative breast cancer: histopathologic features, genomics, and treatment. Radiographics 43(10):e230034
O’Brien KM, Mooney T, Fitzpatrick P, Sharp L (2018) Screening status, tumour subtype, and breast cancer survival: a national population-based analysis. Breast Cancer Res Treat 172(1):133–142
Niraula S, Biswanger N, Hu P, Lambert P, Decker K (2020) Incidence, characteristics, and outcomes of interval breast cancers compared with screening-detected breast cancers. JAMA Netw Open 3(9):e2018179
Acknowledgements
The funder did not play a role in the design of the study; the collection, analysis, and interpretation of the data; the writing of the manuscript; and the decision to submit the manuscript for publication.
Funding
This work was supported by the American Cancer Society—131052-MRSG-17–144-01-CCE.
Author information
Authors and Affiliations
Contributions
Conceptualization: AM and KA; Data Curation: AM and SE; Formal Analysis: AM and SE; Funding Acquisition: AM; Investigation: AM; Methodology: AM, SE, and JC; Project Administration: AM and SE; Resources: AM, KH, CL, EC, DK, KA, and JC; Software: Not applicable; Supervision: AM; Validation: AM and JC; Visualization: SE; Writing – original draft: AM and SE; Writing – review & editing: all.
Corresponding author
Ethics declarations
Competing interests
Author C. Lehman: Institutional Grants/Research support from Breast Cancer Research Foundation, National Cancer Institute, GE Healthcare, Inc. and Co-founder of Clairity, Inc.
Ethical approval
This study was reviewed and approved by the Institutional Review Boards of the University of Pennsylvania and Massachusetts General Hospital.
Consent to participate
The need for informed consent was waived for this observational study given that there was no contact with study participants.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
McCarthy, A.M., Ehsan, S., Hughes, K.S. et al. Feasibility of risk assessment for breast cancer molecular subtypes. Breast Cancer Res Treat (2024). https://doi.org/10.1007/s10549-024-07404-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10549-024-07404-9