Comprehensive multi-omics analysis of breast cancer reveals distinct long-term prognostic subtypes

Sharma, Abhibhav; Debik, Julia; Naume, Bjørn; Ohnstad, Hege Oma; Bathen, Tone F.; Giskeødegård, Guro F.

doi:10.1038/s41389-024-00521-6

Comprehensive multi-omics analysis of breast cancer reveals distinct long-term prognostic subtypes

Article
Open access
Published: 13 June 2024

Volume 13, article number 22, (2024)
Cite this article

Download PDF

You have full access to this open access article

Oncogenesis

Comprehensive multi-omics analysis of breast cancer reveals distinct long-term prognostic subtypes

Download PDF

Abhibhav Sharma ORCID: orcid.org/0000-0001-5513-6210¹,
Julia Debik ORCID: orcid.org/0000-0002-2765-8043^1,2,
Bjørn Naume^3,4,
Hege Oma Ohnstad³,
Oslo Breast Cancer Consortium (OSBREAC),
Tone F. Bathen² &
…
Guro F. Giskeødegård¹

1887 Accesses
Explore all metrics

Abstract

Breast cancer (BC) is a leading cause of cancer-related death worldwide. The diverse nature and heterogeneous biology of BC pose challenges for survival prediction, as patients with similar diagnoses often respond differently to treatment. Clinically relevant BC intrinsic subtypes have been established through gene expression profiling and are implemented in the clinic. While these intrinsic subtypes show a significant association with clinical outcomes, their long-term survival prediction beyond 5 years often deviates from expected clinical outcomes. This study aimed to identify naturally occurring long-term prognostic subgroups of BC based on an integrated multi-omics analysis. This study incorporates a clinical cohort of 335 untreated BC patients from the Oslo2 study with long-term follow-up (>12 years). Multi-Omics Factor Analysis (MOFA+) was employed to integrate transcriptomic, proteomic, and metabolomic data obtained from the tumor tissues. Our analysis revealed three prominent multi-omics clusters of BC patients with significantly different long-term prognoses (p = 0.005). The multi-omics clusters were validated in two independent large cohorts, METABRIC and TCGA. Importantly, a lack of prognostic association to long-term follow-up above 12 years in the previously established intrinsic subtypes was shown for these cohorts. Through a systems-biology approach, we identified varying enrichment levels of cell-cycle and immune-related pathways among the prognostic clusters. Integrated multi-omics analysis of BC revealed three distinct clusters with unique clinical and biological characteristics. Notably, these multi-omics clusters displayed robust associations with long-term survival, outperforming the established intrinsic subtypes.

Breast cancer in the era of integrating “Omics” approaches

Article Open access 14 April 2022

Omics of Chemoresistant and Metastatic Breast Cancer

Comprehensive profiling of biological processes reveals two major prognostic subtypes in breast cancer

Article 07 October 2015

Background

With an estimated 2.3 million new cases worldwide, female breast cancer (BC) became the most diagnosed cancer type among women globally in 2020, accounting for 11.7% of all cancer cases [1]. Despite the recent decline in death rates, BC is still a leading cause of cancer death [1]. BC’s diverse nature and heterogeneous biology impose challenges in assessing prognosis as patients with similar clinical subtypes often experience different responses to treatment [2]. These challenges engendered many studies to probe BC at the molecular level in aspiration to reveal new potential treatment targets and clinically relevant prognostic clusters. Perou et al. characterized BC based on the gene expression variations in human breast tumors, revealing the prominent subtypes basal-like, luminal, Erb-B2+ (Her2 enriched), and normal-like- with distinctive molecular portraits [3]. The luminal subtype was subsequently divided into more refined groups -luminal A, B and C [4]. These molecular subtypes showed a strong association with clinical outcomes; patients with basal-like tumors had the worst prognosis while patients with the luminal A (LumA) subtype had the best prognosis [3,4,5]. These BC subtypes were later classified by gene expression of 50 discriminatory genes called prediction analysis of microarrays 50 (PAM50) and are now implemented clinically to classify BC intrinsic subtypes [6].

Studies have shown heterogeneity within the PAM50 transcriptomic-based intrinsic subtypes, suggesting refined subclusters within the subtypes that better explain patient survival [7, 8]. Although luminal-like subtypes generally have a good prognosis, the LumA subtype itself exhibits significant biological and clinical heterogeneity [9]. The prognostic value of the PAM50 intrinsic subtype for long-term survival above 10 years has shown varying results. For instance, studies suggest that luminal B (LumB) exhibits a poorer long-term prognosis than basal, particularly evident after a 10-year follow-up [10, 11]. Characterizing BC in a more comprehensive biological manner may improve patient stratification, the assessment of long-term prognosis, and optimal treatment. Investigating the mRNA expression alone for the characterization of BC may not capture the full picture as several studies report low correlations between the expression level of mRNA and tumor-related proteins [12,13,14]. For instance, in a study quantifying 52 BC-related proteins, only 22 proteins were reported to correlate with the level of transcripts [14]. The observed limited protein-mRNA correlations deliver substantial rationale to incorporate proteins in the BC characterization. As a result, six clinically relevant BC subtypes have already been described using reverse phase protein arrays (RPPA) [15]. Among these subtypes, four subgroups remained highly concordant with PAM50 subtypes and are consequently named Luminal A, Luminal B, Basal and Her2 rich. Further, two additional subgroups, reactive I and reactive II, were described as products of the surrounding microenvironment [16].

Metabolites are small molecules downstream to proteins, RNA and DNA, thus reflecting intermediates and endpoints of biological pathways. Metabolomics provides unique pathophysiological signatures influenced by gene variability and the surrounding environment, thus revealing information that is closely associated with the phenotype. Altered metabolism is described as a hallmark of cancer [17]. We have previously reported metabolic variation between BC tumors and normal adjacent tissue, and metabolic patterns related to clinical and prognostic factors of BC [18,19,20]. Furthermore, three distinct clusters of BC patients were reported based on tumor metabolic profiles, where the clusters expressed differences in BC-related proteins and genes, particularly those associated with the extracellular matrix and cancer-related metabolic pathways [21].

Based on previous molecular characterizations of BC tissue, it is evident that a global assessment of biomolecules across different omics modalities may provide deeper insights into the heterogeneity of BC, thus improving the prediction of long-term survival. Integration of multi-omics profiles has already revealed clinical subtypes of several heterogeneous cancers including colon cancer, urothelial bladder cancer and esophageal carcinoma [22,23,24,25]. Despite these recent advancements in molecular research techniques, a limited number of comprehensive studies have been reported that utilize multi-omics data to probe BC heterogeneity [26]. In this study, we aimed to categorize BC patients into clinically relevant clusters that improve prediction of long-term survival, based on extensive biological information rather than single molecular levels. We hypothesized that multi-omics data provides a more holistic view of the molecular patterns of BC thus providing a clinically credible assessment during longer follow-ups. Using unsupervised machine learning to integrate transcriptomic, proteomic and metabolomic data from tumor tissues of BC patients prior to adjuvant therapy, we revealed three prominent clusters of BC patients with significant differences in long-term survival beyond 10 years. These findings were validated with high consistency in large cohorts of BC patients with long-term follow-up (Fig. 1) (Table 1). Moreover, we unmasked the key characterizing pathways and possible metastasis drivers through a systems-biology approach, ultimately providing deeper molecular insights into the bio-mechanisms underlying BC heterogeneity.

**Fig. 1: A graphical overview of the study framework.**

Table 1 Clinical characteristics of the patient cohorts.

Full size table

Results

Multi-omics integration revealed molecular characteristics associated with clinical factors

Trained on the Oslo2 data, the MOFA+ model with 20 latent factors provided a favorable balance between the average variance inflation factor (VIF) and the total variance explained by the model (Fig. 2a) (Supplementary Fig. 1). Factors 1 and 2 explained most of the molecular heterogeneity (25.7% and 16.2%, respectively). Interestingly, factor 1 captured higher variance through the protein expressions than via mRNA expressions, followed by metabolomics data (Fig. 2a). The Spearman’s rank correlation between clinical features of the tumor (tumor grade, histology, tumor size, lymph node status, hormone receptor status) and multi-omics factors (MOFs) is illustrated in Supplementary Fig. 2 (Supplementary Data 1). Factor 1 and 2 showed a significant correlation with the clinical characteristics of the tumor. (Fig. 2b) (Supplementary Data 1, Supplementary Fig. 2). Factor 1 showed a significant association with tumor grade (Kruskal–Wallis p = 8.1e-10). Factor 2 exhibited a strong association with tumor grade (Kruskal–Wallis p = 7.9e-22), histology (Kruskal–Wallis p = 1.9e-07) and size (Kruskal–Wallis p = 0.017) (Supplementary Data 1). The absolute loadings for each latent factor reveal important biological features that are responsible for data variability. We observed that estrogen receptor alpha (ERα), progesterone receptor (PR) and GATA binding protein-3 highly influenced factor 1 (Fig. 2c). Further, AGR3, FOXA1 and GATA3 genes were some of the most important gene targets underlying the variability captured by factor 1. The important metabolites according to factor 1 were glutamate (Glu), phosphocholine (pCho) and glutamine (Gln). The biological signals captured by factor 2 remained moderately even from all three modalities. The variance captured by metabolites was higher in factor 2 compared to factor 1 with pCho, lactate (Lac) and taurine (Tau) as important metabolites. Factor 2 highlighted Cyclin B1, Caveolin-1, and ERα as crucial proteins, while the genes UBE2C, HJURP, and C2orf40 were identified as significant contributors to the total variance explained. Interestingly, Factor 13 also showed a significant correlation with body mass index (BMI) and tumor size (Supplementary Fig. 2).

**Fig. 2: MOFA+ analysis of the Oslo2 cohort.**

Multi-omics clusters of breast cancer are related to long-term survival

To assess the prognostic value of the MOFA+ model, multivariate Cox proportional-hazards models were constructed and hazard ratios (HR) with confidence intervals (CI) are reported. In the Oslo2 cohort, 62 patients died due to BC or had BC metastasis during the follow-up period (average time to event = 9.5 years, SD = ± 3.5 years). Crude Cox regression analyses revealed that factor 1 (HR 0.84, 95% CI: 0.72–0.97, P = 0.021), factor 2 (HR 1.18, 95% CI: 1.00–1.39, P = 0.045), and factor 13 (HR 1.29, 95% CI: 1.05–1.58, P = 0.015) were significantly associated with metastasis and/or BC specific death (Fig. 3a). Adjusting for age did not substantially alter the hazard ratios of these factors. Forest plots for the Cox regression are provided in Supplementary Fig. 3.

**Fig. 3: Molecular characterization of the multi-omics clusters.**

Clustering based on MOFs 1,2 and 13 revealed three multi-omics clusters (MOC) (Fig. 3b). These clusters -MOC1, MOC2, and MOC3- included 50, 137 and 148 patients, respectively. An overview of clinical information for each cluster is presented in Table 2. MOC1 demonstrated a predominant enrichment of the basal-like subtype (74%) and a smaller presence of HER2-enriched samples (14%), with no other intrinsic subtypes observed (Fig. 3c). MOC2 contained a mixture of LumB (42%), LumA (26%) and HER2 (20%). The largest cluster MOC3 was highly enriched with LumA (29%) and Normal type (34%) with few basal, HER2 and LumB type samples. In addition, the distribution of ER, PR and HER2 hormone receptor statuses (measured by immunohistochemistry) was noticeably different between the MOCs (Fig. 3d). As expected, the MOCs varied in the pattern of expression in transcriptomic, proteomic, and metabolomic profiles (Fig. 3e). Key breast cancer-associated biomarker genes such as CRYAB and GABRP showed high upregulation in MOC1, while they remained lowly expressed in MOC2 and displayed intermediate expression levels in MOC3 (Supplementary Data 3) [27]. MOC1 exhibited a significant downregulation of GATA3 (Kruskal–Wallis test, P_adj = 8.3e-22), AGR3 (P_adj = 4.0e-20) and FOXA1 (P_adj = 7.0e-22) compared to the other MOCs. In the protein expression profile, MOC1 had upregulated Cyclin B1 (P_adj = 5.3e-22). MOC1 demonstrated significantly low expression levels of ER-alpha (P_adj = 1.01e-21) and PR (P_adj = 5.9e-15) proteins, consistent with the low proportion of hormone receptor-positive cells in this cluster (Fig. 3d, e). The metabolite level of alanine (P_adj = 4.8e-03), β-glucose (P_adj = 5.8e-07), taurine (P_adj = 4.8e-03), lactate (P_adj = 8.0e-06) and glutamate (P_adj = 1.6e-09) were significantly different between the MOCs (Supplementary Data 3). MOC3 exhibited a higher expression of β-glucose, while MOC2 displayed a high level of glutamine expression (Fig. 3e).

Table 2 Clinical characteristics of the multi-omics defined clusters in the study cohorts.

Full size table

Interestingly, the investigation of long-term prognosis among intrinsic subtypes for the Oslo2 cohort yielded outcomes that differed from the expected clinical patterns. On a longer follow-up (~120 months), the luminal subtypes demonstrated an unexpectedly poor prognosis when compared to the basal-like subtypes, defying the expected clinical pattern (Fig. 4a). Such anomalous observations were also seen in the TCGA and METABRIC cohorts when the long-term survival of intrinsic subtypes was accessed (Fig. 4b, c), where the long-term survival of the intrinsic subtypes was not significantly different.

There was a significant difference in the overall survival rates between the MOCs (log-rank test p = 5e-03) when assessing the long-term survival up to 15 years after surgery. (Fig. 4d). MOC1 exhibited a good prognosis compared to MOC2 while remaining moderately better than MOC3. Further, similar survival curves between the MOCs were observed when only death due to BC was examined (log-rank test p = 0.024) (Fig. 4e). MOC2 had the poorest prognosis compared to MOC1 and MOC3 (Fig. 4f). Significant differences in outcome between the MOCs were also demonstrated when investigating the cohort with (a) BC-specific death and/or metastasis and, (b) BC-specific death and/or metastasis including the relapse event (Supplementary Fig. 4). MOC2 comprises more metastasis and relapse events compared to the other MOCs.

Validation of the multi-omics clusters in external cohorts

To validate the differences in survival between the MOCs, a machine-learning (ML) based validation scheme was used for each external cohort separately i.e., TCGA and METABRIC. The common transcriptomes or/and proteins between the validation cohorts and Oslo2 cohort were identified (Supplementary Data 4). These omics features were then used to train the ML ensemble to classify the MOCs of the Oslo2 cohort. Classification results for the models are presented in Supplementary Data 5–6. The derived models were then used to classify samples from the validation cohorts as MOC1, 2 or 3 (Table 2).

For the TCGA cohort, the all-cause survival rates between the MOCs remained highly concordant with the Oslo2 cohort, where MOC1 had a good prognosis while MOC2 depicted the worst prognosis (Fig. 4g). The gene expression profiles of MOCs in TCGA and Oslo2 cohorts remained highly congruent. Similar to the MOC1 in Oslo2, the MOC1 in the TCGA cohort had high enrichment of the Basal-like subtype and few HER2-enriched subtypes (Fig. 4h). Similarly, MOC3 in TCGA remained highly enriched for the LumA and Normal subtypes. MOC2 in TCGA remained predominantly enriched by LumB as for MOC2 from Oslo2. However, the MOC2 samples in TCGA also consisted of samples of Normal subtypes.

In the METABRIC cohort, the MOCs had significant differences in the long-term all-cause survival rates with survival trends similar to those found in the Oslo2 cohort. Initially, the prognosis of MOC1 was the poorest; however, it gradually improved compared to MOC2 during the follow-up period. (Fig. 4i). The PAM50 enrichment of MOCs was also similar to Oslo2, where MOC1 was basal rich while the enrichment of MOC2 and MOC3 were particularly conformable with the Oslo2 cohort’s MOCs (Fig. 4j). Similar observations were also found for both cohorts when only the BC-specific mortality was assessed (Supplementary Fig. 5). The clinical features of MOCs across the validation cohort are presented in Table 2. The proportion distribution of these features between the MOCs remained consistent across all the cohorts.

These validation results remained highly consistent also when the analysis encompassed all the samples, irrespective of their history of receiving neoadjuvant therapy (Supplementary Fig. 6).

Functional analysis

Functional analyses were performed to determine differentially expressed pathways between the multi-omics clusters. The analysis was focused on MOC1 and MOC2 because of their large difference in prognosis. Two class significance analysis of microRNA (SAM) method revealed 9450 differentially expressed genes (DEGs) between MOC1 and MOC2 at false discovery rate (FDR) of 0.01 (Supplementary Fig. 7) (Supplementary Data 7). The KEGG database-based gene set enrichment analysis (GSEA) revealed the upregulation of prominent pathways related to cytokine-cytokine receptor interactions in MOC1 (KEGG ID: hsa04060, hsa04061, hsa04650) (Supplementary Fig. 8). The gene ontology (GO) MSigDB (C5) database suggested the upregulation of pathways related to the immune system (GO: 0002684, 0045087, 0002250) in MOC (Supplementary Fig. 8). In addition, the GSEA based on MSigDB (C6) database showed upregulation of Raf1 pathways in MOC2 (Supplementary Fig. 8)

Multi-omics pathway analysis was carried out by incorporating the differentially expressed omics features between MOC1 and MOC2. The network generated through metabolite-protein interactions and protein-protein interactions were merged to form a multi-omics network through shared nodes (Fig. 5a). The network comprises 4 metabolites and 114 mRNA/proteins. The network matrix and detailed topology are provided in Supplementary Data 8. To further explore the functional pathways, three KEGG databases were referred including KEGG (Metabolites), KEGG (Gene + Metabolites) and KEGG (gene/protein) to highlight the nodes from the enriched pathways. The analysis identified aberrations in alanine, aspartate and glutamate metabolism (P_adj = 1.07e-38), Endocrine resistance (P_adj = 6.42e-9), and the D-glutamine and D-glutamate pathways (P_adj = 1.78e-6) between the MOC1 and MOC2 clusters (Supplementary Data 8).

**Fig. 5: Systems-biology analysis comparing MOC1 and MOC2.**

Discussion

The heterogeneity of BC is rooted in the biomolecular variations present among the cancer cells. These variations affect tumor characteristics such as therapeutic responses and metastasis. RNA-based intrinsic subtyping of BC is an established approach for BC characterization that focuses on inter-tumor heterogeneity. However, several studies suggest that the established intrinsic subtyping does not emphasize the intra-tumor admixture of multiple subtypes leading to imprecise subtyping [28, 29]. We here show that the intrinsic subtyping based on the PAM50 gene set distinctly demonstrates long-term survival outcomes that are contrary to the expected clinical prognosis, for the Oslo2 cohort, but also in the large validation cohorts METABRIC and TCGA. Patients with a basal-like subtype showed relatively good long-term prognosis compared to the remaining subtypes, while the luminal type suffered poorer survival outcomes. In addition, we have previously observed that LumA tumors often have a delayed local relapse and metastasis when compared to other subtypes [30]. We observe a similar trend in our analysis (Supplementary Fig. 4). These observations challenge the conventional assumptions about the clinical nature of these intrinsic subtypes. Earlier research has corroborated these findings, wherein some studies attempted to reclassify intrinsic subtypes while providing plausible explanations for the observed outcomes [7, 8]. To address these challenges in assessing long-term prognosis, it is crucial to incorporate multiple omics modalities to capture the deep-embedded biological heterogeneity. Using this approach, we revealed three new clusters of BC patients with significantly different long-term survival rates. The survival trends between the clusters were further validated in the external cohorts -TCGA and METABRIC- with high consistency. Moreover, the distribution of intrinsic subtypes and clinical variables exhibited significant variation between these diverse cohorts, highlighting the robustness of our clusters. This, in turn, may offer new grounds for enhanced characterization and understanding of BC. It is worth noting that even though metabolomics data were not available for the validation cohorts, we observed a clear survival curve separation between the MOCs, especially in the TCGA cohort. However, this does not necessarily imply that metabolic markers explain less biological heterogeneity compared to transcripts and protein markers. The metabolic data explained the lowest amount of variation in the discovery cohort; however, this may be attributed to high missing values and fewer variables in Oslo2 metabolomic data compared to the other omics data. As metabolic data were used to define the original MOCs, and supervised models were built to classify the validation cohorts based on available data, the metabolic data are likely to have influenced the MOCs also in the cohorts where metabolites are not directly measured, as there is a delicate interplay between metabolites, proteins and transcripts.

Of the three multi-omics clusters, MOC1 showed the best prognosis, while MOC2 had the worst. Interestingly, MOC1 is also highly rich in basal-type cancer which also commonly occurs in younger women [31]. However, the factors underpinning MOCs (i.e., Factor 1, Factor 2 and Factor 13), show a clear association with BC-specific survival even in age-adjusted models. This suggests intricate underlying biological causes other than age or clinical features for such observation in MOC1. Multi-omics pathway analysis suggested differences related to the D-glutamine and D-glutamate metabolism to be among the main molecular differences between the clusters. D-glutamine and D-glutamate metabolism has been studied for their role in endothelial cell regulation and potential role in the metastasis of BC [32]. Comparing the molecular characteristics of MOC1 and MOC2 (Fig. 3e) we observed that MOC2 had a significantly higher level of glutamine (Gln) and a lower level of glutamate (Glu) which is formed directly from Gln through the process of glutaminolysis. This indicates a high degree of altered glutamine metabolism between MOC1 and MOC2 where the former exhibits an increased level of glutaminolysis. The transcript level of glutaminase (GLS), the enzyme that catalyzes glutaminolysis, was significantly upregulated in MOC1 when compared to MOC2 and MOC3, thus supporting the hypothesis. Gln is a conditionally essential amino acid and Gln-metabolism is considered an important therapeutic target, especially since certain cancer cells exhibit heightened glutamine uptake and dependency [33, 34]. Through transporters such as SLC1A5 and SLC7A5, which were also found upregulated in MOC1, the Glu further gets converted into enzymes that enable ATP production via the TCA cycle which is necessary for cancer cell growth and survival (Fig. 5b, c) [35]. Furthermore, activation of alanine, aspartate and glutamate metabolism (KEGG pathway map00250) was observed in MOC2. This pathway is known to provide critical fuel for survivability and proliferation of cancer cells [36]. The proteomic characteristics of MOC2 include a high level of Cyclin-Dependent Kinase 1 (CDK1), CDK2 and Cyclin D protein. Upregulation of CDKs and Cyclin D proteins along with the phosphatidylinositol 3-kinase (PIK3) gene family in MOC2 indicate the activation of the PI3K/Akt/mTOR pathway [37, 38]. PI3K-AKT signaling pathway has been implicated in endocrine resistance [39]. Supporting this, multi-omics pathway analysis revealed the endocrine resistance pathway as a significant pathway (13 hits) when comparing MOC1 and MOC2.

Interesting differences between the omics characteristics of MOC3 and the other two MOCs were also observed. The concentration levels of Gln, Glu and Cho were significantly different between all clusters with an intermediate expression level of these metabolites in MOC3 when compared to MOC1 and MOC2. MOC3 tumors are also characterized by significantly higher concentration levels of taurine and moinositol compared to MOC1 and MOC2 suggesting high apoptosis in BC cells [40, 41]. Interestingly, the GSEA analysis support the increased apoptosis of tumor cells in MOC3 as pathways crucial for the mitotic cell cycle were significantly downregulated in MOC3 when compared to MOC2 (Supplementary Fig. 9). These observations may explain the low mortalities in MOC3 compared to MOC2, however, further investigation is warranted.

This study showed that the integrative multi-omics approach allowed the discovery of robust long-term prognostic MOCs, however, it is important to highlight the challenge this approach poses in terms of clinical practicality. Although incorporating more omics modalities in the framework could improve our understanding of BC heterogeneity and prognosis, it is uncommon to obtain such information at the individual level. However, as demonstrated through TCGA and METABRIC cohorts, not every level of omics is required for classifying MOCs. A robust supervised model, trained on a larger cohort, could attain clinically credible precision in MOC classification using a limited panel, similar to the PAM50. Furthermore, as the high-throughput technology continues to advance, integrating multi-omics into clinical practice may become feasible. One of the challenges in investigating long-term outcomes is that the treatment may be partly outdated during the long-term follow-up, especially for cancer where remarkable advancements in treatment regimens have been made in previous decades (endocrine treatment, chemotherapy including intensity, HER2 directed treatment, post-neoadjuvant treatment alterations). This may be reflected in our validation study where the curve separation in TCGA is in agreement with Oslo2, while the basal type rich MOC1 in the METABRIC cohort shows a different 5-year breast cancer mortality. This difference in outcomes may be attributed to the fact that TCGA and Oslo2 are relatively newer cohorts compared to METABRIC, consequently benefiting from more advanced and improved treatment strategies. However, it is also interesting to observe that MOC1 has 80% pN0 in Oslo2, versus 50% and 60% for MOC2 and MOC3 respectively (Table 2). In TCGA, 63% of patients are pN0 in MOC1 versus 43% and 45% in MOC2 and MOC3 respectively (Table 2). Since lymph node status is an important prognostic factor (Supplementary Fig. 3), this variation may also contribute to the favorable prognosis of patients in the MOC1 compared to MOC2/MOC3. It is worth noting that treatment information was not considered in our study, under the assumption that all patients received optimal treatment according to the disease type at the time of their diagnosis. Optimally, to obtain clinically reliable prognostic information from the proposed multi-omics approach, MOC subgroups should be compared within an independent patient cohort that has not received any systemic adjuvant treatment.

Conclusion

Overall, the analysis presented in this study shows that integrated analysis of metabolomics, transcriptomics, and proteomics, can differentiate highly aggressive breast cancer from less aggressive types. Three prominent clusters of BC patients with different prognoses were revealed in this study, which was validated in two independent large cohorts with high consistency. Importantly, the multi-omics clusters showed robust associations with long-term survival compared to the established intrinsic subtypes. Using systems-biology approaches we unmasked possible key pathways and drivers that characterized the found multi-omics clusters, ultimately providing deeper molecular insights into the biological mechanisms underlying BC heterogeneity. Nonetheless, our study emphasizes the significance of the integrative multi-omics approach in future clinical studies of BC.

Methods

Oslo2 cohort

As a part of the Oslo2 study, tissue samples from untreated patients of primary breast carcinoma were collected at the Oslo University Hospital in the years 2006–2009 (Radium Hospital and Ullevål Hospital) [21]. Tumor tissue from a total of 335 patients were included in this study (Table 1). The obtained tissue samples were freshly frozen after surgery and stored at −80 °C. Further, the samples were dissected into three sections as previously described [21]. The two parts comprising the outer edges were used to perform histological evaluation including tumor cell percentage, estrogen receptor (ER) and progesterone receptor (PR) status. The middle section was used for metabolomics analysis, while the remnant of all three pieces was used for the extraction of DNA, RNA, and proteins.

Of 335 patients, metabolomics data, gene expression and protein expression profiles were available for 228, 308 and 315 samples, respectively. A complete dataset for all three omics profiles was available for 196 samples.

Transcriptome profiling

Transcriptomics analysis was performed as previously described by Haukaas et al. [21]. In short, Total RNA was extracted with TRIzol. SurePrint G3 Human GE 8x60K (Agilent Technologies) was used to measure mRNA expression as per the manufactory’s protocol (one-color microarray-based gene expression analysis, low input Quick Amp Labeling, v.6.5, May 2010) and 100 ng RNA was used as input for labeling. Further, the obtained 21,851 profiles were quantile normalized, log2-transformed, and hospital-adjusted [21]. Measurements from the identical Entrez ID probes were averaged to create a single expression value per gene. Finally, intrinsic subtypes were assigned to each sample using the previously defined PAM50 algorithm [6].

Protein expression

A total of 151 primary antibodies were probed using the RPPA technique to detect breast cancer-related proteins. Tumor tissue was lysed using a lysis buffer containing proteinase and phosphatase inhibitors. The lysates were diluted to a concentration of 1.33 mg/ml and boiled with SDS and 2-mercaptoethanol. The supernatants were further diluted in serial dilutions with lysis buffer. These samples were then spotted and immobilized on nitrocellulose-coated FAST slides. Probing with primary antibodies, the signal intensity was captured, amplified, and analyzed using MicroVigene software. The spot signal intensities were processed using SuperCurve (version 1.01) to derive protein concentrations. The concentrations were log2-transformed and normalized using median centering for each antibody across the samples. The samples were categorized into five groups (basal, luminal, HER2, and reactive I and II as defined in The Cancer Genome Atlas Network data set [16]) based on best-fitting consensus clustering.

Metabolomics analysis

The concentration level of 18 metabolites was measured by high-resolution magic angle spinning magnetic resonance (HR-MAS MR) spectroscopy as previously described [21] using a Bruker Avance III 600 MHz/54 mm US (Bruker, Biospin GmbH, Germany) equipped with a 1H/13 C MAS probe. The mean sample weight was 7.3 ± 2.6 mg. Metabolites were quantified from Carr–Purcell–Meiboom–Gill (cpmg) pulse sequence (cpmgpr1d; Bruker) spin-echo spectra. The spectral region between 1.40 and 4.70 ppm was mean normalized after the removal of lipid residuals, and relative quantification was done by peak integration.

Data analysis

Multi-omics factor analysis

Multi-omics Factor Analysis + (MOFA+) is an unsupervised data integration framework that allows for identification of the variation-explaining principal axes when multiple omics profiles of the same sample are analyzed concurrently [42]. These principal axes are low-dimensional interpretable representations of multi-omics datasets called latent factors (referred to as multi-omics factors, MOFs), thus allowing to perform downstream analyses such as clustering and feature selection.

MOFA+ training is carried out under a probabilistic Bayesian framework with an assumed prior joined distribution of unobserved variables within the model [42]. For $M$ different types of omics modalities (matrices) given as ${Y}_{1},\ldots {Y}_{m}$ with dimensions $N\times {D}_{m}$, where $N$ is the total number of samples and ${D}_{m}$ is the features available in the modality $m$, MOFA decomposes each ${Y}_{m}$ as (Eq1):

$${Y}_{m}=Z{W}^{{mT}}+{\epsilon }^{m}{;}\,m=1,\ldots ,M$$

(1)

Here, ${W}^{m}$ is the regularized weight matrix for each modality $m$, and ${\varepsilon }^{m}$ is the residual noise term specific to the modality. The latent factors matrix, Z, is a lower-dimensional representation of the original matrix Y, which stands common for all modalities. To determine the optimal MOFA+ model, four analogous MOFA+ models with latent factors of n = 10, 15, 20, and 30 dimensions were compared. The comparison was based on assessing the multicollinearity between the MOFs and the total variance explained (${R}^{2}$ by individual factors and modality) [43]. Average variational inflation factor (VIF) was used to analyze multicollinearity between the factors [43]. A high or low VIF is suggestive of a poor model. The spearman correlation coefficient was evaluated between the MOFs and the tumor’s characteristics. In addition, associations between clinical parameters (tumor grade, histology, tumor size, lymph node status, hormone receptor status) and MOFs were assessed by Wilcoxon and Kruskal–Wallis tests. For the association analyses, histology was dichotomized into two categories ductal and non-ductal type and tumor size was dichotomized into tumor size less than or larger than 20 mm. Further, a multivariate Cox proportional-hazards model was fitted over the MOFs to identify factors associated with patient prognosis. Schoenfeld test was used to test the proportionality of the hazards (PH) assumption of Cox model [44]. The significance of the hazard ratio and the overall significance of the multivariate models were evaluated using the likelihood ratio test (LRT) and Wald tests. For the Kaplan–Meier curve, a log-rank test was used to test the difference in the time-to-event curve between the different groups. Survival analysis was performed using the “survival” package. MOFA+ was implemented on R Statistical Software (version 4.2.1) using the “MOFA2” package [45].

Cluster analysis

Cluster analysis by k-means clustering was carried out on the survival-associated latent factors of the trained MOFA+ model. The Hartigan-Wong algorithm for K-means clustering was executed using Euclidean distance as the distance parameter [46]. Gap statistics and Silhouette scores were used to evaluate the optimal number of clusters, where the number of clusters (maximum 10) was chosen as the number corresponding to the largest GAP [47]. K-means clustering was implemented using the R (4.2.1) package “stats”, and the Gap statistics were obtained using the “cluster” package in R (4.2.1). (Supplementary Fig. 10).

Validation framework

To test the robustness and clinical credibility of the found clusters on independent validation cohorts, a supervised learning-based validation scheme was used. We used two large-scale study cohorts for validation: The Cancer Genome Atlas (TCGA) and the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) (Table 1). From the validation cohorts, only those genes or proteins that were in common with the omics assays of the Oslo2 cohort were included in the validation scheme.

TCGA cohort

The Cancer Genome Atlas Pan-Cancer analysis project (TCGA-PanCanAtlas 2018) analyzed a large number of cancer tissues obtained from patients with primary tumors in different sites [48]. The study covered 12 tumor types including breast neoplasm. The data acquisition time for the TCGA began in 2006 (https://www.cancer.gov/ccg/research/genome-sequencing/tcga/history/timeline-milestones). Molecular profiles including the z-score normalized mRNA expression and the protein expression profiles for 1,084 primary BC samples along with their clinical information files were downloaded from cBioPortal (https://www.cbioportal.org/). To match the Oslo2 cohort, only patients with no history of neoadjuvant therapy were included in the analysis, resulting in 859 female BC samples.

METABRIC cohort

Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) collected over 2000 primary BC tumor tissue samples from biobanks based in Canada and the UK [49]. The first study based on the METABRIC dataset was published in 2012 with a maximum follow-up of ~28 year [49]. This suggests that the compilation of the METABRIC dataset began around ~1985. These fresh-frozen samples are clinically annotated. The clinical information, gene expression data and metadata for 1904 samples were downloaded from cBioPortal (https://www.cbioportal.org/). Samples undergoing any therapy including chemotherapy or radiation therapy were excluded from the analysis. The final cohort comprised 694 samples.

Since the TCGA cohort comprises of transcriptome and protein profiles while the METABRIC cohort only offers transcriptomic profiles of BC patients, we perform a supervised analysis to overcome the incompleteness concerning the complete analysis performed on the Oslo2 cohort. As part of a sensitivity analysis, we also conducted validation across all samples in the cohort, regardless of their history of neoadjuvant therapy.

Supervised learning

Once the multi-omics clusters were identified in the Oslo2 cohort, multiple supervised classifiers were trained over multi-omics features set to classify the multi-omics clusters. These machine learning (ML) classifiers learned the intricate patterns embedded in the multi-omics assays in a supervised fashion which could be translated for the validation cohorts to predict the respective cluster of their samples. Three state-of-the-art machine learning classifiers were used in this study: random forest (RF), support vector machines (SVM) with linear kernel and partial least square discriminant analysis (PLS-DA) [50,51,52]. As an ensemble, the different mathematical foundations underlying these models offer a holistic approach to learning the intrinsic patterns, thus improving the reliability of the predictions (Supplementary Figs. 11–12) [53]. The RF, SVM and PLS-DA were implemented in R (4.2.1) using the packages “randomForest”, “e1071” and “caret” respectively.

To validate our results on the external cohorts, for each cohort, the supervised classifiers were trained over the Oslo2 cohort, by incorporating the set of common multi-omics features between Oslo2 and the external cohort (Fig. 1). Furthermore, to mitigate the challenge of dealing with high-dimensional data of the transcriptomic data, only the genes that exhibited differential expression (differentially expressed genes, DEGs) between the clusters in the Oslo2 dataset were utilized for the machine learning modeling. To identify DEGs, a significance analysis of microarrays (SAM) was performed for the Oslo2 mRNA profiles using the 21,851 genes with 100 permutations and a false discovery rate value of less than 0.01 (Supplementary Data 7). The “siggenes” package in R (4.2.1) was used to perform the SAM analysis.

The best-performing hyperparameters of the model were determined by assessing the model’s performance through 5-fold cross-validation and the area under the receiver operating characteristic (ROC) curve (AUC) [54] (Supplementary Data 5, 6) (Supplementary Figs. 11–12). The models were then trained for the entire Oslo2 cohort’s samples by setting the best-performing hyperparameters. Finally, these fully trained classifiers were fitted over the external cohort to predict the clusters for its samples. The cluster or class probabilities for each sample in the validation cohort were averaged across the trained classifiers and the class with the highest average probability was assigned to the sample.

Integrated pathway analysis

To identify the enriched gene sets in the obtained MOCs, gene set enrichment analysis (GSEA) was implemented [55]. GSEA was performed pairwise on the multi-omics clusters. For the pairwise comparison, the ranking of the DEGs was based on the standardized size effect taking signal-to-noise ratio into account (Eq. 2) where ${\mu }_{N}$ is the mean and ${\sigma }_{N}$ is the standard deviation of the gene in cluster N, respectively (Supplementary Data 9).

$$\frac{{\mu }_{A}-{\mu }_{B}}{{\sigma }_{A}+{\sigma }_{B}}$$

(2)

KEGG database was referred to perform the GSEA [56]. Additionally, the gene sets C5 (gene ontology (GO) and Human Phenotype Ontology (HPO) gene sets) and C6 (oncogenic signature gene sets) derived from the Molecular Signatures Database (MSigDB) by The Broad Institute were also referred to evaluate the enrichments [57]. The gene sets database was filtered for the maximum and minimum size of 500 and 15 genes, respectively. For each comparison, n = 10,000 permutations were performed on phenotypes and the false discovery rate (FDR) cutoff was set to 25% (as recommended in the manual). The enrichment analysis was performed in R using the packages “msigdbr”, “clusterProfiler” and “fgsea”.

To further probe the obtained higher-order clusters at the molecular level, the top-weighted multi-omics features set was identified that contributed the most to the multi-omics latent factors. An integrated pathway analysis was carried out using the OmicsNet 2.0 software by incorporating these multi-omics features [58]. For network modeling, metabolite-protein was set as the primary network followed by protein-protein interaction as secondary molecular interaction referring to the STRING database [59]. A pairwise comparative study between the clusters was also performed. Pathways with an adjusted p-value less than 0.05 were considered significant. The network connections were identified using the current nodes and no new nodes were introduced.

Code availability

All code for data analysis associated with the current submission is available at https://github.com/AbhibhavS/BreastCancer-MultiOmics.

Data availability

The supporting datasets of this article are included within the article Haukaas, T.H., et al., Metabolic clusters of breast cancer in relation to gene- and protein expression subtypes. Cancer Metab, 2016. 4: p. 12.; where mRNA expression data have been submitted to the Gene Expression Omnibus (GEO) database as a SuperSeries record with accession number GSE58215.

References

Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71:209–49.
Article PubMed Google Scholar
Loibl S, Poortmans P, Morrow M, Denkert C, Curigliano G. Breast cancer. Lancet. 2021;397:1750–69.
Article CAS PubMed Google Scholar
Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, et al. Molecular portraits of human breast tumours. Nature. 2000;406:747–52.
Article CAS PubMed Google Scholar
Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA. 2001;98:10869–74.
Article CAS PubMed PubMed Central Google Scholar
Prat A, Perou CM. Deconstructing the molecular portraits of breast cancer. Mol Oncol. 2011;5:5–23.
Article CAS PubMed Google Scholar
Parker JS, Mullins M, Cheang MC, Leung S, Voduc D, Vickery T, et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27:1160–7.
Article PubMed PubMed Central Google Scholar
Mathews JC, Nadeem S, Levine AJ, Pouryahya M, Deasy JO, Tannenbaum A. Robust and interpretable PAM50 reclassification exhibits survival advantage for myoepithelial and immune phenotypes. NPJ Breast Cancer. 2019;5:30.
Article PubMed PubMed Central Google Scholar
Milioli HH, Tishchenko I, Riveros C, Berretta R, Moscato P. Basal-like breast cancer: molecular profiles, clinical features and survival outcomes. BMC Med Genomics. 2017;10:19.
Article PubMed PubMed Central Google Scholar
Ciriello G, Sinha R, Hoadley KA, Jacobsen AS, Reva B, Perou CM, et al. The molecular diversity of Luminal A breast tumors. Breast Cancer Res Treat. 2013;141:409–20.
Article CAS PubMed PubMed Central Google Scholar
Prat A, Pineda E, Adamo B, Galvan P, Fernandez A, Gaba L, et al. Clinical implications of the intrinsic molecular subtypes of breast cancer. Breast. 2015;24:S26–35.
Article PubMed Google Scholar
Staaf J, Hakkinen J, Hegardt C, Saal LH, Kimbung S, Hedenfalk I, et al. RNA sequencing-based single sample predictors of molecular subtype and risk of recurrence for clinical assessment of early-stage breast cancer. NPJ Breast Cancer. 2022;8:94.
Article CAS PubMed PubMed Central Google Scholar
Akbani R, Ng PK, Werner HM, Shahmoradgoli M, Zhang F, Ju Z, et al. A pan-cancer proteomic perspective on The Cancer Genome Atlas. Nat Commun. 2014;5:3887.
Article CAS PubMed Google Scholar
Zhang B, Wang J, Wang X, Zhu J, Liu Q, Shi Z, et al. Proteogenomic characterization of human colon and rectal cancer. Nature. 2014;513:382–7.
Article CAS PubMed PubMed Central Google Scholar
Myhre S, Lingjaerde OC, Hennessy BT, Aure MR, Carey MS, Alsner J, et al. Influence of DNA copy number and mRNA levels on the expression of breast cancer related proteins. Mol Oncol. 2013;7:704–18.
Article CAS PubMed PubMed Central Google Scholar
Hennessy BT, Lu Y, Gonzalez-Angulo AM, Carey MS, Myhre S, Ju Z, et al. A technical assessment of the utility of reverse phase protein arrays for the study of the functional proteome in non-microdissected human breast cancers. Clin Proteom. 2010;6:129–51.
Article CAS Google Scholar
Cancer Genome Atlas N. Comprehensive molecular portraits of human breast tumours. Nature. 2012;490:61–70.
Article Google Scholar
Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144:646–74.
Article CAS PubMed Google Scholar
Bathen TF, Geurts B, Sitter B, Fjosne HE, Lundgren S, Buydens LM, et al. Feasibility of MR metabolomics for immediate analysis of resection margins during breast cancer surgery. PLoS One. 2013;8:e61578.
Article CAS PubMed PubMed Central Google Scholar
Cao MD, Sitter B, Bathen TF, Bofin A, Lonning PE, Lundgren S, et al. Predicting long-term survival and treatment response in breast cancer patients receiving neoadjuvant chemotherapy by MR metabolic profiling. NMR Biomed. 2012;25:369–78.
Article CAS PubMed Google Scholar
Giskeodegard GF, Lundgren S, Sitter B, Fjosne HE, Postma G, Buydens LM, et al. Lactate and glycine-potential MR biomarkers of prognosis in estrogen receptor-positive breast cancers. NMR Biomed. 2012;25:1271–9.
Article CAS PubMed Google Scholar
Haukaas TH, Euceda LR, Giskeodegard GF, Lamichhane S, Krohn M, Jernstrom S, et al. Metabolic clusters of breast cancer in relation to gene- and protein expression subtypes. Cancer Metab. 2016;4:12.
Article PubMed PubMed Central Google Scholar
Chu G, Ji X, Wang Y, Niu H. Integrated multiomics analysis and machine learning refine molecular subtypes and prognosis for muscle-invasive urothelial cancer. Mol Ther Nucleic Acids. 2023;33:110–26.
Article CAS PubMed PubMed Central Google Scholar
Liu Z, Zhao Y, Kong P, Liu Y, Huang J, Xu E, et al. Integrated multi-omics profiling yields a clinically relevant molecular classification for esophageal squamous cell carcinoma. Cancer Cell. 2023;41:181–195 e189.
Article CAS PubMed Google Scholar
Roelands J, Kuppen PJK, Ahmed EI, Mall R, Masoodi T, Singh P, et al. An integrated tumor, immune and microbiome atlas of colon cancer. Nat Med. 2023;29:1273–86.
Article CAS PubMed PubMed Central Google Scholar
Yao Z, Xu N, Shang G, Wang H, Tao H, Wang Y, et al. Proteogenomics of different urothelial bladder cancer stages reveals distinct molecular features for papillary cancer and carcinoma in situ. Nat Commun. 2023;14:5670.
Article CAS PubMed PubMed Central Google Scholar
Rossi C, Cicalini I, Cufaro MC, Consalvo A, Upadhyaya P, Sala G, et al. Breast cancer in the era of integrating “Omics” approaches. Oncogenesis. 2022;11:17.
Article PubMed PubMed Central Google Scholar
Milioli HH, Vimieiro R, Riveros C, Tishchenko I, Berretta R, Moscato P. The discovery of novel biomarkers improves breast cancer intrinsic subtype prediction and reconciles the labels in the METABRIC data set. PLoS One. 2015;10:e0129711.
Article PubMed PubMed Central Google Scholar
Allott EH, Geradts J, Sun X, Cohen SM, Zirpoli GR, Khoury T, et al. Intratumoral heterogeneity as a source of discordance in breast cancer biomarker classification. Breast Cancer Res. 2016;18:68.
Article PubMed PubMed Central Google Scholar
Martelotto LG, Ng CK, Piscuoglio S, Weigelt B, Reis-Filho JS. Breast cancer intra-tumor heterogeneity. Breast Cancer Res. 2014;16:210.
Article PubMed PubMed Central Google Scholar
Aure MR, Vitelli V, Jernstrom S, Kumar S, Krohn M, Due EU, et al. Integrative clustering reveals a novel split in the luminal A subtype of breast cancer with impact on outcome. Breast Cancer Res. 2017;19:44.
Article PubMed PubMed Central Google Scholar
Azim HA Jr, Partridge AH. Biology of breast cancer in young women. Breast Cancer Res. 2014;16:427.
Article PubMed PubMed Central Google Scholar
Huang H, Vandekeere S, Kalucka J, Bierhansl L, Zecchin A, Bruning U, et al. Role of glutamine and interlinked asparagine metabolism in vessel formation. EMBO J. 2017;36:2334–52.
Article CAS PubMed PubMed Central Google Scholar
Altman BJ, Stine ZE, Dang CV. From Krebs to clinic: glutamine metabolism to cancer therapy. Nat Rev Cancer. 2016;16:749.
Article CAS PubMed Google Scholar
Jin J, Byun JK, Choi YK, Park KG. Targeting glutamine metabolism as a therapeutic strategy for cancer. Exp Mol Med. 2023;55:706–15.
Article CAS PubMed PubMed Central Google Scholar
Wise DR, Thompson CB. Glutamine addiction: a new therapeutic target in cancer. Trends Biochem Sci. 2010;35:427–33.
Article CAS PubMed PubMed Central Google Scholar
Huang S, Chong N, Lewis NE, Jia W, Xie G, Garmire LX. Novel personalized pathway-based metabolomics models reveal key metabolic pathways for breast cancer diagnosis. Genome Med. 2016;8:34.
Article PubMed PubMed Central Google Scholar
Gul A, Leyland-Jones B, Dey N, De P. A combination of the PI3K pathway inhibitor plus cell cycle pathway inhibitor to combat endocrine resistance in hormone receptor-positive breast cancer: a genomic algorithm-based treatment approach. Am J Cancer Res. 2018;8:2359–76.
PubMed PubMed Central Google Scholar
Qie S, Diehl JA. Cyclin D1, cancer progression, and opportunities in cancer treatment. J Mol Med (Berl). 2016;94:1313–26.
Article CAS PubMed Google Scholar
Abu-Khalaf MM, Alex Hodge K, Hatzis C, Baldelli E, El Gazzah E, Valdes F, et al. AKT/mTOR signaling modulates resistance to endocrine therapy and CDK4/6 inhibition in metastatic breast cancers. NPJ Precis Oncol. 2023;7:18.
Article CAS PubMed PubMed Central Google Scholar
Bizzarri M, Dinicola S, Bevilacqua A, Cucina A. Broad spectrum anticancer activity of myo-inositol and inositol hexakisphosphate. Int J Endocrinol. 2016;2016:5616807.
Article PubMed PubMed Central Google Scholar
Zhang X, Lu H, Wang Y, Liu C, Zhu W, Zheng S, et al. Taurine induces the apoptosis of breast cancer cells by regulating apoptosis-related proteins of mitochondria. Int J Mol Med. 2015;35:218–26.
Article CAS PubMed Google Scholar
Argelaguet R, Arnol D, Bredikhin D, Deloro Y, Velten B, Marioni JC, et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 2020;21:111.
Article PubMed PubMed Central Google Scholar
Goncalves E, Poulos RC, Cai Z, Barthorpe S, Manda SS, Lucas N, et al. Pan-cancer proteomic map of 949 human cell lines. Cancer Cell. 2022;40:835–849 e838.
Article CAS PubMed PubMed Central Google Scholar
SCHOENFELD D. Partial residuals for the proportional hazards regression model. Biometrika. 1982;69:239–41.
Article Google Scholar
R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Hartigan JA, Wong MA. Algorithm AS 136: A K-Means Clustering Algorithm. J R Stat Soc Ser C (Appl Stat). 1979;28:100–8.
Google Scholar
Tibshirani R, Walther G, Hastie T. Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc B. 2001;63:411–23.
Article Google Scholar
Cancer Genome Atlas Research N, Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet. 2013;45:1113–20.
Article Google Scholar
Curtis C, Shah SP, Chin SF, Turashvili G, Rueda OM, Dunning MJ, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature. 2012;486:346–52.
Article CAS PubMed PubMed Central Google Scholar
Barker M, Rayens W. Partial least squares for discrimination. J Chemometrics. 2003;17:166–73.
Article CAS Google Scholar
Boser BE, Guyon IM, Vapnik VN. A training algorithm for optimal margin classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory. Association for Computing Machinery, 1992, pp 144–52.
Breiman L. Random forests. Mach Learn. 2001;45:5–32.
Article Google Scholar
Dietterich TG. Ensemble methods in machine learning. Multiple Classifier Systems. Berlin Heidelberg: Springer; 2000. p. 1–15.
Hajian-Tilaki K. Receiver Operating Characteristic (ROC) curve analysis for medical diagnostic test evaluation. Casp J Intern Med. 2013;4:627–35.
Google Scholar
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005;102:15545–50.
Article CAS PubMed PubMed Central Google Scholar
Kanehisa M, Furumichi M, Sato Y, Kawashima M, Ishiguro-Watanabe M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 2023;51:D587–92.
Article CAS PubMed Google Scholar
Liberzon A, Birger C, Thorvaldsdottir H, Ghandi M, Mesirov JP, Tamayo P. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst. 2015;1:417–25.
Article CAS PubMed PubMed Central Google Scholar
Zhou G, Pang Z, Lu Y, Ewald J, Xia J. OmicsNet 2.0: a web-based platform for multi-omics integration and network visual analytics. Nucleic Acids Res. 2022;50:W527–33.
Article CAS PubMed PubMed Central Google Scholar
Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, et al. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43:D447–52.
Article CAS PubMed Google Scholar

Download references

Funding

Open access funding provided by Norwegian University of Science and Technology.

Author information

Authors and Affiliations

Department of Public Health and Nursing (ISM), Norwegian University of Science and Technology- NTNU, Trondheim, Norway
Abhibhav Sharma, Julia Debik & Guro F. Giskeødegård
Department of Circulation and Medical Imaging, NTNU, Trondheim, Norway
Julia Debik & Tone F. Bathen
Department of Oncology, Division of Cancer Medicine, Oslo University Hospital, Oslo, Norway
Bjørn Naume & Hege Oma Ohnstad
Institute of Clinical Medicine, University of Oslo, Oslo, Norway
Bjørn Naume
Vestre Viken Hospital Trust, Oslo, Norway
Kristine Kleivi Sahlber & Helle Kristine Skjerven
Oslo University Hospital, Oslo, Norway
Elin Borgen, Anne-Lise Børresen-Dale, Olav Engebråten, Øystein Garred, Gry Aarum Geitvik, Rolf Kåresen, Anita Langerød, Gunhild Mari Mælandsmo, Hege G Russnes, Torill Sauer, Ellen Schlichting & Therese Sørlie
Østfold Hospital, Oslo, Norway
Britt Fritzman
Akershus University Hospital, Oslo, Norway
Jürgen Geisler & Vessela N Kristensen
Cancer Registry of Norway, Oslo, Norway
Solveig Hofvind
University of Oslo, Oslo, Norway
Ole Christian Lingjærde

Authors

Abhibhav Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Julia Debik
View author publications
You can also search for this author in PubMed Google Scholar
Bjørn Naume
View author publications
You can also search for this author in PubMed Google Scholar
Hege Oma Ohnstad
View author publications
You can also search for this author in PubMed Google Scholar
Tone F. Bathen
View author publications
You can also search for this author in PubMed Google Scholar
Guro F. Giskeødegård
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

Oslo Breast Cancer Consortium (OSBREAC)

Bjørn Naume
, Tone F. Bathen
, Kristine Kleivi Sahlber
, Elin Borgen
, Anne-Lise Børresen-Dale
, Olav Engebråten
, Britt Fritzman
, Øystein Garred
, Jürgen Geisler
, Gry Aarum Geitvik
, Solveig Hofvind
, Vessela N Kristensen
, Rolf Kåresen
, Anita Langerød
, Ole Christian Lingjærde
, Gunhild Mari Mælandsmo
, Hege G Russnes
, Torill Sauer
, Helle Kristine Skjerven
, Ellen Schlichting
& Therese Sørlie

Contributions

AS, JD, BN, HOO, OSBREAC, TFB and GFG designed the study. AS performed the computation, statistical analysis and drafted the manuscript. All the authors have read and contributed toward building the final draft. The final manuscript is approved by all the authors.

Corresponding authors

Correspondence to Abhibhav Sharma or Guro F. Giskeødegård.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval and consent to participate

The study is approved by the Norwegian Regional Committee for Medical Research Ethics (Biobank approval 1.2006.1607) and is in accordance with the Declaration of Helsinki. All patients have given written consent for the use of material for research purposes.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary File Description

Supplementary Figure

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Supplementary Data 7

Supplementary Data 8

Supplementary Data 9

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sharma, A., Debik, J., Naume, B. et al. Comprehensive multi-omics analysis of breast cancer reveals distinct long-term prognostic subtypes. Oncogenesis 13, 22 (2024). https://doi.org/10.1038/s41389-024-00521-6

Download citation

Received: 16 January 2024
Revised: 20 May 2024
Accepted: 22 May 2024
Published: 13 June 2024
DOI: https://doi.org/10.1038/s41389-024-00521-6
Springer Nature Limited

Comprehensive multi-omics analysis of breast cancer reveals distinct long-term prognostic subtypes

Abstract

Similar content being viewed by others

Background

Results

Multi-omics integration revealed molecular characteristics associated with clinical factors

Multi-omics clusters of breast cancer are related to long-term survival

Validation of the multi-omics clusters in external cohorts

Functional analysis

Discussion

Conclusion

Methods

Oslo2 cohort

Transcriptome profiling

Protein expression

Metabolomics analysis

Data analysis

Multi-omics factor analysis

Cluster analysis

Validation framework

TCGA cohort

METABRIC cohort

Supervised learning

Integrated pathway analysis

Code availability

Data availability

References

Funding

Author information

Authors and Affiliations

Consortia

Oslo Breast Cancer Consortium (OSBREAC)

Contributions

Corresponding authors

Ethics declarations

Competing interests

Ethical approval and consent to participate

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation