Introduction

The status of axillary lymph node (ALN) of breast cancer playing a critical role in determining patient prognosis and guiding treatment strategies [1, 2]. Recent guidelines from the American College of Surgeons Oncology Group now advise against axillary lymph node dissection (ALND) in specific cases. Specifically, patients with early-stage (T1 or T2) breast cancer who have one or two positive sentinel lymph nodes are recommended not to undergo ALND if they are undergoing mastectomy followed by radiotherapy [3,4,5,6]. Conversely, a higher ALN burden (≥ 3 metastatic ALNs) often leads to recommendations for more aggressive interventions, such as comprehensive chemotherapy or additional surgical procedures, to effectively address the spread of cancer. [7].

Currently, the principal method for evaluating ALN burden is sentinel lymph node biopsy (SLNB) [8]. However, as an invasive axillary technique, SLNB not only contributes to additional medical expenses and anesthesia duration but also may induce complications, such as lymphedema or upper limb numbness, with rates ranging from 1.8–29.9% [9]. If a reliable method existed to predict ALN burden preoperatively ecxisted, patients with three or more positive lymph nodes could be directly selected for ALND or receive a new adjuvant therapy without SLNB [10].

Currently, MRI and ultrasound examinations are the primary noninvasive imaging modalities used for assessing ALN burden [11, 12]. Their efficacy in this context is constrained by their comparatively high false-negative rates [12]. Radiomics, a novel approach for extracting quantitative features from medical images, can be regarded as a form of digital biopsy, facilitating comprehensive tumor phenotype and spatial heterogeneity characterization across diverse clinical scenarios [13,14,15]. Several previous radiomics models aimed to predict ALN metastasis, but small sample sizes and insufficient independent validation limited their efficacy [16,17,18,19]. Additionally, the exploration of the association between radiomic phenotypes and their biological foundations remains superficial [20]. Elucidating the biological underpinnings of radiomic predictions is crucial for their broader acceptance by clinicians as independent diagnostic, predictive, or prognostic tools [21]. Therefore, further research is crucial to validate radiomics’ robustness in predicting ALN burden and investigate its molecular genetic basis, thereby facilitating its clinical application.

This study aims to develop a radiomic model for predicting ALN burden in early-stage breast cancer preoperatively and elucidate the association of biological basis of MRI radiomic phenotypes with ALN burden through radiomic and genomic data integration.

Materials and methods

Patient recruitment and clinicopathological data collection

The ethics committees of four institutions granted ethical approval for this study. Informed consent was waived because of the retrospective study design. Patients with early-stage invasive breast cancer (T1 or T2, tumor size ≤ 5 cm) were recruited from four institutions. These individuals had undergone breast-enhanced MRI examinations within two weeks before SLNB or ALN surgery. This study enrolled 1211 patients, with Center I enrolling 532 cases (May 2014 to December 2020) for model training, and Centers II, III, and IV enrolling 113, 185, and 381 cases, respectively, for validation over different periods. Furthermore, The Cancer Genome Atlas-Breast Cancer (TCGA-BRCA) and the Cancer Imaging Archive-Breast Cancer (TCIA-BRCA) data, including dynamic contrast-enhanced MRI (DCE-MRI) and RNA data, serve as radiogenomics cohorts (n = 99). Data from the Duke Breast Cancer DCE-MRI set, also from TCIA, served as a prognostic cohort of 226 patients. Figure 1 provides detailed inclusion and exclusion criteria for the study.

Fig. 1
figure 1

Flowchart of patient selection and study design. SLNB: sentinel lymph node biopsy; ALND: axillary lymph node dissection; ALN: axillary lymph node burden

The baseline clinical characteristics and pathological data were obtained from the patient’s medical records, including age, menopausal status, clinical T stage, estrogen receptor (ER) status, progesterone receptor (PR) status, human epidermal growth factor receptor-2 (HER-2) status, histological grade, histological type, molecular subtype [22], Ki-67 proliferation index [23], ALN burden, and ALN status. Patients were categorized into specific groups based on the pathological ALN burden. These categories included the low-ALN burden group (< 3 metastatic ALNs) and the high-ALN burden group (≥ 3 metastatic ALNs).

MRI information acquisition

The parameters of breast MRI collection and scanning of the four centers, Duke University (DUKE), and TCIA are detailed in the supplementary materials (Table S1). MRI enhancement pattern of the tumor region was recorded as mass or nonmass patterns. MRI-ALN status is assessed by MRI [24, 25], where the presence of one or more suspicious metastatic lymph nodes was considered positive, and the absence was deemed negative. MRI-ALN burden is categorized as low for 0–2 and high for ≥ 3 suspicious nodes. Additionally, MRI images were independently evaluated by two experienced radiologists, referred to here as Physician A (M.P.H., 8 years of experience) and Physician B (Z.F, 7 years of experience). To ensure consistency in evaluation and to resolve any discrepancies between the two initial assessments, a third, senior radiologist (M.S.X, 31 years of experience) was consulted. This senior radiologist reviewed any conflicting cases and facilitated a consensus decision.

Radiomics analysis

The U2Net network architecture deep learning model was used for automatic volumn of interest segmentation of the peak phase images of the DCE-MRI scan. Two experienced radiologists, M.P.H. and J.Y.W. (22 years of experience) corrected the segmented volumn of interest. The MRI image preprocessing strategy was applied to minimize the effect of device differences between different scanners before feature extraction. The images were first resampled to a standardized voxel spacing of 1 × 1 × 1 mm3 (x, y, z) using a linear interpolation algorithm. The tumor and the surrounding 15-pixel three-dimensional area were segmented, and histogram equalization was conducted based on the field strength intensity. The PyRadiomics package (version 3.0.1) was used to extract radiomic features, such as shape, first-order, texture, wavelet, exponential, and square transform. The same image segmentation and feature extraction processes were reapplied to 50 cases in the cohort after 2 months to assess the intraclass correlation coefficient (ICC).

Construction of the ALN-burden prediction radiomic model

ComBat Tool and z-scores were first applied to standardize radiomics data. [26, 27]A series of feature selection strategies were used for feature dimensionality reduction. Initially, features were selected for further analysis based on both inter- and intra-observer ICC values > 0.75. Subsequently, rank-sum tests were performed to determine features with significant differences between low and high-ALN burden groups. The Spearman correlation coefficients (r) were then computed between each pair of features. Feature pairs with |r| > 0.6 were selected, following which the feature with the higher mean absolute correlation in each pair was eliminated. The upsampling method was conducted to mitigate model bias favoring the majority class. Elastic-logistic analysis was applied to identify the most predictive features. Finally, a backpropagation neural method was used to develop a radiomic model, and the predicted probability was regarded as the radscore.

Prognostic stratification analysis of ALN status-related Radscore

To evaluate and validate the potential prognostic stratification value of ALN status-related radscore, patients were divided into predicted high- and low-ALN burden groups based on the radscore cutoff value calculated from the minimum log-rank P value method of the training cohort to evaluate and validate the potential prognostic stratification value of ALN status-related radscore. The Kaplan–Meier curve and log-rank test were then performed to assess the prognostic differences between the predicted high- and low-ALN burden groups in both Center I and DUKE cohorts.

External validation of ALN-burden prediction

Cohorts from Centers II–IV were used for the external validation of the radscore to further confirm the generalization performance of the radscore in assessing the ALN burden. The evaluation metrics used included receiver operating characteristic curves and area under the curve (AUC), sensitivity, accuracy, and specificity.

Development and validation of the clinical model

Univariate and multivariate logistic regression analyses were used for selecting the ALN-related clinical variables. The candidate clinical factors included age, menopausal status, histological grade, histological type, MRI ALN status, MRI ALN burden, Ki67, ER, PR, HER-2, molecular subtype, and clinical T stage (Table S2). Subsequently, a combined model integrating all significant clinical predictive factors and radscore was developed, whereas a separate clinical model was established exclusively from relevant clinical factors.

Analysis of potential biological significance underlying the ALN-related Radscore

RNA transcriptomics data was collected from the TCGA-BRCA cohort and matched with the MRI data from TCIA-BRCA to reflect the potential biological significance underlying the ALN-related radscore to improve the biological interpretability of the radiomics model. A total of 99 patients were categorized into the predicted high- and low-ALN burden groups based on the same image preprocessing and radiomic modeling analysis.

For differential expression analysis, we utilized count type gene data, which is ideal for analyzing raw RNA-seq outputs. This analysis was performed on the TCGA-BRCA dataset using the edgeR R package, applying stringent selection criteria of a false discovery rate (FDR) less than 0.01 and an absolute log2 fold change (|log2FC|) greater than 2. The results from this differential expression analysis served as the basis for subsequent Gene Ontology (GO) Biological Process enrichment analysis. This enrichment analysis was conducted using the clusterProfiler R package, which facilitated the identification of significantly enriched molecular functions and pathways through hierarchical clustering analysis of the differential gene set.

For Gene Set Enrichment Analysis (GSEA) and immune infiltration analysis, we utilized log2-transformed FPKM type gene data. This transformation standardizes expression levels, enabling more accurate comparisons across samples. The GSEA was employed to determine pathways differentially enriched between patients with varying ALN burdens, thereby providing insights into the biological underpinnings of radiomic features.

Association analysis between radiomics and tumor immune infiltration

Scores for 64 types of immune cells were calculated using the XCell package from fpkm data in RNA-seq, obtained through count normalization, to validate the hypothesis that ALN-related radscore can reflect the heterogeneity of the tumor microenvironment. Rank-sum tests were used to identify differences in the scores of 64 types of immune cells between the high- and low-ALN burden groups. Finally, the Spearman method was used to evaluate the correlation between radiomic features and most immune cell scores.

Statistical analysis

Chi-squared, Fisher’s exact, or Mann–Whitney U tests were used to assess differences in clinical features. AUC values were used to assess model performance. DeLong test was used to compare differences in AUC values. McNemar’s test was used to assess performance differences between the radiomic model and expert reader review. The cutoff value of prediction model and survival analysis was calculated by the maximum Youden index and minimum log-rank P value method of the training cohort, respectively. Kaplan–Meier curve and log-rank test were performed to assess the prognostic differences between the predicted high- and low-ALN burden groups. R (version 4.3.1) and Python (version 3.7.3) were used for all analyses. P-values < 0.05 were considered statistically significant.

Result

Characteristics of the study population

Table 1 shows the clinicopathologic factors of 1536 patients with early-stage breast cancer from four Chinese centers, TCIA, and DUKE. The average age of the enrolled patients was 52.0 ± 10.7 years (range: 21–85 years), with 540 (44.6%) and 671 (55.4%) having positive and negative ALN status, respectively. Of the 540 patients with positive ALN status, 215 (39.8%) were identified with a high-ALN burden. Patients from Center I and DUKE underwent follow-up for overall survival (OS), consequently demonstrating a median [interquartile range] OS of 19.8 [9.42–41.2] months and 46.4 [28.8–63.1] months, respectively.

Table 1 Demographic and clinicopathological characteristics

Feature selection and radscore calculation

A total of 944 features were extracted from volumn of interest. Figures S1a–1b show that data distributions from different centers were relatively scattered before using ComBat, but these datasets converged after the center effect was eliminated using ComBat. First, 736 features with ICC > 0.75 were selected. Second, 388 features were selected using independent sample t-tests or Mann–Whitney U tests (p < .05). Third, 37 features were selected based on correlation analysis. Fourth, elastic-logistic regression analysis was performed on the remaining features after upsampling high-burden patients four times (Figures S1c–1d), reducing inter-sequence redundancy, ultimately yielding nine features (Table S3). Finally, a neural network method (BPNN model) was used to develop an ALN burden prediction model using the nine most predictive features, with their predictive probability constituting the radscore.

Performance of the prediction models

Univariate and multivariate logistic analyses identified menopausal status, MRI-ALN status, MRI-ALN burden, and radscore as independent predictive factors for ALN burden (Table 2). A clinical model and a combined model were constructed based on these factors (Table 2 and Table S2).

Table 2 Univariate and multivariate logistic regression analysis to assess the association of clinical characteristics and radscore with ALN burden

The BPNN radiomics model (with AUCs of 0.856, 0.781, 0.809, and 0.783 in training and three external validation cohorts) performed comparable to the combined model (AUCs of 0.899, 0.826, 0.812, and 0.803, DeLong’s test, P = .112–0.850), significantly outperforming the clinical model (AUCs of 0.771, 0.689, 0.620, and 0.643, P < .01). Table 3; Fig. 2a and d show the model performance details. The performance of BPNN radiomics model performed better than the MRI-ALN burden in four cohorts (McNemar’s test, p < .001). Figure 3 presents three typical cases demonstrating the clinical application of the radiomics model. Figure 3 illustrates that patient 1 was pathologically ALN-negative but misclassified as ALN-positive by MRI. In contrast, patient 2 was initially diagnosed as ALN-negative but was later found to have a low-ALN burden pathologically. Similarly, patient 3 was initially deemed ALN-negative but was later revealed to have a high-ALN burden.

Fig. 2
figure 2

Model performance analysis. Receiver operating characteristic (ROC) curves for various models are presented for centers I (a), II (b), III (c), and IV (d), along with the AUC and 95% confidence intervals shown in the bottom right. Kaplan–Meier curves illustrate the survival probabilities of patients in the predicted high- and low-ALN burden groups for in Center I (e) and DUKE (f)

Fig. 3
figure 3

Three typical cases demonstrated the clinical application of radiomics models. The ALN burden was correctly classified according to the radscore, and MRI image-based decision support was provided. Pathological images are 20×Microscope field

Table 3 Model performance evaluation and comparison

Prognostic stratification analysis of ALN status-related radscore

A cutoff value of radscore with 0.542 was calculated, and patients were categorized into predicted high- and low-ALN burden groups. Kaplan–Meier survival curve revealed that the predicted low-ALN burden group had significantly better OS than the predicted high-ALN burden group in the Center I (hazard ratio [HR] = 31.52, P = .034) and DUKE cohort (HR = 20.72, P = .031) (Fig. 2e and f).

Biological significance of the ALN-related Radscore

This study identified 231 differentially expressed genes (DEGs) between the predicted high- and low-ALN burden groups, comprising 120 upregulated and 111 downregulated DEGs in the predicted high-ALN burden group (Fig. 4a). Hierarchical clustering analysis revealed that the DEGs were mainly distributed in five functional modules, including response chemotaxis to adhesion; ERK1, ERK2, and MAPK cascade; blood activation body coagulation; receptor surface signaling pathway; and epidermal epidermis development differentiation (Fig. 4b). GO functional analysis revealed that epidermal cell differentiation-related, keratinocyte differentiation, and epidermis pathways were downregulated in the predicted high-ALN burden group (Figures S2a-c). Conversely, migration/invasion pathways, such as cell chemotaxis, regulation of chemotaxis, and cell-substrate adhesion were upregulated in the predicted high-ALN burden group (Figures S2d-f).

Fig. 4
figure 4

Transcriptomic and immunological analysis related to radscore. (a) Volcano plot of differentially expressed genes, with upregulation indicated in red, downregulation in blue, and nonsignificant genes in gray. (b) Clustered heatmap of pathway enrichment analysis. (c) Bubble chart of pathway enrichment based on gene ratio, with bubble size representing gene count. (d) Box plots of immune cell scores across two radscore groups

Associations between radiomic features and tumor immune infiltration

Significant differences in eight types of RNA-based immune markers between the low and high radscore groups were observed (Fig. 4d). Specifically, Mv Endothelial, Pericytes cells, and others were more abundant in the predicted high-ALN burden group, whereas common lymphocyte precursors, smooth muscle were more prevalent in the predicted low-ALN burden group. These results indicate differences in tumor immunity and the tumor microenvironment between the high- and low-score groups. Correlation analysis between radiomics features and immune cell scores revealed a strong negative correlation between radiomic features and immune scores (Fig. 5).

Fig. 5
figure 5

Spearman correlation heatmap between radiomic features and immune marks. Blue indicates a negative correlation, while red signifies a positive correlation. The deeper the color, the stronger the correlation. CLP, common lymphocyte precursors

Discussion

This study developed a radiomic model for assessing the ALN burden in patients with early-stage breast cancer based on the BPNN algorithm and successfully validated its performance across three external cohorts. Furthermore, we observed a correlation between higher radscores and poorer breast cancer prognosis. The heterogeneity of tumor cells and the tumor microenvironment across different radscore groups was revealed by integrating genomic data.

The performance of our radiomics signature was moderate compared to previous models although judged by AUC values (AUCs of 0.781–0.856 versus 0.810–0.865, respectively) [16,17,18], and our model exhibited robust generalizability across validation datasets comprising a large number of patients (n = 1211) scanned under six different scanners. The broad applicability of the model to diverse patient samples may be attributed to the use of the ComBat technique, which effectively mitigates potential center effects [28]. Additionally, this study used the deep learning method to automatically sketch volumn of interest, further improving the reproducibility of radiological features.

The status of ALN metastasis is well-established as a critical factor influencing the prognosis of patients with breast cancer [29, 30]. This study revealed a significant correlation between the radscore and breast cancer prognosis, indicating that the lower the radscore, the better the prognosis. This result indicates that radscore can be used as a valuable tool to tailor personalized treatment plans for patients.

Further exploration of RNA-seq data provided insight into the molecular underpinnings of the radiomic predictions in validating the radiomic score. GSEA revealed that the upregulation of cell adhesion pathways was particularly pronounced in individuals with a higher radiomic score. Noteworthy, the pathway from cell chemotaxis to adhesion plays a critical role in the mechanisms of tumor invasion and metastasis, possessing the capacity to facilitate tumor cell migration through the extracellular matrix and develop connections with distant tissues [31]. This process likely represents a potential mechanism associated with the observed increase in ALN burden. Conversely, pathways associated with epidermal cell differentiation and keratinocyte differentiation were downregulated in the high radiomic score group. The suppression of differentiation pathways indicates a disruption in normal cell function and an inclination toward a more undifferentiated, aggressive tumor phenotype capable of enhanced metastatic dissemination [32]. These results not only improve the biological interpretability of the model at the genetic level but also provide potential therapeutic strategies to target these pathways for curbing tumor dissemination.

Notably, significant differences in immune scores were observed between high and low radscore groups, indicating that radiomics predictions reflect variations in tumor immune status and tumor microenvironment. Higher scores were noted for common lymphocyte precursors within the low radscore group. Common lymphocyte precursors, closely related to new blood vessel formation, can disrupt the local microenvironment balance through active metabolism, potentially promoting tumor growth [33, 34]. Further analysis revealed a strong negative correlation between radiomic features and immune scores, indicating a potential for stronger immune escape capabilities in tumors within the high radscore group. This observation indicates that tumors may suppress effective immune cell infiltration through specific mechanisms, thereby diminishing the body’s immune response to the tumor and resulting in a poorer prognosis [35].

Additionally, the study determined that menopausal status also serves as an independent clinical predictor of ALN burden in early-stage breast cancer along with MRI-ALN status and MRI-ALN burden. This could be attributed to hormone receptor-positive breast cancer being more prevalent in premenopausal women, with hormonal level fluctuations potentially affecting tumor aggressiveness and metastatic potential [36]. However, the addition of the clinical features did not cause a significant incremental increase in predictive value for ALN burden (AUCs of 0.826, 0.812, and 0.823, Delong test, P > .05). This phenomenon stemmed from the interplay of coexisting features and the superior predictive performance of radiomic features compared to clinical features.

This study has several limitations. First, being a retrospective analysis, and future research will necessitate larger datasets in multicenter and prospective studies despite the application of an independent external validation cohort. Second, the result that radiomic features may reflect biological characteristics is based on indirect evidence. Plans include conducting targeted modulation of immune cells in breast cancer metastasis mouse models and performing MRI examinations to monitor dynamic changes in radiomic features.

In summary, our study successfully developed and validated an MRI radiomics model for predicting ALN burden in patients with early-stage breast cancer. The model demonstrated robust performance across different imaging centers, along with prognostic significance. Furthermore, our study provided valuable insights into the biological implications of the radiomic predictions, uncovering associations with tumor microenvironment variations, key signaling pathways, and cellular processes.