Introduction

80 to 90% of lung adenocarcinomas show heterogeneous histological patterns, and the predominant patterns have been found to be correlated with the prognosis [1, 2]. Accordingly, a three-tier grading system was proposed by the World Health Organization (WHO) in 2015, classifying lung adenocarcinomas as grade 1 (lepidic-predominant), grade 2 (papillary- or acinar-predominant), and grade 3 (solid- or micropapillary-predominant) [1, 2]. However, the Pathology Committee of the International Association for the Study of Lung Cancer (IASLC) proposed a new grading system in 2020, which combines the predominant subtype and high-grade components. This classification has been proven to have a higher prognostic performance than the previous system that based grading on the predominant subtypes alone [3].

Nearly 20% of lung adenocarcinomas identified with computed tomography (CT) lung cancer screenings present as pure ground-glass nodules (pGGNs), homogenous hazy lesions with preserved vascular and bronchial components and no solid components [4]. According to the 2020 grading system, pGGNs are more likely to be classified as grade 1 after pathological examination, with a few lesions classified as grade 2 [5]. However, in some cases the conventional radiographic characteristics are not aligned with the pathological findings, and the latter are considered the “gold standard” for diagnosis and treatment strategy.

Radiomic approach aims to analyze the tumor imaging phenotypes quantitatively and noninvasively in relation to the pathological and clinical outcomes to establish models for the classification of pulmonary lesion images and prognosis estimation [6]. Inflammation, invasiveness, cell migration, and subtle changes in the microenvironment are hallmarks of malignant tumors; therefore, these characteristics of the peritumoral parenchyma may provide useful information [7, 8]. Recent reports have indicated that peritumoral radiomics increases the efficiency of the prediction and classification of aggressive biological behavior [9,10,11]. However, its potential remains insufficiently explored, there is currently a dearth of literature systematically evaluating the radiomics model across multiscale peritumoral ranges for predicting IASLC grading system.

Here we therefore compared the efficacy to classify pGGNs of different radiomic models based on multi-scale intra- and perinodular regions to that of the new IASLC grading system.

Materials and methods

Patient population

This study was based on a randomized multicenter retrospective design and was approved by the relevant ethics committee. The requirement of informed consent was waived due to the retrospective nature of this study.

A total of 151 patients with pGGNs were recruited from three centers (Department of Radiology, Xiangtan Central Hospital, Xiangtan, Hunan, China, Department of Radiology, Affiliated Hospital of Guilin Medical University, Guilin, Guangxi, China, Department of Radiology, Liuzhou People’s Hospital Affiliated to Guangxi Medical University, Liuzhou, China). All patients underwent non-contrast chest CT examinations from January 2016 to June 2022. The inclusion criteria were as follows: (1) lobectomy or sub-lobectomy for lung cancer based on the histological evidence of adenocarcinoma; (2) surgical resection performed within 2 weeks following the CT scan; (3) radiological diagnosis of pGGNs (< 3 cm); and (4) the slice thickness of the CT images ranged from 0.625 mm to 1.250 mm. The exclusion criteria were as follows: (1) history of pulmonary surgery, chemoradiotherapy, or another malignancy; (2) inadequate CT image quality, such motion artifacts, or lower signal-to-noise ratio; (3) the largest nodules smaller than 1.250 mm; and (4) missing or incomplete CT scans. For patients presenting multiple pGGNs, only one nodule with a conclusive pathological diagnosis were included. The data from patients at centers 1 and 2 were used as the training cohort and those from patients at center 3 as the validation cohort (Fig. 1).

Fig. 1
figure 1

Flow diagram of the study

CT image acquisition

CT examinations were performed for all patients using a 64- or 128-slice spiral CT scanner (either Revolution CT [GE Healthcare, Chicago, IL, USA] or MX16 CT [Philips Healthcare, Best, Netherlands] at center 1; uCT550 or uCT760 [Shanghai United Imaging Healthcare, Shanghai, China] at center 2; and VCT 610 [Philips Healthcare] at center 3). The protocol of the CT scans has been previously published [12]. The CT acquisition parameters were as follows: tube voltage: 120 kV, tube current: 200 mAs, slice thickness: 5.0 mm, slice interval: 5.0 mm, pitch 1.2, and thickness of reconstruction: 0.625–1.250 mm. The standard or lung reconstruction kernel was used to reconstruct images.

Radiological and histological evaluations

In accordance with the recommendations by the Fleischner Society and previous studies [4, 13], the pGGNs were radiologically described as hazy opacities, allowing visualization of the typical pulmonary architecture. The features considered for the radiological model included the location, margin, lobulation, spiculation, vessel change, bubble-like sign, and pleural attachment, size, and CT density. The size refers to the longest diameter on the axial plane, according to the IASLC 8th TNM Lung Cancer Staging System [13]. The CT density was calculated as the mean of three measurements using a region-of-interest (ROI) cursor [13]. All image features were determined by consensus between two chest radiologists with 5–15 years of experience. During this collaborative process, the window widths for the lung window were set within the range of 1500 to 2000 HU, with corresponding window levels varying from − 450 to -700 HU.

Two board-certified pathologists independently verified the diagnosis, and subtyped and graded the tumors. Comprehensive histological subtyping was performed on each tumor according to the IASLC grading system, recording the percentage of each histological component in 5% increments [3]. Solid, micropapillary, and/or complex glandular components were regarded as high-grade patterns, and the tumors were classified into three grades: grade 1 (lepidic-predominant with < 20% high-grade patterns); grade 2 (papillary- or acinar-predominant with < 20% high-grade patterns); and grade 3 (any component with ≥ 20% high-grade patterns).

Nodule segmentation

In preparation for the radiomic analysis, all images were processed with ITK-SNAP software (version 3.6.0, http://www.itksnap.org). The images were segmented by manually drawing the region of interest (ROI) on each slice until the entire lesion was included. The segmentation process was performed by a junior trainee with 6 years of experience in thoracic imaging, followed by a senior radiologist with 16 years of expertise who reviewed and adjusted the segmentations. During this collaborative process, the window widths for the lung window were set within the range of 1500 to 2000 HU, and the corresponding window levels varied from − 450 to -700 HU. The primary tumor was defined as the gross tumor volume (GTV) region. To obtain the peritumoral volume (PTV), three additional regions were generated using a dilation operator with radial distances of 5, 10, and 15 mm from the original GTV, as reported in previous studies [9,10,11, 14]. The VOIs of GTV + 5/10/15 mm PTV indicate the entire regions including pGGN and the peripheral areas.

Feature extraction

The open-source Python package PyRadiomics v2.2.0 (http://www.radiomics.io/pyradiomics.html) was used to extract the radiomic features [15]. To standardize the gray-level intensity ranges across different participants, we applied z-score normalization using the formula z = (x-µ)/σ, where x represents the pixel intensity value, µ is the mean, and σ is the standard deviation. Next, to minimize acquisition-related radiomics variability, voxel dimensions were standardized across the cohorts by using cubic interpolation to achieve an isotropic voxel resolution of 1 × 1 × 1 mm³. Specifically, we used fixed binarization with 10 levels of bin-width. We generated 1222 radiomic features from each original CT scan, including first-order statistics, shape- and size-based and textural features, and filter features, such as sigma, log, and wavelets.

Feature selection and radiomic model

The stability and reproducibility of radiomic feature extraction were evaluated using Spearman’s rank correlation coefficient or Pearson’s correlation coefficient to identify and eliminate redundant parameters. Correlated features with correlation coefficients greater than 0.9 were excluded. After selecting the final radiomics signature, we used the LASSO method with 10-fold cross-validation to identify the optimal parameter λ for controlling the number of selected features. This method was chosen for its excellent predictive value and ability to reduce overfitting by identifying a low-correlated subset of features [16]. Thereafter, radiomics scores (rad-scores) were calculated by a summation of selected features weighted by their coefficients for each patient (Fig. 2). The formulation for the Rad score is presented below: Rad score = β0 + β1F1 + β2F2 + β3F3+…+βnFn, where β0 represents intercept, Fi(i = 1,2,.n) represents radiomics features, and βi represents the coefficient of Fi.

Fig. 2
figure 2

Diagram depicting the study design

Statistical analysis

We used independent Student’s t-tests for normally distributed variables, Mann-Whitney tests for non-normally distributed variables, and Chi-square tests for categorical variables to assess the differences between the training and validation cohorts. The radiological models were developed using both univariate and multivariate logistic regression. In the multivariate logistic regression, we used a stepwise backward elimination procedure. For statistical significance, we considered P values less than 0.05.

Receiver operating characteristic (ROC) curves were calculated to determine the diagnostic efficacy of the radiomic and radiological models, determining the AUC, sensitivity, and specificity. Figure 2 shows an illustration of the study design.

Statistical analyses were conducted using R software (version 3.6.3; R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.r-project.org/) and the R packages glmnet, rms, survival, reshape2, ggplot2, and plotROC.

Results

Demographic characteristics

A total of 151 persistent ground-glass nodules (pGGNs) were included in the study, comprising 117, 34, and 0 for grade 1, grade 2, and grade 3 lesions, respectively. These pGGNs were divided into a training cohort (n = 87) and a validation cohort (n = 64) based on their center of origin.

The study population had a mean age of 55.2 ± 11.6 years, and 29.8% (45/151) were male. The training cohort included 70 grade 1 and 17 grade 2 pGGNs, while the validation cohort comprised 47 grade 1 and 17 grade 2 pGGNs. There were no significant differences in the clinical and radiographic parameters between the two cohorts (all p > 0.05), as detailed in Table 1.

Table 1 Baseline data between training and validation cohort

Radiological model

In the training cohort, statistically significant differences emerged between pGGNs of grade 1 and 2 regarding size and spiculation (P = 0.031 and P = 0.025, respectively). These results are summarized in Table 2. Multivariate logistic analyses revealed that size (odds ratio [OR]: 1.11; P < 0.001) was the only radiographic characteristic independently associated with the IASLC grade for pGGNs.

Table 2 Patient demographics between pGGNs of grade 1 and 2

Radiomic models based on GTV and PTV

We implemented four different prediction models based on GTV and GTV + 5/10/15 mm PTV to classify pGGNs.

In the GTV model, 977 highly correlated features were eliminated, and 245 features were retained based on correlation coefficients < 0.9, reduced to seven after LASSO regression analysis. The process was repeated in the other models, and finally seven, eight, 11, and five features were incorporated into the final GTV and GTV + 5/10/15-mm PTV models, respectively (Fig. 3).

Fig. 3
figure 3

Lollipop diagrams representing the final radiomics features for the four models

In both the training and validation cohorts, the GTV rad-scores of grade-2 pGGNs were significantly higher than those of grade-1 pGGNs (P < 0.05, Fig. 4). Similar results were obtained for the GTV + PTV models; the rad-scores of grade-2 nodules were higher than those of grade-1 nodules in both cohorts (P < 0.05, Fig. 4).

Fig. 4
figure 4

Violin plots showing rad-scores between grade 1 and grade 2 across four models

Performance and comparison

The ROC analysis showed that the radiological model had an AUC of 0.756, sensitivity of 45.5%, and specificity of 92.2% in the training cohort, and an AUC of 0.665, sensitivity of 77.3%, and specificity of 53.1% in the validation cohort.

The efficacy of the GTV and GTV + PTV models in the training and validation cohorts were analyzed and compared (Table 3; Fig. 5). In training cohort, the AUCs of the GTV and GTV + 5/10/15-mm PTV were 0.869, 0.910, 0.951, and 0.872, respectively. The AUCs in validation cohort were, in that order, 0.700, 0.715, 0.745, and 0.724.

Table 3 Performance of four Radiomic models
Fig. 5
figure 5

Receiver operating characteristic curves demonstrating diagnostic efficiency for the four models

Notably, the GTV + 10 mm PTV radiomics model exhibited the highest efficacy compared to both the other scale radiomics models and the radiological model.

Discussion

In this study, radiomics models based on intratumoral and peritumoral pGGNs were constructed according to the 2020 IASLC staging system. Four radiomic models were constructed using different VOIs: the GTV and the GTV expanded by 5 mm, 10 mm, and 15 mm (GTV + 5/10/15 mm PTV). These models, along with a radiological model, were evaluated for their ability to differentiate grade 1 and grade 2 of pGGNs. Specially, the GTV + 10 mm PTV radiomics model exhibited the highest efficacy compared to the other radiomic models and the radiological model.

The new IASLC grading system has been proposed to improve prognostic prediction for lung cancer. The present study shows that different grades of pGGN lung adenocarcinomas can also be distinguished based on radiomic features. Although pGGNs were previously perceived as an “indolent” type of lung adenocarcinoma, some radiographic features, such as lobulation, spiculation, pleural invasion, and bubble-like sign, are associated with tumor invasion [17,18,19]. In the present study, only the CT size was independently associated with the IASLC grade of pGGNs of lung adenocarcinoma, consistent with the results of an earlier study by Fu et al. [13]. It is not surprising that larger nodules are more likely to be malignant; nonetheless, further investigations should be conducted.

Compared to basic CT characteristics, radiomic features can objectively and quantitatively determine intratumoral differences more effectively [6]. However, the assessment of these features is subject to intra- and interobserver variability, depending on the experience and expertise of the radiologists [19]. All four radiomic models in this study were more efficient than the radiological model (Table 3). Since the peritumoral radiomic features may represent malignancy hallmarks such as tumor cell migration, inflammatory infiltration, and subtle changes in microscopical level, they serve as a complement to the conventional intratumoral ones [7, 8]. Wang et al. incorporated radiomic features from GTV and PTV to preoperatively predict the presence of lymph node metastasis in T1 peripheral lung adenocarcinomas [9]. This resulted in an AUC of 0.843, which was higher than that of GTV or PTV features alone (AUCs of 0.829 and 0.825, respectively). Calheiros et al. observed improvements in solid lung nodule classification by incorporating peritumoral radiomic features [10]. A similar study showed comparable results when GTV and PTV features were combined; the predictions were further improved compared to the Ki-67 labeling index level in early-stage lung adenocarcinomas [11]. However, it remains to be determined whether a radiomic analysis based on the peritumoral parenchyma can effectively reflect the variations in the tumor and microscopical level between different IASLC grades of pGNNs.

In the present study, we investigated whether high-throughput radiomic signatures captured from the intra- and peritumoral regions could significantly differentiate pGGNs of grade 1 and grade 2; the highest number of significant radiomic signatures emerged in the GTV + 10-mm PTV model. These results indicate that the peritumoral parenchymal tissues within 10 mm of the tumor contain critical information that could reflect the differences between grade-1 and grade-2 pGNNs. In contrast, Wu et al. found that 5-mm PTV radiomic features were insufficient to provide an increased benefit in distinguishing invasive adenocarcinoma from adenocarcinoma in situ/microinvasive adenocarcinoma compared to 2-mm PTV features [20].

This study has some limitations. First, CT protocols and images varied between and within centers due to the retrospective nature of the study. Second, the GTV and PTV regions were manually delineated, based on visual inspection of CT images, and thus interobserver variability may limit its clinical value. Third, the number of lesions examined was limited and confirmation through larger studies is required.

In conclusion, the radiomic features extracted from the GTV and PTV regions of pGGN images can effectively differentiate grade 1 and grade 2 pGGNs, specifically when considering a PTV of 10 mm. This radiomics model has significant potential in aiding physicians to formulate comprehensive treatment strategies for patients with pGGNs.