Introduction

Hepatocellular carcinoma (HCC), the predominant histologic type of liver cancer, represents aggressive malignancy and high lethality [1], and is the third leading cause of cancer-related death [2]. Patients with advanced HCC are often assessed as unsuitable for surgery. Such patients usually underwent palliative operations or were treated with nonsurgical systemic approaches, including chemotherapy, radiation, targeted therapy, and immunotherapy [3, 4]. Unfortunately, these treatments offer a poor prognosis, with a median overall survival (OS) of approximately 1-year [5, 6]. Monitoring a patient’s disease status and identifying those with poor prognoses can aid clinicians in making informed treatment decisions and timely interventions.

Response evaluation criteria in solid tumors (RECIST) serves as a standardized method for evaluating oncologic treatment responses in clinical practices [7, 8]. It provides guidelines for measuring changes in tumor size on longitudinal radiographic imaging, and categorizes the treatment outcomes into four classes: complete response, partial response, stable disease, or progressive disease [9]. While disease progression is commonly associated with a poorer prognosis, it has not been definitively established that there is a strictly positive correlation between the two [10]. Therefore, the traditional guideline does not provide a reliable prediction of the OS of HCC patients in clinical practice.

Currently, the rising popularity of artificial intelligence has led to the widespread use of convolutional neural networks (CNN) to extract features from clinical images automatically and provide insight into disease prognosis [11, 12]. Many deep learning models exist to predict survival outcomes in HCC patients using pathological images, primarily Hematoxylin and Eosin slides [13,14,15]. Additionally, gene sequencing has been utilized to differentiate survival subpopulations of HCC patients [16]. These quantitative imaging studies all achieved favorable performances in forecasting the prognostic risk. However, single time-point biomarkers may be less effective in advanced stages, due to the lack of pathological and genetic data in some patients. Serial CT scans offer a non-invasive and informative approach to capturing tumor dynamics. However, to the best of our knowledge, there is currently no prognostic prediction model available that utilizes serial CT scans for advanced HCC.

Therefore, we aimed to develop a deep-learning model to predict survival outcomes for advanced HCC patients. The study data comprises a multi-center clinical trial cohort with patients previously treated with conventional therapies before participating in the immunotherapy trial. We collected the longitudinal CT imaging and a few clinical variables to construct the multi-modal prognostic model. The designed model utilized a convolutional-recurrent neural network (CRNN) structure to decode spatial information from CT images and extract temporal information of tumor changes between the baseline and the follow-up. Various prediction models with different input modalities were compared. To evaluate the added benefit provided by the proposed model, risk stratification ability was compared between the deep learning approach and the traditional RECIST criteria.

Materials and methods

Data collection

All data passed ethics review (application no. I2021173I) and were approved by the Human Genetic Resource Administration of China (approval no. [2021] GH5565), with all participants providing informed consent. This retrospective, multi-centered study includes patients with unresectable HCC treated with anti-PD-1 monoclonal antibodies (clinicaltrials.gov number: NCT03419897) from April 9, 2018 to July 6, 2022. Patients’ CT scans and clinical information were collected. Exclusion criteria were: (1) lack of complete abdominal CT scans on the venous phase and (2) loss to follow-up. After the sample selection, 207 patients from 52 multi-national centers (sites located in China, France, Germany, Italy, Poland, Spain, and the United Kingdom) were used to train and validate the model. Patients were randomly assigned by center for model development (comprising 161 patients from 37 centers) and evaluation (comprising 46 patients from 15 centers). The dataset for model development was further randomly divided into training and validation sets in an 8:2 ratio (Fig. 1).

Fig. 1
figure 1

Workflow of the patient selection process

CT imaging and clinical information acquisition

Patients were required to take oral or intravenous contrast before enhanced CT scans. Enhanced abdominal imaging was mandatory for all participants, and chest imaging was strongly recommended. The imaging methodology was kept consistent across visits for a patient (i.e., acquisition time for each phase, contrast agent, scan mode, and parameters). Venous-phase CT images were acquired using various scanners and then resampled and linearly interpolated to 5-mm section thickness. The abdomen scans were normalized to window width (WW) 400, window level (WL) 0, and chest scans were normalized to WW 1200, WL −600.

To reduce computational load while keeping the most informative parts of whole CT scans, three liver slices and three lung slices from each 3D CT scan were automatically selected as the input for the model. The selection process for representative slices of the liver and lung is as follows: If a tumor is present, three 2D slices with the top three largest tumor sizes in the axial view were chosen from the 3D scans. If there are fewer than three slices containing tumors, slices with tumors were first selected, then the remaining slices were selected based on the largest organ size (either liver or lung) to present the overall state of that specific organ. If there is no tumor present, all representative slices were selected based on the largest organ size (Fig. 2a). To achieve this, four pre-trained automated models were utilized to segment the boundaries of the lung, lung tumors, liver, and liver tumors. These pretrained segmentation models are based on nnU-Net architecture [17], which is a self-adaptive framework that can automatically adjust model hyperparameters based on dataset characteristics to achieve optimal performance. It has achieved state-of-the-art (SOTA) results in multiple organ and tumor segmentation tasks including liver and lung [18]. These selected 2D CT images were reshaped to 224 × 224 and standardized to the mean of zero and variance of one to serve as the input for the deep learning model. The training images were randomly flipped with a probability of 0.5 for data augmentation.

Fig. 2
figure 2

Image selection and model illustration. a The process of representative CT image selection, from 3D scans to selected 2D liver and lung scans. b The flowchart for the deep learning model. Feature dimensions are marked. M, taking the average

The clinical data comprised of seven variables: tumor histological differentiation type, presence of non-alcoholic steatohepatitis (NASH) or non-alcoholic fatty liver disease (NAFLD), prior surgical history, presence of partial or complete portal vein tumor thrombosis (PVTT), treatment of external beam radiation therapy (EBRT), transarterial embolization (TAE)/transarterial chemoembolization (TACE), and radiofrequency ablation (RFA)/microwave ablation (MWA).

The study label, i.e., the OS, was retrieved from the clinical recorded form (CRF). For censored patients, censor time was defined as the maximum value between the time recorded in the CRF and the time of the latest CT scan.

Model development

Figure 2b presents the workflow of the deep learning model (‘Rad-D’ in the subsequent content) to process radiological images. The model inputs consist of the baseline and first follow-up scan. Every patient has the baseline and the follow-up abdominal scan as part of the inclusion criteria and imaging acquisition protocols. Whereas, if follow-up chest images are unavailable, the baseline image would be replicated and used as a substitute for the follow-up, which is a common practice referred to as the last observation carried forward (LOCF) strategy [19] in longitudinal data analysis. LOCF is a method for imputing missing data in a dataset, which involves replacing any missing values with the last known, non-missing value for that data point. LOCF can increase the utilization of data without increasing the complexity of the model.

A total of 12 images, with three images for each organ at each time point, were fed into EfficientNet-b0 [20] with weights pretrained on ImageNet to extract features. EfficientNet-b0 is a lightweight CNN backbone that adopted compound scaling to enhance performance. The extracted features of the three representative images from both baseline and follow-up were averaged and then fed into the long short-term memory (LSTM) module [21] to capture the temporal information. LSTM is a type of recurrent neural network (RNN) that leverages memory cells and gating mechanisms to effectively capture and process sequential data. Then, the liver and lung features were concatenated, and linear layers were utilized to predict the risk score.

Brief descriptions of models with different input modalities are listed here:

CLN: a multivariate Cox proportional hazard model performs risk regression using seven clinical variables (Fig. S1a).

Rad-S: a deep learning model without follow-up scans and the RNN layers. It only used EfficientNet-b0 to extract features from baseline images (Fig. S1b).

RadCLN-S: a multi-modal model that combines the risk score from CLN and Rad-S and then uses a bivariate Cox model to calculate to predict the prognostic risk (Fig. S1c).

Rad-D: the above-mentioned deep learning model that assessed both baseline and follow-up CT scans using CRNN structure.

RadCLN-D: a multi-modal model that combines the clinical risk score from CLN and the radiological score from the output of Rad-D by using a bivariate Cox model to predict the prognostic risk (Fig. S1d).

The deep learning model’s gradient optimization process adopted the loss function of partial likelihood, which is an unbiased and efficient way to estimate the parameters of the Cox proportional hazards model.

The model was trained 200 epochs using 1 × NVIDIA A100 (40 G), with the CRNN implemented in PyTorch 1.11.0 and MONAI 0.9.1, utilizing a batch size of 16, and the Adam optimizer with a learning rate of 5 × 10−5. The code is available at https://github.com/EstelleXIA/ProgHCC.

Evaluation metrics and statistical analyses

Model performance was evaluated using Harrell’s concordance index (C-index) [22] and the time-dependent area under the receiver operating characteristic curves (AUCs) at different time points [23]. C-index quantifies the model’s capability to correctly rank the relative risks of pairs of individuals. Survival estimates were calculated using the Kaplan–Meier method for low and high-risk groups, which were stratified based on the median prediction score of the training set. Hazard ratios (HR) were further computed and the significance was measured using the log-rank test. For the RadCLN-D model, Cox regression coefficients were used to generate a nomogram. Calibration curves were utilized to show concordance between actual and predicted outcomes determined by the nomogram.

Statistical tests were performed with survival, survcomp, survminer, timeROC, and rms packages in R4.2.2. A two-sided p < 0.05 indicated statistical significance. Model interpretation used Gradient-weighted Class Activation Mapping (Grad-CAM) [24] and was visualized by pytorch-gradcam 0.2.1.

Results

Patient characteristics

Characteristics of the 207 patients (mean age, 61 years ± 12 [SD], 180 male) can be found in Table 1. Among all patients, the median interval between baseline and follow-up CTs is 55 days. The median survival time is 475 days, in which 138 patients (66.7%) have deceased. There was no significant difference in survival status among the training, validation, and test datasets (Fig. S2, train vs validation, HR, 1.051, 95% confidence interval [CI]: 0.662–1.669, p = 0.833; train vs test, HR, 1.032, 95% CI: 0.686–1.552, p = 0.880; validation vs test, HR, 1.039, 95% CI: 0.604–1.790, p = 0.889). For histological types, out of 207 patients, 29 (14%) were highly differentiated, 154 (74%) were moderately differentiated, 23 (11%) were low differentiated, and 1 (< 1%) was undifferentiated. For the baseline symptoms, 35 (17%) had NASH/NAFLD and 36 (17%) had PVTT. For the additional treatments, 109 (53%) had undergone surgery, 10 (5%) received EBRT, 121 (58%) received TAE/TACE, and 58 (28%) received RFA/WMA. Multivariable Cox regression calculated a risk score based on the seven variables using the formula: \({\rm{Score}} =0.3747 \times {\rm{Differentiation}} + 0.1593\times {\rm{NASH|NAFLD}} \! - \!0.1801\times {\rm{Surgery}}+0.6732 \times {\rm{PVTT}}-0.8235\,\times \,{\rm{EBRT}} + 0.6482 \,\times {\rm{TAE| TACE}}-0.4497 \times {\rm{RFA|MWA}}\) (Fig. S3, C-index = 0.630, p = 0.018).

Table 1 Patient characteristics

Comparisons of different models on the survival prediction

Prediction performances were compared among CLN, Rad-S, RadCLN-S, Rad-D, and RadCLN-D on both the validation and independent test sets (Tables 2 and 3 and Fig. S4). Clinical variables displayed unfavorable prediction performances, with a C-index of 0.537 (95% CI: 0.406–0.668) on the validation set and 0.622 (95% CI: 0.500–0.744) on the test set. Using the baseline radiological image achieved the C-index of 0.692 (95% CI: 0.569–0.815) on the validation set and 0.608 (95% CI: 0.504–0.712). By incorporating the first follow-up image, performance demonstrated significant improvement, reaching 0.748 (95% CI: 0.664–0.832) and 0.681 (95% CI: 0.573–0.789) on the validation and test sets, respectively. Multi-modal inputs (RadCLN-S and RadCLN-D) outperformed the uni-modal (CLN, Rad-S, and Rad-D). RadCLN-S reached the C-index of 0.697 (95% CI: 0.574–0.820) on the validation set, and 0.638 (95% CI: 0.536–0.740) on the test set. RadCLN-D attained 0.752 (95% CI: 0.660–0.844) on the validation set, and 0.695 (95% CI: 0.581–0.809) on the test set. Time-dependent ROCs showed a similar pattern in survival prediction performances (Figs. S4b and S4c).

Table 2 Concordance index of the prognostic prediction models
Table 3 Time-dependent AUCs of the prognostic prediction models

To further demonstrate the prognostic predictive ability of the model, the models’ capability for risk stratification was assessed. RadCLN-S, Rad-D, and RadCLN-D effectively stratified patients into high-risk and low-risk groups (Figs. 3 right, S5, and S6), demonstrating the models’ ability to identify survival risk using clinical information and baseline CT scans or solely baseline and first follow-up scans. Among all, RadCLN-D exhibited the highest predictive performance, as it included the most comprehensive information.

Fig. 3
figure 3

Performances of RadCLN-D. Left, the distributions of risk scores based on the multi-modal predictions are presented. Heatmaps are displayed to illustrate the distribution levels of the two modalities (radiological and clinical). Middle, time-dependent ROC curves at 1-year and 2-year. Right, Kaplan–Meier survival estimates for the OS, were stratified into low-risk and high-risk groups according to the median risk score in the training set. AUC, area under the curve; HR, hazard ratio

RadCLN-D accurately predicts the OS

Performances of RadCLN-D were further detailly illustrated. RadCLN-D combines the radiological score from the output of the CRNN structure and the clinical score with the formula \({{\rm{Score}}}\,=\,9.8834\times {{\rm{Radiolog}}}{{{\rm{y}}}}_{{{\rm{score}}}}+0.5300\times {{\rm{Clinica}}}{{{\rm{l}}}}_{{{\rm{score}}}}\), the two modalities all significantly contributed to the OS prediction (Fig. 4b, radiological, p < 0.001; clinical, p = 0.0056; Wald Test). For 1-year OS predictions, the AUC is 0.966 in the training set, 0.777 in the validation set, and 0.704 in the test set. For 2-year OS predictions, the AUC is 0.983 in the training set, 0.839 in the validation set, and 0.652 in the test set (Fig. 3, middle). Patients with lower multi-modal scores tend to be censored or with a relatively longer survival time, while most patients with higher scores suffered early decease (Fig. 3, left). The median score from the training data was used to apply a cutoff for stratifying patients into high-risk and low-risk groups, i.e., ‘score > 0.66’ signifies high-risk, and ‘score ≤ 0.66’ signifies low-risk. To examine the generalizability, the risk score calculation, and cutoff stratification used in the validation and test sets were consistent with those of the training set. The multi-modal score displayed reliable predictive accuracies. It made significant risk stratifications in all the training (HR, 24.173, 95% CI: 12.181–47.971, p < 0.001), validation (HR, 3.330, 95% CI: 1.369–8.102, p = 0.008), and test sets (HR, 2.024, 95% CI: 1.009–4.064, p = 0.047).

Fig. 4
figure 4

Establish and validate the nomogram of RadCLN-D. a For the radiological score and the clinical score, locate the corresponding value on the scale provided on the nomogram, then add up to get the total points. A vertical line from the total points value is to the predicted probability of the 1-year and 2-year survival probability. b Importance of the radiological score and the clinical score. c Calibration of the nomogram in terms of the agreement between predicted and observed 1-year survival outcomes

A nomogram was developed based on the RadCLN-D prediction model to determine the OS for individual patients (Fig. 4a). It allows clinicians to estimate the 1-year and 2-year survival probabilities in a clear and concise manner. Calibration plots indicated favorable comparability between the nomogram and an ideal model across the training, validation, and test datasets (Fig. 4c).

Moreover, to assess the robustness of the model, the patients in the validation set and test set were grouped according to the manufacturers used. Among the 78 patients, the main manufacturers were Siemens (37 patients) and General Electric (33 patients). Due to the small sample sizes of other manufacturers such as Philips, TOSHIBA, and Hitachi Medical, these patients were not included in the analysis. The prognostic performance of patients with two major manufacturers was compared, with a C-index of 0.728 for the Siemens group and 0.707 for the general electric group.

Interpretation of the deep learning model

To demonstrate the explainability of the deep learning model, four patients from the test set, including two predicted high-risk and two predicted low-risk by Rad-D, were presented to interpret the constructed CRNN architecture. Heatmaps highlighted the regions of the image that contribute most to the network’s decision-making process. The prognostic model focused particularly on the tumor regions (Fig. 5a, b, liver scans, the hottest region on the tumor), which is consistent with common medical knowledge that regions with high malignancy correlate strongly with prognosis. Non-liver-malignancies in two low-risk patients resulted in hot regions of the whole liver detected by the model (Fig. 5c, d, liver scans). Similar heatmap patterns predicted by the deep learning model can be observed in lung scans, i.e., the model focused more on suspicious lesion areas.

Fig. 5
figure 5

Interpretation of the deep learning model. Grad-CAM computes the gradients of the target class’s score (i.e., the risk score) with respect to the feature maps in the last convolutional layer of the network. These gradients are then weighted by the average pooling of the gradients to obtain the importance weights of the feature maps then normalized to [0,1] (with the blue color close to zero and the red color close to one) and linearly combined with the original feature maps. The four demonstrated cases are selected from the independent test set. a, b Two patients with model-predicted high-risk. c, d Two patients with model-predicted low-risk

Incremental value of RadCLN-D to traditional size-based method

RECIST outcomes assessed by an independent review committee were adopted and patients with a progression status were assigned to the high-risk group. Risk stratification performances of RadCLN-D and the conventional RECIST criteria were compared (Fig. 6). RECIST outcomes showed acceptable risk prediction performance as the response status significantly stratified the high-risk group and the low-risk group (HR, 1.992, 95% CI: 1.119–3.545, p = 0.019). Whereas, RadCLN-D exhibited stronger categorization capability (HR, 2.450, 95% CI: 1.424–4.214, p = 0.001), suggesting an improvement of the deep learning-based method over the conventional size-based method.

Fig. 6
figure 6

Risk stratification by the deep learning model and the conventional RECIST criteria. Results were made on the combination of the validation set and the test set. a Risk stratification by the deep learning model RadCLN-D, the high-risk group is defined as a score > 0.66, and the low-risk group is defined as a score ≤ 0.66. b Risk stratification by the RECIST criteria, the high-risk group is defined as disease progression at the first follow-up, and the low-risk is defined as no progression observed at the first follow-up. HR, hazard ratio; RECIST, response evaluation criteria in solid tumors

Discussion

In this study, we developed and validated a deep learning model that decodes spatial-temporal information from radiological imaging and clinical variables to predict the prognostic outcome of advanced HCC patients. Our multimodal approach combines the baseline and first follow-up scans with clinical information, reaching a 1-year AUC of 0.777 in the validation set and 0.704 in the independent test set. Additionally, models with missing modalities, i.e., the single-modal imaging-based model (Rad-D) and the model incorporating only baseline scans (RadCLN-S), can still achieve favorable risk stratification performance (with all p < 0.05, except for RadCLN-S on the test set, p = 0.053). Compared to conventional RECIST criteria, the deep learning model exhibits superior prognostic prediction ability (RECIST, HR, 1.992, p = 0.019; RadCLN-D, HR, 2.450, p = 0.001). This study shows that deep learning analysis of CT scans can yield valuable prognostic information to guide surveillance of immunotherapy-treated advanced HCC patients.

Prognostic prediction for HCC can help doctors formulate more targeted treatment plans and then maximize the treatment effect for patients. With the development of radiomics and deep learning techniques, there are several prognostic prediction models for HCC. He et al [25] developed a survival model for macrotrabecular-massive HCC patients using radiomics extracted from enhanced abdominal CT scans. Meng et al [26] adopted a deep learning model that utilized CNN to analyze MRI images for early recurrence prediction after hepatectomy. Xu et al [27] proposed a deep prediction network that utilizes information on both full liver and tumor masks from CT images to predict early recurrence. Zhang et al [28] used pretrained CNN models to extract features from CT scans, then employed machine learning methods to provide OS predictions in unresectable patients treated with sorafenib. Wei et al [29] proposed a deep learning model that utilized automated segmentation-based MRI radiomic signature to estimate the postsurgical early recurrence risk. However, these prognostic models have not been specifically developed for advanced HCC patients receiving immunotherapy. HCC patients treated with immunotherapy usually undergo multiple imaging follow-ups to aid in observing the efficacy of the drugs. Therefore, these models that only utilize data from a single time point, can fail to capture the dynamic growth characteristics of the tumor. Besides, the lung is the most common site of metastasis for advanced liver cancer. Therefore, evaluation of the chest condition is important for prognosis prediction. A direct application of existing methods to advanced HCC can affect the performance of the model, as they solely focus on the liver and neglect the lung.

In our developed model, we used both liver and lung images at baseline and follow-up, with a CRNN structure, to make prognostic predictions. CNNs can proficiently extract prognostically relevant features from radiological images. Integrated with clinical information, RadCLN-S enables precise survival predictions. Moreover, the RNN architecture efficiently captures tumor changes between baseline and follow-up measurements. The integration of temporal and spatial information results in improved risk stratification outcomes.

The validation results of the model performance from multiple perspectives (i.e., time-dependent AUC, risk stratification capability, and overall C-index) on multi-center data illustrate the practicability and superiority of the deep learning model. The comparative results of models under various input settings indicate that the model can still yield reasonably accurate predictive outcomes despite the absence of certain modalities. Furthermore, the comparison with the RECIST criteria demonstrated that, compared to the manual assessment of the growth trend of the whole-body lesions, deep learning encapsulates additional information beyond the size, adding incremental information for prognostic predictions. Our model has the potential to be effectively utilized for prognostic prediction in advanced HCC patients and aid clinicians with adaptive treatment planning to improve patient outcomes.

Our proposed prognostic model is interpretable. Heatmaps can provide a coarse location of regions relevant to prognosis, which demonstrates the interpretability of the radiological input. For heatmaps of the liver, although we did not explicitly input the liver or liver tumor region as auxiliary information in the model, the heatmaps consistently reveal highlighted areas within the liver. Specifically, in the two high-risk patients, the core areas colored the deepest red were localized to the tumor region, suggesting a strong association between the presence of the tumor and the unfavorable prognosis predicted by the model. Moreover, the model’s focus on the tumor area allows for a more comprehensive capture of tumor characteristics, such as size and morphology. These tumor-related features are closely associated with prognosis.

Besides, the chosen clinical variables in this study are also explainable. It comprises frequently used treatments for HCC, such as surgical resection and local non-surgical interventions, as well as previously established and reported prognostic variables. The results of the clinical Cox regression model (CLN model, Fig. S3) align with previous studies’ evidence. The German Cancer Research Center [30] reported that immunotherapy fails to provide a survival benefit in patients with NASH/NAFLD, owing to an accumulation of abnormal CD8 + /PD-1 + T cells within the liver. PVTT is considered a sign of advanced-stage disease, with untreated cases only having a median OS of 2.7–4.0 months [31]. Histological differentiation type is a recognized prognostic factor, with higher differentiation types typically indicating a better prognosis [32, 33]. As for treatments, patients previously treated with RFA/MWA, EBRT, or surgical resection all exhibited positive prognostic effects. There was an inverse relationship between TAE/TACE therapy and prognosis, which may be due to the higher frequency of TAE/TACE intervention in patients with more severe conditions.

Our proposed model is efficient in processing radiological imaging. The representative 2D scans were automatically selected from the original whole-body CT scans using pretrained SOTA segmentation models. Most previous works manually delineate certain tumor regions to generate the inputs [34, 35], which is labor-intensive and impracticable for advanced HCC due to the presence of multifocal hepatic lesions. Furthermore, the inclusion of the tumor and its surrounding tissue within the model inputs can provide additional indications for survival predictions. Some studies [36, 37] have employed semi-automatic seed growing to create the region of interest, but it still needs lots of human efforts in lesion identifications. Deep learning-based segmentation models can achieve considerable performance in automatically depicting tumor and tissue regions of the lung and liver [17, 38, 39]. Therefore, we incorporated the SOTA model to assist with the representative slice selection, which ultimately makes the model more user-friendly.

Our study data come from multiple centers, and the diversity of the data enables the model to be adaptive to different vendors, making it more robust and transferable across various imaging settings. Experimental results demonstrate that our model performs similarly across different devices, demonstrating the generalizability of our model.

Our study has limitations. First, some advanced liver cancer patients may have other metastatic lesions apart from the lung, such as the brain, lymph nodes, and adrenal glands. Though it has been reported that distant organ metastases, excluding pulmonary metastases, typically do not have a detrimental effect on the prognosis [40], future works can consider more metastatic organs for potential improvement of the prediction performance. Second, the study data were retrospectively collected, which can introduce various biases. Prospective studies with larger sample sizes should be conducted to further validate the practicability of the proposed model.

In conclusion, deep learning analysis of CT scans using the multi-modal CRNN model can provide valuable prognostic information, enabling the effective surveillance of patients with advanced HCC. The proposed approach could empower clinicians to make informed decisions regarding patient management and follow-up strategies based on the identified risk stratification patterns derived from the CT scans and the clinical information.