Background

Colorectal cancer (CRC) is the second leading cause of cancer-related death worldwide in 2020 [1]. Although patients with stage I disease generally experience favorable prognosis, there were reportedly 10% of them who developed recurrent tumors within 5 years after curative resection [2,3,4,5,6,7]. With strengthened population-based screening programs, a continuing increase in the proportion of stage I cases is expected [8], underpinning compelling rationale to identify stage I patients who are predisposed to inferior survival outcomes.

Our previous umbrella review found a dearth of solid evidence on factors influencing survival outcomes nor available prediction tools for stage I CRC [9]. The major challenge lies in the low frequency of unfavorable outcome events as well as the lack of prior knowledge on candidate factors. Meanwhile, well-established prognostic indicators for stage II tumors to inform adjuvant treatment, such as suboptimal lymph node retrieval and perineural invasion (PNI), have been recommended by guidelines from international societies, such as the European Society of Medical Oncology (ESMO) and National Comprehensive Cancer Network (NCCN) [10, 11]. Given that most of these factors are also present in stage I CRC, they may also exert impact on survival outcomes of these patients. Herein, we report a discovery-validation study investigating prognostic effects of these factors and developing a prognostication tool for stage I patients using two prospective cohorts.

Methods

Study population

This study enrolled consecutive patients diagnosed with CRC at the West China Hospital (WCH, Chengdu, China) between April 2009 to April 2016 as the discovery cohort, whereas the validation cohort was comprised of individuals diagnosed within the same time span at the Peking University Cancer Hospital & Institute (PUCHI, Beijing, China). The study was performed in accordance with the Declaration of Helsinki and reported adhering to the STROBE statement for observational studies [12], with the STROBE checklist presented in Additional file 1.

We included patients who underwent R0 resection with pathologically diagnosed stage I tumor according to the American Joint Committee on Cancer (AJCC) TNM staging manual. The exclusion criteria included the following: (1) patients having received any neoadjuvant treatment, (2) patients with multiple tumors, (3) patients undergoing endoscopic resection due to unavailable pathological evidence of lymph node status, and (4) patients who died within one month after surgery. Of note, patients having received endoscopic resections prior to the curative surgery were also enrolled as evidence showed that the preceding endoscopic procedure had no significant adverse effects on long-term survival outcomes (5).

Candidate prognostic factors

Based on the recommendations from NCCN, prognostic parameters for high-risk stage II colon cancer patients included: pathologically confirmed T4 stage, undifferentiated or poorly differentiated tumors (G3 or G4), lymphovascular invasion (LVI), PNI, lymph nodes sampling < 12, with obstruction or perforation, and positive resection margins [11, 13]. These factors above were also listed in the recommendations from the Chinese Society of Clinical Oncology (CSCO) [14]. In addition, the ESMO guideline noted the level of preoperative carcinoma embryonic antigen (CEA) as a prognostic factor for stage II tumors [10]. As positive margins and presentations of obstruction or perforation rarely occurred in stage I patients, we included the remaining six factors, namely T stage (T2 vs. T1), tumor grade (G3 or G4 vs G1 or G2), LVI, PNI, suboptimal lymph node examination (< 12), and CEA, in conjunction with age and gender as candidate covariates.

Patient follow-up and survival outcomes

A standard follow-up scheme was applied to participants in both cohorts, and details can be found in Additional file 2. In view of the low prevalence of inferior outcomes, we adopted disease-free survival (DFS) as the primary outcome to obtain maximum statistical power. The DFS was defined as the time span from the date of surgery to recurrence at any sites, death, or the date of last follow-up. Recurrence was confirmed by biopsy or diagnosed by at least two radiologists via CT or MRI scans. We also employed overall survival (OS) and recurrence rate as secondary outcomes. CRC-specific deaths and site-specific recurrences were also investigated as additional outcomes for sensitivity analysis. The latest patient follow-up for both cohorts was completed in May 2020.

Statistical analysis

Identification of prognostic indicators

In descriptive analysis, log-transformation was conducted to normalize the continuous variables with skewed distribution. In consideration of power loss and possible biases caused by dichotomization, we kept the original scale of continuous variables in survival analysis [15]. With respect to missing data, we adopted a multiple imputation approach to impute the missing values of patient characteristics [16]. Survival rates were estimated using the Kaplan–Meier approach. In discovery analysis, we first fitted univariable proportional hazard Cox models to estimate effects of candidate risk factors on DFS and other outcomes. A two-sided p value < 0.05 was used as the threshold to select potential factors, which were then validated the using the PUCHI cohort. A successful validation was determined by a two-sided p < 0.05.

Predictive modelling and visualization

A multivariable Cox model including features identified from univariable analysis was fitted in the WCH cohort, factors with significant impact (p < 0.05) in this model were selected, and their coefficients were retained to generate predicted survival estimates in the validation cohort in order to evaluate the external model performance. We conducted analysis of variance to examine non-linearity of continuous predictors, and restricted cubic splines (RCS) were utilized to model any non-linear associations [17]. We then calculated a concordance index (C-statistic) along with the 95% confidence interval (CI) to assess the discriminative ability of the model. A C-statistic with a 95% CI excluding 0.5 indicated significant discriminative ability. We then constructed calibration curves by plotting the predictive 5-year survival estimates against the observed rates and visually assessed the discrepancies. A nomogram along with an online calculator was developed based on the validated model to provide a plug-in tool for clinical use.

To visualize the model performance, a prognostic index was created using the linear predictor, and a Kaplan–Meier curve for the validation cohort was plotted based on the optimal cut-off value derived from the discovery phase using an iterative approach via the X-tile software [18]. We also plotted the time-dependent trends of area under the curves (AUC) to exhibit discriminative ability of the prediction model at various follow-up time [19]. With respect to possible clinical utility, decision curve analysis (DCA) was conducted to estimate net benefits of applying the prediction model in clinical decision-making compared to the null model in which all participants were considered at the same risk level [20].

Results

Patient characteristics

Based on the inclusion criteria, 728 from WCH and 413 patients from PUCHI were included (diagram of patient selection in Fig. 1). Essential characteristics for enrolled patients are summarized in Table 1. There was a significantly larger proportion of rectal cancer patients and a smaller proportion of left colon cancer patients in the discovery cohort (p < 0.001). In addition, we observed a higher percentage of patients with poor tumor differentiation (G3 or G4) in the discovery cohort (19.7% vs. 7.3%).

Fig. 1
figure 1

Diagram for patient selection of two study cohorts

Table 1 Summarized distribution of essential characteristics of the discovery and validation cohorts

During the follow-up time span, 39 deaths along with 49 recurrences (44 distant and five local) occurred in the discovery cohort; meanwhile, 28 patients died and 17 developed recurrent tumors, among whom 13 had distant recurrences, in the validation cohort. We observed a 5-year DFS of 91% (95% CI: 89–93%) for the discovery cohort and 92% for the validation cohort (95% CI: 90–95%, log-rank test p = 0.950) (Additional file 3: Figure S1). Similarly, we did not find significant differences with respect to 5-year OS (95% vs. 94%, p = 0.330) and recurrence rate (6.9% vs. 3.8%, p = 0.130) across the two cohorts.

Risk factors for survival outcomes

In discovery analysis, elder age at diagnosis and higher preoperative CEA was significantly associated with worse DFS (age per 1 year: HR = 1.04, 95% CI: 1.02–1.07, p < 0.001, CEA per 1 log-transformed unit: HR = 1.46, 95% CI: 1.13–1.87, p = 0.003). We also found significant effect of PNI on DFS (HR = 4.26, 95% CI: 1.70–10.67, p = 0.002). These three factors retained their significant influence on DFS in the validation cohort (details in Table 2). We failed to validate T2 stage and suboptimal lymph node examination (< 12) as prognostic indicators in validation analysis although they were significantly associated with inferior DFS in the WCH cohort (Table 2). With regard to secondary outcomes, age and CEA was observed to be linked with OS, while PNI were linked to tumor recurrence rates in both discovery and validation cohorts (p < 0.05, details in the Table 2).

Table 2 Summarized effect estimates of univariable Cox regression in discovery and validation analysis

As for sensitivity analysis, higher preoperative CEA was also observed to be associated with inferior CRC-specific survival (CSS) in the two cohorts (p < 0.05, Additional file 4: Table S1). With respect to recurrence types, risk factors presented similar distributions across patients with recurrences at local and distant sites (Additional file 4: Table S2). Given the rareness of recurrences at local sites (14%), site-specific survival analysis was only conducted for distant sites, and presence of PNI retained as significant indicator for higher risk of distant recurrences in both cohorts (p < 0.05, Additional file 4: Table S3).

Prediction of inferior outcomes

By fitting multivariable Cox models in the discovery set, we identified four predictors, i.e., Age, CEA, PNI, and LYmph nodes examined (Additional file 4: Table S4, p < 0.05), which were subsequently utilized to develop the ACEPLY model forecasting DFS. Analysis of variance found significant non-linearity (p < 0.05) between age at diagnosis and DFS, and therefore, a restricted cubic spline (RCS) was applied to model the categorized effects of age. The dose–response relations between age, CEA, and DFS are shown in Fig. 2. The prediction rule is presented as a nomogram in Fig. 3. We also created a web-based ACEPLY tool to provide plug-in calculation as well as visualization of predicted DFS (https://webcalculator.shinyapps.io/DFS_ACEPLY/). For example, the ACEPLY yielded an expected 5-year DFS of 57.0% for a 65-year stage I patient with preoperative CEA of 20 ng/ml, positive PNI, and less than 12 nodes examined.

Fig. 2
figure 2

Dose–response association of age at diagnosis and preoperative CEA level with disease-free survival. A Non-linear relationship between age and DFS. B Linear relationship between log-transformed CEA and DFS

Fig. 3
figure 3

Nomogram of the ACEPLY tool predicting 3- and 5-year disease-free survival of stage I colorectal cancer patients

With respect to the model performance, the validation cohort was divided into high- and low-risk group based on the optimal cut-off value of the prognostic index derived from the discovery cohort. Patients in the high-risk group of the validation cohort showed significantly inferior survival outcomes (Fig. 4A). We evaluated the external discriminative performance and obtained a significant overall concordance index of 0.69 (95% CI: 0.60–0.77). This was further confirmed by time-dependent AUC analysis at various time points, with the discriminative ability peaked after 5 to 6 years since diagnosis (Fig. 4B). Acceptable calibration was observed based on the overall agreement between the predicted and observed 5-year DFS (calibration plot in Fig. 4C). DCA identified significant net benefit of adding the model to decision-making at a relatively low threshold probability (0.1–0.2, Fig. 4D). Prediction models were also developed for OS and recurrence using the same approach (effect estimates presented in Additional file 4: Table S4), and their performance is shown in Additional file 3 (Figure S2 for OS and Figure S3 for recurrence).

Fig. 4
figure 4

Performance of the ACEPLY model on DFS. A Kaplan–Meier curve of high- and low-risk group of stage I patients in the validation cohort based on linear prognostic index with a cut-off value derived from the discovery cohort. B Time-dependent area under the curve (AUC) of the prediction model validated in the external cohort. C Model calibration in the validation cohort. D Decision curve analysis of the prediction model. The probability threshold indicates the ratio of benefit of true positives vs. the harm of false positives

Discussion

With the improvement in cancer screening, more tumors will be detected at an earlier stage, necessitating more attention to stage I patients. The current study identified and validated indicators and provided a clinically-useful prediction tool that enabled early recognition of stage I patients at higher risk of poor outcomes.

PNI presents the growth and invasion of tumor cells into nerves in the surrounding microenvironment. Previous evidence indicated that peripheral nerves, as essential components of the tumor microenvironment, can facilitate tumor progression and metastasis via the nerve sheaths [21, 22]. This resonates with our findings on the specific effect of PNI on higher risk of CRC distant recurrence. Higher prevalence of PNI has been observed in more lethal cancers (present in 90% of pancreatic cancer [23]). In the case of CRC, Liebig et al. found that PNI prevalence increased with more advanced tumor stages [24]. However, they failed to observe any PNI in 46 stage I patients, pointing to a pressing need for investigation in larger cohorts. Our multi-center study observed positive PNI in 2.5–3.6% of stage I patients. Albeit occurring less frequently, eight out of 33 (24.2%) PNI-positive patients developed unfavorable outcomes, which was even parallel to the reported DFS of stage II patients [25], indicating that PNI can provide certain clinical utility in identifying the small subset of stage I patients who are predisposed to inferior outcomes.

Preoperative serum CEA level is a well-studied biomarker for recurrence risk in stage II–III CRC [26, 27] but remains less investigated for stage I patients. An empirical cut-off of 5 ng/ml was widely used by registry-based studies [28]; however, this value has been proven suboptimal by later modeling efforts using pooled data from trials [26]. Moreover, evidence has demonstrated that dichotomization of variables in continuous scale could result in loss of statistical power and possible biased estimation [15, 29], which would be detrimental for investigation in less frequent outcome events like the current study. Thus, we modeled the relation between CEA and DFS while retaining the original continuous scale, and our findings add to current knowledge by unveiling the linear relationship, which was subsequently leveraged in strengthening the predictive performance of the ACEPLY tool.

With regard to other factors, we observed a similar, yet non-significant (p = 0.09) impact of lymph node sampling < 12 on DFS in validation, which could be attributed to inadequate power of the validation cohort. However, it still showed predictive performance. The ESMO guideline listed the number node sampling as a major prognostic factor for stage II disease [10]. In view of the fact the missing affected nodes is less likely in tumors at an earlier stage, the expected effect of node sampling could be smaller among stage I than stage II patients. Therefore, future large validation study is still needed to confirm the exact effect. The insufficient statistical power might also have played a role in our analysis on the effect of T2 stage and LVI, where consistent direction of effects were reported by previous studies [3, 30]. In concordance with our findings, Lee et al. did not observe significant influence of tumor grade on recurrence risk for stage I CRC [3]. This could be attributed to the strong correlation between tumor grade and TNM stage [31], which could confound the underlying effect of tumor grade.

Our study observed that elder age at diagnosis was independently associated with poor OS instead of recurrence rate. A latest population-based study across all tumor stages reported similar findings, but meanwhile, it detected a significant effect modification on age by tumor stage [32]. More importantly, a reduction in survival benefit was observed in patients of 24 years or younger compared with those from 35 to 39 [32]. Another analysis combining data from six trials suggested adverse prognostic impact of young age in stage III patients [33], highlighting the need for investigating stage-specific effect of age. Our study offered a glimpse into a potential non-linear relation between age at diagnosis and DFS of stage I CRC patients, although this finding merits further verification by future evidence.

The low prevalence of both risk factors and outcome event renders it rather challenging to develop statistical models for risk stratification among stage I patients. Given the absence of published prognostication tool [34], the ACEPLY model presented a pioneer effort in the field, more importantly, with externally validated model performance, to help clinicians inform individualized patient outcomes. Although clinical net benefit was identified, the low probability threshold pointed to escalated odds of false positive predictions, and this caveat needs to be fully considered before any adjuvant treatment or more intensive follow-up scheme being adopted. In addition, the prediction accuracy is yet to be further improved with more risk factors being discovered and added into the current model.

As opposed to the European population [35], rectal cancer tends to be more dominant in eastern Asia. In accordance with our results, a reported 50 ~ 80% of CRC patients presented rectal tumors in China [30, 36]. Similarly, rectal cancer had the highest incidence among all sites along the large bowel in South Korea [37]. This might be attributed to distinct genetic background and dietary style in the area [37]. Rectal cancer has been reportedly enriched particularly in an early stage. The US national cancer registry identified a significantly greater percentage of rectal cancer among stage 0 or I CRC patients than stage II (35% vs. 24%) [38]. More prominent symptoms, such as bleeding, might render tumors in the rectum easier to be detected at an earlier stage. Our cohorts also featured a large proportion of open resections in line with the dominance of rectal tumors. Although tumor site and surgical approach had no significant impact on survival outcomes in our study as well as other previous reports[39, 40], our findings including the ACEPLY tool merit re-calibration when applied to other populations with varied structures of these covariates.

Although this is, to our knowledge, the largest study with independent validation targeting survival outcomes of stage I CRC, the sample size is still the major limitation of the current study. The relatively rare events, such as PNI and tumor recurrences, hindered more extensive investigations in possible factors and further subgroup analysis (e.g. by recurrence subtypes) given the grossly risen type I error due to multiple testings. A second limitation is that our validation cohort is overall smaller in sample size than the discovery cohort, leading to inadequate statistical power to replicate potential discoveries, such as the impact of suboptimal lymph node examination. Last but not the least, our study featured the local patient population, for example the high proportion of rectal cancer, and thus, our findings including the prediction tool, merit further external validation and re-calibration before applied to other populations.

Conclusions

In conclusion, the present study discovered and validated the utility of PNI and preoperative CEA in prognostication of stage I CRC. Moreover, an externally validated prediction tool was developed for clinical use to identify stage I CRC patients at high-risk for inferior survival outcomes. Future collaborative efforts are warranted to aggregate larger patient cohorts with the hope of revealing more prognostic factors to further improve prediction accuracy.