Introduction

Chronic Kidney Disease (CKD) is a major worldwide health burden and the most common microvascular complication of type 2 diabetes (T2D) [1, 2]. In 2017, more than 840 million individuals developed CKD [3], increasing health care demand, particularly in low to middle-income countries (LMICs) [1]. In the UK and the United States, the prevalence of CKD in T2D was reported to range between 25 and 36%, of which 19% was estimated to be advanced (stages 3–5) [4]. The age-standardised global mortality of CKD due to diabetes has been estimated at 7.6 per 100,000[5].

Early detection and treatment are beneficial in the prevention or delay of CKD progression. Despite improved screening, many CKD patients face delayed diagnosis until an advanced stage due to a lack of overt symptoms. Prognostic models for complications associated with T2D progression that incorporate clinical information systems would facilitate improved treatment allocations, healthcare management, and improve understanding of clinical research strategies [6, 7].

Currently, several prognostic equations [8,9,10,11,12,13,14,15] are available for the prediction of CKD in T2D patients, but their generalisability remains uncertain due to limited external and independent validation, particularly in Asian populations [16, 17]. Indeed, external validation is essential and has become mandatory before implementation in clinical practice [16, 18, 19].

Despite many potential advantages, prognostic models have several shortcomings and frequently reported deficiencies [20]. Multiple models have been developed in different ethnicities [8,9,10,11,12,13,14,15, 21,22,23,24,25,26,27,28,29] but no single model has consistently outperformed all others in Asian populations. For instance, a study based in China performed a limited temporal internal model validation over time on the same data [10]. Most importantly, adaptation of a suitable prognostic model by ethnicity is particularly in an Asian context given that half of the ten countries affected by diabetes worldwide are Asian [4]. Furthermore, recent recommendations have proposed re-evaluation to including race/ethnicity in CKD prediction models [30].

Therefore, this study conducted external validation and improvement of previously published prognostic models of CKD and end stage renal disease (ESRD) in Thai T2D patients.

Methods

We adhered to the TRIPOD guidelines for the development and validation of a clinical prediction score [31, 32]. We focused on external validation of existing models of CKD-ESRD risk predictions in T2D, supplemented with the addition of routine clinical factors to potentially increase the discriminatory power in our local population [18].

We first identified previous prognostic models by performing a systematic review and meta-analysis (SR/MAs), see Figure S1. We selected prognostic models if they: (1) had been internally or externally validated; (2) reported moderate to excellent discrimination of C-statistics, i.e., ≥ 0.70. We identified six studies that met the inclusion criteria for CKD [8,9,10] and ESRD [11,12,13] (Table 1).

Table 1 Characteristics of prognostic studies that were used for external validations

Study design and data sources

Data from the Thailand National Health Examination Survey (Thai-NHES) and the standard health databases version 2.4 2019 edition (http://spd.moph.go.th/healthdata/) were used for model validation. The NHES IV and V were population-based cross-sectional surveys conducted in 2009 and 2014, respectively. These surveys captured: health interviews, physical examination, nutrition assessment, and health-related behaviours [33]. Briefly, a multi-stage sampling of adult subjects from the regions, provinces, and districts across the country was used [34, 35].

The standard health databases included medical service records from hospitals, mostly under the direction of the Ministry of Public Health. They comprised a set of tables of all transactions from outpatient and inpatient services for each individual; of 43 files available, only the six that were related to outpatient services (i.e., Person, Diagnosis, Chronic, Drug, Laboratory, and Death) were used for this study.

Settings and participants

A total of 19,671 and 18,564 participants were de-identified from NHES IV-V, respectively; removal of duplicates and missing or invalid citizen identification (CID) resulted in 29,089 participants remaining, see Fig. 1. These were linked with the standard hospital health databases (1999–2019) using an encrypted CID to construct the initial sampling frame, leaving a total of 26,170 participants.

Fig. 1
figure 1

Flowchart for participant inclusion

We confirmed T2D status based on self-report, medication use, and/or pathology tests (Fasting Plasma Glucose (FPG) ≥ 126 mg/dL or HbA1c ≥ 6.5%). We excluded type 1 diabetes (T1D) with age at onset less than 30 years with severe insulin treatment. There were 3416 participants with identified T2D, of whom 270 (7.9%) were excluded on the basis that CKD was diagnosed prior to T2D, leaving a total of 3146 participants. Of these, 3014 (10.4%) participated in both NHES IV-V, with 402 newly diagnosed participants identified after the survey, see Fig. 1. These participants were followed up from 1999 to October 31st, 2019.

Outcomes

The primary study outcomes included diabetic nephropathy (CKD stage 3–5) based on the International Classification of Disease, Tenth Edition (ICD-X), which was confirmed by estimated glomerular filtration rate (eGFR) < 60 mL/min/1.73m2 measured within 3 months before and after diagnosis, see Table S1. ESRD (CKD stage 5) was defined as eGFR < 15 mL/min/1.73m2, or dialysis identified by ICD-X code diagnosis. eGFR was based on the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) formula [36].

Established prognostic factors

We focused on prognostic factors identified through our systematic review, including demographics (age, sex, education, income, and area of residence), biomarkers, comorbidities, medication usage, and clinical features; the latter included diabetes duration, body mass index (BMI; kg/m2), waist and hip circumference (cm), systolic/diastolic (SBP/DBP) blood pressure (mmHg), pulse (beat/min), smoking, alcohol consumption, dietary control measures, physical activity, dyslipidaemia, hypertension, and family history of diabetes (FHD, presence of T2D in 1st-degree relatives). Biomarkers included lipid profile (i.e., high-density lipoprotein (HDL), low-density lipoprotein (LDL), triglycerides (TG), total cholesterol (TC) in mg/dL, FPG (mg/dL), haemoglobin (g/dL)) and dipstick proteinuria. Comorbidities included a history of cardiovascular disease (CVD) and stroke. CVD was defined by self-report, clinical diagnosis or receipt of treatment for coronary heart disease. Medications recorded included oral-diabetic, blood pressure or cholesterol-lowering drugs.

We included clinical data associated with diabetic complications (i.e., retinopathy, stroke, and composite CVD’s) based on ICD-X diagnostic codes (Table S1), laboratory follow-up, medication treatment (Table S2), or death certification (based on ICD-X).

Hypertension was defined as SBP ≥ 140 or DBP ≥ 90 mmHg or use of anti-hypertensive medication. Dyslipidaemia was defined as HDL ≤ 40 mg/dL, or LDL, TG and TC levels ≥ 130, ≥ 130, and ≥ 200, mg/dL respectively, according to ATP-III guidelines [37].

All factors were included according to their definitions in the original studies (Table S3–S4).

Statistical analysis

Descriptive statistics for predictor variables were summarised as mean (± standard deviation) or median (interquartile range) for continuous variables or frequency (percentage) for categorical variables. Participant characteristics were compared between groups using Chi-Square or Fisher’s Exact test, where appropriate for categorical variables, and one-way ANOVA or Kruskal Wallis for continuous variables. The predictors which were missing ranged from only 0.1% (n = 3) to 5.8% (n = 199). Therefore, a complete case analysis was applied for the whole analyses.

We evaluated prognostic models originally derived by logistic [8, 10] or Cox regression models [9, 11,12,13] that were identified from our systematic review (PROSPERO: CRD42018105287). Prognostic scores were calculated according to the published regression formulae using the coefficient and intercept or baseline hazard, see Table S4.

External validation was undertaken in accordance with guidelines for the validation and interpretation of risk prediction models [18, 19]. In brief, we evaluated model performance through comparisons between the original published equation and models that included additional adjustment (e.g., intercept, regression coefficients) for other potential predictors, see Appendix [18, 38,39,40].

Briefly, model performance was evaluated as follows [7]. Discrimination was assessed by concordance of C-statistics, area under receiver operator characteristic curves (AUROC) [41], and 95% confidence intervals (CI’s). Calibration, i.e., the closeness between the observed and predicted values, was assessed using the Hosmer–Lemeshow goodness-of-fit test, the observed to expected (O/E) ratios with 95% CI, and calibration plots. We also used global heuristic shrinkage factors and penalised regression to address issues of over-optimism in updated prognostic models [39, 42, 43].

All statistical analyses were conducted using STATA version 16.0. A two-sided p-value less than 0.05 was considered significant.

Results

Characteristics of prognostic models

We identified a total of 6 prognostic studies for CKD-ESRD in T2D patients; see PRISMA flow diagram in Figure S1. Of these, two [8, 10] and four [9, 11,12,13] applied logistic and Cox regressions, respectively (see Table S4).

Five [8,9,10, 12, 13] models were developed in Asia and one in New Zealand [11]. Only two [10, 11] models had been externally validated in either a Chinese or New Zealand population. Five [8,9,10,11,12,13] studies used hospital-based data. The mean age of T2D subjects ranged from 55.4 to 62.9 years with study size ranging between 1582 and 116,509. Five [8,9,10, 12, 13] studies performed internal validation by splitting samples for discovery and validation, and three [9, 12, 13] applied multiple imputation to account for missing data (see Table 1).

T2D was characterized on the basis of a FPG ≥ 7.0 mmol/L in four [9,10,11,12] studies, or medical record review in the remaining two studies [8, 13]. Identification of CKD was mainly based on eGFR and ICD-X codes. The number of prognostic factors included in each model varied between 4 and 11 and included age, sex, SBP, creatinine, and diabetes duration as common predictor variables. These models had fair to good calibration, and discrimination C-statistics ranged between 0.713 [10] and 0.920 [12].

NHES population characteristics

The T2D cohort included 3,416 participants with a median diabetes duration and follow up time of 5.7 (IQR: 2.6–10.1) and 9.9 (IQR: 6.8–12.7) years, respectively, see Table 1. Of these, 1383 and 186 participants developed CKD and ESRD with an incidence (95% CI) of 43.9% (42.2%, 45.7%) and 5.9% (5.1%, 6.8%), respectively; 704 (22.3%) and 495 (14.5%) developed CVD and retinopathy, and 420 (12.3%) died from any cause.

Baseline characteristics of T2D patients are described in Table 2. The mean (SD) age was 56.6 (12.4) years, and 60.2% were female. The mean age at diabetes onset was 60.0 (12.3) years, and 26.5% of patients had a first degree relative with diabetes. Mean BMI was 26.4 (4.7) kg/m2, and the presence of hypertension and dyslipidaemia was 52.3% and 85.2%, respectively.

Table 2 Baseline characteristics of T2D in Thailand NHES IV-V

A total of 1,222 (35.8%), 280 (8.2%), and 1,188 (34.8%) participants were undergoing treatment for diabetes, including oral diabetic medications, insulin, or diet-control, respectively. In general, all prognostic factors including demographics, socioeconomic status, clinical features, biomarkers, treatments, and complications demonstrated significant differences between CKD stages 3–5 (Table 2).

Participant characteristics comparisons

Participants in our study were slightly younger with fewer males (39.8% vs 43.7%–56.2%) compared to the other six CKD-ESRD studies (Table S5). Mean diabetes duration, BMI, serum creatinine, eGFR and SBP-DBP for our cohort fell within the range reported across the various models but the prevalence of dyslipidemia and hypertension was much higher among our participants. Our cohort had lower FPG and HDL-C, but higher lipid levels (i.e., LDL-C, TG and TC). Moreover, the percentages of anti-hypertensive, anti-hyperlipidaemic and oral diabetic medications were lower than for other reported models.

CKD incidence in our study was similar to that reported by Low and colleagues [8] (i.e., 43.9 vs 42.9%), but much higher than that reported in the remaining studies[9, 10], which ranged from 0.7 to 12.3%. The incidence of ESRD in the study by Lin et al. [12] was comparable to our study (5.04% vs 5.90%), but much higher than the other two studies that reported it [11, 13] (0.4% and 2.5%), see Table 1. The coefficients for the associations between prognostic factors and CKD/ESRD in our cohort were estimated and compared to those in the original models, see Table S6. Our coefficients were mostly similar to the model proposed by Low and colleagues [8], but several predictors (i.e., sex, BMI, location, HDL-C, presence of hypertension, and/or dyslipidemia) were not significant compared to the models proposed by Miao et al. [9]. Most predictors in Wu’s model were also significant in our data; however, the effect sizes were lower for SBP, and diabetes duration and the direction of effect was reversed for BMI. Comparison of the corresponding rank odds ratio of predictors included in their respective CKD models identified creatinine (β = 4.653) and retinopathy (β = 1.045) with the strongest effects for females in Miao’s model, whereas SBP (β = 0.902) and diabetes duration (β = 0.891) were highly associated with CKD in Wu’s models, respectively (Table S6).

For modelling ESRD, only three of the 10 predictors were significant in Elley’s [11] equations, including creatinine, diabetes duration and microalbuminuria, whereas in Wan’s [13] models for female participants, insulin use, oral diabetic drug, and SBP were significantly correlated with ESRD in our multivariate analyses (Table S6).

External validation

External validations were performed for models M1 to M6 where applicable (Table S7). Results of CKD-ESRD models are summarised in Table 3. At baseline (M0), all prognostic models showed fair calibration, but discrimination varied from poor to moderate, i.e., 0.585 to 0.707 and 0.671 to 0.760 for CKD and ESRD, respectively (Fig. 2). Sex-specific specific CKD and ESRD models performed better for females. For CKD, Miao’s model for females generated a C-statistic of 0.786 (0.765–0.806) compared to 0.720 (0.691–0.749) for males, see Table 3.

Table 3 Details of external validation performance for CKD-ESRD models
Fig. 2
figure 2

Receiving Operating Characteristic (ROC) curve comparisons between a baseline and b updated prognostic equations of CKD; and c baseline and d updated prognostic equations of ESRD

All CKD-ESRD models provided improved C-statistics following additional adjustments of the regression coefficient (M3) and updated models from (M4 – M6), see Figure S2. We updated CKD models by adding biomarkers (i.e., FPG groups < 126 vs ≥ 126 mg/dL) and/or interaction effects with oral diabetic drug use; the greatest improvement was observed in the model by Wu and colleagues with a C-statistic of 0.790 (0.774 – 0.806), see Table 3.

In the baseline validation, most CKD models were well-calibrated in our population with O/E ranging from 0.999 (0.975 –1.024) to 1.009 (0.929 – 1.090). Model calibration remained similar after updating, although Miao’s model for males and females showed a slight overestimation of 1.052 (0.868 – 1.235), and 1.036 (0.917 – 1.156), respectively.

Four ESRD risk scores showed moderate to good calibration for baseline validation, recalibration, and updated models, see Figure S3 and Table 3. Fitting the equations using our validation set of ESRD equations (M5) showed worsening shrinkage, with a penalty of 12.31% and 15.55% for Lin’s and Wan’s male models, respectively.

The Brier score is another measure of prediction accuracy, ranging between 0 and 1, where lower scores indicate better accuracy. The Brier scores for the baseline and updated models are presented in Table 3. In the updated CKD model, the lowest Brier score was observed in Miao’s model for females (0.162), Low’s model (0.168), Miao’s model for males (0.178), and Wu’s model (0.185). Of the four ESRD models, the Brier score for the updated models (M4) was superior and ranged from 0.043 to 0.061.

Table S8 provides a summary of the model improvements implemented following baseline validation. New additional predictor variables (i.e., glucose level and/or interaction with oral diabetic medication) significantly improved the discrimination for the CKD models. The highest improvement was observed in Wu’s models with ∆C-statistic of 0.214 (0.193 – 0.234). Most ESRD models showed minor significant discrimination improvements in the updated models.

Discussion

We externally evaluated, validated, compared and updated six previously published models for predicting CKD/ESRD in a nationwide cohort of Thai participants with T2D, in line with recent framework guidelines [18, 19, 31, 38]. At baseline, most models provided only modest discrimination of T2D patients who developed CKD/ESRD. Two [10, 12] models demonstrated similar performance to their parent models. All models showed good calibration and upon modification, the agreement between observed and expected risk was fair, with only a few models showing slight overestimation.

In this study, the associations observed between prognostic factors and CKD/ESRD risk in Thai participants with T2D differed from previous studies. For instance, either hypertension or dyslipidaemia, LDL-C, and BMI were negatively associated with CKD risk in some models [8,9,10], with only a few predictors (i.e., diabetic duration, creatinine, and oral diabetic medications) significantly correlated with ESRD risk. We suspect that the lack of associations or variation in the direction of effect observed between previously reported predictor outcomes may have resulted from heterogeneity among the predictors and outcomes in our data, and that used previously for the development sets. However, we were unable to include two important biomarker predictor variables for four [8, 11,12,13] models (i.e., UACR and HbA1c) as they were unavailable in our data.

We postulate that the magnitude of the C-statistics and miscalibration observed may be explained by case-mix effects represented by the number of events, predictor effects, and heterogeneity in the population characteristics [19, 44, 45]. Variation of the included predictor variables, and sample size characteristics between derivation and validation settings, are likely responsible for the modest model performance in our population [19, 46].

In general, discrimination and calibration improved in our updated models. Although most models demonstrated lower discrimination in our data compared to their original settings, our updated models showed consistent improvement for all evaluation metrics (i.e., Brier score, shrinkage factor, penalty regression, and C-statistics). Most CKD-ESRD models also showed better reclassification (i.e., ∆C-statistic) for the enhanced models. Despite a lack of existing standards, Pencina et al. proposed that ∆C-statistics greater than 0.01 represents a relevant improvement in model prediction [47, 48]. For our data, all models showed significant improvement on modification, with ∆C-statistics ranging between 0.041 and 0.214 for CKD and 0.025 to 0.089 for ESRD equations.

The Brier score has been proposed as a measure of discrimination and calibration for model validation [49]. In this study, ESRD models performed better compared to those for CKD as determined by Brier scores. Almost every validation and updated model showed improved predictions (as judged by a Benchmark value less than 0.25) [40].

In our updated models, four proved more effective either for the prediction of CKD [8, 9] or ESRD[11, 13] in our population, without the need for recalibration or updated equations. These models consistently exceeded all others in terms of calibration and discrimination, and were more comparable to the derived models. Only Elley’s model [11] provided a web calculator (http://www.nzssd.org.nz/cvd_renal/) to facilitate easier routine clinical practice use.

The strengths of our study include the long-term follow-up of diabetic progression in 26,170 individuals over 20 years, the definition of CKD from multiple data sources, and the evaluation of previously published prognostic models identified from a current SR/MA. This study was based on real world data from a clinical setting that used a broad range of routinely captured potential predictor variables evaluated for prognostic performance of renal outcomes in those with incident diabetes. To our knowledge, this is the first independent validation of CKD-ESRD prognostic models in an Asian population using real world data, beyond the populations from which the models originated. Therefore, our findings should be useful in predicting CKD-ESRD occurrence in other Asian regions where their settings are similar to Thailand.

Our study highlighted that eGFR assessment using creatinine was beneficial to kidney disease surveillance in a Thai population. By avoiding specific race/ethnicity coefficients, our updated models still offered accurate prognostic estimates which could be enhanced further through improved clinical and laboratory standards [30, 50].

Our study has several limitations. Markers of kidney damage, such as albuminuria and cystatin-C were not available in our data and missing data for some predictor variables precluded prognostic risk estimates for some models.

Conclusions

In conclusion, we have provided an independent external validation of prognostic models for the prediction of incident CKD/ESRD in participants with T2D from Thailand. All evaluated prognostic models showed only moderate discriminative performance, but fair calibration at baseline validation. Updated prognostic scores improved predictive performance in most of the evaluation metrics (i.e., discrimination, calibration, and Brier score). An updated prognostic model for clinical use in Asian populations is provided.

Although no model was excellent, prognostic equations not delimited by sex (i.e., Low’s [8] and Elley’s [11]) performed better in our data and may offer clinical utility as a CKD screening tool in primary care for patients with diabetes.