Introduction

Although outcomes of cardiovascular surgery have improved over time, the incidence of deep sternal wound infection (DSWI) remains an important issue. Recently published data showed an incidence of DSWI ranging between 1.3 and 1.6% in patients undergoing CABG alone1,2. Morbidities associated with DSWI include prolonged hospital stays, increased use of antibiotics and consequent increased costs. Patients with DSWI had a threefold increase in hospitalization costs compared with patients without DSWI. In addition, mortality from this complication continues to concern.

Prediction models for DSWI exist but may not be generalizable in different geographic settings. Incidence of DSWI in developed countries may be lower, considering their resources, application of best practices to avoid these complications and population characteristics. Moreover, models that include more than one cardiac surgery type may have heterogeneous in case-mix, thereby limiting the discriminant ability3,4,5,6,7. Healthcare systems, patients’ characteristics, and quality protocol adherence varies widely between institutions.

The Magendanz score, a specific prediction tool for Mediastinitis, was based on data from a single center for which some risk factors were missing, data quality assessment was incomplete, and no external validation in independent samples was conducted, thus impacting generalizability4,5. Methodologically, addressing the observational nature of the data due to lack of randomization of patients to institutions and missing information for key patient-level variables are two critical challenges requiring attention6.

Valid statistical approaches to missing confounder information include multiply imputing the missing information, inverse-probability weighting, or comprehensive sensitivity analyses. The inclusion of more confounders and multiple imputation of missing information should enhance the predictive performance of a model3,7,8. We hypothesized that specific clinical characteristics, including pre and intraoperative factors, would be associated with better accuracy to predict DSWI on a multicentric registry. This study aimed to develop and validate a prediction model, the REPINF, using data from the Cardiovascular Surgery Registry of the state of São Paulo, Brazil (REPLICCAR II), and compare with the validated model STS.

Methods

The REPLICCAR II study is an observational, multicenter, prospective cohort study (9 hospitals in the state of São Paulo) conducted between August 2017 and June 2019. The Ethical Committee Board of the Heart Institute of the Hospital das Clínicas, Faculty of Medicine, University of São Paulo, Brazil approved this study as a sub-analysis of the REPLICCAR project (CAPPesq: 2.507.078). Thus, informed consent was waived due to the analysis of pre-established data logs.

We declare that all methods were performed in accordance with relevant guidelines and regulations. All consecutive patients over 18 years undergoing isolated CABG surgery (first cardiac surgery) constituted the sample. The indications for CABG surgery were according to guidelines9. All patients received antibiotic prophylaxis at least one hour before skin incision according to institutional policies.

The variables included in REPLICCAR II were defined using the STS ACSD (Adult Cardiac Surgery Database) collection tool (version 2.9, 2017). Approximately 760 variables were collected preoperatively, intraoperatively, and postoperatively, and included risk factors, clinical and laboratory characteristics, and complications of surgery. The data were collected using a secure web application for building and managing online surveys and databases, the REDCap platform (Research Electronic Data Capture, https://www.project-redcap.org/).

The participating hospitals, their researchers, and data managers participated in meetings and data training before and during the data collection period. Data were audited twice by the REPLICCAR II team to evaluate the accuracy and validity of the information collected by the trained data managers10.

A trained surgical clinical nurse reviewed the infection criteria and definitions following the infection control surveillance system (Standard CDC National Healthcare Safety Network definitions following the National Healthcare Safety Network—NHSN)11. All infections involving the subcutaneous tissue to the mediastinum within 30 days following CABG surgery were considered DSWI. This involves fascia and muscle layers as well as organs, spaces and/or deep soft tissues. “Mediastinitis” refers to an infection of the mediastinum, which can be caused by different etiologies, including DSWI following sternotomy11. A computed tomography imaging study was performed in all patients with suspected DSWI and/or Mediastinitis for diagnostic confirmation. The infection control services of the participating hospitals on REPLICCAR II perform routinely the active surveillance to report surgical wound infections that progressed to deep planes and Mediastinitis. Data were verified by a specialist nurse from the coordinating center while carrying out her doctoral project. Thus, only cases diagnosed with the definitions and criteria of the CDC–NHSN were included in our analysis. The reference was reviewed and changed for the one used for the infection control services during the study. Patients who have a fascia or muscle affected by an infection during hospitalization often receive surgical wound debridement, antibiotics and negative-pressure wound therapy (vacuum-assisted closure) to prevent mediastinitis.

Approach

Confounders and predictors

First, we eliminated all variables missing in more than 30% of the patients because the imputed values would be driven by the imputation model. We next identified variables related to incidence of DSWI in the scientific literature and found 160 variables in our database. Of these, 55 variables with statistical association or clinical significance for DSWI were considered as predictors (supplementary Table 1). Therefore, the variables were chosen for the initial analysis according to their relationship with the scientific literature and subsequently for their statistical significance, all of this given the multifactorial nature related to infections. Our goal was to build a model that was the most rational and at the same time scientifically robust, including both pre and intraoperative variables.

Treating missing data with multivariate imputation

We used chained equations (MICE) to impute missing data and created 10 imputed datasets.

Sample distribution was captured with histograms and descriptive statistics.

Statistical analysis

The training sample was created using the REPLICCAR II database that included 4,085 patients. Information from an additional 498 patients from a different set of hospitals was assembled to create an external validation set (from 2015 to 2016). The model development for variables selection and regularization was performed with the least absolute shrinkage and selection operator (LASSO) logistic regression tenfold cross-validation, to enhance the prediction accuracy and interpretability of the statistical model produced.

We calculated the area under the receiver operating characteristic curve (c-index) to evaluate the discriminatory performance of the model and calibration in-large (CL) containing the observed and predicted values (ratio of observed/predicted). The discriminative ability was also evaluated by net reclassification improvement (NRI) and integrated discrimination improvement (IDI)12. The results were plotted to compare the new model (REPINF) with the STS in both the training and validation databases.

We follow the guidelines recommended in the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis statement checklist (TRIPOD)13.

Ethics approval

This study was submitted and approved by the Ethics Commission for Analysis of Research Projects (CAPPesq) under number 2016/15163-0. The free and informed consent was dismissed due to the analysis dealing with pre-established data logs.

Results

Nine hospitals started data collection, but 7 hospitals actively participated during the 2 years of the project. After exclusion of the patients from 2 hospitals (n = 53), our final sample size was 4085 patients undergoing isolated CABG surgery as the first cardiac surgery. The mean age was 63.3 years (95%CI 62.9–63.5) and 74% were male. The mean body mass index (BMI) was 27 kg/m2 (95%CI 26.9–27.2) and common comorbidities included diabetes (49%), hypertension (88%), dyslipidemia (62%) and previous myocardial infarction (52%). The baseline characteristics and missing percentages are described in supplementary Table 2. After excluding variables with more than 30% missing, 5% of all patients had at least one missing variable in the REPLICCAR II database (n = 4085 and 160 variables).

The incidence of DSWI during 30 days from surgery was 2.47% (n = 101). We observed 104 deaths, a competing risk for DSWI, within 30 days (3.1%); of these, 3 patients died with DSWI in the period (2.9%). Characteristics between infected and non-infected patients are described in Table 1.

Table 1 Baseline characteristics of patients undergoing isolated CABG surgery with and without DSWI (n = 4085). REPLICCAR II, São Paulo, Brazil, 2017–2019.

Model development

Out of a total 55 of variables related to pre- and intraoperative factors, 7 were included in the Lasso modeling after tenfold cross-validation (Table 2). In the training sample (n = 4085), the REPINF had a c-index of 0.81 (95%CI 0.77–0.86) compared to STS c-index of 0.70 (95%CI 0.64–0.75) (Fig. 1). The predicted mean for DSWI was 0.12% (SD = 0.08) using the STS. The calibration in-the-large plot (Fig. 2) demonstrated that STS predictions tended to underestimate the DSWI risk in our sample.

Table 2 LASSO logistic regression tenfold cross-validation coefficients. REPINF, REPLICCAR II, São Paulo, Brazil, 2017–2019.
Figure 1
figure 1

Receiver operating characteristic curve (ROC; c-index) in the external validation sample of the REPINF and STS in patients undergoing isolated CABG. Sao Paulo, Brazil, 2017–2019.

Figure 2
figure 2

Calibration in-the-large plot on training and external validation. REPINF, Sao Paulo, Brazil, 2017–2019.

External model validation

The validation database included 498 patients undergoing isolated CABG during 2015–2016. The mean age was 61.7 (SD = 9.5), 78.5% were male, 54.6% had diabetes and 88% had hypertension. Relative to the STS, REPINF demonstrated improved classification with a NRI of 29% (Table 3) and, the discrimination IDI was 0.065 (6,5%). The incidence of DSWI in this sample was 1.61%. The STS model predicted 0.24% (SD = 0.15) of DSWI events and, REPINF 2.65% (SD = 1.58). The c-index was 0.83 (95%CI 0.72–0.95) and 0.72 (95%CI 0.56–0.88) for REPINF and STS in the external validation sample, respectively.

Table 3 Reclassification data table with quintiles for net reclassification improvement (NRI). REPINF, REPLICCAR II, São Paulo, Brazil, 2017–2019.

It is important to note that EuroSCORE II and STS are the most used worldwide to predict mortality risk after cardiac surgery. However, only the STS model has a validated index to predict the risk of DSWI. Therefore, our research team decided to compare our model only with the STS in order not to bring the comparison bias if we choose to use the EuroSCORE II since our outcome would not be mortality but DSWI. Below, we show in Table 4 some basic characteristics between the STS, REPINF and EuroSCORE II models.

Table 4 Comparison of baseline characteristics of STS, REPINF and EuroSCORE II models, São Paulo, Brazil, 2017–2019.

Discussion

The development of prognostic models that combine patient characteristics, risk profiles, and surgical practice to produce predictions about future outcomes allow informed clinical decision-making8. Risk models can be used for quality measurement, clinical practice improvement, voluntary public reporting, and research.

The STS systematically underestimates the risk of infection in CABG patients, possibly due to the low rate of this complication for this type of procedure (less than 0.5%), yielding a c-index of 0.68 for CABG surgery patients3. The REPINF demonstrated better discrimination (IDI = 6,5%) and net reclassification improvement (NRI = 29%) compared to STS model.

Differences in health management and performance across countries may also affect model discrimination. The MagedanzSCORE4 (2010) was created using information from a single Brazilian institution among adults (n = 2809) undergoing isolated CABG and valve surgery. The score was developed and validated in the same population and thus provides overly optimistic model performance metrics5.

In this study, the incidence of DSWI was 2.47% (30-day follow-up) and the prognostic model was restricted to patients undergoing primary isolated CABG. Of 55 variables candidates in REPINF, 7 emerged from the LASSO logistic regression: female gender14,15,16, BMI14,17,18,19, diabetes1,17,18,19, hemoglobin18, emergency surgery status18,20,21, surgery duration18,21 and bilateral internal thoracic artery (BITA) used22,23,24. All variables included in the LASSO regression in the new model have already been described as risk factors for DSWI. In fact, the REPINF model included intraoperative variables already described for DSWI endpoint, as surgical timing18,21 and use of BITA22,23,24.

A recent prospective multicentric study with 16 centers of cardiac surgery in 6 European countries (England, Finland, France, Germany, Italy and Sweden) reported an incidence of DSWI of 2.5% and the following independent predictors: female gender, BMI ≥ 30 kg/m2, estimated glomerular filtration rate < 45 mL/min/1.73 m2, diabetes, chronic lung disease, preoperative atrial fibrillation, critical preoperative state and BITA grafting. The model achieved better discrimination than the usual scores (Alfred Hospital Risk Index, Friedman Score, and Brompton-Harefield Infection Score). Compared to these values, improvements in discrimination (IDI) ranged from 1.2 to 2.1%, but were not compared with the validated cardiac surgery model STS24.

A single ‘‘calibrated model’’ to make predictions across patients undergoing many different surgery types is challenging25,26. Our model achieved better accuracy than the STS having c-indices of 0.83 (REPINF) and 0.72 for STS in the validation cohort. It is important to note that the external validation database corresponds to an active participant in STS reports since 2014, which is center that is likely not representative of all Brazilian hospitals. The STS systematically underestimated DSWI in both the internal and external validation datasets (Calibration: Fig. 2). Our score overestimated risk in external validation cohort for those at the lower end of the risk scale. This overestimation may be related to the largest volume of patients coming from the public health system used for the elaboration of the REPINF. For validation, the REPINF was evaluated in a specific population of private network patients. This may have influenced REPINF to overestimate the risk of infection in the validation sample.

Calibration is an important aspect in models constructed for predictive purposes. It is necessary to keep data collection guided by rigorous quality registries and criteria to achieve and maintain the best accuracy in predictions, considering that this information may be often contaminated by noise26. To improve calibration, risk scores should be adjusted for the case-mix of hospitals, with recalibration or remodeling being recommended27. Adding more variables and optimizing estimates of improvement may increase model performance but at the same time cause overfitting. REPINF model was created considering these situations, where all variables included for LASSO regression were associated with DSWI creating a difficult task for the variable selection.

LASSO was originally formulated for linear regression, and it’s applied in statistics and machine learning for variable selection and regularization. Before LASSO, the stepwise approach is the most widespread method for choosing covariates. Also, LASSO improves prediction error by shrinking the sum of the squares of the regression coefficients to be less than a fixed value to reduce overfitting26,28.

Accurate information is essential to access patient’s prognosis, which simultaneously considers a number of factors and provides an estimate of the patient’s absolute risk of an event and, for DSWI, it is a great challenge. Clinicians and surgeons need an accurate risk prediction for decision support, quality of care assessment, and patient education. Continuous evaluation of the model performance is important to ascertain that the classification performance does not degrade with time. Some models are redeveloped periodically to adjust for temporal trends. Recently, STS updated the model to predict mortality in children following cardiac surgery using the proposed machine learning method29,30,31,32,33.

We suggest that REPINF score should be estimated when the patient arrives to the intensive care unit. This moment becomes fundamental in the management of patients after CABG. By this way, the professional team would be able to establish a clear plan of care based on the patient risk, minimizing thereby the potential complications, and reducing costs and hospital length of stay. Also, specific protocol may be developed by the infection control team34. More investigation should be performed to determine cut-offs on risk classification and timing for the application such preventing strategies. This paper describes the development and validation of the deep sternal wound infection model for CABG patients. According to the medical literature, factors related to intraoperative timing are also associated with Mediastinitis, so we included these variables in our registry for analysis. Limitations related to data completeness and accuracy were carefully addressed during quality audits for all institutions. Still, some important clinical aspects were not evaluated and could increase the sensibility and precision of the model, for example, the use of pedicled or skeletonized harvesting conduits, glycated hemoglobin, albumin, bilirubin, and variables related to DSWI treatment (fluid or tissue culture, antibiotics, wound intervention, bandages, and others). In our institutions, the infection control service follows these data for epidemiological surveillance, and to guide preventive protocols according to the CDC surgical site prevention manuals34. Future studies should consider all possible detailed information and recommend standard prevention interventions to avoid bias and increase accuracy.

Another important issue is related to the DSWI detection method (30-day follow-up), which may vary across institutions24,25. In our study, this limitation was controlled by having trained researchers to make contact 30 days after surgery with each patient, with only 5.97% of incomplete follow-up.

In summary, this study considered a structured, standardized approach to model development, and validation to identify factors to help multidisciplinary teams prevent DSWI after CABG. More studies should be performed to validate these findings, but we suggest that REPINF, as well as the STS prediction models35, provides the highest generalizability for future data. Thus, it’s proven that different populations require independent scoring systems to achieve the best predictive effect.