Introduction

Coronary computed tomography angiography (CTA) is increasingly used to diagnose coronary artery disease (CAD). Indeed, clinical guideline 95 of the National Institute for Health and Care Excellence with chest pain of recent onset recommends CTA as the first diagnostic test in all patients with possible angina [1]. Functional stress testing, including exercise electrocardiography (exercise-ECG) or single-photon emission computed tomography (SPECT), is recommended in uncertainty about whether chest pain is caused by myocardial ischaemia in patients with known CAD. In contrast, the recent ESC guideline on chronic coronary syndrome (CCS) recommends coronary CTA as the first-line diagnostic imaging test with a low pretest probability for CCS, whereas functional cardiac imaging is recommended in patients having a high pretest probability for CCS [2]. The ISCHEMIA trial showed that an invasive interventional strategy was not superior to a conservative strategy in patients with stable chest pain and test-based ischaemia [3].

Results of the SCOT-HEART trial showed a significant reduction of fatal and non-fatal myocardial infarction by CTA compared with diagnostic standard of care in patients with recent onset stable chest pain [4]. However, there is a lack of large diagnostic comparison studies of CTA for coronary stenosis evaluation with functional stress testing for ischaemia evaluation for the detection of obstructive CAD. Previous investigations have suggested that coronary CT may have higher sensitivity and specificity than functional stress testing for the detection of anatomically defined CAD with invasive coronary angiography (ICA) as the reference standard [5,6,7]. Within the Collaborative Meta-Analysis of Cardiac CT (COME-CCT) [8] of patients with symptomatic stable chest pain, we compared the effectiveness of functional stress testing using exercise-ECG or SPECT with CTA for diagnosis of CAD using ICA as the reference standard. Further, the association of non-invasive diagnostic tests and pretest probability was assessed for evaluation of the ability to exclude obstructive CAD.

Methods

Patients

Seven-thousand eight-hundred thirteen patients with stable chest pain and suspected CAD were included in the COME-CCT Consortium with a clinical indication for ICA, who were also prospectively enrolled to undergo cardiac CT. The study protocol of the COME-CCT collaborators was previously published including detailed information on search strategy, inclusion, and exclusion criteria for this individual patient data (IPD) data meta-analysis [8]. Patients with stents or bypasses, unstable angina, and non-diagnostic were excluded as well as patients with incomplete information for pretest probability calculation. Data was available on the per-patient level. The study was prospectively registered in the PROSPERO Database for Systematic Reviews (CRD42012002780). Obstructive CAD was defined as at least diameter stenosis of ≥ 50% by ICA with 81% of patients receiving quantitative coronary analysis (QCA). Specifically important for the present subgroup analysis, studies were excluded if datasets did not include results on either exercise-ECG or SPECT for at least 5% of the patients. All participants gave written informed consent to participate in the local studies, which were approved by the local ethics committees of the participating centres. For quality assessment and comparability, an additional questionnaire regarding exercise-ECG and SPECT was sent to all participating sites. For this subanalysis, for those studies eligible for inclusion, patients with data on functional testing were included, but studies with < 5% of patients receiving functional testing with regard to the site cohort were excluded from further analysis to avoid inclusion bias.

Statistical analysis

Raw datasets were merged in an Excel spreadsheet and exported as comma-separated values for statistical analysis using “R” [9]. Continuous data are reported as mean (standard deviation (SD)) and categorical variables as percentages (absolute numbers). Diagnostic accuracy of all tests using obstructive CAD defined by ICA as the reference standard was modelled using generalised linear mixed models (GLMM), i.e. multivariable logistic regression model with a study-specific random intercept to take heterogeneity between studies into account [10] by extending the method suggested by Coughlin et al with random effects, which provides a one-step approach for a diagnostic IPD meta-analysis [11]. The current model is a univariate logistic regression model extended by incorporating a random effect for the study and a random slope for ICA results, respectively, which is equivalent to a bivariate generalised linear mixed model [12]. Based on this model using the test result as the dependent variable, mean logit sensitivity and specificity, the estimates of the between-study variability in logit sensitivity and specificity, and the covariance between them were estimated. These estimates quantify heterogeneity between studies and patients within studies and investigate the effect of covariates such as type of diagnostic procedure. Covariates were: the reference standard ICA and the type of non-invasive diagnostic method and their interactions. Post-test probabilities (positive (PPVs) and negative predictive values (NPVs)) of the respective diagnostic procedures for the presence of CAD as a function of the pretest probability of CAD were analysed by a generalised linear mixed model as described above. In a similar way, models were applied when studies with a high risk of bias were analysed in a sensitivity analysis. In another analysis, we compared the diagnostic accuracy of CT in the 2920 patients from studies with functional tests performed with 2412 patients who were included in studies in which no functional tests were performed (Fig. 1) applying the covariate test performed (yes/no). Using an intention-to-diagnose approach, we implemented the worst-case scenario in which non-diagnostic CTA results were considered false positive if ICA was negative and false negative if ICA was positive [13]. Clinical pretest probability was calculated using a validated prediction tool, which was an updated version of the Diamond and Forrester model [14, 15]. Clinical pretest probability was estimated based on patient age, gender, and clinical presentation. We also performed a statistical prediction for a new cohort following the ideas presented by Skrondal and Rabe-Hasketh [16].

Fig. 1
figure 1

Flow chart for study selection. Part of the study flow referring to the COME-CCT main analysis paper as published by Haase et al [70]. In this subanalysis of the international COME-CCT Consortium, only patients with functional testing data were included, and studies for pooled analysis were only available if at least 5% of the patients of each of the 31 included studies received functional testing in order to avoid inclusion bias

Diagnostic performance was evaluated using a complete case analysis (basic generalised linear model) while the role of potential covariates was investigated using multiple imputations of missing data in patients who did not undergo functional test results in the original studies as a sensitivity analysis. Statistical analysis was performed with “R” (R-package lme4) [17]. For the reduction of missing data bias, multiple imputation was performed. Post-test probabilities were obtained with STATA 14 (packages GLLAMM, GLLAPRED). Cross-hair plots which show a scatter plot per-study sensitivity and false-positive rate with corresponding confidence intervals were produced with the R package mada [18, 19].

Results

Study characteristics

Pooled data on the per-patient level from 31 eligible studies with data from 2920 patients from 21 sites in 16 countries for analysis (Fig. 1) [5,6,7, 20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46]. Results of consensus reviewer judgments of the methodological quality of included studies regarding risk of bias and applicability can be found in the Appendix (Figs. 1 and 2). The risk of bias was high in eight studies and high applicability concerns were not present [7, 25, 30, 33, 37, 40, 41, 45]. Included participant data varied in size from 3 to 243 participants (mean (SD) of 91.2 (53.9)); 67% were male (Table 1). All patients included underwent clinically indicated ICA, (81% with QCA) as the reference standard for detection of obstructive CAD.

Fig. 2
figure 2

Analysis of diagnostic performance for CTA, Exercise-ECG, SPECT. The lines represent the positive and negative predictive values of CAD after a positive (solid lines) or negative (dashed lines) diagnostic test result for obstructive (obstructive) coronary artery disease defined as a patient with at least 50% coronary diameter stenosis. CTA was significantly more accurate than exercise-ECG and SPECT. Predictive values including 95% confidence intervals for all three tests are provided in Appendix Figs. 35

Table 1 Patient characteristics

Imaging test characteristics

Fifty-three percent of the patients had an additional exercise-ECG (1540/2920), 37% had no functional test (1066/2920) and 18% had an additional SPECT (532/2920). Approximately 7% of patients underwent all three non-invasive tests and ICA (218/2920). The study population had a high number of cardiovascular risk factors, with a cumulative of 0.2 (0.4) risk factors per patient (Table 1) while the average pretest probability was 47.9% (22.2%). The prevalence of obstructive CAD in the 31 eligible studies varied between 22 and 90% (Appendix Table 8) depending on the type of patients included as well as local patient selection for ICA and conduct of functional testing (Appendix Tables 5 and 6).

Effectiveness of CTA and functional testing for the diagnosis of obstructive CAD: individual-patient data analysis

For CTA, the sensitivity of 2072 patients with CT and functional test results in comparison to ICA as the reference standard in a generalised linear model was 94.6% (95% CI (92.7–96) and specificity was 76% (72.2–80) Table 2). The sensitivity of exercise-ECG was 54.9% (47.9–61.7) and specificity was 60.9% (55.2–66.3) while the sensitivity of SPECT was 72.9% (65–79.6) and specificity was 44.9% (36.7–54.4). Table 2 additionally shows all characteristics in 10%-steps of pretest probability. The sensitivity and specificity of CTA and functional stress testing differed significantly (p < 0.0001 for all, see Table 3). Excluding the eight studies with a high risk of bias Appendix Table 2) [7, 25, 30, 33, 37, 40, 41, 45] in a sensitivity analysis, results remained similar for all three tests (Fig. 3 and Appendix Table 11). When comparing the diagnostic accuracy of CTA in the 2920 patients included from 31 studies with functional tests performed with the 2412 patients who were included in studies without functional tests performed we found no differences indicating no relevant selection bias (Appendix Table 12).

Table 2 Diagnostic accuracy of CTA, exercise-ECG, and SPEC
Table 3 Overall statistical model without additional covariates
Fig. 3
figure 3

Similar diagnostic performance of CTA, Exercise-ECG, and SPECT after excluding studies with risk of bias. Similar diagnostic performance as shown in Fig. 2 after including all individual-patient data, is found in this analysis in which studies with a high risk of bias [7, 28, 33, 36, 40, 43, 44, 48] were excluded and only studies with low risk of bias were included (details on the risk of bias assessment is shown in Appendix Table 2)

There was better diagnostic differentiation using CTA compared with both exercise-ECG and SPECT (Fig. 2). Reliably excluding CAD with an NPV of 85% was possible in case of a negative CTA in patients presenting a pretest probability of up to 74% whereas negative exercise-ECG and SPECT excluded CAD only up to pretest probabilities of 7% and 11%, respectively (Fig. 2), with variability between studies (Table 3). Gender comparison showed similar results of women and men of CTA and functional tests in women and men (Appendix Tables 7, 9, 10).

Effectiveness of CTA and functional testing for the diagnosis of obstructive CAD: study-level analysis

On the study level, the sensitivity and specificity of CT, exercise-ECG, and SPECT are reported in Fig. 4. At a pretest probability of 10%, the positive predictive value (PPV) of CTA was 50.9% (95% CI 40.9–60.2) while the PPV of exercise-ECG was 19.1% (95% CI 12.8%–27.5%) and that of SPECT was 32.2 (95% CI 22.5–45.2). At a pretest probability of 74%, the NPV of CTA was 85.2% (95% CI 78.0–90.4) while the NPV of exercise-ECG was 41.9% (95% CI 32.5–0.50.6) and that of SPECT was 34.8% (95% CI 24.6–49.0).

Fig. 4
figure 4

Cross-hair comparison of CTA, Exercise-ECG, SPECT of per-study sensitivity, and false-positive rate. The lines represent 95% confidence intervals for sensitivity and false-positive rate based on the per-study data for CT, exercise-ECG, and SPECT. The per-study forest plots for all three tests and the results of all individual studies are also shown in Appendix Figs. 58

Multiple imputation analysis and covariates

Results of the multiple imputation analysis based on all 2920 patients (Appendix Table 7) revealed a significant influence of the covariates Agatston Score, heart rate, and chest pain on the specificity of CTA and functional test results with CTA outperforming SPECT and ECG in terms of sensitivity and specificity. Patients with an increased heart rate and higher Agatston Score with lower specificity using all diagnostic tests. The type of chest pain mainly influenced the specificity of all (functional and anatomical) diagnostic tests, which were best in patients with typical angina pectoris. Higher heart rates led to lower sensitivity of all tests. Models investigating test-specific effects of the covariates failed to converge so that only the overall influence of these covariates is reported.

Discussion

In this pooled analysis of patient-level data, we show that both the sensitivity, as well as specificity of coronary CTA, are higher than that of exercise-ECG and SPECT for the diagnostic assessment of CAD using ICA as the reference standard. The findings are not applicable to the detection of myocardial ischaemia, which was not included in the COME-CCT protocol. Across a wide range of clinical pretest probabilities, the diagnostic performance of CTA was better than that of functional stress testing. Results were consistent across populations from 21 different sites in 16 countries suggesting that the benefit of CTA is generalisable, and that CTA should be more widely adopted in patients with suspected CAD based on stable chest pain. This adds clear evidence to previous small studies that indicated CTA might outperform functional testing for the diagnosis of obstructive CAD [5,6,7]. Thus, CTA may provide a more solid basis for diagnostic and treatment decision-making. However, the broad inclusion for instance of all patients with atypical or typical angina pectoris might require revision as we have shown that reliably excluding obstructive CAD by CTA (NPV of at least 85%) works best up to a clinical pretest probability of 74%. Confirming obstructive CAD based on a positive CTA yields post-test probabilities of > 75% above clinical pretest probabilities of 39% which should be considered in the decision-making.

Comparison with previous studies

In a per-study-level meta-analysis of randomised trials, CTA compared with functional stress testing was associated with a reduced incidence of myocardial infarction [47]. This further supports the conclusions from the present IPD meta-analysis of diagnostic accuracy studies. In a network meta-analysis comparing CTA, SPECT, PET, and MRI on the per-study (not per-patient) level with ICA or fractional flow reserve (FFR) it was demonstrated that each diagnostic modality has its own optimal performance pretest probability [48]. For the choice between stress testing and coronary CTA, the ESC guideline recommends considering whether patients are suitable and if local expertise in one or the other diagnostic test is present. Nowadays local expertise is commonly present for both stress testing and CTA, while our study shows that if local expertise is available, coronary CTA should be considered as the primary test for the exclusion of obstructive CAD. Our results may help in choosing the most appropriate non-invasive test before proceeding to ICA potentially resulting in an increase of the reportedly lower diagnostic yield of invasive angiography [49]. Importantly, the ISCHEMIA trial suggests that an ischaemia detection strategy with subsequent invasive interventions may not result in improved outcomes [3]. The COME-CCT consortium used (quantitative) coronary angiography as the reference standard for the direct visualisation of coronary obstructions [8]. Considering the low uptake of invasive FFR worldwide [50], the pragmatic reference standard used in COME-CCT reflects clinical practice at the time of data collection.

In line with our results, previous results indicate that CT reduces false-positive rates compared with functional testing [51]. In contrast, compared to most previous publications, exercise-ECG and SPECT performed worse in the present study, while past meta-analyses and current guidelines report 61%–68% sensitivity and 70%–77% specificity for exercise-ECG and 73%–91% sensitivity and 48%-90% specificity for SPECT our analysis reveals a much lower diagnostic performance for the two tests, likely due to the selected population [52,53,54,55,56,57,58,59]. In the prospective multicentre PICTURE trial directly comparing SPECT and CTA with ICA with 50% lumen reduction for CAD detection as the reference standard, sensitivity was 92.0 versus 54.5% for CTA and SPECT, respectively, while specificity was 87.0% versus 78.3%. Applying a 70% lumen reduction threshold for the definition of significant CAD, CTA, and SPECT yielded similar results (sensitivity 92.6% versus 59.3% and specificity 88.9% versus 81.5%) [60]. In contrast, the COME-CCT protocol prespecified 50% coronary stenosis as the definition of obstructive CAD [8] similar to almost all studies available at the time of planning this IPD analysis [49]. Moreover, using a cut-off of 50% was assumed to not miss obstructive disease as defined [61]. Importantly, we found no evidence of selection bias in our cohort when comparing the diagnostic accuracy of CTA in the patients with and without available information on functional testing. Moreover, CTA as a non-invasive anatomical test holds an advantage regarding the evaluation of further imaging criteria, such as coronary plaque analysis, which plays an important role in further risk stratification and may be useful for the prediction of future cardiovascular events. Whereas functional tests have the advantage of functional and flow-relevant assessment of the coronary arteries.

Quality assurance and interpretation of results

To verify if exercise-ECG and SPECT were conducted according to quality standards, participating sites reported their protocols of functional tests (Appendix Tables 2 and 3). According to these data, all SPECT examinations and the majority of exercise-ECG examinations of CAD patients were done using standardised criteria [2, 52]. Thus, a likely reason for the lower diagnostic accuracy of functional stress testing compared with CTA is that these tests cannot directly visualise obstructive disease. However, in light of the ORBITA trial, a much more comprehensive strategy for the diagnosis of CAD that includes anatomic and functional criteria will be required to improve the selection of patients who benefit from the most aggressive treatment [53]. A second aspect, that may influence reported diagnostic accuracy, is verification bias [53]. The methodologically robust inclusion of all patients with functional testing and CT prior to the reference standard most likely reduced referral bias, which cannot be entirely avoided and has been reported to lead to erroneously high diagnostic sensitivity as shown by Ladapo and co-workers [54, 55]. The solid approach of comparing CT and functional testing with the reference standard ICA used in the current collaborative meta-analysis may thus explain especially the lower sensitivity for functional testing compared to reports that were influenced by referral bias. Moreover, the reference standard used for this comparison was also a morphological imaging test (invasive catheter angiography), similar to CT, which may also explain the low diagnostic accuracy of functional tests in this analysis. In addition to that evidence is missing whether functional tests using state-of-the-art technology provide better diagnostic accuracy as most of the mentioned studies were performed a SPECT generation ago. Magnetic resonance imaging, which has shown higher diagnostic accuracy than SPECT in the CE-MARC study [54], was only rarely done as cardiac stress in patients included in COME-CCT, thus intraindividual comparison was not performed. Our results are also supported by a recent study by Patel et al, demonstrating that performing CTA first leads to the highest diagnostic yield of ICA (70%) while using functional testing leads to a lower diagnostic yield (45%) [55]. Furthermore, functional stress tests have been shown not to improve discriminative ability [56]. With our results, we have also clearly shown the ability of CTA for the identification of patients with moderate-to-severe CAD. However, especially in this patient cohort these results do not necessarily prove or evaluate the possible reduction of unnecessary ICA, which should be addressed in future studies and analyses. In summary, the present work provides further evidence for the superior diagnostic accuracy of CTA compared to exercise-ECG and SPECT.

Moreover, the SCOT-HEART and the CRESCENT trial (Calcium imaging and selective CTA in comparison to functional testing for suspected CAD), both comparing CTA with functional cardiac tests, found a reduction in cardiac events for the CTA group after a median follow-up of 1.7 years or 1.2 years [4, 57, 58]. The improvement is likely due to the change in preventive therapy regimens through CTA, such as prescription of statins, aspirin and smoking cessation, especially in the large patient group with non-obstructive CAD [58]. Interestingly, a post-hoc analysis of the SCOT-HEART trial revealed an association of exercise-ECG with revascularisation procedures and future risk of adverse coronary events, but to a lower extent than CTA while CTA also offers information about undetected CAD and improves clinical decision-making [59].

Study limitations and strengths

Our meta-analysis had three major limitations. Fourteen studies including 1367 patients (47.8%) used CT scanners with less than 64 detector rows [5, 6, 24, 26, 30,31,32, 34, 35, 37, 39, 41,42,43, 45]. These studies contributed to the majority of the non-diagnostic test results, and because of the conservative approach that was used, this led to lower sensitivities and specificities for CTA. However, CTA still outperformed SPECT and exercise-ECG. Nowadays, the use of updated state-of-the-art technology with more than 64 detector-row CT scanners may increase diagnostic performance in general.

Assuming increased diagnostic accuracy, this would also most likely lead to improvement with less frequent non-diagnostic results and an overall reduction in radiation dose leading to more wider availability of CTA to further patients. Second, the use of obstructive CAD defined by the COME-CCT collaborators as the reference standard is not optimal for evaluating functional tests. The PACIFIC study demonstrated using FFR in ICA as a reference standard for detecting hemodynamically significant stenoses, CTA with a ≥ 50% lumen reduction as a cut-off for significant stenoses criterion performed worse than PET [62]. Yet, using obstructive CAD in ICA as the reference standard reflects clinical practice with a low adoption rate of below 10% of FFR during ICA [50]. The third limitation is the amount of missing data which was addressed by using multiple imputations for reduction of bias as described to be superior to complete case analysis even with large proportions of missing data [63, 64].

The major strength of this study is the IPD meta-analysis approach to diagnostic accuracy using GLMM, which has not been used before in comparing the diagnostic accuracy of CTA with SPECT and exercise-ECG and is generally rarely employed in diagnostic accuracy studies [65,66,67,68,69]. There was between-study heterogeneity for 1-specificity (intercept) and sensitivity. We assume that heterogeneity was most likely due to differences in the patient population. However, GLMM can account for some degree of heterogeneity when the study is introduced into the model as a random effect, as has been done in this IPD meta-analysis [65].

Conclusions

Coronary CTA improves the diagnostic assessment of patients with suspected obstructive CAD based on stable chest pain when compared with functional stress testing. Diagnostic benefits of CTA over cardiac stress testing are seen across a wide range of clinical pretest probabilities and CTA should become widely adopted in patients with intermediate pretest probability.