Introduction

Coronary artery disease (CAD) is the most prevalent heart disease [1] and affects 7.2% of adults aged ≥ 20 in the United States [2]. It is a particular health concern in densely populated countries [3]. CAD mainly manifests as coronary artery stenosis, which arises from plaques within arteries caused by the deposition of cholesterol, calcium, and fat [4]. These plaques can obstruct arteries, and then restrict oxygen-rich blood flowing into the heart, potentially leading to fatal outcomes. Consequently, accurately quantifying the degree of coronary artery stenosis is crucial for early diagnosis, risk assessment, and the management of CAD [5]. Early identification and management of CAD is essential in clinical practice since it can reduce the risk of acute coronary events and sudden death.

While invasive coronary angiography (CAG) is still the gold standard for the diagnosis of stenosis [6], several non-invasive techniques are available, such as Electron Beam Computed Tomography (EBCT), Multi-Detector Computed Tomography (MDCT), and Coronary Magnetic Resonance Angiography [7]. However, these methods heavily rely on clinical practitioners’ expertise and experience. With recent advancements in artificial intelligence and statistical theory, image-based artificial intelligence has been successfully applied to the diagnosis and treatment of diseases in clinical practice. The application of artificial intelligence in the medical field, particularly deep learning, has garnered significant attention from researchers. Deep learning has been widely utilized across various domains, including drug discovery and biomedical signal analysis. For instance, neural network-based models have been effectively employed to predict drug permeability across the placenta [8]. Machine learning approaches, integrating fingerprint amalgamation and data balancing, have been used to comprehensively analyze drug permeability through the blood-brain barrier [9]. Additionally, deep learning methods have been applied to estimate age and gender from electrocardiogram signals [10], and food recognition has been automated via deep learning models [11]. Over the past decade, deep learning has made significant advancements in medical information science and image analysis [12], particularly through the use of convolutional neural networks (CNNs). CNNs have made notable advancements in fields like computer vision and speech recognition. For example, efficient CNNs such as ConvUNeXt and DRU-Net have been developed for medical image segmentation [13, 14], highlighting the strengths of deep learning in ultrasound image segmentation [15]. Additionally, neural networks with improved segmentation accuracy have been developed for liver CT image segmentation and liver ultrasound image segmentation. [16, 17]. Researchers have proposed a new framework that can effectively bridge CNNs and transformers (Cotr) for precise 3D medical image segmentation [18]. In the context of liver tumor diseases, image segmentation serves as the first step for clinicians in optimizing diagnosis, staging, treatment planning, and interventions, which could potentially impact diagnostic and therapeutic outcomes [19]. Additionally, researchers have explored the risk assessment of computer-aided diagnostic software for hepatic resection [20]and evaluated the effectiveness of fusion imaging for immediate post-ablation assessment of malignant liver neoplasms [21]. Deep learning also appears to show promising results in the diagnosis of certain complex cardiac diseases. For instance, artificial intelligence and deep neural networks have been employed to identify electrocardiographically concealed long QT syndrome using surface 12-lead electrocardiograms (ECGs), demonstrating their potential in the intelligent diagnosis of diseases [22].

In this context, some researchers have explored deep learning models based on coronary angiography (CAG) or coronary CT angiography (CCTA) images to identify the degree of coronary artery stenosis. Nevertheless, comprehensive evidence substantiating their efficacy remains insufficient. Therefore, we conducted this systematic review and meta-analysis of previously published studies to provide the following evidence for the use of deep learning in the diagnosis of coronary artery stenosis: (1) A review of the accuracy of deep learning in differentiating various levels of coronary artery stenosis based on CAG or CCTA; (2) A review of the accuracy of deep learning in diagnosing coronary artery stenosis for both binary classification and multiclass classification tasks.

Methods

Study registration

This study was carried out following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines and prospectively registered with PROSPERO (ID: CRD42023444635).

Eligibility criteria

Inclusion criteria

  1. (1)

    Studies that have fully developed deep learning models for identifying the degree of coronary artery stenosis, including both binary classification and multiclass classification tasks.

  2. (2)

    Some studies may utilize the same publicly available database. We acknowledged the contributions of these studies, and included various deep learning investigations conducted on the same dataset.

  3. (3)

    In previous research, there may be a small number of studies that have validated previously developed deep learning models. These studies were also included in our systematic review.

  4. (4)

    The types of studies included were case-control studies, cohort studies, and cross-sectional studies.

  5. (5)

    Studies had to be written in English.

Exclusion criteria

  1. (1)

    Meta-analyses, reviews, guidelines, and expert opinions were excluded.

  2. (2)

    Studies lacking model validation were excluded.

  3. (3)

    Studies did not report the following outcome measures for model accuracy: ROC, c-statistic, c-index, sensitivity, specificity, accuracy, recovery, precision, confusion matrix, diagnostic fourfold table, F1 score, and calibration curve.

  4. (4)

    Studies solely focused on image segmentation and reconstruction.

Data sources and search strategy

PubMed, Cochrane, Embase, and Web of Science were thoroughly retrieved up to April 11, 2023. Both MeSH terms and free-text keywords were used, without restrictions on publication region or year. Details of the search strategy are available in Table S1.

Study selection and data extraction

All identified articles were imported into EndNote software. Duplicates were automatically and manually removed. After checking titles and abstracts, we deleted irrelevant studies. The full texts of the remaining articles were reviewed to determine eligible studies.

A structured form was used to extract data, including title, first author, year of publication, author’s country, study type, patient source, image source, diagnostic criteria for coronary artery stenosis, degree of coronary artery stenosis, total number of cases, number of coronary artery stenosis cases in training set, total number of cases in training set, generation methods of validation set, number of coronary artery stenosis cases in validation set, number of cases in validation set, and the type of models used. The literature screening and data extraction were independently implemented by two researchers, followed by cross-checking. Any dissents were addressed by a third researcher (a cardiology expert).

Risk of bias assessment

The QUADAS-2 tool (Quality Assessment of Diagnostic Accuracy Studies-2) was leveraged to discern the risk of bias and applicability of the included studies [23]. This tool encompassed specific questions in four aspects: patient selection, index test, reference standard, and flow and timing. Each question was answered as Yes, No, or Unclear, which suggested a low, high, or unclear risk of bias, respectively. Studies were graded at low risk of bias if all key questions in each domain were answered with Yes. If all signal questions were answered as No, there was potential bias. An unclear risk was considered if there was no sufficient information for a definitive judgment.

The quality assessment was independently conducted by two researchers (a clinician with 5 years of experience in cardiology and a clinician with over 5 years of cardiology experience). Then their results were cross-checked. Discrepancies were resolved by a third researcher (a cardiology expert).

Outcomes

This study assessed the accuracy of deep learning in detecting the degree of coronary artery stenosis. Notably, the degree thresholds for defining stenosis were inconsistent across studies from different countries, and > 25%, > 50%, and > 70% were mainly used.

Synthesis methods

Since a limited number of the included studies reported the number of cases and images of different stenosis degrees, we failed to perform the planned meta-analysis using a bivariate mixed-effects model and diagnostic 2 × 2 tables, as described in the registered protocol. Consequently, a meta-analysis of sensitivity was conducted instead. Heterogeneity across studies was examined utilizing the Q test and I² index. A fixed-effects model was employed for meta-analysis if I² <50%, while a random-effects model was applied if I² >50%. Subgroup analyses, sensitivity analyses, and meta-regression analyses were implemented to discern the sources of heterogeneity.

Results

Study selection

A total of 2,139 reports were identified from PubMed, Cochrane, Embase, and Web of Science. After the removal of 749 duplicate publications, titles and abstracts were checked, and 42 potentially relevant studies were selected. After a thorough full-text review, 18 studies were eligible and included [4, 24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39]. The literature screening process is shown in Fig. 1.

Fig. 1
figure 1

Visually depicts the literature screening process

Study characteristics

The 18 included studies encompassed 3,568 patients and 13,362 vascular images. These studies were published between 2019 and 2023, and these studies were conducted in China, the United States, Australia, Japan, Mexico, Portugal, and the Netherlands. Five studies were multicenter studies [25, 32, 33, 35, 37], while the remaining were single-center studies. Regarding image sources, 3 studies relied solely on CAG images [27, 28, 31], 11 utilized only CCTA images [4, 24,25,26, 28,29,30, 33, 37,38,39], and 4 used both modalities [27, 32, 35, 36]. Twelve studies used binary classification tasks [24,25,26, 28, 29, 31, 32, 34,35,36], while six focused on multiclass tasks [4, 27, 30, 33, 37, 38]. The threshold for defining coronary artery stenosis varied across studies, with > 25%, > 50%, and > 70% being the most commonly used. Five studies used 25% as the threshold for defining coronary artery stenosis [26, 28, 30, 33, 37], seven used 50% [24, 25, 28, 32, 36,37,38], four used 70% [30, 33, 37, 38], and two did not explicitly define the threshold [4, 35]. Detailed information is provided in Table 1.

Table 1 Detailed information is provided in basic information

Risk of bias

While all included studies employed a case-control design, this design did not impact the assessment of the performance of deep learning. Thus, the risk of bias in patient selection is considered low. Additionally, although the included studies did not describe whether the index tests were interpreted without the knowledge of the results of the reference standard, their results were obtained using artificial intelligence, which did not affect the assessment of the performance of deep learning. As such, the risk of bias in the index test was also deemed low. Besides, most studies lacked explicit information on the blinding in the assessment of the reference standard. However, there was a reasonable interval between the index tests and reference standard, and all patients received a reference standard, suggesting a low risk of bias in flow and timing.

Meta-analysis

Binary classification tasks

For binary classification tasks, most studies mainly used < 50% and > 50% as the threshold for defining coronary artery stenosis, while limited studies used 25% and 70%. Based on vascular images, the meta-analysis revealed that the accuracy of deep learning models was 0.79 (95% CI: 0.64–0.94) for detecting < 50% stenosis and 0.73 (95% CI: 0.58–0.88) for > 50% stenosis. However, the results for 25% and 70% stenosis should be interpreted with caution (Fig. 2). Likewise, based on patients, the meta-analysis showed higher performance of deep learning models in the diagnosis of < 50% stenosis (0.83, 95% CI: 0.74–0.93) compared to > 50% stenosis (0.79, 95% CI: 0.66–0.91) (Fig. 3).

Fig. 2
figure 2

Results of vessel-based meta-analysis for binary classification tasks

Fig. 3
figure 3

Results of per-patient-based meta-analysis for binary classification tasks

Based on vascular images, the meta-analysis showed that the accuracy of CCTA-based models was 0.80 (95% CI: 0.76–0.84) for detecting < 50% stenosis, 0.76 (95% CI: 0.53–0.99) for > 50% stenosis, 0.81 (95% CI: 0.76–0.84) for < 25% stenosis, and 0.81 (95% CI: 0.75–0.88) for > 25% stenosis (Fig. 4). Based on patients, the meta-analysis revealed that the accuracy of these models was 0.87 (95% CI: 0.67–1.07) for detecting < 50% stenosis, 0.70 (95% CI: 0.62–0.79) for > 50% stenosis, 0.85 (95% CI: 0.81–0.90) for < 25% stenosis, and 0.72 (95% CI: 0.62–0.82) for > 25% stenosis (Fig. 5).

Fig. 4
figure 4

Results of vessel-based meta-analysis for models based on CCTA images

Fig. 5
figure 5

Results of per-patient-based meta-analysis for models based on CCTA images

Multiclass classification tasks

For multiclass classification tasks, based on vascular images, the meta-analysis revealed that the accuracy of deep learning models was 0.78 (95% CI: 0.73–0.84) for detecting 0–25% stenosis, 0.86 (95% CI: 0.78–0.93) for 25–50% stenosis, 0.83 (95% CI: 0.70–0.97) for 50–70% stenosis, and 0.70 (95% CI: 0.42–0.98) for 70–100% stenosis (Fig. 6). Based on patients, the meta-analysis revealed that the accuracy of deep learning models was 0.72 (95% CI: 0.48–0.95) for detecting 0–25% stenosis, 0.78 (95% CI: 0.55–1.00) for 25–50% stenosis, 0.65 (95% CI: 0.30–1.00) for 50–70% stenosis, and 0.74 (95% CI: 0.54–0.94) for 70–100% stenosis (Fig. 7).

Fig. 6
figure 6

Results of vessel-based meta-analysis for multiclass classification tasks

Fig. 7
figure 7

Results of per-patient-based meta-analysis for multiclass classification tasks

Discussion

Summary of the main findings

Deep learning is relatively accurate in detecting coronary artery stenosis, especially > 50% and < 50% stenosis. In binary classification tasks, the accuracy of deep learning models in detecting < 50% stenosis at the vessel level was 0.79 (95% CI: 0.64–0.94), and 0.73 (95% CI: 0.58–0.88) for > 50% stenosis. In the multiclass classification tasks, the accuracy was 0.86 (95% CI: 0.78–0.93) for 25–50% stenosis and 0.83 (95% CI: 0.70–0.97) for 50–70% stenosis.

Coronary artery stenosis is often caused by atherosclerosis which narrows the coronary artery lumens. In most studies, significant stenosis is typically defined as a 50% or higher degree of stenosis [40]. The treatment of CAD primarily relies mainly on drugs, often combined with reperfusion therapies through interventional procedures or bypass surgery. These treatments aim to improve coronary blood flow, prevent cardiovascular events, and enhance quality of life [41]. The specific treatment approach would be formulated according to the severity and number of coronary artery stenosis. Existing studies mainly focus on significant stenosis (> 50%). Our findings demonstrate the high accuracy of deep learning in detecting stenosis > 50%, suggesting that deep learning is a feasible technique.

Recent research has mainly used image-based techniques to detect coronary artery stenosis. Song-Bai Deng et al. [42] reported that FFRCT showed a sensitivity of 90% (95% CI: 85–93%) and a specificity of 72% (95% CI: 67–76%) in the diagnosis of CAD. A review by Zhenhua Xing et al. [43] described the accuracy of quantitative flow ratio (QFR) in the assessment of moderate coronary artery stenosis, with a sensitivity of 0.89 (95% CI: 0.86–0.92) and a specificity of 0.88 (95% CI: 0.86–0.91). Shun-Lin Guo [44] reported that dual-source computed tomography (DSCT) achieved a sensitivity of 0.957 (95% CI: 0.943–0.969) and specificity of 0.930 (95% CI: 0.910–0.940) in the diagnosis of CADs when the degree of ≥ 50% was used as the threshold to define significant stenosis. Although these studies have reported promising performance of deep learning in binary classification tasks, they overlook multiclass scenarios. However, assessing varying degrees of coronary artery stenosis is prevalent in clinical practice. Hence, our comprehensive review highlights favorable performance of machine learning in the diagnosis of vascular stenosis in multiclass settings, with an accuracy rate of 0.78 (95% CI: 0.73–0.84) for 0–25% stenosis, 0.86 (95% CI: 0.78–0.93) for 25–50% stenosis, and 0.83 (95% CI: 0.70–0.97) for 50–70% stenosis.

Image segmentation is a key technique in the field of computer vision. It involves the process of dividing an image into multiple regions or objects [45]. In medical imaging, particularly in cardiovascular imaging, segmentation is a critical step for enhancing the diagnosis of cardiovascular diseases [46]. Segmenting images facilitates the extraction of quantitative data, such as the degree of luminal stenosis, plaque burden, and arterial wall thicknes [47]. Additionally, segmented images can provide a clear and detailed visual representation of the coronary arteries, which enables clinicians to easily identify areas of concern [48]. Reproducibility in image segmentation is vital for disease detection, yet it remains a significant challenge in ongoing research [20]. Segmented images can be integrated with other diagnostic modalities, such as CT angiography, MR angiography, or intravascular ultrasound (IVUS), to provide a comprehensive assessment of coronary artery disease [49, 50]. Segmentation methods include automated and semi-automated approaches based on deep learning, as well as manual segmentation. Deep learning-based segmentation reduces the time and effort required for manual segmentation and minimizes human error [51]. In current research, segmented images are often analyzed using specialized software to extract texture features, which are then used to construct traditional machine learning models for disease diagnosis or prognosis prediction [52, 53]. These approaches have shown promising results. Similarly, researchers can adopt deep learning methods to directly train on segmented images for disease diagnosis. In the diagnosis of certain diseases, deep learning appears to have advantages over traditional machine learning [54]. In the original studies included in our systematic review, the research focused on deep learning models built on image segmentation, which demonstrated promising accuracy in improving the identification of coronary artery stenosis.

Nowadays, artificial intelligence research based on imaging faces several great challenges. The development of AI across different imaging modalities presents certain difficulties. For example, in the study by Snigdha Mohanty et al. [55], a novel non-rigid method was introduced to register both the same and cross-imaging modalities, such as MRI, CT, and 3D rotational angiography. Even within the same modality, the impact of noise during the image segmentation process remains a serious challenge in practical applications [56, 57]. The speed of segmentation during the process is also a challenge, and some researchers have been exploring solutions to improve running speed [58]. Additionally, in manual segmentation, the reproducibility of results remains a significant challenge due to the reliance on the prior knowledge of human experts.

Strengths and limitations

This study is the first to discuss the accuracy of deep learning in detecting coronary artery stenosis, providing valuable evidence for future research. However, it still has some limitations. First, despite our systematic search, the number of the included studies is limited. Therefore, more studies are required for validation in the future. Second, these studies employed diverse thresholds for defining coronary artery stenosis, with some thresholds analyzed in only a few studies. Hence, the results should be interpreted cautiously. Third, the models were mainly built on both CCTA and CAG images and seldom on CAG images, and thus the results should be interpreted with caution.

Conclusions

In summary, deep learning methods have recently gained significant attention and have been widely used in the intelligent detection of coronary artery stenosis. Our meta-analysis reveals that these methods are relatively accurate for detecting coronary artery stenosis. Existing studies focus on both binary and multiclass classification tasks, but actually, the latter appears more applicable to clinical practice. However, accurately detecting smaller degrees of stenosis in multi-class settings remains challenging. Therefore, future research is needed to develop more efficient deep learning models to enhance the detection of coronary artery stenosis.