Abstract
Background
Consistency of results between different readers is an important issue in medical imaging, as it affects portability of results between institutions and may affect patient care. The International Atomic Energy Agency (IAEA) in pursuing its mission of fostering peaceful applications of nuclear technologies has supported several training activities in the field of nuclear cardiology (NC) and SPECT myocardial perfusion imaging (MPI) in particular. The aim of this study was to verify the outcome of those activities through an international clinical audit on MPI where participants were requested to report on studies distributed from a core lab.
Methods
The study was run in two phases: in phase 1, SPECT MPI studies were distributed as raw data and full processing was requested as per local practice. In phase 2, images from studies pre-processed at the core lab were distributed. Data to be reported included summed stress score (SSS); summed rest score (SRS); summed difference score (SDS); left ventricular (LV) ejection fraction (EF) and end- diastolic volume (EDV). Qualitative appraisals included the assessment of perfusion and presence of ischemia, scar or mixed patterns, presence of transient ischemic dilation (TID), and risk for cardiac events (CE). Twenty-four previous trainees from low- and middle-income countries participated (core participants group) and their results were assessed for inter-observer variability in each of the two phases, and for changes between phases. The same evaluations were performed for a group of eleven international experts (experts group). Results were also compared between the groups.
Results
Expert readers showed an excellent level of agreement for all parameters in both phase 1 and 2. For core participants, the concordance of all parameters in phase 1 was rated as good to excellent. Two parameters which were re-evaluated in phase 2, namely SSS and SRS, showed an increased level of concordance, up to excellent in both cases. Reporting of categorical variables by expert readers remained almost unchanged between the two phases, while core participants showed an increase in phase 2. Finally, pooled LVEF values did not show a significant difference between core participants and experts. However, significant differences were found between LVEF values obtained using different software packages for cardiac analysis.
Conclusions
In this study, inter-observer agreement was moderate-to-good for core group readers and good-to-excellent for expert readers. The quality of reporting is affected by the quality of processing. These results confirm the important role of the IAEA training activities in improving imaging in low- and middle-income countries.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
The International Atomic Energy Agency (IAEA) is an independent, intergovernmental science and technology-based organization which is part of the United Nations family of organizations.1 The IAEA works with its 170 Member States (MS) and multiple partners worldwide to promote the safe, secure and peaceful use of nuclear technologies. The IAEA supports nuclear medicine through activities of the Nuclear Medicine and Diagnostic Imaging Section (NMDI) within a quality assurance framework.2,3 The nuclear medicine programme contributes to achieving the sustainable development goals (SDGs) set by the United Nations, one of which is “by 2030, reduce by one third premature mortality from non-communicable diseases through prevention and treatment and promote mental health and well-being”.4
Considering the burden of cardiovascular diseases (CVD) as a major threat to public health worldwide,5,6 and the important role of nuclear techniques such as myocardial perfusion imaging (MPI) in the management of patients with ischemic heart disease (IHD),7,8,9 the NMDI Section adopted a strategic decision of strengthening capacity building in nuclear cardiology (NC), providing training through national and regional projects,10 supported by the Technical Cooperation Programme (TCP), which is the IAEA’s main mechanism for transferring nuclear technology to Low- and Middle-Income Countries (LMICs).11 Educational activities in NC include several Regional Training Courses (RTC) carried out over the past ten years.
This paper reports the results of an audit of NC practices (the I-MAP study), initiated in 2015 to assess whether and how training provided through RTCs impacted the quality of clinical practice. The primary goal was to assess homogeneity (i.e. intra- and inter-observer variability) within a group of core participants from LMICs. As secondary goals the study aims at a) evaluating the impact of IAEA activities in NC; b) comparing the readings of MPI studies in limited resource centres with those of international experts; c) evaluating the quality of reporting and d) assessing the impact of the reconstruction of MPI studies on the quality of reporting.
Methods
Recorded contact data from all attendees to RTCs in NC was retrieved. In the preceding 10 years, 896 participants had attended a total of 41 RTCs. Their regional distribution is reported in Appendix (Table 5). To make sure that those trainees, prospective participants to this study, were still actively involved in NC, that list was cross-checked with data from an international database managed by the IAEA.12 Of the 896 participants, 275 were identified as being currently active as nuclear cardiologists, and were approached for potential participation this study. Of these, 24/275 (8.7%), participated in the study. They formed the group referred to as “core participants.” Figure 1 reports their distribution around the world. The “core participants” group included physicians trained in nuclear medicine, with limited formal training in nuclear cardiology, in most cases acquired through short-term fellowships supported by the IAEA and/or trained “on the job.” Their yearly average volume of MPI studies was 880, with a minimum of 559 and a maximum of 1200.
The second group of “expert readers” consisted of eleven international experts identified by the Agency from a pool of its consultants and lecturers, and internationally recognized nuclear cardiologists. Overall, for the experts, the yearly volume of SPECT-MPI studies was on average double that of the core participants.
Both core participants and expert readers were requested to report anonymized case studies provided by a Core Lab, chosen on the basis of sound NC practice and significant record of research. The core lab identified 15 studies which, after anonymization, were uploaded onto a cloud-based collaborative platform (SharePoint™) and then downloaded from both core participants and experts.
All studies were carried out with the two-day protocol, using Tc99m labelled perfusion agents, and patients were imaged only in supine position. To provide readable studies for centres with limited technical resources, the core lab was asked to send studies processed with neither resolution recovery, nor scatter or attenuation correction, nor studies acquired with CZT cameras. Clinical data, including patients’ history, rest and stress ECG recordings and symptoms during stress were made available to participants. Relevant demographic and clinical data are summarized in Table 1.
We designed I-MAP to be run in two phases. In Phase 1, all 15 patient studies were provided as raw data. Both groups were requested to process them according to their own routine practice. For Phase 2, the same 15 cases were re-submitted in a different order, but pre-processed at the core lab using Myovation v3 software (GE Health Care; Haifa, Israel) with an iterative reconstruction ordered subset expectation maximization algorithm (2 iterations, 10 subsets) and motion correction. The “cool” GE colour scale was applied for tomographic slices representation. Both groups of participants were unaware that they were re-reading the same studies. This second phase was aimed at assessing whether reconstruction could have any impact on the overall quality of the study and consistency of interpretation. An example of a pre-processed patient study, as distributed in phase 2, is illustrated in Figure 2.
We used standardized forms for data collection which were forwarded to the core lab for statistical analysis. After on-site processing for phase 1, and based on images provided by the core lab for phase 2, readers were requested to score tracer uptake in polar maps using a 17-segment model (Figure 3A). An important distinction is that while in phase 1 readers could accept any score given by the cardiac software, in phase 2 they had to digit their own interpretation. The severity of perfusion defects in each of the 17 myocardial segments, as defined by the American Heart Association13 is scored on a 0-4 scale.
Data to be reported included quantitative perfusion metrics such as Summed Stress Score (SSS); Summed Rest Score (SRS); and Summed difference Score (SDS). SDS results were pooled to generate three categories: (a) SDS ≤ 3; (b) 4 ≤ SDS ≤ 7 and (c) SDS ≥ 8.14.
For left ventricular function, quantitative data were reported on Left Ventricular Ejection Fraction (LVEF) and End Diastolic Volume (EDV), while regional wall motion was reported based on visual assessment. Other qualitative, or visual, appraisals included the assessment of perfusion, classified as normal or abnormal. In this latter case, readers had to report presence of ischemia, scar or mixed patterns. Another parameter visually analysed was presence or absence of Transient Ischemic Dilation (TID). Both groups were also requested to provide an overall judgment about patients being at high risk or not (PHR).
Furthermore, we aimed at assessing the relationship between the overall judgment of the status of perfusion, either normal or abnormal, and uptake scores (SSS; SRS) as the sum of scores assigned to each single segment. To this purpose and to avoid the possibility that high SSS values could just be the result of the sum of mild defects scattered throughout the myocardial wall, not representing significant perfusion defects, we defined “hypoperfusion cluster” as the presence of a real perfusion defect, when two adjacent segments scored ≥2. Then, we assessed the relationship between SSS values and the number of hypoperfusion clusters identified in the polar maps.
To evaluate the inter-reader concordance of hypoperfusion assessments, SDS values were stratified into three categories, a) SDS ≤3; b) 4 ≤ SDS ≤ 7 and c) SDS ≥ 8.14 For each study, each group of readers (both experts and core participants), and for both phase 1 and 2, the rate of responses for each of the three different SDS categories, was evaluated. These three categories have been called “SDS strat.”
For phase 1 we also tested the consistency of quantitative data, such as LVEF and EDV, since they were calculated using different software. This evaluation was run only for phase 1, since in phase 2 participants were provided pre-processed studies. Variables LVEF post stress and LVEF rest were analyzed using univariate analysis of variance (ANOVA).
Finally, we tested the repeatability of LVEF values when different processing software was used. To avoid the increased risk of Type I errors because of the multiple simultaneous hypotheses being tested, we adjusted P values using the Bonferroni method.15
Statistical Analysis
For statistical analysis, data were collected on Excel spread sheets and analysed using the Statistical Package for Social Sciences (SPSS; IBM® SPSS® Statistics Release 24); For hypothesis testing, Student’s t-test, analysis of covariance (ANCOVA), ANOVA, and Chi-square test for proportions were used as appropriate, the latter for assessing difference in response rates between groups and phases. Intra-rater and inter-rater agreement were assessed:
by means of the intra-class correlation coefficient (ICC), for continuous measurements (EDV, LVEF, SSS, SRS, SDS). ICC is a measure of agreement that combines information on both the correlation and the systematic differences between readings16,17; using ICC, the level of agreement is classified into four categories
by means of the Fleiss’ kappa, for categorical variables (Function, Perfusion, TID, SDS strat, patient high risk). Using Fleiss’ kappa (κ) scores, the level of agreement is classified into seven categories.18,19,20
Values for either SSS and SDS reported from the two groups in phase 1, when MPI studies were supplied as raw data and each participant had to completely process and assess using their own software, were compared with those reported from phase 2, where studies were supplied pre-processed at the core lab and participants had to visually score segmental perfusion.
Results
For continuous variables (EDV; LVEF; SSS; SRS) ICC values and the corresponding concordance category are reported in Figure 3 and in Table 2.
Metrics for EDV and LVEF are assessed only for phase 1, as in phase 2 these data were already calculated at the Core lab. Expert readers showed an excellent level of agreement for all parameters in both phase 1 and 2, spanning from 0.85 for LVEF at rest to 0.94 for EDV post-stress. In phase 1, concordance levels for core participants were rated as good for all parameters (from 0.64 to 0.71), except for LVEF at rest and EDV post stress, which were rated as excellent (0.75 and 0.76, respectively). Interestingly, both parameters which were re-evaluated in phase 2, i.e. SSS and SRS, showed an increased level of concordance, up to 0.87 and 0.86 (excellent).
Fleiss’ kappa values for categorical variables are summarized in Figure 4 and Table 3, along with the significance of concordance. In this case, reports from phase 1 and 2 are compared for all variables. For those variables, categories of agreement for expert readers between the two phases remained almost unchanged, with the exception of TID, while core participants showed an increase for all variables.
Relationship between SSS values as reported from both experts and core participants and the number of “hypoperfusion clusters”, as derived from polar maps, is summarized in Figure 5. In more detail, Figures 5A and B represent results from experts in phases 1 and phase 2, respectively; while in Figures 5C and D the same analysis is reported for Core participants.
If we consider SSS mean values as a function of cluster number and then we determine a linear interpolation between the experimental data, we observe a tendency towards statistical significance (F=3.64 and p=0.057) for curve slopes only between phase 1 and phase 2 for core participants (Table 4).
As already described, based on SDS values, patients have been stratified (SDS-strat) as “low risk” (SDS ≤3); “intermediate risk” (4 ≤ SDS ≤ 7) and “high risk” (SDS ≥ 8), according to their SDS value. Analysing differences in risk stratification as described by SDS values between phases, we found that there is a significant difference for SDS strat between phases 1 and 2, for 3 studies out of 15 in the core participants group, and in 2 of 15 in experts.
As already described, participants were encouraged to analyze and report submitted studies according to their daily routine, including use of their cardiac software. Well aware of the possible impact on calculated values such as LVEF and EDV, information was also collected on type of cardiac software utilized. For both Core participants and Experts, the distribution of the different cardiac software available on the market is reported in Appendix (Table 6).
Evaluations on LVEF values included the following factors: Group (2 levels: Experts, Core Participants); cardiac software (5 levels: 4DMCardio; CedarsSinai; EmoryCardiacToolBox; InterView; Other); Case Study number (15 levels: patient studies 1-15). Changes in variables were assessed as a function of factors and interaction between factors themselves. Results are shown in Table 7 of the Appendix. There were significant differences in the LVEF values calculated both post-stress and at rest and for values calculated from the different types of software. The Bonferroni post-hoc analysis of multiple comparisons shows that one of the software packages (EmoryCardiacToolBox) systematically produces an LVEF value significantly lower than 4DMCardio, CedarsSinai, and Other software (range of differences: − 8.2% to − 10.8%); while no significant differences are found with the InterView software (see Table 8 in Appendix for details).
Overall, LVEF post-stress values are not significantly different between core participants and experts (Table 9 in Appendix). Average SD levels for the readings of core participants were about twice as high as the average SD levels for the experts group (10.4% vs 5.8%), a finding which was also expressed in the higher ICC for the latter group (Figure 3). Case 11 that caused relatively larger SD values in both core participants and experts readers groups (18.5 and 19.4, respectively) is represented in Figure 6.
Discussion
Often, in medical imaging, interpretation of results is subjective21,22,23,24,25,26,27,28,29,30 and can be influenced by technical considerations. Quality plays a pivotal role when analysing and reporting an imaging study. Several factors can affect the results of the analysis and the value of the studies. This is true for all modalities and in the case of SPECT MPI,31,32,33,34,35 which is the subject of this study, it is crucial to ensure that the acquisition and reconstruction parameters are consistent and optimized, thus allowing accurate and reproducible results.
Several factors, in different phases of the procedure, might influence the final results of MPI studies and require scrutiny. They include, but are not limited to, pre-examination checks, such as appropriateness of reference, QA/QC of equipment and radiopharmaceutical preparation, to steps to be taken during examination, such as QA/QC of acquisition parameters and of processing and reporting. We geared the I-MAP study towards assessing the quality of processing and reporting.
We examined the reliability of SPECT MPI studies using inter-observer variability within two groups of participants: one made of practitioners from LMICs, which are indeed the target of IAEA’s educational activities, and a second group of expert readers. The first group of “core participants” was composed of nuclear cardiology professionals who attended training events managed by the IAEA, many of them working in settings where financial resources might be limited, therefore with limited experience and limited resources for improving their expertise. As regards the study, it was run in two phases and in both of them participants had to report the same group of 15 cases, with the important difference that in phase 1 all participants were provided raw data and were requested to process them according to their routine practice and then report. In phase 2, all participants were given, in different order, the same 15 cases pre-reconstructed and were requested to provide their segmental uptake score, visually assessed, as well as other qualitative interpretations. Both groups were unaware that in phase 2 they were re-evaluating the same studies.
For quantitative data such as EDVs and LVEFs, an excellent level of concordance was found within both groups for both phase 1 and 2 (Table 2; Figure 3). Concordance was also excellent within the experts group for SSS and SRS values in both phases.
It’s very Interesting that, for the latter two parameters (SSS and SRS), core readers showed an excellent intra-group agreement in phase 2 when they had to provide their own evaluation on pre-processed images (0.87 and 0.86; for SSS and SRS respectively), while in phase 1, when they had to process the studies and scores were automatically calculated by their software, concordance was only good, being 0.66 and 0.64; for SSS and SRS respectively).
It should be remembered that while in phase 1 readers could accept segmental scores from their own software, or override if needed, in phase 2 scores had to be visually assessed and manually entered into the forms, therefore reflecting a qualitative rather than a semi-quantitative evaluation. Therefore, we relate this improvement to the central role of processing: when less experienced readers are presented with well processed studies and are forced to score perfusion status, their readings are as good as experts’ readings. This finding confirms that processing remains a crucial step for the overall SPECT MPI evaluation and that experience and training plays a major role for good quality processing. Furthermore, this finding tells us that, besides physicians who actually are those who read studies, IAEA training events should also involve technologists who often perform the processing.
Further confirmation of the importance of processing is found when we compare performances between the two groups for risk stratification. In this case, when we analysed differences between the experts panel and the core participants group, we have found that in phase 1 a significant difference could be seen in 2/15 cases, while no difference could be seen between the two groups for phase 2, when the core lab distributed pre-processed studies.
Fleiss’ kappa value is a rather stringent index, very sensitive to even small deviations between readers which may cause an important worsening of calculated values. In this study, it showed that experts, as expected, had a greater concordance in interpretation, in both phases of the study, while for core participants concordance improved significantly between phase 1 and 2. This finding holds true for both the analysis of continuous variables and for SSS and SRS indexes. Once more, this finding supports the notion that interpretation in itself is not the issue, but what is going to be interpreted is. When study processing is not properly carried out, then interpretation suffers.
A tendency of core participants to give an overall evaluation of “normal perfusion” even in presence of significant SSS values and hypoperfusion clusters was observed (Figure 5).
The greater variability in interpreting on-site processed images, as requested in phase 1, might well be affected by poor alignment of slices because of bad selection of left ventricular axes, valve planes and apex. So, while experts were able to minimize the impact of processing on the quality of images, this was not the case for core participants, who indeed markedly improved their performance when they were given studies which had been pre-processed at the core lab. Pre-processing included motion correction, careful slice realignment between stress and rest acquisitions, correct choice of slice thickness to avoid artefacts due to partial volume effect, and correct colour scale levelling in presence of extracardiac hot-spots such as sub-diaphragmatic activity.
Finally, we found, as reported by other groups36,37 that important parameters such as LVEF, calculated through gated SPECT, may differ significantly when different processing software packages are used, as shown in Table 8. One software deviates substantially and significantly from almost all the other software packages, with a systematic bias in LVEF of − 8.3% down to − 10.8% which could be clinically significant when LVEF is used in clinical decision making, such as in longitudinal studies of cardio-oncological patients.
The univariate analysis of variance for LVEF post-stress and LVEF at rest was run considering the different factors involved and their interactions. Results of that analysis reported in Table 7 also show significant differences for LVEF values calculated both post-stress and at rest, and for values calculated from the different types of software.
Overall, LVEF values are not significantly different between the two groups, core participants and experts, as shown in Table 9. A relatively wide SD shown for case #11 could be attributed to factors such as patient movement during acquisition (which could have been corrected for by readers), small heart with partial volume effect, hypertrophic left ventricular walls due to hypertension, and attenuation due to obesity (Figure 6).
New Knowledge Gained
This study has shown that the quality of processing remains a crucial step for SPECT MPI and that experience helps overcome possible artefacts that may hamper the quality of reporting. As concerns the IAEA, this study shows that the outcomes of training events in NC are satisfactory, as the performance of NC professionals from LMICs does not differ significantly from expert readers in many circumstances, and particularly when good quality processing was applied to clinical studies. This latter consideration supports the concept that training courses should necessarily cover basic issues such as study processing. In addition, this study shows that LVEF values may differ significantly depending on the cardiac package employed and this should be kept in mind particularly when patients are studied in different institutions or when an institution adopts a different software package.
Limitations of the Study
The small sample size of 24 participants from LMICs is a very low response rate for survey data, challenging the generalizability of findings. Furthermore, we don’t know to what extent “core participants” are representative of the reading pattern in LMICs. This is, however, unavoidable when dealing with centres from developing world because of difficult communication as well as technical problems affecting data transfers and report transmission, which may affect active participation.
One more important limit of the study design is the choice of not requiring participants to provide images along with reporting forms. This choice was made to minimize image transmission problems, but prevented full quality checks from being performed for the processed studies.
Conclusions
The quality of reporting SPECT MPI could be rated as moderate-to-good for participants from emerging economies and good-to-excellent for expert readers. It is clearly affected by the quality of processing. Indeed, when readers with less experience are asked to report on studies pre-processed at an experienced core lab and by professionals well-trained to avoid sources of artefacts, inter-observer agreement between readers with less experience improves substantially. To our knowledge, this is the first study reporting these findings.
Significant differences were found between LVEF values obtained using different software packages for cardiac analysis. This should be kept in mind particularly when patients are studied in different institutions or when an institution adopts a different software.
This study calls for attention from scientific societies on the issue of the quality of study processing, suggesting the need for more stringent guidelines about this aspect of NC practice.
Finally, these results suggest that the outcomes of training events conducted by the IAEA in NC are satisfactory. However, in order to improve the quality of processing, future training courses should necessarily cover this issue, and should also involve technologists.
Abbreviations
- IAEA:
-
International Atomic Energy Agency
- MPI:
-
Myocardial perfusion imaging
- LVEF:
-
Left ventricle ejection fraction
- SSS:
-
Summed stress score
- SRS:
-
Summed rest score
- SDS:
-
Summed difference score
- EDV:
-
End-diastolic volume
- TID:
-
Transient ischemic dilation
- CE:
-
Cardiac events
- PHR:
-
Patient high risk
References
https://www.iaea.org/. Last accessed 22 May 2018.
https://www.iaea.org/about/organizational-structure/department-of-nuclear-sciences-and-applications/division-of-human-health. Last accessed 22 May 2018.
https://www.iaea.org/topics/nuclear-medicine-and-diagnostic-imaging-section. Last accessed 22 May 2018.
https://sustainabledevelopment.un.org/sdg3. Last accessed on 08 June 2018.
http://www.who.int/nmh/en/. Last accessed 08 June 2018.
Pradeepa R, Prabhakaran D, Mohan V. Emerging economies and diabetes and cardiovascular disease. Diabetes Technol Ther. 2012;14(Suppl 1):S59-67. https://doi.org/10.1089/dia.2012.00.
Jaarsma C, Leiner T, Bekkers SC, Crijns HJ, Wildberger JE, Nagel E, Nelemans PJ. Schalla S Diagnostic performance of noninvasive myocardial perfusion imaging using single-photon emission computed tomography, cardiac magnetic resonance, and positron emission tomography imaging for the detection of obstructive coronary artery disease: A meta-analysis. J Am Coll Cardiol. 2012;59(19):1719-28.
Metz LD, Beattie M, Hom R, Redberg RF, Grady D. Fleischmann KE The prognostic value of normal exercise myocardial perfusion imaging and exercise echocardiography: A meta-analysis. J Am Coll Cardiol. 2007;49(2):227-37.
Hachamovitch R, Hayes SW, Friedman JD, Cohen I, Berman DS. Comparison of the short-term survival benefit associated with revascularization compared with medical therapy in patients with no prior coronary artery disease undergoing stress myocardial perfusion single photon emission computed tomography. Circulation. 2003;107(23):2900-7.
Dondi M, Andreo P. Developing nuclear medicine in developing countries: IAEA’s possible mission. Eur J Nucl Med Mol Imaging. 2006;33:514-5.
Casas-Zamora JA, Kashyap R. The IAEA technical cooperation programme and nuclear medicine in the developing world: Objectives, trends, and contributions. Semin Nucl Med. 2013;43(3):172-80. https://doi.org/10.1053/j.semnuclmed.2012.11.007.
https://humanhealth.iaea.org/HHW/NuclearMedicine/NUMDAB/index.html. Last accessed 22 May 2018.
Cerqueira MD, Weissman NJ, Dilsizian V, Jacobs AK, Kaul S, Laskey WK, et al. Standardized myocardial segmentation and nomenclature for tomographic imaging of the heart. Circulation. 2002;105:539-42.
IAEA Human Health Series No. 23 (Rev. 1) Nuclear Cardiology: Guidance on the Implementation of SPECT Myocardial Perfusion Imaging International Atomic Energy Agency, Vienna; 2016.
Armstrong RA. When to use the Bonferroni correction. Ophthalmic Physiol Opt. 2014;34(5):502-8. https://doi.org/10.1111/opo.12131.
Deyo RA, Diehr P, Patrick DL. Reproducibility and responsiveness of health status measures. Statistics and strategies for evaluation. Control Clin Trials. 1991;12(Suppl 4):142S-58S.
Hallgren KA. Computing inter-rater reliability for observational data: An overview and tutorial. Tutor Quant Methods Psychol. 2012;8(1):23-34.
Fleiss JL, Levin B, Paik MC. The measurement of interrater agreement. In: Fleiss JL, Levin B, Paik MC, editors. Statistical methods for rates and proportions. 3rd ed. New York: Wiley; 2003. p. 598-626.
Cicchetti DV. Guidelines, criteria and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess. 1994;6(4):284-90.
Hartling L, Hamm M, Milne A, Vandermeer B, Lina Santaguida P, Ansari M, et al. Validity and inter-rater reliability testing of quality assessment instruments. Rockville, MD: Agency for Healthcare Research and Quality (US); 2012. Table B, Interpretation of Fleiss’ kappa (κ) (from Landis and Koch 1977).
Papadopoulou SL, Garcia-Garcia HM, Rossi A, Girasis C, Dharampal AS, et al. Reproducibility of computed tomography angiography data analysis using semiautomated plaque quantification software: Implications for the design of longitudinal studies. Int J Cardiovasc Imaging. 2013;29:1095-104. https://doi.org/10.1007/s10554-012-0167-5.
Herzog C, Kerl JM, De Rosa S, Tekin T, Boehme E, Liem S, et al. Influence of observer experience and training on proficiency in coronary CT angiography interpretation. Eur J Radiol. 2013;82:1240-7.
Taylor AJ, Patrick J, Abbara S, Berman DS, Halliburton SS, Hines JL, et al. Relationship between previous training and experience and results of the certification examination in cardiovascular computed tomography. JACC Cardiovasc Imaging. 2010;9:976-80.
Chauvela C, Abergel E, Renault L, Chatellier G, Cohen I, Attane C, et al. Improving stress echocardiography accuracy for detecting left circumflex artery stenosis: A new echocardiographic sign? Arch Cardiovasc Dis. 2012;105:196-202.
Kataoka A, Scherrer-Crosbie M, Senior R, Gosselin G, Phaneuf D, Guzman G, et al. The value of core lab stress echocardiography interpretations: Observations from the ISCHEMIA Trial. Cardiovasc Ultrasound. 2015;13:47.
Knight DS, Schwaiger JP, Krupickova S, Davar J, Muthurangu V, Coghlan JG, et al. Accuracy and test-retest reproducibility of two-dimensional knowledge-based volumetric reconstruction of the right ventricle in pulmonary hypertension. J Am Soc Echocardiogr. 2015;28:989-98.
Berman DS, Kang X, Gransar H, Gerlach J, Friedman JD, Hayes SW, et al. Quantitative assessment of myocardial perfusion abnormality on SPECT myocardial perfusion imaging is more reproducible than expert visual analysis. J Nucl Cardiol. 2009;16:45.
Nakajima K, Higuchi T, Taki J, Kawano M, Tonami N, et al. Accuracy of ventricular volume and ejection fraction measured by gated myocardial SPECT: Comparison of 4 software programs. J Nucl Med. 2001;42:1571-8.
Larghat AM, Maredia N, Biglands J, Greenwood JP, Ball SG, Jerosch-Herold M, et al. Reproducibility of first-pass cardiovascular magnetic resonance myocardial perfusion. J Magn Reson Imaging. 2013;37:865-74.
Meriki N, Izurieta A, Welsh A. Reproducibility of constituent time intervals of right and left fetal modified myocardial performance indices on pulsed Doppler echocardiography: A short report. Ultrasound Obstet Gynecol. 2012;39:654-8.
Wackers FJ. Artifacts in planar and SPECT myocardial perfusion imaging. Am J Card Imaging. 1992;6:42-57.
Germano G, Kavanagh PB, Waechter P, Areeda J, Van Kriekinge S, Sharir T, et al. A new algorithm for the quantitation of myocardial perfusion SPECT. I: Technical principles and reproducibility. J Nucl Med. 2000;41:712-9.
Ljungberg M, Pretorius PH. SPECT/CT: An update on technological developments and clinical applications. Br J Radiol. 2018;91(1081):20160402. https://doi.org/10.1259/bjr.20160402. Epub 2017 Jan 16.
Malek H, Yaghoobi N, Hedayati R. Artifacts in quantitative analysis of myocardial perfusion SPECT, using Cedars-Sinai QPS Software. J Nucl Cardiol. 2017;24(2):534-42. https://doi.org/10.1007/s12350-016-0726-6Epub 2016 Nov 10.
Chrysanthou-Baustert I, Polycarpou I, Demetriadou O, Livieratos L, Lontos A, Antoniou A, et al. Characterization of attenuation and respiratory motion artifacts and their influence on SPECT MP image evaluation using a dynamic phantom assembly with variable cardiac defects. J Nucl Cardiol. 2017;24:698-707. https://doi.org/10.1007/s12350-015-0378-yEpub 2016 Feb 4.
Foley TA, Mankad SV, Anavekar NS, Bonnichsen CR, Morris MF, Miller TD, et al. Measuring left ventricular ejection fraction—Techniques and potential pitfalls. Eur Cardiol. 2012;8(2):108-14.
Steyn R, Boniaszczuk J, Geldenhuys T. Comparison of estimates of left ventricular ejection fraction obtained from gated blood pool imaging, different software packages and cameras. Cardiovasc J Afr. 2014;25:44-9.
Acknowledgements
The authors are indebted to Mr Fabio Maiorana and Mr Felix Barajas-Ordonez; interns of the NMDI Section at IAEA, who were instrumental in collecting data and maintaining contacts with participants.
List of I-MAP Investigators Beretta M, Uruguay; Better N, Australia; Bouyoucef S, Algeria; Cabrera Rodríguez LO, Cuba; Chalal G, Algeria; Cittanti C, Italy; Cruz C, Venezuela; Cuocolo A, Italy; Girotto N, Croatia; Huong NT, Vietnam; Iqbal SS, Pakistan; Klaipetch A, Thailand; Marcassa C, Italy; Milan E, Italy; Mut Bastos F, Uruguay; Naïli Q , Algeria; Nanayakkara D, Sri Lanka; Obaldo J, Philippine; Ouattara TF, Burkina Faso; Padrón García KM, Cuba; Peix A, Cuba; Peña Y, Cuba; Poyraz NY, Turkey; Prpic M, Croatia; Rochela A, Cuba; Ruiz Castañeda DF, Colombia; Sciagra R, Italy; Scotti S, Italy; Sereegotov E, Mongolia; Sestini S, Italy; Sobic Saranovic D, Serbia; Spuler J, Chile; Thientunyakit T, Thailand; Vangu W, South Africa; Vitola J, Brazil; Vuleta G, Bosnia.
Disclosure
Andrew J. Einstein has received grants from GE Healthcare, Philips Healthcare, Toshiba America Medical Systems, and Roche Medical Systems; Maurizio Dondi, Carlo Rodella, Raffaele Giubbini, Luca Camoni, Ganesan Karthikeyan, Joao V. Vitola, Bertjan J. Arends, Olga Morozova, Thomas N. Pascual, and Diana Paez have nothing to disclose.
Author information
Authors and Affiliations
Consortia
Corresponding author
Additional information
The authors of this article have provided a PowerPoint file, available for download at SpringerLink, which summarizes the contents of the paper and is free for re-use at meetings and presentations. Search for the article DOI on SpringerLink.com.
The members of the I-MAP investigators are listed in Acknowledgements.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Dondi, M., Rodella, C., Giubbini, R. et al. Inter-reader variability of SPECT MPI readings in low- and middle-income countries: Results from the IAEA-MPI Audit Project (I-MAP). J. Nucl. Cardiol. 27, 465–478 (2020). https://doi.org/10.1007/s12350-018-1407-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12350-018-1407-4