Introduction

Opioids are highly effective to relieve pain, making them one of the most widely prescribed, yet deadly, drugs used globally. Opioids are used to reduce pain in a variety of conditions, including cancer, rheumatic diseases, and surgery, and were estimated to be used by 60 million people, representing 1.2% of the global population, in 2020 (Anastasiou & Yazdany, 2022; Dalal & Bruera, 2019; Fiore et al., 2022). However, opioids are responsible for more than two thirds (69%) of drug overdose deaths worldwide, making them the most lethal drugs (Penington Institute, 2022; UNODC, 2022). Opioid overdose (OOD) is an acute clinical condition characterized by a symptomatic triad of unconsciousness, myotic pupils, and respiratory depression, which accounts for a large proportion of opioid-related deaths (Parthvi et al., 2019). OOD can occur in various settings including misuse, chronic pain management, and accidental exposure as well as for inpatient use (Algera et al., 2021; Bateman et al., 2023; Khanna et al., 2020; Madadi et al., 2013), and may involve pharmaceutical or non-pharmaceutical opioids (Ogeil et al., 2018; Velagapudi & Sethi, 2023).

The number of deaths due to OOD is increasing in Canada. It is estimated that from the beginning of January 2016 to June 2023, 40,642 deaths associated with the opioid crisis were reported (PHAC, 2023). From January to June 2023, 3970 deaths apparently related to opioid intoxication occurred. This corresponds to 22 deaths per day, much higher than the 8 and 12 deaths reported in 2016 and 2018, respectively. The provinces of British Columbia, Alberta, and Ontario reported 89% of these deaths during the first 6 months of 2023. This public health crisis is of greatest concern in young to middle-aged (29–59 years old) men who represented 72% of all deaths during the same period, with fentanyl being the leading opioid molecule accounting for 84% of all OOD death cases across Canada (PHAC, 2023).

International Classification of Diseases (ICD) is the most widely used system for recording morbidity and mortality data in electronic medical records (Lutomski et al., 2012; Walker et al., 2012; Zhu et al., 2022). However, it has been reported that the assignment of ICD codes by medical coders does not always accurately reflect the diagnoses made by the clinicians or the procedures applied by medical staff (Liang et al., 2011; McGrew et al., 2020; Sarrazin & Rosenthal, 2012). Such misclassification may lead to biased estimates of health condition prevalence based on administrative data (McGrew et al., 2020). This has been highlighted for opioid use disorder in particular, a chronic relapsing disorder involving the use of opiates, for which limitations of ICD-9/10 codes including misclassification, low sensitivity, and underestimation of prevalence have been reported (Hallgren et al., 2020; Ranapurwala et al., 2023; Roland et al., 2016; Zhu et al., 2022). In the case of OOD, misclassification of ICD codes may be introduced from incorrect or incomplete documentation by treating clinicians, or alternatively, when other drugs are incriminated as potential causes of the patient’s medical visit or death. ICD codes for fatal or non-fatal OOD cases/poisoning are numerous, and their assignment or processing, especially when multiple substances are involved, can be challenging. In addition, assessment of the validity of ICD codes to correctly classify many conditions, including OOD, is often determined by comparing them to manual review of medical records which is considered a reference standard (Chartash et al., 2019; Green et al., 2017, 2019; Ranapurwala et al., 2023; Rowe et al., 2017; Ward et al., 2023). However, manual chart review is also likely imperfect since reviewers’ expertise and training as well as interpretations of what is written on the medical chart may differ (Gladstone et al., 2016; Green et al., 2017, 2019). Furthermore, the presence of the drug could have been tested too late and therefore not reported even though it was the cause of the medical visit, leading to missing cases through medical record review (and ICD), adding uncertainty to the reported validity of the ICD codes.

Statistical methods, such as Bayesian latent class models (BLCM), model test sensitivity and specificity, as well as the true disease prevalence, as unobserved latent parameters while incorporating prior information on their performance when available (Angelidou et al., 2014; Arango-Sabogal et al., 2018; Berman et al., 2019; Branscum et al., 2005). To date, no study has focused on estimating the performance of ICD codes to classify OOD cases assuming no gold standard exists. Adequate planning of the response to the ever-growing opioid crisis requires accurate knowledge of the extent of OOD across the country and through time, which implies that available medical administrative data provide valid information on the frequency of OOD (Walker et al., 2012). With the increasing use of ICD codes in the monitoring of drug overdose events (Coben et al., 2010; Rowe et al., 2017; Slavova et al., 2014; Xiang et al., 2012), it is imperative that a comprehensive examination of the available evidence be done to determine whether ICD codes are a good tool for estimating and comparing OOD frequency estimates.

The goal of this study was to systematically review published studies evaluating the validity of ICD algorithms compared to any another source of information in diagnosing of OOD events among data obtained from emergency departments, emergency medical services, inpatient, outpatient, administrative, medical claims, and death reports and estimate the misclassification-adjusted sensitivity and specificity of ICD algorithms in identifying OOD-related events.

Methods

This systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist (Moher et al., 2009) (Online Resource 1). The study protocol was registered with PROSPERO (registration number CRD42023408943), and is available from https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=408943.

Study search

The study search strategy was developed by a trained health sciences librarian (SF) at Université de Montréal in collaboration with the researchers, and was implemented in MEDLINE and EMBASE databases. Studies published in English, French, Italian, or Spanish were eligible. An initial search conducted by SET on May 6, 2023, resulted in a limited number of relevant articles and only one deemed eligible for inclusion. A new search was run on December 13, 2023, by FINM adding more general keywords to the previous search strategy. Both full search strategies are provided in Online Resource 2. We also performed snowball manual searches for each study eligible for inclusion in our review as well as from review article in an attempt to identify any additional relevant studies that our search strategy may have missed.

Study selection

All identified studies were uploaded into Covidence web-based collaboration software platform (Veritas Health Innovation Ltd, Melbourne, Australia) for the screening phase of the review.

First, titles and abstracts of the identified papers were screened using the following inclusion criteria: (1) the study populations were individuals who had sought emergency medical services (ambulances) and emergency room visits, and were hospitalized or seen as outpatients, for whom medical claims were created or who died; (2) the outcome of interest was either fatal or non-fatal OOD-related events; and (3) participants were evaluated with at least two diagnostic tests to identify OOD-related events. We excluded editorials, comments, literature reviews, and letters to the editor of journals. The titles and abstracts were screened by two independent reviewers (SET and FINM for initial search, and FINM and APJY for the second search).

Second, the full texts of articles deemed eligible after screening were independently revised by the same two reviewers, using the following inclusion criteria: (1) ICD-9 or ICD-10 algorithms were used as one of the diagnostic tests for OOD-related events; and (2) the data presented in the article allowed the building of a two-by-two table (presence/absence of OOD-related events) of agreement of ICD algorithms compared to a reference standard or another test. We excluded studies using a mixture of both ICD and other classification system codes as an index test which did not allow to isolate classification by ICD code only. Any disagreement between the reviewers was resolved by discussion until consensus was reached.

Data extraction

Two authors (FINM and APJY) independently extracted relevant data from included studies using a standardized form for study evidence synthesis. Discrepancies and unclear issues were addressed in discussion with a third reviewer (HC). In the event of missing data, we contacted the authors of the study to obtain additional information. Extracted data included study setting, study population and its demographic characteristics, data sources used, study methodology, outcomes and measurement times, definition of the outcome according to the ICD codes and the reference standard used, agreement between the two tests with the contents of the 2 × 2 table, and measures of validity of ICD codes versus the reference standard.

Study quality assessment

The two primary reviewers (FINM and APJY) independently assessed the risk of bias in the selected studies using the QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies) tool (Whiting et al., 2011) focusing on its four key domains related to the selection of study, the conduct or interpretation of the index criterion (i.e. ICD codes) as well as of the reference standard or other test used as a comparison, and finally, the way in which missing data were handled, and their potential impact on the study results. The risk of bias was considered to be “low”, “high”, or “unclear” based on the answers provided to the signaling questions in each of the four QUADAS-2 domains. If the answers to all signaling questions for a domain were “yes”, then risk of bias was judged low. If any signaling question was answered “no”, then the risk of bias was judged high. If any signaling question was answered “unclear”, then the risk of bias was judged “unclear”. The second signaling question of domain 3, relating to the index test threshold, was not considered since the outcome of interest of the review was binary (presence/absence of OOD event). Both reviewers also assessed the applicability of each included study to our review, and concerns regarding applicability for the first three domains of QUADAS-2 tool were judged “low”, “high”, or “unclear” (Whiting et al., 2011). Areas of disagreement among the two reviewers about the potential for unbiasedness of studies or applicability concerns were resolved through discussion with a third reviewer (HC). The instructions used to answer the signaling questions of QUADAS-2 tool were adapted from the work by McGrew et al. (2020) (Online Resource 3).

Data synthesis and analysis

Characteristics of selected studies were described and presented in a table. No meta-analysis was performed due to small number (n = 3) and heterogeneity of studies included.

We used BLCM to estimate misclassification-adjusted sensitivity and specificity of ICD-10 algorithms and coroner’s report review (CRR) for the one study (Gladstone et al., 2016) with low risk of bias in all four QUADAS-2 domains, as recommended (Whiting et al., 2011).

Traditionally, the sensitivity and specificity of a diagnostic test are assessed against a reference test assumed perfect to classify disease status (i.e. without error) (Cheung et al., 2021; Collins & Huynh, 2014). In cases where the reference test is imperfect, estimates of sensitivity and specificity can be biased (Collins & Huynh, 2014), resulting in an inaccurate estimate of disease prevalence. BLCM can be used to adjust for such misclassification error by considering each subject’s disease status as latent (existing but unobserved), and by estimating the probability that each subject has the disease conditional on an observed diagnostic test result, and prior information on test accuracy and disease prevalence (Berman et al., 2019; Cheung et al., 2021). One key feature of BLCM is the availability of reliable prior values, and the justification for their distribution, since prior information can influence the results (Kostoulas et al., 2017).

In the study by Gladstone et al. (2016), five ICD-10 algorithms were each used as an independent index test against CRR as reference standard, to identify prescription opioid–related deaths (POD) among all drug- or alcohol-related deaths. CRR was considered an imperfect test given the lack of standards for coroners’ forensic investigation or classification of deaths, and the absence of a nationally recognized training program or credentialing system for coroners in Canada (Kelsall & Bowes, 2016). Thus, the certification of some deaths by different coroners may not be uniform and may be subject to misclassification (McLean, 2017; Parai et al., 2006).

We were unable to find prior information on the performance of either CRR or ICD-10 codes in identifying POD. Therefore, we created 12 scenarios combining uniform prior distributions for the sensitivity and specificity of the CRR and ICD-10 algorithms. We present two extreme scenarios in terms of prior values. Scenario 1 used more informative priors with uniform distributions of (0.75–1.00) for the sensitivity and of (0.90–1.00) for the specificity of the CRR and all ICD-10 algorithms. Scenario 2 used vague uniform distributions of (0.00–1.00) for the sensitivity and the specificity of the CRR and all ICD-10 algorithms. The other ten sets of priors used gave similar results and are not presented. In all scenarios, the prior distribution of the true prevalence was loosely based on a study conducted in Finland between 2000 and 2004 among medico-legally autopsied deaths where a blood analysis showed evidence of drug use (Lahti et al., 2009). The study found that 10.2% of these deaths had evidence of opiates in their blood in 2004. The study population of Gladstone et al.’s study (2016) was less at risk of opioid-related deaths because it included all deaths linked to drugs and alcohol and the OOD were limited to prescription opiates only. Therefore, we assumed that the prior for the true prevalence follows a uniform distribution between 0% and 10.2%.

We assumed conditional independence between the two tests as information on prescription opioid–related mortality diagnoses by ICD-10 codes and by CRR were extracted from two separate databases and used different criteria to come up with this diagnosis. The BLCM were run for 600,000 iterations with 3000 burn-in. We report here the median and 95% Bayesian credible intervals (95%BCI) of the estimated posterior distributions for the sensitivity and specificity of the two tests. We assessed convergence for Markov chain Monte Carlo sampling by visual inspection of the trace plots and using the Gelman-Rubin diagnostic criterion (Gelman et al., 1996). Analyses were performed in R using JAGS (Just Another Gibbs Sampler) software (codes presented in Online Resource 4).

Results

Study selection

Figure 1 presents the PRISMA flowchart of study selection process. We identified a total of 1990 unique studies through database search, of which three articles were included upon completion of the selection process. No studies were selected through snowball manual search.

Fig. 1
figure 1

PRISMA flow diagram illustrating selection process of relevant articles. ICD, International Classification of Diseases; SNOMED CT, Systematized Nomenclature of Medicine Clinical Terms

Characteristics of included studies

Characteristics of the three included studies are summarized in Table 1. The first study (Rowe et al., 2017) was conducted in the United States and used data from 3203 emergency department visits between 2012 and 2014 made by 804 patients (mean age 56 years, 58.1% male) who were prescribed at least daily opioids for at least 3 consecutive months. The outcome was OOD events, assessed using several ICD-9 algorithms as index tests, and the medical chart review, conducted by four research staff, including one physician and one nurse practitioner, as the reference standard.

Table 1 Characteristics of the three studies included in the review

The second study (Green et al., 2019) included administrative, clinical, inpatient, outpatient, and claims information on 872 (development dataset; mean age 48.8 years, 39.5% male) and 1136 (validation dataset; mean age 46.9 years, 39.6% male) events experienced by 845 and 1100 members of Kaiser Permanente Northwest in Oregon and southwest Washington State, who were suspected of or at risk of OOD from 2008 to 2014, respectively. Clinical care information from non‐Kaiser Permanente Northwest settings was also considered. As in the previous study, the outcome was OOD events, and the reference standard was medical chart review which was independently conducted by two trained professionals for 100% of eligible charts. The initial index test algorithms included a combination of ICD-9 codes for non-fatal and fatal events and ICD-10 codes for fatal events.

The last study (Gladstone et al., 2016) was conducted in Canada and handled the data of 88,550 drug- or alcohol-related death cases among Ontario residents between 2003 and 2010. POD were the outcome, identified using five ICD-10 algorithms from Statistics Canada Vital Statistics Databases for Ontario, as index tests, and CRR, conducted by two independent extractors from the Chief Coroner Database for Ontario, as the reference standard.

Accuracy of ICD algorithms in identifying OOD

Table 2 presents the estimates of sensitivity and specificity of ICD algorithms assuming that the reference standard is perfect, as well as the 2 × 2 table contents of agreement between the two tests, and the outcome prevalence based on the reference standard in the eligible studies. The three studies varied in the ICD algorithms, the version of ICD (9 and 10), and the reference standards.

Table 2 Estimates (95% confidence interval) of sensitivity and specificity and agreement of ICD algorithms (index test) versus reference standard (RS), and prevalence of opioid overdose–related events in the three studies included in the review

Compared with medical chart review (Rowe et al., 2017), the six primary ICD-9 algorithms showed very low sensitivity ranging from 25% (95% confidence interval 13.6–37.8) for more precise opioid-poisoning ICD-9 code algorithm to 56.8% (43.6–72.7) when expanding the initial ICD-9 algorithm to include non-specified and general drug poisoning and drug abuse codes. The corresponding specificity estimate was highest with the more targeted ICD-9 algorithm (99.9%) but reduced to 96.2% with the inclusion of less specific codes. The prevalence of OOD events estimated based on the reference standard was very low (1.37%).

Green et al. (2019) also used medical chart review as the reference standard but in a population where the OOD event prevalence was very high (48.5–53.26%) based on the reference standard. The combination of ICD-9 and ICD-10 codes for poisoning by opioids and related narcotics as an index test showed improved sensitivity over 97% in diagnosing OOD events, whereas the specificity values were 88.9% (85.6–91.6) and 84.6% (81.3–87.5) in development and validation datasets, respectively.

Finally, the work by Gladstone et al. (2016) reported moderate to good sensitivity of ICD-10 algorithms ranging from 72% (68–75%) to 89% (87–92%) and consistently excellent specificity of 99% in identifying POD, assuming CRR is a gold standard. The prevalence of POD detected by CRR was rare (0.61%).

Study quality assessment

Table 3 shows the assessment of bias and applicability of eligible studies. No applicability concerns were identified for all three included studies. The risk of bias was judged to be “low” in all four QUADAS-2 domains in one study (Gladstone et al., 2016), and only in the patient selection and the index test domains for the other two studies (Green et al., 2019; Rowe et al., 2017). In Rowe et al. (2017), given that the medical chart review was applied by more than one person and that there was no information on inter-reader agreement, it was unclear whether the reference standard was likely to correctly classify OOD events, or whether all patients received the same reference standard, thus justifying the “unclear” score assigned for bias assessment in both reference standard and flow and time domains. In Green et al. (2019), bias was scored as “high” in the reference standard domain, as Kaiser Permanente Northwest chart auditors were provided with inclusion diagnoses derived from results of the code-based algorithm, suggesting that the reference standard results were interpreted with knowledge of the results of the index test. As in Rowe et al. (2017), the study by Green et al. (2019) did not provide information on inter-reader agreement for the reference standard, thus justifying the “unclear” score assigned to the bias assessment for the flow and time domain.

Table 3 Assessment of bias and applicability for eligible studies using QUADAS-2 tool

Estimation of validity of ICD-10 codes and CRR in diagnosing POD

Table 4 reports estimates of sensitivity and specificity of each of the five ICD-10 code algorithms and CRR in diagnosing POD derived from BLCM. The sensitivity estimates of the two tests did not substantially vary according to the prior values used, but rather according to the ICD-10 algorithms. The median values (95% BCI) estimated for the sensitivity of ICD-10 algorithms 1 to 4 were similar, ranged between 84.4% (71.0–99.2) and 87.1% (75.8–99.3), while that of ICD-10 algorithm 5 was higher (94.5% (88.6–99.7)). As for CRR, the sensitivity estimates were similar when CRR was compared to ICD-10 algorithms 1 to 4 and ranged between 93.3% (86.3–99.7) and 95.4% (90.1–99.8) but were lower in the analysis comparing CRR to ICD-10 algorithm 5, ranging between 83.6% (69.8–99.1) when the priors were the least informative to 86.6% (75.5–99.3) when they were the most informative. The sensitivity estimates of both tests were imprecise due to the scarcity of the outcome. Conversely, the median estimates for specificity of both tests were similar, near-perfect (median values ≥ 99.8%), and highly precise (minimum 95% BCI lower limit of 99.8%).

Table 4 Posterior medians and 95% Bayesian credible intervals of sensitivity and specificity of ICD-10 algorithms and coroner’s report review (CRR) in diagnosing prescription opioid–related deaths using Bayesian latent class models

Discussion

To our knowledge, this study is the first to systematically review published articles assessing the validity of ICD codes for OOD events and is the first to estimate the sensitivity and specificity of ICD codes in the absence of a gold standard for identifying POD. Our review demonstrated that very few studies have focused on examining the validity of ICD codes as diagnostic tools for OOD-related events, as previously observed for illicit drugs (McGrew et al., 2020). Included studies were widely heterogeneous with regard to study population, data sources, ICD Revision (9th and/or 10th) and ICD algorithms, specific OOD events considered, and reference standards. This explains the variation in the sensitivity and specificity and calls for cautious interpretation of estimates of the frequency of OOD based on ICD codes in administrative data which must be done with respect to the specific context of each included study.

Despite the heterogeneity among the eligible studies, one common point highlighted is the good specificity of ICD algorithms, suggesting that ICD codes may be a useful tool for documenting the absence of OOD cases in the administrative data, at least when the outcome is relatively common. Conversely, the sensitivity of ICD codes to correctly detect cases of OOD varied widely between included studies, and between ICD algorithms within a study. Indeed, when used alone, ICD-9 algorithms proved poorly sensitive in detecting OOD events, but sensitivity increased with the addition of codes in the algorithms at the expense of a decline in specificity (Rowe et al., 2017). Compared to ICD-9, ICD-10 algorithms alone showed moderate to good sensitivity (Gladstone et al., 2016), but the ICD-9 and ICD-10 sensitivity estimates reported in these two studies are uncertain because the prevalence of OOD events in both study populations was uncommon. In contrast, the combination of ICD-9 and ICD-10 (Green et al., 2019) showed better and precise sensitivity estimates, but the study population included was one already under surveillance for addiction, and the presumed true prevalence of OOD events was very high, at about 50%.

Results from the studies by Rowe et al. (2017) and Green et al. (2019), despite requiring cautious interpretation due to suspected risk of bias, highlight the challenge of getting precise sensitivity estimates of ICD codes at the population level. A precise estimate of ICD code sensitivity requires a context where the prevalence of outcome is relatively frequent, but in this context, it is likely that the search for OOD will be more extensive than in a less at-risk population. Conducting a study in a more general population would require a very large sample size to obtain a sufficient number of positive tests, which may not be feasible. In addition, the improvement in sensitivity estimates generally came at the cost of decreased specificity estimates, leading to an overestimation of prevalence estimates using ICD codes. However, the good sensitivity and excellent specificity estimates of algorithm 5, when CRR is assumed perfect (Gladstone et al., 2016), would result in less bias in estimating the prevalence of POD than the other algorithms and studies in a range of contexts in terms of the frequency of OOD-related events.

This makes BLCM crucial to estimate the performance of ICD codes in the absence of a gold standard, and ultimately adjust for misclassification error in estimating the frequency of OOD-related events. Our analyses using BLCM allowed us to estimate the sensitivity and specificity of ICD-10 algorithms and CRR in diagnosing POD in the absence of a perfect test. Estimates from our models showed that CRR was not a perfect diagnostic test for POD. Regardless of prior values, the ICD-10 algorithm specificity estimates from our models were similar to those reported by the authors assuming that the CRR is perfect. Given the high relative weight of negative cases (which accounts for over 99% of the 88,500 death cases) and the use of uniform priors in our models, the estimation of the posterior distribution for specificity was largely driven by the observed data.

As observed for specificity, our models showed that the estimated sensitivity values of the ICD-10 algorithms remained substantially consistent regardless of the prior values assigned. In contrast, the ICD-10 algorithm sensitivity estimates reported by the authors systematically underestimated the sensitivity estimates of our models. This result demonstrates the relevance of controlling for misclassification error when evaluating the validity of imperfect diagnostic tests and may support the use of ICD-10 codes in detecting POD. The variability in the sensitivity estimates across ICD-10 algorithms suggests the need for standardization of OOD case definitions, which will improve the quality and comparability of data on OOD occurrence estimates across the country and over time for better service planning and research (Vivolo-Kantor et al., 2021). It is also essential that more studies evaluate ICD codes for OOD to obtain better prior values on certain ICD algorithms, and that public health estimation of OOD prevalence uses such available priors to adjust for ICD algorithm misclassification bias.

Strengths and limitations

The strengths of this study include the conduct of a comprehensive review of published studies on the validity of ICD codes for OOD, and rigorous assessment of the quality of eligible studies using the QUADAS-2 tool. Our review is the first study to use BLCM for estimating the misclassification-adjusted sensitivity and specificity of both ICD-10 codes and CRR in identifying POD in Canada. Our estimates of the sensitivity and specificity of ICD-10 codes and CRR can therefore be used as priors in other studies, whether in Canada or in other countries where the context of our study could be generalized, although the information on sensitivity estimates remains uncertain. Due to the lack of prior information on the sensitivity and specificity of both ICD-10 codes and CRR in diagnosing POD, we considered several scenarios of uniform prior distributions, providing greater insight into plausible variations in test sensitivity and specificity estimates as a function of the prior values used.

Several limitations can be highlighted. The three included studies were based on specific populations with higher risk of OOD than what would be expected in the general population, including people under routine medical care, whether for long-term opioid use for pain management (Rowe et al., 2017) or for comprehensive inpatient and outpatient medical care including addiction and mental health treatment (Green et al., 2019), as well as alcohol- or drug-related deaths that have undergone forensic evaluation by the coroner (Gladstone et al., 2016). The data reported in these studies may differ from national data on OOD which would include OOD events occurring in any individual, limiting the generalizability of the sensitivity and specificity estimates of ICD codes reported in these studies to the general population, and prompting caution in their interpretation. We reported a high or unclear risk of bias in two QUADAS domains in two of the three studies included in the review, also urging caution in interpreting the reported ICD code validity estimates. In addition, we were unable to perform a meta-analysis to estimate summary measures of ICD code sensitivity and specificity due to the small number of eligible studies and the significant heterogeneity observed in these studies, as mentioned above. Finally, the sensitivity estimates of the ICD-10 algorithms and CRR in the absence of a gold standard are imprecise due to the very low prevalence of POD (Gladstone et al., 2016) and the use of uniform prior distributions for all of the parameters.

Conclusion

As the opioid crisis expands across Canada and around the world, the ICD coding system is increasingly emerging as an essential tool for tracking and monitoring opioid overdose events in administrative data from emergency departments, emergency services, hospitalizations, outpatient visits, or claims. Our review highlighted the paucity of studies on the validity of ICD codes in the diagnosis of OOD events. Estimates of the misclassification-adjusted sensitivity and specificity of ICD codes support the usefulness of relying on ICD codes as diagnostic tools for identifying POD in Canada and provide prior information to better assess the validity of ICD codes for OOD in similar populations.

Contributions to knowledge

What does this study add to existing knowledge?

  • Evidence on the validity of ICD codes for opioid overdose (OOD) is rare, with none available for classifying the issue in the general population.

  • ICD-10 algorithm sensitivity estimates in classifying prescription opioid–related deaths (POD) in the absence of a gold standard are moderate and would result in the omission of some truly positive individuals.

  • Moderate sensitivity and excellent specificity estimates of ICD-10 algorithms would result in only minimal bias in estimating the prevalence of POD in a situation where this event is rare, whereas this impact could be different in populations where the use of opioid is more frequent.

What are the key implications for public health interventions, practice, or policy?

  • Public health programs responding to the opioid crisis in Canada can rely on ICD-10 algorithms as diagnostic tools for POD and should encourage more studies examining the validity of ICD codes for OOD to be conducted in various population strata, in order to adjust for misclassification error in estimating population-level frequency estimates.