Methamphetamine and amphetamine (MA/A) use pose a growing global health concern, impacting approximately 34 million individuals worldwide (UNODC, 2022). Globally, the rates of methamphetamine use differ, with recent years showing an increase in usage in North America, East and Southeast Asia, and some Middle Eastern countries (UNODC, 2022). MA/A use disorders are associated with several acute and chronic health risks, such as overdose, infections, mental health disorders, and cognitive decline, and have significant economic and social burdens (Cumming et al., 2020; Han et al., 2021; Marshall & Werb, 2010). Management strategies for MA/A use disorders include psychosocial, pharmacological, and harm reduction interventions (Chan et al., 2019; Han et al., 2021; John M. Roll et al., 2006). While psychosocial interventions aim to modify behaviour (De Giorgi et al., 2018; DeCrescenzo et al., 2018; Siefried et al., 2020), pharmacological interventions consider the stimulant mechanism of action and affect neurotransmitters (Chan et al., 2019; Ronsley et al., 2020; Siefried et al., 2020). Harm reduction interventions for managing MA/A use disorders (e.g., education, needle and syringe programs, safe smoking kits, safer sex promotion, supervised consumption sites) aim to minimize the negative health consequences of use.

Psychosocial interventions may be effective, but these interventions are associated with high attrition and relapse rates (De Giorgi et al., 2018). In addition, the evidence for effectiveness of pharmacotherapies is uncertain and therefore, insufficient to justify routine widespread use (Chan et al., 2019). A German guideline by Braunwarth et al. recommended a number of psychotherapeutic and pharmacological treatment options for patients with MA/A use disorder. It provided two conditional recommendations: the use of tranquilizers for short-term management of agitation and the as-needed use of atypical antipsychotics. However, the guideline advised against using sertraline due to potential significant adverse events (Li & Loshak, 2019). The Australian guideline for management of MA/A use disorder, however, recommends a stepped care approach involving access to care in hospital/psychiatric settings based on each patient’s needs with focus on pharmacological management of symptoms as needed (Roche et al., 2019). The recent American Society of Addiction Medicine clinical practice guideline suggested a number of pharmacological agents for management of MA/A use disorder, including bupropion, mirtazapine, and methylphenidate. The guideline also, recommended contingency management (CM) should be a primary component of the treatment plan in conjunction with other psychosocial interventions (Batki et al., 2024).

Several systematic reviews have explored the effectiveness of pharmacological and psychosocial interventions for management of MA/A use disorders, but almost all have important limitations, including failure to generate accurate pooled effect estimates (Siefried et al., 2020; Stuart et al., 2020); failure to assess the comparative effectiveness of management strategies (Apuy et al., 2023; Bhatt et al., 2016; Naji et al., 2022); failure to appraise the overall certainty of the evidence (Chan et al., 2019); and combining MA/A with other stimulant disorders (DeCrescenzo et al., 2018). Given these limitations and wide variability in potential treatment approaches, the objective of this study was to conduct a comprehensive systematic review and network meta-analysis (NMA) of randomized controlled trials (RCTs) investigating the effectiveness and tolerability of pharmacological, psychosocial, and harm reduction interventions for management of MA/A use disorders on a range of patient-important outcomes.

Methods

We registered our review with PROSPERO (CRD42023406460) and followed the PRISMA-NMA guideline statement to report our findings (e-Table 1) (Hutton et al., 2015).

Data Sources and Searches

We searched MEDLINE, CINAHL, Cochrane Library, PsycInfo, Web of Science, and Embase from inception to September 30, 2023, for published RCTs on pharmacological, psychosocial, and harm reduction interventions for management of MA/A use disorders. e-Table 2 provides our search strategy. We also reviewed reference lists of included trials and relevant reviews to identify additional eligible trials.

Study Selection

Eligible studies were RCTs that (1) enrolled adults (18 years or older) diagnosed with MA/A use disorders according to the Diagnostic and Statistical Manual of Mental Disorders (DSM), International Classification of Diseases 10th Revision (ICD-10) criteria, the Severity of Dependence Scale, or clinical interview; (2) randomized participants to any pharmacological, psychosocial, or harm reduction interventions specifically targeted at management of MA/A use disorders compared to an alternative intervention or usual care, placebo, waitlist control, or no treatment; and (3) reported at least one effectiveness outcome. We excluded trials or trial arms that compared different doses or frequencies of the same intervention. Our outcomes of interest included (1) abstinence, defined as proportion of participants with negative urine samples at the last follow-up, (2) weekly average percentage of patients with negative urine samples, (3) longest duration of abstinence (mean of days), (4) proportion of patients with ≥ 2 weeks of abstinence, (5) self-reported duration of MA/A use (days per month), (6) self-reported duration of stimulants use (days per month), defined as occasional or frequent use of cocaine, crack, ecstasy, or other stimulants alone or combined with amphetamines, (7) craving for MA/A, measured using a validated instrument or visual analogue scale (VAS), and (8) tolerability based on all-cause drop-out.

Pairs of reviewers (ASM, ES, FMF, MN, NN, SJ, SM, SMD, SMFL, and SS) independently screened titles and abstracts of records retrieved through searches to identify potentially eligible studies. The same pairs of reviewers subsequently screened the full report of included studies to confirm eligibility. Discrepancies were resolved through discussion or involving a third reviewer (MK).

Data Extraction and Risk of Bias Assessment

Pairs of reviewers (ASM, ES, FMF, MN, NN, SJ, SM, SMD, SMFL, and SS) extracted data independently and resolved discrepancies through discussion. We used pilot-tested Excel spreadsheets for data extractions and performed calibration exercises with our review team prior to beginning data abstraction to ensure consistency and accuracy of extractions. We extracted the following information: (1) study characteristics (e.g., publication year, author); (2) participant and trial characteristics (e.g., age, sex, diagnostic tools); (3) details of interventions and comparators (e.g., treatment strategies, frequency, co-interventions); and (4) outcomes of interest. For outcomes that were reported at multiple follow-up times, we used data from the longest follow-up time.

The same pairs of reviewers independently assessed risk of bias (RoB) among the included trials using the Cochrane risk of bias tool that addresses the following potential sources of bias: sequence generation, allocation concealment, blinding of study participants, healthcare providers and outcome assessors, and missing outcome data (Higgins et al., 2011). The modified tool used “definitely low risk” or “probably low risk” (considered as low risk of bias), or “definitely high risk” or “probably high risk” (considered as high risk of bias) rather than standard response options (high, low, or unclear) (Guyatt & Busse, 2015; Higgins & Green, 2011). Disagreements in data extraction or risk of bias assessments were resolved through discussion or involving a third reviewer (MKh or BS).

Data Synthesis and Analysis

For continuous outcomes, we calculated the mean difference (MD) and its 95% confidence intervals (CI) using change scores from baseline to the end of the follow-up to account for interpatient variability (Higgins et al., 2019; Weir et al., 2018). When the standard deviation (SD) for the change score was not reported, it was imputed using baseline and end-of-study SDs and a median correlation coefficient derived from the trials at low risk of bias. We used the methods described by Weir et al. (2018) to impute mean and standard deviation when the median, (interquartile) range, and sample size were reported. For the outcomes of abstinence, proportion of patients with ≥ 2 weeks of abstinence, and tolerability (drop-out rate), we calculated the relative risk (RR) and the associated 95% CI. Craving was measured using instruments with different ranges of scales, thus we used methods suggested by Thorlund et al. (2011) to transform all study estimates to the most commonly reported scale, a 10-point VAS, and then performed the analysis using MD for change from baseline.

The feasibility of NMA for each outcome was assessed by checking network connectivity and availability of at least ten studies. Our primary analysis focused on a network of individual interventions. We also performed a post hoc sensitivity analysis combining individual pharmacotherapies into drug classes. e-Table 3 provides drug classes for included pharmacotherapies. We used a DerSimonian–Laird random-effects model for all direct comparisons and performed a random-effects network meta-analysis assuming a common heterogeneity parameter using a frequentist approach (Sadeghirad et al., 2023; White, 2009). We used I2 to determine statistical heterogeneity. Transitivity and coherence (a.k.a. consistency) are key assumptions for NMA. We ensured that all interventions among the included trials were jointly randomizable to construct a network and that the distribution of the potential effect modifiers (mean age, proportion of male participants, diagnostic criteria, co-interventions, and duration of interventions) were similar across trials and comparisons. The “design-by-treatment” model (global test) was used to assess the coherence assumption for each network, and the side-splitting method was used to evaluate local (loop-specific) incoherence in each closed loop of the network as the difference between direct and indirect evidence (Higgins et al., 2012; White et al., 2012). We estimated the ranking probabilities among individual interventions and reported the Surface Under the Cumulative Ranking Curves (SUCRA) values, mean ranks, and rankograms (Chaimani & Salanti, 2015; Chaimani et al., 2013). We used STATA 17.0 (StataCorp, College Station, TX, USA) for all analyses.

Assessment of Certainty of the Evidence

We used the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach to assess the certainty of evidence for direct, indirect, and network estimates across all outcomes (Brignardello-Petersen et al., 2018; Puhan et al., 2014). First, we rated the certainty of direct estimates considering the limitations due to the risk of bias, indirectness, publication bias, and inconsistency. Then, the certainty in the indirect estimates was rated, with a focus on the dominant lowest-order loops. Finally, we rated the certainty of network estimates, considering further limitations due to incoherence and imprecision. We judged imprecision using the network estimate and rated down for imprecision if the 95% CI included the null value (Zeng et al., 2021).

Treatment Hierarchy

We used a GRADE minimally contextualized approach to develop a hierarchy of interventions across all outcomes of interest (Ceriello et al., 2023; Phillips et al., 2022). In this approach, interventions are categorized from the most effective to the least effective based on their relative effects and their associated certainty of evidence. We considered the null effect as the decision threshold and placebo as the reference intervention. Interventions with no evidence of difference compared to placebo (i.e., 95% CI includes null value) are categorized as “among the worst;” interventions superior to placebo, but not superior to any other of the intervention(s) superior to placebo, categorized as “among the intermediate;” and interventions superior to at least one of “among the intermediate” interventions then categorized as “among the best.” We then further categorized interventions based on their certainty of evidence into two groups of “high to moderate certainty” or “low to very low certainty.”

Results

Search Results and Characteristics of Included Studies

We retrieved 4555 unique records from our searches, of which full texts of 183 studies were reviewed for eligibility. A total of 72 published RCTs involving 6836 participants were included in our systematic review, of which three trials (Abdoli et al., 2021; Sorsdahl et al., 2021; Wu et al., 2020) were excluded from the analysis because the interventions were not connected to the rest of the network across outcomes. Figure 1 provides details of the study selection process and excluded studies, and reasons for exclusion are presented in e-Table 4 in the supplementary file.

Fig. 1
figure 1

PRISMA flow diagram for study selection

As shown in Table 1, the median mean age for participants was 36.0 years (interquartile range (IQR) 32.2–38.4 years), and on average, 75.6% of the enrolled participants were male (IQR 63.2–98.5). Fifty-seven (79.2%) trials explored the efficacy of pharmacological interventions alone or in combination with psychological interventions and none investigated the impact of MA/A use-related harm reduction interventions. The median duration of intervention was 12 weeks ranging from 8 to 12 weeks. Most trials were conducted in the USA (n = 33) and Iran (n = 19). Supplementary e-Tables 5 and 6 provide a detailed summary of included trial characteristics in the systematic review and NMA.

Table 1 Characteristics of randomized controlled trials included in the network meta-analysis

Risk of Bias

Of the 72 trials, 9 studies (12.5%) were at low risk of overall bias; 50 trials (69.4%) adequately generated their randomization sequence, and 34 (47.2%) were at low RoB for allocation concealment; 17 studies (23.6%) were at high RoB for inadequate blinding of participants, 25 (34.7%) for inadequate blinding of care providers, and 44 (61.1%) for inadequate blinding of outcome assessors. A total of 46 trials (63.9%) were at high risk of bias due to ≥ 10% loss to follow-up. Supplementary e-Table 7 provides RoB assessment details.

Abstinence (Longest Follow-Up)

The 39 RCTs involving 2780 participants reported abstinence (Fig. 2A). The median duration of follow-up was 12 weeks ranging from 4 to 36 weeks. In 9 of 25 direct comparisons, there were at least two trials for each comparison, and among them, four comparisons showed significant heterogeneity (I2 ≥ 65%; e-Table 8). We did not find evidence of global or loop-specific incoherence (e-Table 9). Our results showed that quetiapine [RR 3.17 (95% CI, 1.24 to 8.07), low certainty] in comparison to placebo may be associated with a higher likelihood of abstinence (Table 2, e-Table 10). The results of sensitivity analysis combining individual pharmacotherapies into drug classes showed no treatment was superior to the placebo (eTable 43).

Fig. 2
figure 2

Network of abstinence (A), % patients with negative urine samples (B), longest duration of abstinence (C), and all-cause drop out (D). Notes: The size of the circles (nodes) is proportionate to the number of patients randomized to each intervention. The thickness of the line and the associated numbers correspond to the number of trials comparing the two linked interventions. CBT cognitive behavioural therapy, CM contingency management

Table 2 GRADE summary of findings for the comparisons of active treatments and placebo for each outcome

Negative Urine Samples

The weekly average proportion of patients with negative urine samples was reported in 35 trials involving 3449 participants. In 7 of 37 direct comparisons, there were at least two trials for each comparison, and among them, 5 comparisons showed significant heterogeneity (I2 > 75%; e-Table 11). Figure 2B displays the network of interventions. We did not find evidence of global or loop-specific incoherence. The results of incoherence assessments and estimates of direct and indirect effects are provided in e-Table 12.

Low certainty evidence suggests methylphenidate alone [MD 10.24 (95% CI, 3.49 to 16.99)] or in combination with the matrix model [MD 23.55 (95% CI, 7.64 to 39.46)], quetiapine [MD 32.17 (95% CI, 14.08 to 50.26)], and riluzole [MD 24.10 (95% CI, 5.54 to 42.66)] may increase the weekly average proportion of patients with negative urine samples compared to placebo (Table 2, e-Table 13). The sensitivity analysis showed stimulants alone [MD 8.22 (95% CI, 0.69 to 15.75)] or in combination with the matrix model [MD 22.61 (95% CI, 2.62 to 42.60)] were superior compared to placebo (e-Table 44).

Longest Duration of Abstinence (Days)

The 13 RCTs reporting the longest duration of abstinence enrolled 1835 participants (Fig. 2C). Of the available 12 direct comparisons, 4 comparisons were informed by 2 or more trials; of these, one had substantial heterogeneity (I2 = 95.6%) (e-Table 14). We did not find evidence of global or loop-specific incoherence (e-Table 15). The median duration of follow-up time for this outcome was 15 weeks ranging from 8 to 24 weeks.

Low certainty evidence suggested varenicline may be associated with a longer duration of abstinence [compared to placebo, MD 16.65 days (95% CI, 5.31 to 27.99)]. Very low certainty of evidence showed CM alone [MD 21.20 days (95% CI, 11.39 to 31.00)] or in combination with cognitive behavioural therapy (CBT) [MD 34.85 days (95% CI, 19.63 to 50.08)] may be associated with an increased duration of abstinence compared to placebo (Table 2, e-Table 16). The results of sensitivity analysis showed only combinations of CBT and CM may be associated with an increase in the longest duration of abstinence [MD 34.93 days (95% CI, 19.70 to 50.16)] (eTable 45).

Proportion of Patients with ≥ 2 Weeks of Abstinence

The 10 RCTs that reported ≥ 2 weeks of consecutive abstinence enrolled 1101 participants (e-Fig. 1). Of the 16 direct comparisons, only a single comparison was informed by 2 RCTs (e-Table 17). We did not find evidence of global incoherence. The available closed loops of evidence were informed by multi-arm trials and thus were coherent by definition. The median duration of follow-up for this outcome was 12 weeks ranging from 4 to 36 weeks (IQR 9.5–14.6 weeks). Compared to placebo, no intervention showed any statistically significant change in the proportion of patients with ≥ 2 weeks of abstinence (Table 2, e-Tables 18 and 46).

Duration of MA/A Use

The 13 RCTs reporting a change in the duration of MA/A use enrolled 1780 participants (e-Fig. 2). Of the available 12 direct comparisons, 3 comparisons were informed by 2 or more trials, of which one comparison demonstrated substantial heterogeneity (I2 = 60.0%) (e-Table 19). We did not find evidence of global or loop-specific incoherence. The results of incoherence assessments and estimates of direct and indirect effects are provided in e-Table 20. Low and very low certainty of evidence indicates all interventions may result in little to no change in the days of MA/A use per month (Table 2, e-Tables 21 and 47).

Craving for MA/A Use

The 30 RCTs reporting craving enrolled 2587 participants (e-Fig. 3). Of the available 26 direct comparisons, 7 comparisons were informed by 2 or more trials, of which 4 comparisons demonstrated substantial heterogeneity (I2 > 80%) (e-Table 22). We did not find evidence of global or loop-specific incoherence. The results of incoherence assessments and estimates of direct and indirect effects are provided in e-Table 23.

Compared to placebo, very low certainty of evidence suggests that perphenazine [MD − 4.75 (95% CI, − 7.71 to − 1.78)] and N-acetylcysteine [MD − 2.90 (95% CI, − 4.58 to − 1.21)] may be associated with a decreased craving (Table 2 and e-Table 24). The results of sensitivity analysis showed only N-acetylcysteine [MD − 2.91 (95% CI, − 4.69 to − 1.12)] may be associated with a decreased craving (e-Table 48).

Duration of Stimulant Use

Only two RCTs reported changes in days of stimulant use per month. Ling et al. (2014) reported statistically insignificant changes in days of stimulant use from baseline for methylphenidate compared to placebo. Shearer et al. (2009) showed little to no difference for modafinil compared to placebo (e-Table 25).

All-Cause Drop-Out

The 72 RCTs that reported an all-cause drop-out rate (tolerability) enrolled 6321 participants (Fig. 2D). Of the available 47 direct comparisons, 16 comparisons were informed by 2 or more trials, of which 3 comparisons demonstrated substantial heterogeneity (I2 > 60.0%) (e-Table 26). The results of incoherence assessments and estimates of direct and indirect effects are provided in e-Table 27. Compared to placebo, venlafaxine [RR 0.27 (95% CI 0.08 to 0.90), low certainty] and citicoline [RR 0.69 (95% CI 0.49 to 0.99), low certainty] may be among the most tolerable interventions. Naltrexone in combination with bupropion [RR 1.77 (95% CI, 1.10 to 2.85), low certainty] may result in higher drop-out (Table 2 and e-Table 28). The results from the sensitivity analysis were similar to the findings from our primary analysis (e-Table 49).

Additional Analysis

The e-Tables 2935 provide ranking probabilities and SUCRA values for all outcomes. We performed network meta-regression to explore the impact of allocation concealment, blinding, missing participant data, sex, and co-interventions and found no credible effect modification for any of these variables across outcomes. The results of network meta-regression are provided in e-Tables 3642. There was not enough data available to investigate the effect of modification of sexual orientation and psychiatric comorbidity on the outcomes.

Discussion

Our systematic review included 72 RCTs investigating the effects of pharmacological and psychosocial interventions for the management of MA/A use disorder. We found that the included studies were extremely heterogeneous in terms of interventions and outcomes, potentially reflecting the limited understanding of treatment approaches. Nonetheless, compared to placebo, quetiapine may be associated with increased abstinence at longest follow-up and a higher weekly proportion of patients with negative urine samples throughout and at the end of the intervention. CM alone or in combination with CBT, as well as varenicline may be associated with a longer duration of abstinence. Methylphenidate alone or in combination with the matrix model of addiction treatment, as well as riluzole may increase the weekly proportion of patients with negative urine samples. The evidence was very uncertain about the effectiveness of interventions on 2 weeks or more of abstinence and duration of MA/A use. Venlafaxine and citicoline may be associated with a slight reduction in all-cause drop out, while the combination of naltrexone and bupropion may be associated with a slight increase in all-cause drop out. The results of sensitivity analysis looking at drug classes were to a large degree similar to the analysis of individual drugs, with the exception of quetiapine and riluzole — anticonvulsants and antipsychotic drugs showed no statistically significant effectiveness compared to placebo.

Our findings are consistent with other reviews on management of MA/A use disorder suggesting little to no benefit for pharmacotherapies, which underscores the need for further, larger, and higher quality trials (Chan et al., 2019; Paulus & Stewart, 2020; Siefried et al., 2020). However, the evidence for the benefits of quetiapine and varenicline may be encouraging but should be interpreted with extreme caution and further validated in larger trials, given that it comes from two very small trials of 60 and 52 patients. In addition, although the quetiapine trial (Javdan et al., 2020) excluded patients with mental health comorbidities, it only followed patients for 8 weeks. Moreover, the varenicline trial (Briones et al., 2018) was halted halfway before they reached their target sample size and had a 50% drop out. The findings of the single trial of riluzole should also be interpreted with caution given its significant loss to follow-up (41% in riluzole and 62% in placebo) and unequal allocation of patients — 34 randomized to intervention and 52 to placebo (Farahzadi et al., 2019).

We found while CM alone or in combination with CBT may impact longest duration of abstinence, their effects on other patient-important outcomes are uncertain. These findings are partly in line with a previous network meta-analysis on psychosocial interventions in individuals with cocaine and/or amphetamine addiction that suggested combining CM with a community reinforcement approach was among the most efficacious interventions in achieving abstinence (DeCrescenzo et al., 2018). However, we found that despite CM and the matrix model being among the most studied interventions, most trials were small and limited by risk of bias issues. We also found limited and uncertain evidence on combinations of pharmacotherapies with non-pharmacological interventions.

Our NMA is the first review to assess the comparative effects of both pharmacological and psychosocial interventions for the management of MA/A use disorder and has several notable strengths. We used explicit eligibility criteria, focused on patient-important outcomes to assess the comparative effectiveness and tolerability of interventions available for management of MA/A use disorder, and applied the GRADE approach to evaluate the certainty of evidence and generate a hierarchy of interventions. The limitations of the review are largely those of the included studies. The majority of included studies had small sample sizes; many comparisons were only informed by a small number of trials, which may have affected the overall certainty in network estimates and reduced the power of our analysis leading to imprecise effect estimates. Most trials were at high risk of bias due to inadequate concealment of allocation and blinding, and considerable loss to follow-up, particularly in longer follow-up. These limitations were the main reasons for rating down our certainty in evidence leading to the majority of network estimates being judged to be at low to very low certainty.

Furthermore, there is variation in route, administration, doses, and frequency of use across specific medications for pharmacological interventions. As for psychosocial interventions, we observed variations in the frequency and/or duration of the intervention. This variability could contribute to conceptual heterogeneity of results, thereby presenting a challenge for practitioners in applying these findings. Last but not least, our review did not find any harm reduction interventions because the studies examining these interventions had different objectives compared to the outcomes of interest in our review. These studies often included participants with diverse substance use patterns and disorders, which diverged from the specific focus of our review. Additionally, practical or ethical challenges could limit the testing of harm reduction interventions using traditional patient-level RCTs (Reddon et al., 2020).

Implication for Practice and Future Research

Our findings are limited by the quality and quantity of the data but highlight the limited evidence of the efficacy of some behavioural interventions, as well as the potential efficacy of some pharmacological interventions for the management of MA/A use disorder. While the evidence may suggest potential benefits for few interventions in increasing abstinence at longest follow-up (quetiapine), having weekly negative urine samples (methylphenidate alone or combined with matrix model, quetiapine, riluzole), and having the longest duration of abstinence (CM alone or combined with CBT, varenicline), the evidence is very uncertain about their tolerability or their impact on craving, ≥ 2 weeks of consecutive abstinence or self-reported duration of MA/A use. No intervention showed moderate or high certainty evidence for important change across any patient-important outcomes.

Given the significant public health burden of MA/A use disorders, the focus of future research and investments should be on larger and higher quality trials with well-defined interventions and longer follow-up periods prioritizing combination therapies. In addition, trialists should consider a set of core patient-important outcomes including non-abstinence-based clinically relevant outcomes and mortality in future studies. It is crucial to recognize that focusing on abstinence may not always align with people who use drugs’ immediate objective or priorities given the competing priorities, such as socioeconomic stressors, unstable employment, or lack of affordable housing (Coffin & Suen, 2023). In such cases, which are indeed not uncommon (Rosenberg et al., 2020), management strategies should prioritize reducing the harms associated with MA/A use. Therefore, harm reduction interventions tailored towards MA/A use disorders should be further examined given their role in developing trust and willingness among people living with MA/A use disorders to seek and adhere to other therapeutic interventions.