Abstract
Response defined as a 50% reduction in the sum score of the Hamilton Depression Rating Scale (HDRS-17-sum) is often used to assess the efficacy of antidepressants. Critics have, however, argued that dichotomising ratings with a cutoff close to the median may lead to scores clustering on either side, the result being inflation of miniscule drug-placebo differences. Using pooled patient-level data sets from trials of three selective serotonin reuptake inhibitors (SSRIs) (citalopram, paroxetine and sertraline) (n = 7909), and from similar trials of duloxetine (n = 3478), we thus assessed the impact of different cutoffs on response rates. Response criteria were based on (i) HDRS-17-sum, (ii) the sum score of the HDRS-6 subscale (HDRS-6-sum) and (iii) the depressed mood item. The separation between SSRI and placebo with respect to response rates increased when HDRS-17-sum was replaced by HDRS-6-sum or depressed mood as effect parameter and was markedly dependent on SSRI dose. With the exception of extreme cutoff values, differences in response rates were largely similar regardless of where the cutoff was placed, and also not markedly changed by the exclusion of subjects close to the selected cutoff (e.g., ±10%). The observation of similar response rate differences between active drugs and placebo for different cutoffs was corroborated by the analysis of duloxetine data. In conclusion, the suggestion that using a cutoff close to the median when defining response has markedly overestimated the separation between antidepressants and placebo may be discarded.
Similar content being viewed by others
Introduction
A 50% reduction in the total sum of the 17 items comprising the Hamilton Depression Rating Scale (HDRS-17-sum) has been a common definition of response in trials assessing the efficacy of selective serotonin reuptake inhibitors (SSRIs) and other antidepressants. Critics have, however, argued that selecting a cutoff close to the median of the endpoint score distribution curve may result in significant differences in response rates between groups also when the actual differences in mean rating are miniscule and clinically irrelevant [1]. Indeed, if, for example, all antidepressant-treated patients improve by 50% and all those treated with placebo by 49%, the resultant 100% difference in response rates would be statistically highly significant but clinically unimportant. Conversely, it has been argued that using HDRS-17-sum as an effect parameter may underestimate the actual antidepressant effects of SSRIs, one reason being that several items included in this scale may capture common side effects of these drugs [2,3,4].
Using pooled patient-level data from 28 trials comparing an SSRI with placebo, we explored the impact of the placement of dichotomised cutoffs on the SSRI versus placebo separation. To this end, three different outcome measures were used: HDRS-17-sum, the sum score of the unidimensional HDRS-6 subscale including six core symptoms of depression from the HDRS (HDRS-6-sum) [5], and the depressed mood item [2]. To corroborate the results obtained using the SSRI data, corresponding analyses were performed using patient-level data from 13 trials comparing serotonin- and noradrenaline reuptake inhibitor (SNRI), duloxetine, to placebo. To further address concerns that subjects clustering close to but on either side of the cutoff might inflate negligible mean drug-placebo differences [1], we also used the data from the SSRI trials to assess the effect of excluding subjects closest to the cutoffs (e.g., ±10%) from the analyses. Finally, we assessed response and remission rates for what we have previously [6] suggested to be optimal and suboptimal SSRI doses, respectively.
Materials and methods
Data acquisition
Patient-level data from 28 industry-sponsored, placebo-controlled, acute phase trials for adults with major depression using the HDRS-17 as symptom inventory were obtained for citalopram (Lundbeck, Valby, Denmark), paroxetine (GlaxoSmithKline, Brentford, UK) and sertraline (Pfizer, New York, NY, USA). In two of the paroxetine studies and one sertraline study, fluoxetine was used as active control; while these patients were also included, those treated with non-SSRI comparators were not. To corroborate results obtained using data from the SSRI trials, we also analysed patient-level data from 13 trials comparing duloxetine to placebo (Lilly, Indianapolis, IN, USA). SSRI comparators (escitalopram, fluoxetine and paroxetine) were excluded. Both these data sets have been previously described in greater detail [7, 8].
Statistical analyses
To visualise the separation of SSRIs and placebo at different cutoffs for percentage reduction from baseline, we plotted the cumulative proportions of SSRI- and placebo-treated patients whose remaining symptoms at endpoint corresponded to every 5% fraction of their baseline score for HDRS-17-sum and HDRS-6-sum, respectively. For depressed mood, comprising only four possible levels, change scores rather than % reduction were plotted. For simplified visualisation, patients who were unchanged or had deteriorated during treatment were all included in one data point as if they had displayed 100% of baseline scores at endpoint for HDRS-17-sum and HDRS-6-sum or had displayed no change for depressed mood.
Corresponding visualisations were produced for endpoint scores with respect to the HDRS-17-sum, HDRS-6-sum and the depressed mood item; for visualisation purposes, maximum scores were capped at 40 points for HDRS-17-sum (range 0–52) and at 20 points for HDRS-6-sum (range 0–22); subjects displaying higher scores being included in the highest visualised score.
All visualisations were performed on patients with at least one pre- and post-baseline HDRS-17 measure in both the intention-to-treat (ITT) last observation carried forward (LOCF) population and in the observed cases (OC) population. Week 6 was used as the primary endpoint since most studies had an evaluation at that time. For studies with no week 6 observation, the closest observation was used (week 4 for five studies and week 8 for one study; see Supplementary Table 1).
Similar visualisations were undertaken also with respect to the duloxetine versus placebo trials. For these data, week 8 was used as endpoint observation; if week 8 data were missing, data from the closest observation was used (week 7 for one study and week 9 for two studies; Supplementary Table 2).
We also used data from the SSRI trials to model odds ratios (ORs) for response (SSRI versus placebo) for all possible 10% intervals of HDRS-17-sum and HDRS-6-sum reductions from baseline, and also for all possible change scores for depressed mood. Similar analyses were undertaken with respect to endpoint scores in the range of 0–10 points for HDRS-17-sum and HDRS-6-sum and 0–4 points for depressed mood, hence including cutoffs commonly used to define remission [9,10,11,12,13,14,15]. This was done using a generalised mixed model which included treatment, time (week) and trial as fixed factors as well as the interaction between treatment and time. The baseline rating on the outcome measure in question (HDRS-17-sum, HDRS-6-sum, or depressed mood) was included as a covariate. The model utilised a binary distribution with a logit link, the Kenward–Roger method was used to estimate denominator degrees of freedom and an unstructured covariance matrix was used to model within-patient errors. All time points between week 1 and endpoint were included in the model but only results at week 6 are reported. If the models did not converge, we first excluded the observation at week 5 (which was available only for a minority of studies) and then, if needed, the trial fixed factor. If convergence was still not attained, unmodelled LOCF numbers were used with statistical significance analysed using the chi-square test.
To explore whether differences between-treatment groups with respect to response rates might be explained by patients clustering close to, but on either side of, any particular cutoff, we also used the SSRI data to model ORs for established response or remission criteria [9, 10] after removing patients with endpoint scores just above and/or below the cutoff in question, i.e., ±5% and ±10% for percentage reduction-based outcomes and ±1 and ±2 points for endpoint score cutoffs.
Finally, to assess the impact of SSRI dose on dichotomous definitions of response and remission, we pooled data from those trials that had assessed fixed SSRI doses, hence excluding flexible-dose trials (Supplementary Table 1). Based on a previous report based on the same data set [6], suboptimal doses were defined as citalopram 10–20 mg, sertraline 50 mg, paroxetine 10 mg and optimal doses as citalopram 40–60 mg, sertraline 100–400 mg and paroxetine 20–40 mg. Paroxetine controlled-release doses of 12.5 mg and 25 mg were assumed to correspond to 10 mg and 20 mg of paroxetine immediate-release, respectively. A three-level variable coding for placebo, suboptimal SSRI dose and optimal SSRI dose, respectively, replaced the treatment variable in a generalised linear mixed model otherwise identical to the initial model used to obtain OSs for all possible 10% intervals.
All analyses were performed using SAS version 9.4 (SAS Institute, Cary, NC, USA). For the duloxetine data, remote desktop access to the Clinical Trial Data Transparency environment was provided through SAS Solutions OnDemand by the Clinical Study Data Request website. All p values were two-tailed and the significance level was set at α = 0.05.
Ethics
The Regional Ethics Review Board of Gothenburg, Sweden, issued an advisory opinion stating no objection to the conduct of post hoc analyses of clinical trial data. As data were anonymised, informed patient consent was waived.
Results
In total, 7909 patients from SSRI versus placebo trials with a post-baseline HDRS-17 observation were included in the analyses. Inspection of the cumulative plots for relative score reductions in SSRI- and placebo-treated patients, respectively, revealed largely parallel lines across a wide range of possible outcomes (Fig. 1)‚ with the exception of very low (≤20%) and very high (≤90%) cutoffs, SSRIs thus separated from placebo with roughly the same absolute magnitude regardless of cutoff. The SSRI versus placebo separation was numerically greater for HDRS-6-sum and depressed mood than for HDRS-17-sum.
Cumulative plots for endpoint scores were similar to those for percentage reduction and change scores. Drug versus placebo differences was again most prominent when assessed using HDRS-6-sum or the depressed mood item as effect parameter, but was largely independent of the chosen cutoff, except, with respect to HDRS-17-sum and HDRS-6-sum, for values near the bottom or the top of the range. For depressed mood, only values near the top of the range showed less SSRI versus placebo separation. The plots for the observed cases population displayed a similar pattern (Supplementary fig. 1).
Similar visualisations as those produced for the SSRI trials were created also for the duloxetine trials (number of subjects: 3478) and are displayed in Supplementary figs. 2 and 3. Again the placement of the cutoff for dichotomisation was found to exert no major impact on the separation of active drug versus placebo.
The modelled analyses confirmed that the separation between SSRIs and placebo on relative measures of improvement was largely independent of the cutoff used except for at very high and very low values (Table 1). ORs were generally higher for HDRS-6-sum and depressed mood than for HDRS-17-sum. Removing patients close to the commonly used cutoffs (±5 or ±10%) did not markedly impact ORs or absolute differences. Similar patterns were observed for endpoint score-based cutoffs (Table 2).
For all tested cutoffs, drug versus placebo differences were larger for optimal SSRI doses than for suboptimal ones (Table 3). Differences between doses were also larger for most comparisons when using outcome measures based on HDRS-6-sum or depressed mood rather than on HDRS-17-sum.
Discussion
In this report, we rebut the claim [1] that differences between SSRIs and placebo with respect to response rates be inflated due to the choice of a cutoff for defining response near the median; the discrimination of treatments in this regard was hence largely independent of the placement of the cutoff. Moreover, in line with previous studies based on mean ratings, we show the separation between active treatment and placebo also for dichotomous responses to be larger when using HDRS-6-sum or the individual item depressed mood rather than HDRS-17-sum as effect parameter. Finally, between-treatment differences were larger when including optimal SSRI doses only. We hence conclude that differences between SSRIs and placebo with respect to their ability to induce response or remission in previous meta-analyses [16] may have been underrated by the use of an insensitive measure, i.e., HDRS-17-sum, and by the inclusion of suboptimal doses, but that the definition of response or remission is less consequential for the outcome. The choice of response-defining cutoff not exerting a major impact on the separation between active treatment and placebo was confirmed in an independent sample of trials comparing duloxetine and placebo.
The lack of impact of the placement of the cutoff is visually illustrated by the largely parallel lines over most of the range with respect to the cumulative distribution of symptom reduction scores (Fig. 1, Supplementary Figs. 1–3). In line with this, and further demonstrating that differences in response rates are not caused by subjects in the respective treatment groups clustering near but on either side of a particular cutoff, exclusion of subjects close to common cutoffs did not markedly impact response rates (Tables 1 and 2). Instead, differences between groups with respect to common definitions of response and remission primarily reflect differences in distribution with respect to low remaining symptom scores, where there is a predominance of subjects in the active treatment group and with respect to high remaining symptom scores, where there is a corresponding clustering of placebo-treated subjects [17]. The placement of the cutoff is hence largely irrelevant as long as it is not placed near the extreme values (i.e., where the lines in the figures are not parallel).
We [2, 8] and others [18, 19] have previously reported that using HDRS-17-sum as an effect parameter may make SSRIs and SNRIs appear less effective than they actually are in reducing core symptoms of depression such as depressed mood. Although the separation of active drug from placebo with respect to mean symptom rating has thus been shown more robust when using HDRS-6-sum instead of HDRS-17-sum as a measure, we now report that this difference, not unexpectedly, also translates into corresponding differences with respect to ORs for dichotomous criteria of response and remission. A similar observation (based on the same SSRI data set) was previously reported for the depressed mood item [6] and was here extended to include additional definitions of response or remission.
Of note is that the separation between active drug and placebo with respect to the proportion of subjects displaying very low remaining endpoint scores as well as proportions of baseline scores was substantial when assessed using the shorter and unidimensional HDRS-6 subscale but less so when using HDRS-17-sum. Since healthy volunteers on average score about three points on the HDRS [20], since particularly some of the items not included in the HDRS-6 subscale may capture side effects of active treatment [4, 19], and since residual symptoms of depression usually remain after only 6 weeks of treatment also in responders [21, 22], it is not surprising that only a few subjects displayed very low HDRS-17-sum scores also in the actively treated group.
A factor possibly impacting efficacy is the dose of active treatment. While the dose–response curve for the SSRIs have often been described as flat [23, 24], and while most trial-based meta-analyses have included all SSRI doses as if they be equally effective [16, 25], using the same data set as in the present study, we have previously reported [6] 50 mg of sertraline, 20 mg of citalopram and 10 mg of paroxetine to demonstrate lower efficacy than higher doses of the same compounds both in terms of mean symptom rating and dichotomous assessment of response and remission; similar conclusions have subsequently been advocated also by others [26]. We have now expanded these analyses to comprise additional dichotomous outcome measures, again yielding consistently higher response and remission rates for optimal doses as compared to what is obtained with doses at the lower end of the dosing interval. Of note is the impressive difference between optimal doses of SSRIs versus placebo, e.g., with respect to obtaining a 50% reduction on HDRS-17-sum or HDRS-6-sum (~60% versus 40%).
This study has some limitations. First, HDRS scores might be inflated at baseline [27], which may result in an artificial symptom reduction, regardless of treatment, when the same instrument is used for inclusion and evaluation of response. Second, methodological problems related, e.g., to poor compliance [28, 29] or to overly liberal recruitment of participants [30], which may also reduce the apparent difference between active drug and placebo, are bound to impact not only the outcome of the individual trials but also that of post hoc analyses. Third, the results presented may not necessarily translate to antidepressants with other mechanisms of action or to subjects below the age of eighteen.
In conclusion, this report rebuts the previous claim that differences between antidepressants and placebo with respect to response rates be inflated by the common use of a cutoff (50%) close to the median with subjects clustering on either side. Instead, we report response rate differences consistently larger for higher SSRI doses and for outcomes based on the unidimensional HDRS-6 subscale or the single item depressed mood, indicating that previous meta-analyses in this field may have, on the contrary, deflated the differences between SSRIs and placebo with respect to response rates by including suboptimal doses and applying an insensitive outcome measure.
Data availability
Qualified researchers can apply to obtain the data underlying this publication through the respective pharmaceutical companies. Supplementary information is available on MP’s website.
Code availability
The computer code used to generate the results published here can be obtained by contacting the authors.
References
Kirsch I, Moncrieff J. Clinical trials and the response rate illusion. Contemp Clin Trials. 2007;28:348–51.
Hieronymus F, Emilsson JF, Nilsson S, Eriksson E. Consistent superiority of selective serotonin reuptake inhibitors over placebo in reducing depressed mood in patients with major depression. Mol Psychiatry. 2016;21:523–30.
Ostergaard SD. Do not blame the SSRIs: blame the Hamilton Depression Rating Scale. Acta Neuropsychiatr. 2018;30:241–3.
Hieronymus F, Lisinski A, Eriksson E, Ostergaard SD. Do side effects of antidepressants impact efficacy estimates based on the Hamilton Depression Rating Scale? A pooled patient-level analysis. Transl Psychiatry. 2021;11:249.
Bech P, Gram LF, Dein E, Jacobsen O, Vitger J, Bolwig TG. Quantitative rating of depressive states. Acta Psychiatr Scand. 1975;51:161–70.
Hieronymus F, Nilsson S, Eriksson E. A mega-analysis of fixed-dose trials reveals dose-dependency and a rapid onset of action for the antidepressant effect of three selective serotonin reuptake inhibitors. Transl Psychiatry. 2016;6:e834.
Hieronymus F, Lisinski A, Nilsson S, Eriksson E. Influence of baseline severity on the effects of SSRIs in depression: an item-based, patient-level post-hoc analysis. Lancet Psychiatry. 2019;6:745–52.
Lisinski A, Hieronymus F, Naslund J, Nilsson S, Eriksson E. Item-based analysis of the effects of duloxetine in depression: a patient-level post hoc study. Neuropsychopharmacology 2020;45:553–60.
Frank E, Prien RF, Jarrett RB, Keller MB, Kupfer DJ, Lavori PW, et al. Conceptualization and rationale for consensus definitions of terms in major depressive disorder. Remission, recovery, relapse, and recurrence. Arch Gen Psychiatry. 1991;48:851–5.
Ruhe HG, Dekker JJ, Peen J, Holman R, de Jonghe F. Clinical use of the Hamilton Depression Rating Scale: is increased efficiency possible? A post hoc comparison of Hamilton Depression Rating Scale, Maier and Bech subscales, Clinical Global Impression, and Symptom Checklist-90 scores. Compr Psychiatry. 2005;46:417–27.
Ballesteros J, Bobes J, Bulbena A, Luque A, Dal-Re R, Ibarra N, et al. Sensitivity to change, discriminative performance, and cutoff criteria to define remission for embedded short scales of the Hamilton depression rating scale (HAMD). J Affect Disord. 2007;102:93–9.
Riedel M, Moller HJ, Obermeier M, Schennach-Wolff R, Bauer M, Adli M, et al. Response and remission criteria in major depression—a validation of current practice. J Psychiatr Res. 2010;44:1063–8.
Romera I, Perez V, Menchon JM, Polavieja P, Gilaberte I. Optimal cutoff point of the Hamilton Rating Scale for Depression according to normal levels of social and occupational functioning. Psychiatry Res. 2011;186:133–7.
Zimmerman M, Martinez J, Attiullah N, Friedman M, Toba C, Boerescu DA, et al. Further evidence that the cutoff to define remission on the 17-item Hamilton Depression Rating Scale should be lowered. Depress Anxiety. 2012;29:159–65.
Kyle PR, Lemming OM, Timmerby N, Sondergaard S, Andreasson K, Bech P. The validity of the different versions of the hamilton depression scale in separating remission rates of placebo and antidepressants in clinical trials of major depression. J Clin Psychopharmacol. 2016;36:453–6.
Cipriani A, Furukawa TA, Salanti G, Chaimani A, Atkinson LZ, Ogawa Y, et al. Comparative efficacy and acceptability of 21 antidepressant drugs for the acute treatment of adults with major depressive disorder: a systematic review and network meta-analysis. Focus (Am Psychiatr Publ). 2018;16:420–9.
Hieronymus F, Hieronymus M, Nilsson S, Eriksson E, Ostergaard SD. Individual variability in treatment response to antidepressants in major depression: comparing trial-level and patient-level analyses. Acta Psychiatr Scand. 2020;142:443–5.
Faries D, Herrera J, Rayamajhi J, DeBrota D, Demitrack M, Potter WZ. The responsiveness of the Hamilton Depression Rating Scale. J Psychiatr Res. 2000;34:3–10.
Timmerby N, Andersen JH, Sondergaard S, Ostergaard SD, Bech P. A systematic review of the clinimetric properties of the 6-item version of the hamilton depression rating scale (HAM-D6). Psychother Psychosom. 2017;86:141–9.
Zimmerman M, Chelminski I, Posternak M. A review of studies of the Hamilton depression rating scale in healthy controls: implications for the definition of remission in treatment studies of depression. J Nerv Ment Dis. 2004;192:595–601.
Moller HJ. Outcomes in major depressive disorder: the evolving concept of remission and its implications for treatment. World J Biol Psychiatry. 2008;9:102–14.
Hieronymus F, Jauhar S, Ostergaard SD, Young AH. One (effect) size does not fit at all: Interpreting clinical significance and effect sizes in depression treatment trials. J Psychopharmacol. 2020;34:1074–8.
Bollini P, Pampallona S, Tibaldi G, Kupelnick B, Munizza C. Effectiveness of antidepressants. Meta-analysis of dose-effect relationships in randomised clinical trials. Br J Psychiatry. 1999;174:297–303.
Berney P. Dose-response relationship of recent antidepressants in the short-term treatment of depression. Dialogues Clin Neurosci. 2005;7:249–62.
Gibbons RD, Hur K, Brown CH, Davis JM, Mann JJ. Benefits from antidepressants: synthesis of 6-week patient-level outcomes from double-blind placebo-controlled randomized trials of fluoxetine and venlafaxine. Arch Gen Psychiatry. 2012;69:572–9.
Furukawa TA, Cipriani A, Cowen PJ, Leucht S, Egger M, Salanti G. Optimal dose of selective serotonin reuptake inhibitors, venlafaxine, and mirtazapine in major depression: a systematic review and dose-response meta-analysis. Lancet Psychiatry. 2019;6:601–9.
Mundt JC, Greist JH, Jefferson JW, Katzelnick DJ, DeBrota DJ, Chappell PB, et al. Is it easier to find what you are looking for if you think you know what it looks like? J Clin Psychopharmacol. 2007;27:121–5.
Reis M, Aberg-Wistedt A, Agren H, Akerblad AC, Bengtsson F. Compliance with SSRI medication during 6 months of treatment for major depression: an evaluation by determination of repeated serum drug concentrations. J Affect Disord. 2004;82:443–6.
Akerblad AC, Bengtsson F, von Knorring L, Ekselius L. Response, remission and relapse in relation to adherence in primary care treatment of depression: a 2-year outcome study. Int Clin Psychopharmacol. 2006;21:117–24.
Brody B, Leon AC, Kocsis JH. Antidepressant clinical trials and subject recruitment: just who are symptomatic volunteers? Am J Psychiatry. 2011;168:1245–7.
Acknowledgements
Patient-level data for citalopram, paroxetine and sertraline were obtained from Lundbeck, GSK and Pfizer, respectively. Patient-level data for duloxetine were obtained from Lilly through Clinical Study Data Request. The study was supported by the Swedish Medical Research Council, AFA Insurance, the Swedish Brain Foundation and by a grant from the Swedish state under the agreement between the Swedish government and the county councils (ALF).
Funding
Open access funding provided by University of Gothenburg.
Author information
Authors and Affiliations
Contributions
A.L., F.H., S.N. and E.E. designed the study. A.L., F.H. and S.N. conducted the statistical analyses. All authors participated in the interpretation of data. A.L. produced the first draft of the manuscript. F.H., S.N. and E.E. provided input on the manuscript. A.L. designed the figures. All authors have given final approval to the version being submitted.
Corresponding author
Ethics declarations
Competing interests
F.H. has received speaker fees from Servier and H Lundbeck. E.E. has been on advisory boards and/or received speaker’s honoraria and/or research grants from Eli Lilly, Servier, GSK, H Lundbeck and Janssen Cilag. A.L. and S.N. declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lisinski, A., Hieronymus, F., Nilsson, S. et al. Impact of chosen cutoff on response rate differences between selective serotonin reuptake inhibitors and placebo. Transl Psychiatry 12, 160 (2022). https://doi.org/10.1038/s41398-022-01882-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41398-022-01882-5
- Springer Nature Limited