The validation of short eating disorder, body dysmorphia, and Weight Bias Internalisation Scales among UK adults

Lantos, Dorottya; Moreno-Agostino, Darío; Harris, Lasana T.; Ploubidis, George; Haselden, Lucy; Fitzsimons, Emla

doi:10.1186/s40337-024-01095-9

The validation of short eating disorder, body dysmorphia, and Weight Bias Internalisation Scales among UK adults

Research
Open access
Published: 09 September 2024

Volume 12, article number 137, (2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Eating Disorders Aims and scope Submit manuscript

The validation of short eating disorder, body dysmorphia, and Weight Bias Internalisation Scales among UK adults

Download PDF

Dorottya Lantos¹,
Darío Moreno-Agostino^2,4,
Lasana T. Harris³,
George Ploubidis²,
Lucy Haselden² &
…
Emla Fitzsimons²

102 Accesses
Explore all metrics

Abstract

Background

When collecting data from human participants, it is often important to minimise the length of questionnaire-based measures. This makes it possible to ensure that the data collection is as engaging as possible, while it also reduces response burden, which may protect data quality. Brevity is especially important when assessing eating disorders and related phenomena, as minimising questions pertaining to shame-ridden, unpleasant experiences may in turn minimise any negative affect experienced whilst responding.

Methods

We relied on item response theory to shorten three eating disorder and body dysmorphia measures, while aiming to ensure that the information assessed by the scales remained as close to that assessed by the original scales as possible. We further tested measurement invariance, correlations among different versions of the same scales as well as different measures, and explored additional properties of each scale, including their internal consistency. Additionally, we explored the performance of the 3-item version of the modified Weight Bias Internalisation Scale and compared it to that of the 11-item version of the scale.

Results

We introduce a 5-item version of the Eating Disorder Examination Questionnaire, a 3-item version of the SCOFF questionnaire, and a 3-item version of the Dysmorphic Concern Questionnaire. The results revealed that, across a sample of UK adults (N = 987, ages 18–86, M = 45.21), the short scales had a reasonably good fit. Significant positive correlations between the longer and shorter versions of the scales and their significant positive, albeit somewhat weaker correlations to other, related measures support their convergent and discriminant validity. The results followed a similar pattern across the young adult subsample (N = 375, ages 18–39, M = 28.56).

Conclusions

These results indicate that the short forms of the tested scales may perform similarly to the full versions.

Plain English summary

This manuscript introduces short versions of existing measures of eating disorders and body dysmorphia, specifically the Eating Disorder Examination Questionnaire, the SCOFF Questionnaire, and the Dysmorphic Concern Questionnaire. We further investigate the properties of the recently introduced 3-item short version of the modified Weight Bias Internalisation Scale. Across analyses including measurement invariance testing and bivariate correlations aiming to assess convergent and discriminant validity, we find support that the short scales may perform similarly to their longer versions. These short scales may contribute in meaningful ways to research where the brevity of questionnaire-type measures may make a difference by contributing to data quality.

Examining the validity and consistency of the Adult Eating Behaviour Questionnaire-Español (AEBQ-Esp) and its relationship to BMI in a Mexican population

Article Open access 08 May 2021

The Florence Emotional Eating Drive (FEED): a validation study of a self-report questionnaire for emotional eating

Article Open access 27 May 2021

Optimizing the empirical assessment of orthorexia nervosa through EHQ and clarifying its relationship with BMI

Article 28 April 2020

Background

The time participants volunteer to partake in research is invaluable. Ensuring that participants spend this time in a meaningful way and no unnecessary time is granted is not only an ethical priority, but also a way to obtain high quality data [1, 2]. Questionnaire-type measures are among the most often used methods for data collection in the psychological and social sciences [3, 4]. Historically, such scales have been designed to capture a given construct in an in-depth manner, often resulting in a long series of items, taking a long time to complete. More recently, advances in psychometrics have revealed that fewer items are in many cases sufficient to capture the same underlying construct, without losing meaningful information [5, 6].

The amount of time taken to complete questionnaires is an especially important objective whilst designing test packages for longitudinal cohort studies. In preparation for the upcoming 2023 data sweep of the Millennium Cohort Study (MCS, 7,8), which has followed the lives of nearly 19,000 UK individuals born in 2000–2001, we aimed to optimise selected eating disorder and body dysmorphia scales among a sample of UK adults. The analyses presented here complement our recent analyses performed with the aim of comparing and optimising measures of depression, anxiety, and psychological distress in preparation for the same MCS data sweep [9]. Specifically, here we examined the properties of the 12-item short version of the Eating Disorder Examination Questionnaire (EDE-QS, 10), the 5-item SCOFF questionnaire [11], the 7-item Dysmorphic Concern Questionnaire (DCQ, 12), and the 11-item and 3-item versions of the modified Weight Bias Internalisation Scale (WBIS, 13,14).

Our aim was to ensure that these widely used scales can be administered in as little time as possible, whilst capturing similar variance and information to their original versions. Experiences of eating disorders and body dysmorphia may be highly unpleasant and shame-ridden [15,16,17]. Thus, asking a limited number of questions regarding such experiences may be especially important in ensuring that participants are exposed to as little amount of stress as possible, without compromising data quality. We further tested the measurement invariance of these self-report scales in order to ensure that they can be used across cohorts, enabling measurement harmonisation and thus facilitating cross-cohort comparisons [18, 19].

Optimising questionnaires

Brevity

Questionnaire-based measures historically tended to comprise many, often dozens of items. This was driven by the intention to truly capture an underlying construct as accurately as possible. However, using such measures may be counterproductive in some cases, as studies which last too long also compromise data quality [2]. Often cited causes for this include boredom effects (i.e., participants’ performance/attention decreases as they become bored and lose interest), response burden (i.e., the effort required to complete questionnaire, which increases as the length of the questionnaire increases), and fatigue (i.e., participants’ performance/attention decreases as they become tired). Longer scales are additionally more likely to result in missing data. One of our aims in this study is to optimise self-report questionnaires for brevity.

Bias

Another key issue with questionnaires is potential bias. Several factors may influence the way in which people perceive certain questions, including cultural differences or other differences related to age, including the historical time during which one grew up, etc. [20]. Such bias may lead individuals to interpret questions differently, which ultimately may lead to different constructs being assessed by the same scale [21,22,23]. Thus, in this study we further assess measurement invariance across sex and age groups.

Selecting and developing measures for cohort studies

Longitudinal birth cohort studies follow a cohort of participants born around the same time. Such designs allow researchers the opportunity to study the effects of social, economic, and environmental factors on key outcomes across the lifespan [24, 25]. Several birth cohort studies conducted throughout the past decade in the UK are currently still running, including cohorts born in 1946, 1958, 1970, 2000/01. An important factor when selecting and developing measures for inclusion in birth cohort studies is brevity. Brevity contributes to ensuring that participants in longitudinal studies remain engaged and minimises attrition. One way to do this is to optimise scales by minimising their number of items. However, it must be ensured that the included scales are valid and reliable. In addition, it is important to ensure that all measures assess the same construct across different groups, such as across sex or age groups in a population [18, 19]. This further facilitates the comparison across studies, including cross-cohort comparisons.

Overview of the study

Using an online survey, we explored the properties of existing self-report measures of eating disorders, body dysmorphia, and weight bias internalisation among UK adults. We aimed to optimise selected measures by reducing the number of items which participants are required to respond to. We further examined the same characteristics among only the young adult subsample (18–39 years) and ensured that the optimised short versions of each scale exhibit similar properties in the young adult sample and the full sample. In preparation for the next MCS [7, 8] data sweep, this age group is of special interest. To gather data of the highest possible quality, keeping in mind the limited availability of survey time, we aim to inform the selection of self-report questionnaires for use in the upcoming data sweep (age 22, 2023) with the results presented here.

More specifically, our aim was to find a short set of items that correlate highly with longer widely used scales, but which are less time-consuming to complete. We have tried to shorten the scales based on multiple factors: retaining the maximum amount of information across different levels of the underlying construct, thinking of the general (non-clinical) population, and focusing on reducing participant burden. We have assessed whether these shorter measures may rank-order the participants in a similar way as the longer versions. While undoubtedly there is a loss of granularity with the shortening of scales, data quality may be, overall, be improved this way if, for example, these scales are to be embedded in lengthy questionnaires. Under such circumstances, reducing participant burden is especially important as it may lead to a lack of attention, disengagement, or missing data, among others. Thus, while shorter scales do not necessarily mean better scales, there may certainly be cases where shorter options are better at meeting the researchers’ aims.

As the analyses presented here additionally allow us to optimise these same scales across UK adults of all ages, we aim to inform other researchers who may be conducting studies in this population. We tested measurement invariance to ensure that the scales tested here assessed the same constructs across sex and age groups. The online survey included additional measures of depression, anxiety, and psychological distress as well, which are explored in detail elsewhere [9]. The study was preregistered (https://osf.io/bk9xs)^{Footnote 1}. Ethical approval was obtained from the Ethics Committee of University College London. All data and syntax files are available via OSF (https://osf.io/vg4a9/).

Method

Participants

A sample of 1,068 UK adults started the survey. The sample was recruited via Prolific (www.prolific.com) to closely mimic one that is representative of the UK population. To recruit a sample that approximates representativeness, Prolific uses data from the UK Office of National Statistics, and matches participants to the national population as closely as possible on age, gender, and ethnicity. We removed the data of 8 participants who gave consent to partaking but did not consent to the storage of their data, as well as 40 participants who only filled in the consent form and nothing else. We excluded a further 33 participants from data analysis due to incorrect responses to (one or both) attention check questions (e.g., Please select agree). The final sample consisted of 987 participants (463 males, 505 females, 2 participants indicated that they did not wish to share their sex^{Footnote 2}), ages 18–86, M = 45.21, SD = 15.61. Seventeen participants only partially completed the survey, and their demographics details were thus missing. Participants were recruited via Prolific Academic and reimbursed £7.50 for their time. Across some of the analyses we were interested primarily in the responses of young adults, and hence completed them by including only the 375 participants who were aged 18–39 (M = 28.56, SD = 6.39, 184 males, 191 females).

Procedure

Data was collected as part of a larger project in November, 2021 (see for further details: [9]. We created an online survey using Qualtrics software. Participants were first presented with an informed consent form and information sheet detailing their tasks throughout the study. They next completed several psychometric questionnaires, including measures focusing on the assessment of eating disorders and body dysmorphia described below. All scales were presented in a randomized order across participants. Finally, participants responded to demographic questions (sex, gender, age, ethnicity), were debriefed and thanked for their time.

Measures

Eating disorders were assessed using the 12-item short version of the EDE-QS [10], the 5-item SCOFF questionnaire [11], and the 22-item eating disorder diagnostic scale (EDDS, 23,24).

The EDE-QS [10] was completed by 972 participants. Participants responded to 10 items of the EDE-QS (e.g., On how many of the past 7 days have you had a definite fear that you might gain weight? ) on a 4-point scale with response options 0 = 0 days, 1 = 1–2 days, 2 = 3–5 days, 3 = 6–7 days; and to two items (e.g., Over the past 7 days, how dissatisfied have you been with your weight or shape? ) on a 4-point scale with response options 0 = not at all, 1 = slightly, 2 = moderately, 3 = markedly. Participants’ responses were summed, with higher scores indicating an increased presence of characteristics of eating disorders.

The SCOFF [11] was completed by 975 participants. Participants completed 5 items of the questionnaire (e.g., Do you make yourself sick because you feel uncomfortably full? ) using binary yes/no responses. We scored ‘yes’ responses as 1 and ‘no’ responses as 0, and summed participants’ answers, with higher scores indicating a greater likelihood for the presence of eating disorders.

The EDDS [26, 27] was completed by 974 participants. The 22 items which participants completed included a variety of response methods, e.g., questions asked participants to enter their weight and height, to respond to binary questions with yes/no responses (e.g., During the times when you ate an unusually large amount of food, did you experience a loss of control (feel you couldn’t stop eating or control what or how much you were eating)? ), or to respond to 15 point scales (e.g., How many times per week on average over the past 3 months have you made yourself vomit to prevent weight gain or counteract the effects of eating, with response options between 0 and 14), among others. We used existing code [27] to calculate index scores (raw eating disorder composite score and Z-transformed eating disorder composite score) based on participants’ responses, where higher scores indicate a greater likelihood for the presence of eating disorders. Note that as a diagnostic tool this scale corresponds directly to the DSM-IV rather than the DSM-V diagnostic criteria of eating disorders.

Body dysmorphia was assessed using the 4-item body dysmorphic disorder questionnaire (BDDQ, 25) and the 7-item DCQ [12]. The BDDQ [28] was completed by 997 participants. This scale is made up of four core questions, where each question is presented based on participants’ previous responses (e.g., the question ‘Is your main concern with how you look that you aren’t thin enough or that you might get too fat?’ is only presented if a participant responds ‘yes’ to the question ‘Are you worried about how you look?’). This scale functions as a diagnostic tool for eating disorders. Following the scoring guidelines, we coded participants either as being at risk of an eating disorder (coded 1, overall sample: N = 183 out of 987; young adults: N = 109 out of 375) or not (coded 0).

The DCQ [12] was completed by 977 participants. Participants responded to the 7 items of the DCQ (e.g., Have you ever been very concerned about some aspect of your physical appearance? ) on a 4-point scale with response options 0 = not at all, 1 = same as most people, 2 = more than most people, 3 = much more than most people. Participants’ responses were summed, with higher scores indicating increased body dysmorphia.

Weight bias internalisation was assessed using the 11-item [14] and 3-item [13] versions of the WBIS. The scales were completed by 978 participants. Participants responded to the items (e.g., I hate myself for my weight) on a 7-point Likert scale with response options ranging from 1 = strongly disagree to 7 = strongly agree. Participants’ responses on selected items were reverse scored and all scores were summed in a way that higher scores reflect increased weight bias internalisation.

Depression, anxiety, and psychological distress were also assessed as part of the survey, though these scales are examined in detail elsewhere [9]. The 10-item K10 scale and the 6-item K6 scale embedded in it [29], the 9-item version of the Malaise Inventory [30, 31], the PHQ-9 [32, 33], PHQ-2 [34], GAD-7 [35], and GAD-2 [36] were included (see the Supplementary Materials for further details).

Data analyses

Measurement properties

We used MPlus version 8.7 [37] to explore measurement properties with a latent variable modelling approach. To test the latent structure of each self-report measure we used confirmatory factor analyses with a robust mean and variance adjusted weighted least squares (WLSMV) estimator, with either a model for binary (Yes vs. No responses) or ordered categorical data (questionnaires with multiple ordered response options) depending on the type of responses used for each scale. Because each of the self-report questionnaires which we focus on here have well-established factor structures, we relied on confirmatory factor analyses. We used the root mean square error of approximation (RMSEA, [38]), the comparative fit index (CFI, [39]), and the Tucker-Lewis Index (TLI, [40]) to determine model fit. We interpreted RMSEA values up to 0.05 as indicating good fit, and values up to 0.08 as indicating adequate fit [41]. In the cases of CFI and TLI, we interpreted values greater than 0.90 as indicating adequate, and those greater than 0.95 as indicating good model fit [42].

Finally, we plotted test information functions (TIF) to evaluate the precision of measurement of the self-report questionnaires using MPlus version 8.7 [37]. TIF plots illustrate Fischer information - i.e., an indicator of the precision or reliability of the measure due to their inverse relationship with the standard error of measurement - at different levels of the underlying latent variable [43]. All analyses exploring the properties of the self-report questionnaires were conducted on the complete sample as well as on the young adult subsample.

Item reduction

We aimed to optimise two of the eating disorder measures, the EDE-QS and SCOFF, and two of the body dysmorphia measures, the DCQ and WBIS, by shortening them using item response theory. The diagnostic measures, the EDDS and BDDQ, served instead as measures against which we could validate the emerging results. We relied on the factor analyses conducted for the EDE-QS, SCOFF, DCQ, and WBIS to examine their general properties. Our approach was to take a small number of items which load the highest on the underlying factors (i.e., those with the highest discrimination parameter, ideally three items) to create the short scale, while ensuring that the TIF remains as similar as possible to that of the original scale and that internal consistency also remains optimal.

As the measures included in the present study may be used to screen clinical populations, certain items may provide limited information in the general population despite being important in clinical samples. As here we aimed to develop short measures for use in nonclinical samples, we additionally took into consideration the thresholds related to the items. This way, we attempted to avoid the inclusion of any items which may be less informative in the target sample. Where item thresholds were very high, thus resulting in low item endorsement and, subsequently, low variability in a general (not clinical) population like that of MCS, lower item loadings but thresholds closer to the centre of the distribution of latent factor scores were preferred. Unless otherwise noted, the thresholds did not suggest that items should not be retained.

Measurement invariance

To determine whether the measurement properties of the scales were equivalent across sex and age groups, we used a measurement invariance testing strategy. To compare ages, we split the sample according to younger adults (18–39 years) and older adults (40 + years), as in the previous analyses. We tested measurement invariance to explore any potential bias within the self-report questionnaires across sexes or age groups caused by measurement error [18, 19, 44, 45]. We conducted the analyses across four groups (sex * age: younger males, older males, younger females, and older females). We used a WLSMV estimator and tested two levels of invariance: configural invariance, without constraining any measurement parameters to be equal across the groups, and scalar invariance, where the items’ loadings as well as their thresholds are constrained to be equal across the groups. We compared the goodness-of-fit indices of the two models. Since the chi-square difference test is very sensitive to sample size, invariance was also informed by additional fit indices. Models where the loss of fit was less than 0.01 for CFI and 0.015 for RMSEA met the criteria for invariance [46, 47]. These analyses were conducted using MPlus version 8.7 [37].

Note that this type of strategy could not be implemented in scales with three or less items, since in those cases the configural model is just-identified at best, thus resulting in non-meaningful goodness-of-fit indices that cannot be compared to those from models with invariance constraints. It was thus not possible to test measurement invariance in the short versions of the scales comprised of only three items. We performed the analyses on the 12- and 5-item EDE-QS, 5-item SCOFF, 7-item DCQ, and 11-item WBIS scales. This allowed us to detect potential differences in the measurement properties of the larger scales that may impact the shorter versions.

Scale properties

We first explored scale properties by examining descriptive statistics. To test whether any differences exist in the sample on key measures among sex and age groups (i.e., 18–39 year olds vs. 40 + year olds), we ran independent samples t-tests. We also conducted 2 × 2 ANOVAs to explore any interactions across sex and age groups. The two participants who did not disclose their sex were excluded from the analyses where sex differences were tested. We used SPSS 27.0 to conduct these analyses. We used the Omega macro for SPSS [48] to test the internal consistency of the scales with McDonald’s omega total (ω_t) coefficient [49].

Correlations

We conducted bivariate correlations between the long and short versions of the eating disorder and body dysmorphia, and between these measures and those of depression, anxiety, and psychological distress. This way, we were able to explore the equivalence in the rank ordering across the measures, along convergent and discriminant validity.

Results

Measurement properties & item reduction

We first conducted a confirmatory factor analysis on the EDE-QS, SCOFF, DCQ, and WBIS scores. Based on these analyses, we created the short versions of the scales, relying on the items with the highest discrimination parameters (Figs. 1, 2, 3 and 4). The fit statistics of the full and shortened administered scales are presented in Table 1, while the TIFs of the scalar models are presented in the Supplementary Materials. While RMSEA values were adequate in the cases of the long and short versions of the SCOFF, the 3-item WBIS, and the 3-item DCQ, the CFI and TLI scores indicated the remaining scales had a good fit as well. The only exception was the 12-item EDE-QS scale assessed in the overall sample rather than the young adult subsample. Nevertheless, this scale also showed an adequate fit.

Table 1 Fit statistics for the administered scales

Full size table

Eating disorder measures

In the case of the 12-item EDE-QS [10], the three items with the highest loading did not match across the analysis conducted on the full sample and that conducted on the young adult subsample (Fig. 1). Our aim throughout the study was to develop short scales which are optimal for use both in a general UK population as well as the young adult population. This way, test-retest within a single cohort, as well as measurement harmonisation across different UK-based cohorts could be facilitated. For this reason, we chose items with the three highest loadings from both analyses, resulting in a five-item long scale. The final items were ‘On how many of the past 7 days has thinking about your weight or shape made it very difficult to concentrate on things you are interested in (such as working, following a conversation or reading)?’. ‘On how many of the past 7 days have you had a definite fear that you might gain weight?’, ‘On how many of the past 7 days have you had a strong desire to lose weight?’, ‘Over the past 7 days has your weight or shape influenced how you think about (judge) yourself as a person?’, ‘Over the past 7 days, how dissatisfied have you been with your weight or shape?’ (AppendixA).^{Footnote 3} These items cover a range of the characteristics of eating disorders, but do not include more clinically salient behaviours such as purging. This indicates that it may be an ideal measure to use among the general, rather than a clinical population.

In the case of the 5-item SCOFF [11], we selected the three items with the highest loadings, which matched across the analysis conducted on the full sample and that conducted on the young adult subsample. These items were ‘Do you make yourself sick because you feel uncomfortably full?’, ‘Do you worry you have lost control over how much you eat?’. ‘Would you say that food dominates your life?’ (Fig. 2, Appendix B). Note that the threshold (overall sample: item 1 = 1.63, item 2 = 0.51, item 3 = 0.99, item 4 = 1.09, item 5 = 0.74; young adult sample: item 1 = 1.39, item 2 = 0.34, item 3 = 0.86, item 4 = 0.88, item 5 = 0.61) of item 1 of the SCOFF suggested that though it may hold valuable information in a clinical sample, it may be less useful when assessed in the general population. Indeed, this item was endorsed the least number of times among both the overall sample (only 50 out of 975 participants responded ‘yes’) as well as the young adult sample (only 31 out of 375 participants responded ‘yes’). This corresponds to the content of the item, asking individuals about vomiting on purpose, which may be more applicable to clinical populations. For this reason, we explored a 3-item version of the SCOFF where item 1 was not included. These analyses, however, indicated that when exchanged to the next highest loading item, item 3, its loading in the three-item model was poor (overall sample: item 2 = 0.92, item 3 = 0.32, item 5 = 0.77; young adult sample: item 2 = 0.89, item 3 = 0.44, item 5 = 0.73). We thus retained the 3-item SCOFF which included item 1, despite its seemingly increased suitability for clinical populations.

Body dysmorphia measure

In the case of the 7-item DCQ [12], we selected the three items with the highest loadings, which matched across the analysis conducted on the full sample and that conducted on the young adult subsample. These were ‘Have you ever been very concerned about some aspect of your physical appearance?’, ‘Have you ever spent a lot of time worrying about a defect in your appearance / bodily functioning?’, ‘Have you ever spent a lot of time covering up defects in your appearance / bodily functioning?’ (Fig. 3, AppendixC).

Weight bias internalisation measure

A 3-item short version of the 11-item modified WBIS has previously been introduced [13, 14]. The same three items were implicated by our analyses as those with the highest loading when taking the full sample into account. These were ‘I feel anxious about my weight because of what people might think of me’, ‘Whenever I think a lot about my weight, I feel depressed’, ‘I hate myself for my weight’ (Fig. 4, AppendixD). Although when we only included the young adult subsample in the analyses the results did not completely overlap with these, the loadings of the selected three items were nevertheless high. For this reason, and because the three-item version of the scale has already been introduced and used, we decided to keep these items across the remaining analyses.