The Bergen Shopping Addiction Scale (BSAS) is a popular measure to assess shopping addiction (SA). Andreassen (2014) have defined SA as “being overly concerned about shopping, driven by an uncontrollable shopping motivation, and investing so much time and effort into shopping that it impairs other important life areas” (p. 198). Existing data shows that SA is positively and closely linked to emotional, relational, health-related, and performance problems (Andreassen et al., 2015; Niedermoser et al., 2021). Despite various studies exploring the BSAS’s unidimensionality and psychometric properties, a clear establishment of its factor structure and a notable absence of data on its longitudinal measurement invariance exist. This study aims to fill this gap by examining the longitudinal measurement invariance of the BSAS over a two-year interval involving three-time points in a group of adults from the general community.

The BSAS was developed to align with the components model of addiction (Griffiths, 2005). Related to SA, the components in this model are salience (preoccupation with shopping), mood modification (involvement with shopping improves mood), tolerance (increasing amount of involvement with shopping is required to be satisfied), withdrawal symptoms (reduction or preclusion from involvement with shopping creates restlessness and negative feelings), conflict (involvement with shopping creates conflicts and causes problems for the individual), relapse (return to old shopping patterns after a period of control or absence), and problems (impaired wellbeing due to excessive shopping; Andreassen et al., 2015; Griffiths, 2005). The BSAS, consisting of seven items corresponding to these addiction components, and is considered technologically and culturally neutral and applicable for measuring both online and offline SA.

In the process of developing and validating the Behavioral Addiction Scale (BSAS), Andreassen et al. (2015) initiated the study with an initial version of the BSAS (comprising 28 items covering the seven SA criteria: salience, mood modification, conflict, tolerance, withdrawal, relapse, and problems) from a sample of 23,537 participants in Norway. The highest loading items for the seven addiction criteria formed the final seven items version of the BSAS. The subsequent confirmatory factor analysis (CFA, extraction used not specified) did not support the simple one-factor model. However, the addition of covariance between the error variances of items one and two resulted in a good-fitting model. This model was also supported for internal consistency reliability, and convergent and divergent validity. Other noteworthy findings included a positive correlation of the total score with female gender, and a negative correlation with younger age.

Two additional studies, conducted by Uzarska et al. (2023) and Zarate et al. (2022), delved into the factor structure of the BSAS. Uzarska et al. (2023) administered the Polish version of the BSAS to 1156 participants. CFA with WLMSV extraction did not support the one-factor model. As in the Andreassen et al. (2015) study, the addition of covariance between the error variances of items one and two resulted in a good-fitting model. In contrast, Zarate et al. (2022) employed CFA with WLS extraction, and their findings supporting the one-factor model. These collective findings reinforce the internal consistency reliability and convergent and divergent validity of the BSAS. It’s worth noting that participants in Zarate et al.‘s (2022) study were recruited at Time 1 (N = 1097) in the present study. Despite overall support, the original one-factor BSAS model encounters mixed validation, particularly with limited attention to longitudinal measurement invariance in these studies.

Longitudinal measurement invariance, distinct from test-retest reliability, ensures comparable metric and scalar factorial structures at different time points (Brown, 2006). When there is weak or no support for longitudinal measurement invariance, it indicates that the ratings at the different time points cannot be justifiably compared since the scores can be assumed to be confounded by different measurement and scaling properties. Consequently, corresponding empirical information on the BSAS items’ measurement invariance is required to compare ratings at different time points. The significance of establishing support for longitudinal measurement invariance lies in its role in accurately tracking the developmental trajectory of SA symptoms over time. This is instrumental in assessing the effectiveness of clinical treatments over time and enhances the generalizability of findings derived from longitudinally collected BSAS data. In essence, longitudinal measurement invariance ensures the consistency of measurement and scaling properties, allowing for meaningful comparisons across different time-related assessment intervals.

Distinguishing itself from test-retest reliability, longitudinal measurement invariance assesses whether observed scores reflect consistent levels of the underlying latent trait across time points. On the other hand, test-retest reliability examines the stability of scores obtained from a measure over two or more time points, typically demonstrated by a high correlation (refer to The Standards for Educational and Psychological Testing, American Educational Research Association, 2014). Notably, neither longitudinal measurement invariance nor test-retest reliability has been systematically investigated for the BSAS, highlighting an avenue for future research to enhance the robustness of its psychometric properties.

Current study

Given the existing data limitations, characterized by a dearth of clear information on the factor structure of the Behavioral Addiction Scale (BSAS), a lack of measurement invariance data, and an absence of test-retest reliability information, the primary objective of the current study was to employ Confirmatory Factor Analysis to systematically investigate the factor structure of the BSAS, assess measurement invariance, and determine the test-retest reliability of BSAS ratings. The study focused on a cohort of adults from the general community, spanning a two-year interval, with data collected at three distinct time points (i.e., 2021, 2022, and 2023). Importantly, the study refrained from formulating specific hypotheses, as its objectives were primarily exploratory in nature.

Study aims and significance

The primary objective of this study is to comprehensively analyze the factor structure, measurement invariance, and test-retest reliability of the BSAS across three sequential time points over a two-year period. By utilizing CFA, this research aims to validate the BSAS’s consistency and stability in measuring shopping addiction among adults in a community setting. Given the scale’s wide application in both clinical and research settings, establishing its longitudinal measurement invariance is crucial for ensuring that the observed scores are true reflections of the underlying construct over time and not influenced by differing measurement properties or scales at different assessment points. This investigation is significant as it addresses a notable gap in the literature concerning the long-term reliability of shopping addiction measurements, thus aiding in the accurate monitoring and treatment assessment of this behavioral addiction. The findings are expected to enhance the scale’s utility by confirming its applicability for longitudinal studies and repeated measurements in diverse populations, thereby contributing to better clinical practices and research methodologies in the field of psychological addiction studies.

Method

Participants

Participants were from the general community, forming a normative online convenience sample. Regarding individuals with usable scores, responses of 968 English-speaking adults were included at time 1, 462 at time 2, and 276 at time 3. In the current study, the attrition rate from time 1 to time 2 was 52.3% [(968 − 462)/968], and from time 1 to time 3 was 71.5% [(968 − 276)/968)]. To detect attrition bias in the characteristics of the final sample, we used t-tests for the BSAS total scale scores in time 1 who responded and did not respond at time 2 and at time 3 (Miller & Wright, 1995). Supplementary Table S1 shows the descriptives for these variables and the results of the t-test. As shown, respondents and nonrespondents differed in the BSAS total score for time 2, and not time 1. However, as the effect sizes for the differences at time 2 were small, it was interpreted that attribution bias had little effect on the scores collected in the study.

Only the 276 participants who completed ratings at all time points were involved in this study. To compute the sample size requirements, Soper’s (2022) software was used. For the current study, with an expected effect size set at 0.3, a power level of 0.8, the number of latent variables at 3 (i.e., the three time points), and probability at 0.05, a minimum sample size of 119 is recommended. Our sample size (N = 276) exceeds this recommendation.

Supplementary Table S2 provides background information on the 276 participants involved in the study. As shown, their age ranged from 18 to 62 years (mean = 31.86 years; SD = 9.94 years) and included 196 men (71%; mean age = 31.92 years, SD = 10.84 years) and 75 women (32.9%; mean age = 32.12 years, SD = 10.84 years). Additionally, five individuals (1.8%) did not identify their gender. No significant age difference was found across men and women, t (269) = 0.1496, p = .882. Regarding sociodemographic background, slightly more than half the number of participants (66.5%) reported being employed, and most reported having completed at least secondary education (97%). Racially, most of the participants identified themselves as “white” (69.2%), and slightly less than half the number of participants (42.4%) indicated that they were involved in romantic relationships. Based on their IRT study, Zarate et al. (2022) have proposed a score of 23 (out of a total scale score of 35) to distinguish between those at risk and those not at risk for shopping addiction. Based on this cut-off, the frequency for those at risk at time 1 was 18 or 6.5%.

Measures

All data were collected online. At the start of the study (time 1), participants provided demographic information, including age, gender, ethnicity, highest education level completed, employment status, and relationship status. They also completed ratings of the BSAS (Andreassen et al., 2015) at three different time intervals, one year apart (in 2021, 2022, and 2023).

The BSAS has seven items with a time reference of the past year. An example item is: “I think about shopping/buying things all the time”. Items are responded to on a five-point scale ranging from very rarely (1) to very often (5). Therefore, higher symptom scores indicate higher symptom severity. The internal reliability for the BSAS instrument was very good in the present study (McDonald’s ω = 0.89, 0.87, and 0.9 for time 1, time 2, and time 3, respectively).

Procedure

Upon approval from the Victoria University Human Research Ethics Committe (application no. HRE21-044), the study was advertised using non-electronic and electronic (i.e., email, social media) methods. Time 1 data was collected between August 2019 and August 2020. Participants were invited to register for the study via a Qualtrics link available on social media (i.e., Facebook, Instagram, Twitter), the University websites and digital forums (i.e., reddit.com). The link took them to the Plain Language Information Statement (PLIS). Those wishing to participate were directed to click a button to agree to informed consent. This was followed by questions seeking sociodemographic information and a number of questionnaires. Only the BSAS is of relevance to the present study. Participants completed the online survey using a computer at their chosen location. At the end of completing the steps at time 1, participants were requested to voluntarily provide their email address to be included in prospective data collection wave(s) and sign the study consent form digitally (box ticking). Twelve months later (between August 2021 and August 2022), those who consented received follow-up emails requesting their voluntary participation in the survey. This included an identical survey link (i.e., PLIS, email provision for the second wave, consent form and survey questions). In all, 462 participated in the second data collection wave. A comparable procedure between August 2022 and August 2023 was used for collecting wave 3 data. Due to the inclusion of questionnaires addressing one’s level of distress, those who had a current untreated severe mental illness were instructed (also included in the plain language information statement) not to participate to avoid any unforeseen/indirect emotional impact.

Statistical procedures

The CFA models for this study were conducted using the Mplus software (Version 7), developed by Muthén and Muthén (2012). Given that the BSAS scores are ordinal with five response categories, they were treated as continuous data (Rhemtulla et al., 2012). For the CFA analyses, the robust maximum likelihood chi-square (MLR) estimator was the preferred estimator, and accordingly, the MLRχ2 estimator was employed to evaluate model fit. Nonetheless, to align with prior research that treated BSAS scores as ordinal and utilized corresponding estimators, this study also explored the models using the Weighted Least Squares Mean and Variance adjusted (WLSMV) estimator. This approach allowed for comparative analysis with previous studies, where the WLSMVχ2 was used for fit assessment. The MLRχ2 and WLSMVχ2 values can be inflated in large samples, a factor taken into consideration in this analysis. Additionally, Mplus provided practical fit indices including the Root Mean Squared Error of Approximation (RMSEA), Comparative Fit Index (CFI), and Tucker-Lewis Index (TLI). These indices were critical in determining model fit, with thresholds set according to Hu and Bentler’s (1998) criteria: RMSEA values ≤ 0.06 indicated a good fit, while CFI and TLI values ≥ 0.95 suggested an optimal model-data fit. Values between 0.90 and < 0.95 were deemed acceptable. The study first assessed the fit of a one-factor BSAS model at three different time points. In these models, all items were loaded onto a single latent factor without correlated error variances, and the latent factor variance was fixed at one for model identification. For longitudinal measurement invariance assessment, an extended single-group CFA model incorporated data from all time points. This model, illustrated in Supplementary Fig. 1, linked the unidimensional factor models from each time point with correlated error variances and latent factors.

An extended single-group CFA model that included the ratings at all time points was used for longitudinal measurement invariance. Supplementary Fig. 1 shows our path diagram for evaluating longitudinal measurement invariance. The model combines the unidimensional factor models for time 1, time 2, and time 3. However, the models are connected at each time point, with correlated like error variances and latent factors. Using this model, longitudinal measurement invariance was run. It involved comparing a series of increasingly constrained models to evaluate configural, metric, scalar, and uniqueness invariance. Details of these steps are outlined in Supplementary Table S3. This involved ensuring that the latent factor structure, factor loadings, item intercepts, and uniqueness variances of corresponding items were consistent across different time points. Model comparisons utilized both the χ2 difference test and changes in the approximate fit indices (CFI and RMSEA), with thresholds for rejecting invariance set at ΔCFI > 0.01 and ΔRMSEA > − 0.015 (Chen, 2007).

Lastly, the test-retest reliability of the BSAS factors, or the temporal stability of the BSAS latent factor, was assessed by analyzing the intercorrelations of the latent factors across time 1, time 2, and time 3.

Data availability

The data used in the analysis is available on request from the corresponding author.

Results

Preliminary analyses

Initially, the mean and standard deviation scores for the seven items of BSAS were computed. These are displayed in Supplementary Table S4. Following this, the fit of the one-factor BSAS model was examined at the three-time points. Table 1 displays the results of these analyses. As shown for MLR extraction, based on Hu and Bentler’s (1998) recommendations, at time 1, time 2, and time 3, the CFI values showed either good (time 2) or acceptable (time 1 and time 2) fit. The TLI indicated adequate fit at time 2 and time 3, and poor fit at time (1) The RMSEA at time 2 indicated good fit, and marginal fit at time 3. For time 1, it was just outside (0.106) the cut-off for marginal fit. For the WLSMV extraction, the CFI and TLI showed a good fit at all three-time points. The RMSEA showed poor fit at all time points. Thus. considering all the approximate fit indices (CFI, TLI, and RMSEA), for MLR extraction there was either a good or adequate fit for times 1 and (2) For WLSMV extractions, there was a misfit for all time points. Overall, we interpreted these findings as indicating that MLR extraction revealed a better fit for the BSAS one-factor model. Despite this, we examine how local misfit may have contributed to some signs of misfit for these MLR-based models. For all three time points, there was for significant discrepancy (p < .05) between the residuals for the relationship of only item 1 with item 2. All other residuals were not significant. Notwithstanding these findings, they can be interpreted as indicating sufficient fit for the one-factor BSAS model at all three-time points. Supplementary Table S5 shows the factor loadings, intercepts, and error variances for the BSAS one-factor CFA model at time 1, time 2 and time 3.

Table 1 Fit values for the BSAS 1-factor CFA Model at Time1, Time 2 and Time 3 using MLR and WLSMV extraction

Longitudinal measurement invariance for the BSAS 1-factor CFA model across Time 1, Time 2, and Time 3

Given a better fit when the MLR extraction was applied, we use MLR extraction in the measurement invariance analyses. Table 2 summarizes the results for testing longitudinal measurement invariance for the BSAS one-factor CFA model. As shown, for the configural invariance model (M1), the CFI and TLI indicate an acceptable fit, and the RSMEA indicates a good fit. Thus, there was support for the configural invariance model. The table also shows that for both differences in chi-square (∆χ2) and approximate fit indices (∆CFI and ∆RMSEA) there no difference between the full metric invariance model (M2) and the configural invariance model (M1), the full scalar invariance model (M3) and the full metric invariance model (M2), the full scalar invariance model (M3) and the full error variances invariance model (M4). Thus, there was support for full metric, scalar, and error variance invariance models, respectively, i.e., full longitudinal measurement invariance.

Table 2 Results of the test for longitudinal measurement invariance for the BSAS 1-factor CFA model across Time1, Time 2 and Time 3 based on the ∆χ2

Test-retest (temporal stability) of the BSAS latent factor

Given support for longitudinal measurement invariance, we examined the temporal stability of the BSAS latent factor. The correlations for time 1 with time 2 and time 3 were 0.594 and 0.572, respectively, and the correlation for time 2 and time 3 was 0.480. They were all significant (p < .001) with large effect sizes (Cohen, 1988), thereby indicating strong temporal stability. Also, a one-way repeated ANOVA showed no difference in the latent mean scores [mean (standard deviation) for time 1 = 12.799 (5.619); time 2 = 12.645 (5.278); time 3 = 12.941 (5.846)] across the three-time points, F (2, 271) = 0.332, p = 0.717.

Discussion

Summary of study findings

The present study findings provided new data on the factor structure, longitudinal measurement invariance in adults for the BSAS items across three time points spanning two years. Additionally, it provided new data on the temporal stability of the BSAS factor over this interval. The findings showed that there was support for (1) the initially proposed simple one-factor model; (2) configural invariance (same factor structure pattern), full metric (same factor loadings), full scalar (same response level), and full unique factor (same unique variances) invariance (i.e., strong support for longitudinal measurement invariance in adults for the BSAS items across the three-time points spanning two years); and (3) temporal stability of the BSAS factor over two years (including equivalency for latent mean scores across the three-time points).

Implications of our findings

While the support for the simple one-factor structure for the BSAS is not new, the present findings do provide more clarity on the BSAS factor structure. This is because, of the three studies published in this area, two studies did not support the simple one-factor model, but a one-factor model with covariance between the error variances of items one and two (Andreassen et al., 2015; Uzarska et al., 2023). The study by Zarate et al. (2022) supported the simple one-factor model. Although our findings were interpreted as providing sufficient support for the simple one-factor model, the support was considered only adequate. As shown for the MLR extraction, the CFI values ranged from good (time 2) or acceptable (time 1 and time 2) fit. The TLI indicated only adequate fit at time 2 and time 3, and poor fit at time (1) The RMSEA at time 2 indicated good fit, and marginal fit at time 3. While for time point 1, it was just outside the cut-off for marginal fit. In contrast, WLSMV extraction paints a more consistent picture of a good fit, with the CFI and TLI indicating a good fit at all three time points, and the RMSEA showed a poor fit at all time points. Our examination of local misfit indicated that at all three time points, there was a significant discrepancy between the residuals for the relationship of only item one with item two. All other residuals were not significant. Considering the present and past findings that required the inclusion of covariance between the error variances of items one and two (Andreassen et al., 2015; Uzarska et al., 2023), it can be speculated that these items may require closer scrutiny, with possible revisions and modifications in required.

In the context of the current study, the support for configural invariance indicates that the same overall factor structure (one factor in the current study) holds across the three-time points. The support for metric invariance indicates that the strength of the associations of the items with the BSAS latent factors is the same for like items at all three-time points. Scalar invariance indicates that individuals will endorse the same level of observed scores for the same latent trait scores at different time points. The support for error variance invariance indicates that the reliabilities of the BSAS items are the same for like items at all three-time points. To date, as no study has examined longitudinal measurement invariance of the BSAS, our findings are new and novel in this area.

The robust support for longitudinal measurement in adults for the BSAS items suggests that the BSAS scores are not confounded by biases related to scaling and measurement issues and, therefore, can be justifiably compared over this interval. Thus, the BSAS items can be used to monitor the developmental changes of SA symptoms and accurately monitor clinical treatment effects over a reasonable length of time. However, this recommendation must be viewed with caveats in mind.

Strictly speaking, our findings apply to the SA symptoms included in the BSAS and not to SA symptoms in general. However, there are reasons to suspect that such a possibility cannot be ruled out. This is because the content of the items in the BSAS is based on the components model of addiction that is thought to capture core addiction symptoms (salience/preoccupation, mood modification, tolerance, withdrawal symptoms, conflict, and relapse; Griffiths, 2005). Additionally, a novel finding relates to the support for strong temporal stability for the latent factors across the three-time points. This suggests that SA has stronger stability over longer intervals. Thus, the study’s findings could be relevant to other measures of SA and possibly SA in general.

Limitations

Although the current study has delivered original and valuable information regarding temporal stability and longitudinal measurement invariance for the BSAS symptom ratings across two years, the findings and interpretations must be considered with several limitations in mind. First, SA ratings are influenced by other factors, such as age and gender (Andreassen et al., 2015). Not controlling for these variables in the present study may have confounded findings. Second, as all participants in this study were from the general community and not selected randomly, our findings may be further confounded and limited in terms of generalization, including their application to those with the potential for clinical levels of SA. Third, all data used were collected using a self-rating questionnaire (i.e., the BSAS). Again, it is possible that the ratings were influenced by this method and as such our results may be subject to confounding by common method variance. Fourth, our findings have been obtained from a single study, and therefore, replication is essential. Fifth, while we establish sufficient power for the study, it may still be possible that the findings would have been different if the sample had been larger. Given these limitations, future research is needed in this field, controlling for the limitations noted above.

Conclusions

In conclusion, the study highlights that BSAS scores over two years are not confounded by biases related to scaling and measurement issues and therefore, can be justifiably used to monitor developmental changes of SA symptoms and clinical treatment effects over time. Despite the limitations mentioned, the findings in this study are new and novel as this is the first study to examine the longitudinal measurement invariance for self-ratings of BSAS symptoms over an extended period of two years. As such, it can be expected to contribute significantly to theory and to research and clinical practice in SA. We recommend that clinicians and researchers consider the findings and interpretations from this present study when integrating information on SA symptoms obtained longitudinally.