Background

Representing 30 – 40% of all non-Hodgkin lymphoma (NHL) cases, diffuse large B-cell lymphoma (DLBCL) is the most prevalent type of NHL [1]. According to the 2020 GLOBOCAN data, there were an estimated 544,352 new cases and 259,793 deaths of NHL globally, comprising 2.8% of all cancer diagnoses and 2.6% of all cancer deaths [2, 3]. Once developed, DLBCL progresses rapidly and affects tissues and organs such as bone marrow, spleen, thymus, lymph nodes, lymphatic vessels. Currently, the standard of care in the first-line DLBCL treatment is the R-CHOP (rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone) chemoimmunotherapy regimen and its modifications [4, 5]. Among those who respond to first-line treatment, about 30% to 40% will relapse and 10% will become refractory [6]. Patients who do not respond to second-line therapy or progress after autologous stem cell transplant currently have very limited treatment options [7, 8]. Recently, innovative therapies such as antibody–drug conjugates [9, 10], CAR-T cell therapy [11, 12], and bispecific antibody therapy [13] have been developed to improve the outcomes of DLBCL patients. These innovations underscore the need of assessing their values in the attributes of patient-reported outcomes. However, despite recent progress, there remains a notable evidentiary gap in the health utility values in Chinese DLBCL population.

Overall, DLBCL represents an enormous burden to the patients and society with its many-faceted clinical manifestations, suboptimal treatment responses, and poor prognosis. Therefore, a thorough assessment of its effects on DLBCL patients' HRQoL is imperative.

Generic preference-based measures (GPBMs) are suitable to assess HRQoL across different populations and health conditions [14,15,16]. They offer comprehensive insights into patient health and overall well-being. In addition, the responses of GPBMs can be converted to utility scores that are necessary components of quality-adjusted life years (QALYs) [17]. The EQ-5D-5L is a GPBM that measures HRQoL based on five dimensions: mobility (MO), self-care (SC), usual activities (UA), pain/discomfort (P/D), and anxiety/depression (AD). The health status of EQ-5D-5L is defined by 5 dimensions, and each of its dimensions has 5 levels [18]. The possible responses to EQ-5D-5L encapsulates a comprehensive range of 3125 health states [19]. The EQ-5D-5L is one of the most popular GPBMs to evaluate health utility scores [20,21,22]. The reliability and validity of the Chinese versions of EQ-5D-5L have been inspected abundantly, rendering its wide use in China [23]. The SF-6Dv2 is another frequently health-utility-enabled instrument, which was derived from the Short-Form Health Survey (SF-36). The conceptual equivalence between the original English version and the Simplified Chinese version of SF-6Dv2 has been established in previous studies [24,25,26]. The SF-6Dv2 has six dimensions: physical functioning (PF), role limitation (RL), social functioning (SF), pain (PA), mental health (MH), and vitality (VA). Each of the SF-6Dv2 dimensions other than the PA-dimension has 5 response levels. The PA-dimension, on the other hand, has six response levels. The SF-6Dv2 descriptive system allows 18,750 health states [25].

To our best knowledge, evidence on the comparative performance of EQ-5D-5L and SF-6Dv2 in Chinese DLBCL patients is currently absent. Therefore, the primary objective of the survey was to describe the HRQoL of DLBCL patients in China using the EQ-5D-5L and SF-6Dv2 questionnaires. The secondary objective of this study was to compare the properties of EQ-5D-5L and SF-6Dv2 among DLBCL patients in China.

Method

Study design and patients

DLBCL patients were recruited via the largest lymphoma patient organization in China. As of 2022, there were more than 100,000 registered members (lymphoma patients-50% and their family members-50%) of the organization. The link to the survey was disseminated by the organization staff via its internal network to the members that registered as DLBCL patients, family members of patients, and caregivers of patients. The survey methodology, execution, and data integrity were defined by the survey committee, including rare disease medical experts, leaders of rare disease patient foundations, and our research team.

To uphold the integrity and quality requirements of data collection, the questionnaire was developed in collaboration with medical experts in DLBCL and blood diseases. The final version of questionnaire was refined through two rounds of pilot testing with a small group of patients to ensure clarity and relevance. The survey was distributed digitally via a secure platform, which ensured structured data collection and minimized the chances of manual data entry errors.

To ensure that each entry in our dataset represented a unique individual, the survey committee employed a manual data exclusion process using Microsoft Excel. Specifically, the survey committee cross-referenced the IP addresses from which the surveys were submitted, the respondents' registered domicile at the city level, the hospital where the diagnosis was made, the disease status, treatment regimen, healthcare expenditures, and the total duration taken to complete the survey. By comparing these data points, duplicate responses were identified and excluded. The survey committee conducted periodic data quality reviews and follow-up contact with participants when necessary to ensure that the data collected were reliable and accurate. This study only included those patients who fully completed the questionnaires and there were no missing values in our analysis.

The main inclusion criteria of this study population included: 1. the patients were adults over 18 years old, 2. the respondent had a diagnosis of DLBCL, or was a family member or caregiver of a DLBCL patient, and 3. the respondent can describe the patient's disease status, quality of life and other conditions. Before commencing the survey, participants were advised to prepare specific documents that would aid in accurately responding to questions. This preparation ensured that participants had access to comprehensive information regarding the patient's health condition and treatment history, enabling them to meet the inclusion criteria effectively. The main exclusion criteria included: 1. unwilling to report some crucial questions in the survey, 2. received treatment outside mainland China.

Data collection

Demographic and clinical variables

The basic demographic and clinical variables collected were 1) demographic information: sex, age at the time of study, ethnicity, and family registration); 2) SES: marital status, education level, employment status, health insurance coverage, personal income, and household income; and 3) clinical characteristics: treatment lines, whether on treatment, treatment efficacy assessment results, IPI score, double-hit/triple-hit (DHL/THL), double-expressor (DEL), non-GCB subtype, ABC subtype, TP53 mutation, MYD88 mutation and/or CD79b mutation, Ann Arbor staging, and complication.

Treatment stages information

According to a question in our survey, “which of the following is the patient's current status?”, patients were classified into five categories: 1, untreated newly diagnosed patients, 2, newly diagnosed patients on treatment, 3, patients completed the first-line treatment and were being monitored for health, 4, RR patients on treatment, and 5, RR patients completed at least one line of treatment after relapse and were being monitored for health. In our analysis, the first three categories were described as initial treatment patients and the rest were described as RR treatment patients.

According to a question in our survey, “what was the efficacy assessment result of the treatment according to the doctor?”, patients were classified to four categories: 1, complete remission, 2, partial remission, 3, progressive disease, and 4, have not been evaluated. In this analysis, these answers were described as CR, PR, PD and Unknown.

HRQoL measure instruments

Given the complexity of the functional status of DLBCL patients, especially that of those unable to complete the survey on their own due to severe health conditions or poor digital illiteracy, this study allowed the family members or caregivers to answer on behalf of the patients [27,28,29,30,31]. This study used the self-reported versions of both EQ-5D-5L and SF-6Dv2 in the questionnaire and the responses were converted to utility values using Chinese-specific value set [26, 32].

Statistical analysis

To ensure the robustness and validity of the statistical analysis, this study performed a Chow’s test [33, 34] to evaluate the comparability of the data collected from the self-reported and proxy-reported groups and to determine if the responses of the two groups could be pooled in subsequent analyses [35, 36]. The results of the Chow’s test indicated that the null hypothesis could not be rejected, suggesting that pooled analyses of self-reported and proxy-reported data were statistically justified. Detailed results of the Chow’s test are provided in Appendix.

Descriptive statistics

Descriptive analyses were conducted for both initial and RR treatment patients. The EQ-5D-5L and SF-6Dv2 utility scores value scores were reported by both treatment stage information and treatment efficacy assessment results. Categorical variables were reported using frequency and percentage. Continuous variables were described using means and standard deviations.

Chi-squared tests were conducted to statistically evaluate the differences in demographic and SES characteristics across initial (untreated, on treatment, and off treatment) and RR (on treatment and off treatment) patients, whereas a t-test was performed to test the difference in age.

The proportion of patients reporting the best and worst levels of the EQ-5D-5L and SF-6Dv2 dimensions and utility values were examined to assess the ceiling and floor effects. Ceiling and floor effects were reported if the proportions of respondents reporting the highest and lowest utility score were greater than 15% [37].

Agreement between the EQ-5D-5L and SF-6Dv2

The agreement between the utility values of the EQ-5D-5L and SF-6Dv2 can be used to determine whether these utility values can be used interchangeably. The Bland–Altman plot is utilized to visually display the agreement of utility values across different ranges. The agreement of utility scores obtained from EQ-5D-5L and SF-6Dv2 was depicted using the B-A plot. The x-axis of the B-A plot represents the mean of the two utility scores whereas the y-axis represents the difference. The mean difference of the two utility scores is labeled using a red horizontal line in the middle of the plot while the upper and lower limits of the range within which 95% of the differences between the two utility scores are expected to lie are visualized using two black horizontal lines. It is considered “good agreement” if the mean difference is close to zero and more than 95% of the scatters lie within the range [38].

Convergent/Divergent validity

Convergent validity is established when two measures that are expected to be related demonstrate a strong correlation. For instance, the mobility dimension in the EQ-5D-5L is expected to exhibit a significant correlation with the physical functioning dimension in the SF-6Dv2 [39, 40]. Pearson’s correlation coefficients were used to assess the association of utility score (continuous data) and Spearman’s rank correlation coefficients were used to assess the associations between EQ-5D-5L and SF-6Dv2 dimensions (categorical data). Correlations were deemed strong, moderate, and weak when the coefficients were > 0.5, between 0.3–0.5, and < 0.3, respectively [41].

Discriminant validity: known-group validity and GRM

The assessment of known-group validity, a measure that elucidates the capacity of the EQ-5D-5L and SF-6Dv2 to discriminate respondents with different clinical severity, was conducted using one-way ANOVA for multiple groups and the t-test across binary variable groups. Specifically, the groups stratified by treatment lines [42,43,44], whether on treatment [45], treatment efficacy assessment results [46], IPI score [47, 48], DHL/THL [49, 50], DEL [51, 52], non-GCB subtype, ABC subtype [49], TP53 mutation [51], MYD88 mutation and/or CD79b mutation [53,54,55], Ann Arbor staging [56, 57], and complication were analyzed [58]. Furthermore, the selection of clinical characteristics defining clinical severity was guided by evidence from previous studies and clinical practice suggesting an association between these characteristics and health status in patients with DLBCL. It was hypothesized that the factors associated with lower utility scores included the RR status, on treatment, PD treatment efficacy assessment results, higher risk IPI score, with DHL/THL, with DEL, non-GCB subtype/ABC subtype, with TP53 mutation, with MYD88 mutation and/or CD79b mutation, higher Ann Arbor staging, and with complications.

Item response theory (IRT) is a statistical model commonly used in psychometric and educational measures. IRT has also been increasingly engaged in quality-of-life research and patient-reported outcomes (PRO) measures in recent years [59,60,61]. IRT relates a person's ability/trait level to their probability of answering a question correctly. Based on the findings from previous studies, data collected from EQ-5D-5L and SF-6Dv2 met the key assumptions for the application of IRT [62,63,64]. The evidence to support the appropriateness of applying IRT to the current data is provided in Appendix. In the PRO analog, the ability/trait refers to health status. In this study, the GRM, which is an extension of IRT to ordinal response variables, was used to evaluate the discrimination power of each item within each instrument [65].

The GRM generates difficulty (in this study, it represents the underlying health) parameters (bn) for n response categories, and a discrimination parameter (a) for each dimension [61]. The dimensions correspond to the items of each instrument and the response categories correspond to the levels of each item. The discrimination parameter of an item indicates the capacity of the corresponding dimension to differentiate between study subjects at different health status [60]. A higher value suggests that the item that can more effectively differentiate respondents across the spectrum of health status [61]. To visualize the discrimination parameter and the relationship between a person's ability and their probability of choosing a less damaged health state, the boundary characteristic curve (BCC) produced from the results of the graded response model GRM can be used. The BCC is a sigmoid shaped curve, typically plotted with ability on the x-axis and the probability of choosing a less damaged health state on the y-axis. On the BCC plot, the discrimination parameter is represented by the slope of the curve. Items with higher discrimination will have steeper slopes. Items with discrimination parameters > 0.50 are typically considered being able to adequately discriminate respondents with different ability [61, 65]. The difficulty parameter, alternatively known as the location parameter, indicates the health status where there is a 50% probability of moving to the next higher response category. It is represented by the location on the ability scale where the curve is steepest such that there is maximum uncertainty in whether a respondent will endorse the item. It is widely accepted that the difficulty parameter of an item should fall within the interval of [-3.00,3.00] to ensure accurate and effective discrimination [59]. To apply GRM to our analysis, the response levels of EQ-5D-5L items and the five-level items of SF-6Dv2 were assigned values of 5 ~ 1, while the options of the 6-level items of SF-6Dv2 were assigned values of 6 ~ 1.

Ordinary least square (OLS) linear regression

A multivariable OLS regression was used to examine the factors that affected the EQ-5D-5L and SF-6Dv2 utility scores. Categorical independent variables were re-organized as dummy variables. The level of each category that was supposedly associated with the highest health utility value was employed as the reference category in our regression model. The reference levels of the categorical explanatory variables were initial treatment, off treatment, CR, IPI = 0 ~ 1 (low risk), no DHL/THL, no DEL, no TP53 mutation, neither MYD88 mutation nor CD79b mutation, Ann Arbor staging: Stage I, and no complication. The variables that demonstrated statistical significance in the one-way ANOVA were considered for inclusion in the OLS model. The significance level was set at 0.05. All statistical analyses were performed using Stata/MP 15.0 (StataCorp LLC, College Station, TX, USA).

Results

Demographic and SES characteristics

In total, 582 valid responses were collected between September 9 and November 3, 2022. Table 1 shows the demographic and SES characteristics stratified by patients’ treatment stages. A total of 477 respondents were associated with initial-treatment patients and 105 respondents were associated with RR patients. Out of the valid responses, 294 were collected from the patients themselves and 288 were provided by the patients’ family members or other caregivers. Female patients slightly outnumbered male. With a mean age of 48.95 years, most of the patients with DLBCL in the survey aged 31 to 60 years by the time of the current study. About 97.42% of the patients in the survey were Han Chinese, 70% were urban residents, and 78.87% were married. More than half of the respondents completed their tertiary education. In addition, 20.27% of the patients were unemployed by the time of this study. Nearly all were covered by social health insurance (SHI)—54.47% with urban employee basic medical insurance (UEBMI), 15.81% with urban resident basic medical insurance (URBMI), 18.04% with new rural cooperative medical scheme (NRCMS), and 5.15% with urban resident cooperative medical insurance (URCMI). Nearly one-third (32.65%) of patients had supplemental insurance in addition to basic social health insurance (SHI), while only 1.2% lacked any insurance coverage. Furthermore, 21.99% of respondents reported no personal income for 2021. The mean annual household income across all patients was 114,250 yuan in 2021.

Table 1 Demographic and SES information of patients

Characteristics of EQ-5D-5L and SF-6Dv2 utility scores

The distribution of responses across each level of the dimensions in EQ-5D-5L and SF-6Dv2 is detailed in Table 2. In the EQ-5D-5L responses, 19.71% of initial treatment patients and 10.48% of the RR patients reported full health (utility score = 1). The utility scores of initial treatment patients ranged from -0.333 to 1, while the scores of RR patients ranged from -0.251 to 1. Additionally, the mean (standard deviation [SD]) SF-6Dv2 score was 0.641 (0.220), with 1.72% of initial treatment and no RR patients reported full health. The utility scores of initial treatment patients ranged from -0.276 to 1 and RR patients ranged from -0.261 to 0.962. Overall, the mean EQ-5D-5L and SF-6Dv2 scores were higher for initial treatment versus RR patients. Moreover, stronger ceiling effects were observed for the EQ-5D-5L, with 18.04% of total responses indicating full health.

Table 2 Characteristics of EQ-5D-5L and SF-6Dv2 responses

According to the EQ-5D-5L results in Table 2, the self-care dimension had the highest proportion of respondents reporting no problems (85.22%). Following self-care, the dimensions ranked from highest to lowest proportions of no problem responses were: usual activities, mobility, pain/discomfort, and anxiety/depression. Additionally, the proportions reporting no problems were higher among initial versus RR treatment patients across all EQ-5D-5L dimensions.

According to the SF-6Dv2 results in Table 2, the highest proportion of respondents reported no problems on the pain dimension (37.63%). Following pain, the dimensions ranked from highest to lowest proportions of no problem responses were: social functioning, mental health, role limitation, physical functioning, and vitality. Compared to initial treatment patients, RR patients more frequently reported no problems on the pain dimension, but less frequently reported no problems on the other SF-6Dv2 dimensions.

Table 3 shows the means, SDs, 95% confidence interval (CI) ranges, and standard errors (SEs) of the EQ-5D-5L and SF-6Dv2 utility scores stratified by treatment stage and treatment response assessment results. Untreated newly diagnosed patients reported a mean utility of 0.857. Among initial treatment patients on treatment, those assessed as CR, PR, and PD reported utilities of 0.785, 0.688, and 0.472, respectively. For initial treatment patients who completed treatment and were monitored, utilities for those assessed as CR, PR, and PD were 0.883, 0.845, and 0.852, respectively. Among RR patients on re-treatment, utilities for those assessed as CR, PR, and PD were 0.861, 0.658, and 0.796, respectively. For RR patients who completed re-treatment and were monitored, utilities for those assessed as CR, PR, and PD were 0.832, 0.876, and 0.805, respectively.

Table 3 EQ-5D-5L and SF-6Dv2 utility scores for health states

Agreement between the EQ-5D-5L and SF-6Dv2

The B-A plot (Fig. 1) showed modest agreement between EQ-5D-5L and SF-6Dv2 utility scores. The mean difference was 0.188 (95% CI 0.176 to 0.200). The limits of agreement (LoA) ranged from -0.101 to 0.476, with 93.81% of observations within this interval. Notably, patients in extremely poor health states exhibited greater inconsistency in utility scores between measures. For patients in better health, EQ-5D-5L scores were higher than SF-6Dv2 scores.

Fig. 1
figure 1

B-A plot

Convergent/Divergent validity

The associations between the dimensions of EQ-5D-5L and SF-6Dv2 are presented in Table 4. Spearman’s correlation coefficients demonstrated that all EQ-5D-5L dimensions were significantly correlated with the SF-6Dv2 dimensions. Also, the association between EQ-5D-5L and SF-6Dv2 utility scores was statistically significant and strong (Pearson’s correlation coefficient = 0.787).

Table 4 Correlation coefficients of EQ-5D-5L and SF-6Dv2 dimensions and utility scores

Known-group validity: discrimination across different groups

The ANOVA and t-test results in Table 5 demonstrate the known-group validity of the EQ-5D-5L and SF-6Dv2. Both utility measures significantly varied across groups defined by most predetermined clinical characteristics, including treatment status, active treatment, treatment response, IPI score, DEL, TP53 mutation status, MYD88/CD79b mutation status, Ann Arbor stage, and complications. However, the SF-6Dv2 did not significantly discriminate between groups based on age and DHL/THL subgroups. Furthermore, both instruments exhibited limited discriminative ability across groups defined by Non-GCB subtype, ABC subtype, and complications.

Table 5 ANOVA and t-test results of clinical characteristics

Graded response model

Table 6 presents the estimated item discrimination and difficulty parameters for the EQ-5D-5L and SF-6Dv2 using GRM. All items showed high discrimination. Basically, the item location parameters of the two measures did not present the same patterns. The location parameters of EQ-5D-5L were predominantly concentrated among populations with poorer health levels (most location parameters are negative). This implied that EQ-5D-5L's ability to differentiate among individuals with better health levels was relatively low, a finding that is consistent with its ceiling effect. In contrast, the location parameters of SF-6Dv2 were more evenly distributed, allowing for differentiation across various health levels of the patients. For the Item Information Functions and the BCC graphs of the dimensions EQ-5D-5L and SF-6Dv2, please refer to Appendix.

Table 6 GRM item parameter estimates

OLS regressions

Table 7 presents the results of the OLS regressions. The coefficient estimates (b) along with 95% CIs are reported. According to the regression coefficients, EQ-5D-5L utility scores were negatively impacted if patients reported being on treatment (-0.095, 95%CI -0.142, -0.048, p < 0.001), PD (-0.104, 95%CI -0.187, -0.021, p < 0.05), MYD88 mutation (-0.093, 95%CI -0.175, -0.01, p < 0.05), Ann Arbor staging IV (-0.076, 95%CI -0.142, -0.001, p < 0.05), and complication (-0.035, 95%CI -0.07, 0, p < 0.05).

Table 7 OLS regression of clinical characteristics on EQ-5D-5L and SF-6Dv2

Similarly, SF-6Dv2 utility scores were negatively impacted if patients reported being on treatment (-0.138, 95%CI -0.182, -0.093, p < 0.001), PD (-0.127, 95%CI -0.205, -0.049, p < 0.001), having not evaluate the treatment efficacy (-0.07, 95%CI -0.125, -0.015, p < 0.05), MYD88 mutation (-0.082, 95%CI -0.16, -0.005, p < 0.05), Ann Arbor staging IV (-0.093, 95%CI -0.155, -0.032, p < 0.01), and complication (-0.054, 95%CI -0.087, 0.021, p < 0.05).

Discussion

To our knowledge, this is the first study to compare the measurement properties of the EQ-5D-5L and SF-6Dv2 among DLBCL patients in China. It also provides novel evidence on the health utility values of DLBCL patients across different treatment statuses.

Based on the current findings, DLBCL causes substantial patient quality of life loss, with utility values of patients far below those of the general population. According to literature evidence, the mean utility values of population norms for EQ-5D-5L and SF-6Dv2 are 0.946 and 0.827, respectively [19]. The utility value of EQ-5D-5L for the general population in China is higher than that in the United States [66], but the utility value among DLBCL patients in China is similar to that of the DLBCL patients in the US (alive and responding to Treatment, 0.83) [19, 25, 67, 68]. As such, the DLBCL patients in China may be subject to disproportionately high HRQoL damage. In particular, RR patients and patients with PD were more vulnerable than the other patients, suggesting stronger unmet need in their clinical management.

The findings showed systematically higher average utility scores for the EQ-5D-5L versus the SF-6Dv2, aligning with previous research in other cancer populations [19]. The results also revealed a ceiling effect in the EQ-5D-5L but not in the SF-6Dv2 among the DLBCL population (18.04% versus 1.72%). No floor effect was observed for either measure. Previous studies among general population have also shown that the EQ-5D-5L exhibits a greater ceiling effect compared to the SF-6Dv2 [69,70,71]. Within the realm of cancer research, a study focusing on Chinese patients with breast cancer reported a ceiling effect of 28.6% for the EQ-5D-5L [72]. In the current sample of DLBCL patients, a higher EQ-5D-5L ceiling effect was also observed. This finding is consistent with prior research conducted in both general and disease-specific populations [69, 73, 74]. The difference in the ceiling effects between the EQ-5D-5L and the SF-6Dv2 may be partially attributed to the differing recall periods used by the two measures. Specifically, the SF-6Dv2 frames its questions in the context of health "over the past 4 weeks," whereas the EQ-5D-5L refers to the health status "today." The longer recall period in SF-6Dv2 could potentially allow respondents to consider minor impairments impacting HRQoL that might be overlooked within a shorter timeframe [75].

In addition, this study found a strong correlation between the utility values of EQ-5D-5L and SF-6Dv2, and all dimensions' Spearman correlation coefficients were significantly positive. This is similar to previous research results [71, 75]. This indicates that EQ-5D-5L and SF-6Dv2 have certain similarities in detecting the trend of health utility value changes in DLBCL patients. However, there may be differences in measuring the absolute value of HRQoL, which may be attributable to the different dimensions covered by the two tools. The larger number of dimensions covered by SF-6Dv2 also allows more descriptive states than EQ-5D-5L, thereby engendering greater granularity of health status.

Both tools exhibited robust discriminative capabilities across subgroups with different clinical characteristics. While both measures were able to distinguish between groups of disparate severity, the ANOVA analysis revealed some differences in the absolute utility values of EQ-5D-5L and SF-6Dv2 across these subgroups, SF-6Dv2 was able to differentiate more known groups with statistically significant differences compared to EQ-5D-5L, which suggests that the known-group validity of SF-6Dv2 may be slightly superior to that of EQ-5D-5L among this specific group of patients, a finding that aligns with previous research [69, 74].

The inter-subgroup differences measured by EQ-5D-5L and SF-6Dv2 are relatively similar and much smaller than the absolute differences across the two instruments. For example, the utility values of EQ-5D-5L and SF-6Dv2 for patients undergoing initial treatment were higher by 0.072 and 0.067, respectively, compared to those for patients experiencing relapse. Similarly, the utility values of EQ-5D-5L and SF-6Dv2 for patients under follow-up observation were higher by 0.153 and 0.201, respectively, compared to those for patients currently in treatment. Furthermore, the utility values of EQ-5D-5L and SF-6Dv2 for patients in CR were higher by 0.183 and 0.216, respectively, compared to those for patients with PD. This suggests that while these two tools may differ in measuring the utility value of a particular health state, they are relatively consistent in reflecting changes or improvements brought about by interventions.

The strengths of this study are manifold. Firstly, they study included both newly treated patients and those who were refractory or had relapsed, allowing stratified analyses by their recurrence status. Secondly, the study benefits from a relatively substantial sample size, encompassing 31 provinces and autonomous regions in China, thereby engendering good representativeness. Lastly, the utilization of the GRM in the analysis provided a novel perspective on the discriminative capacities of the two instruments.

Several limitations of the study must be noted. Firstly, the survey was conducted online, which might have skewed the overall patient age towards a younger demographic, given that older individuals tend to be less proficient with internet usage. Secondly, during the survey process, three scales were used in a fixed order: EQ-5D-5L, QLQ-C30, and SF-6Dv2. This non-random sequence could potentially influence the results. For instance, after responding to questions in the QLQ-C30, patients might reflect more deeply on their affliction, leading them to report a poorer health status when subsequently filling out the SF-6Dv2.

Despite the measures taken to ensure the accuracy and reliability of the information, it should be noted that the current study did not engage objective metrics to determine whether proxy reporters can accurately reflect patient condition, which represents a limitation of our study. Therefore, our conclusions should be interpreted with caution.

Proxy assessments can be obtained by asking proxies to either estimate the patient’s responses (i.e., proxy-patient perspective) or provide their own view of the patient’s HRQoL (i.e., proxy-proxy perspective) [27]. In practical applications, this survey highly possibly encompassed both perspectives. Furthermore, it is plausible that the patients themselves described their own health states, with proxies merely assisting in operating the smartphone. This is particularly relevant given that cognitive impairment, which could necessitate more assessments from the proxy-proxy perspective, is uncommon in DLBCL.

Conclusions

In a nutshell, DLBCL patients in China experience impaired health utilities, which were especially low among the patients with progressive and relapsed diseases.

In addition, the results of this study demonstrate that both EQ-5D-5L and SF-6Dv2 possess certain degrees of validity and reliability in assessing the HRQoL of patients with DLBCL. However, the measurement outcomes of the two tools exhibit differences, indicating that they cannot be used interchangeably. The findings underscore the importance of selecting an appropriate measurement tool to accurately assess the HRQoL of DLBCL patients, the consideration of which should also hinge on study objectives.