Abstract
Background
The St. George’s Respiratory Questionnaire (SGRQ) and its idiopathic pulmonary fibrosis (IPF) version (SGRQ-I) are widely used to assess health-related quality of life in IPF. However, the psychometric properties of these questionnaires in the IPF population have not been thoroughly evaluated. This systematic review assessed and summarized the available evidence on the psychometric properties of the SGRQ and SGRQ-I in IPF patients.
Methods
Seven electronic databases were searched to identify relevant studies. The eligibility criteria included full-text studies focusing on the psychometric properties of the SGRQ and SGRQ-I in patients with IPF. The risk of bias, criteria for good measurement properties, and strength of evidence were assessed based on the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN). The inverse-variance heterogeneity (IVhet) model was used to pool results for construct validity.
Results
A total of 24 studies were included; 19 assessed the psychometric properties of the SGRQ, and seven evaluated the SGRQ-I. Structural validity was assessed in one study for the SGRQ-I. Most domains of the SGRQ and SGRQ-I, except for the symptom domain of the SGRQ, had acceptable internal consistency. Both questionnaires demonstrated adequate test–retest reliability, known-groups validity, and responsiveness. Regarding construct validity, there were moderate to strong correlations with respiratory-specific measures, yet weaker correlations were observed with pulmonary function tests. Interpretability analysis revealed minor floor and ceiling effects.
Conclusion
The SGRQ and SGRQ-I have the potential to capture essential dimensions of health-related quality of life (HRQoL) in IPF. However, further research should focus on ensuring that the items and domains of these questionnaires truly reflect the unique challenges and experiences faced by patients with IPF through content validity studies.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Idiopathic Pulmonary Fibrosis (IPF) is an interstitial lung disease that causes irreversible and progressive lung scarring [1]. Lung transplantation stands as the sole curative treatment for IPF, but anti-fibrotic medication may decelerate disease progression in some patients [1]. Even with anti-fibrotic therapy, patients’ health-related quality of life (HRQoL) is significantly impacted by the burdensome symptoms of breathlessness, cough, and fatigue [2]. Not surprisingly, patients with IPF identify HRQoL as one of their treatment priorities [3]. The St George's Respiratory Questionnaire (SGRQ) is one of the most commonly used patient-reported outcome measures (PROMs) in IPF [4, 5]. However, SGRQ was initially developed for patients with chronic obstructive pulmonary disease (COPD), and an IPF-specific version (SGRQ-I) has been recently developed for patients with IPF [6]. Swigris et al. conducted a literature review on the SGRQ in patients with IPF [7]. They found that the SGRQ's psychometric properties were adequate and suggested that it may be a useful measure of HRQoL in patients with IPF. However, they did not assess the SGRQ's properties against The COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) guidelines. These guidelines incorporate a methodology for integrating the methodological rigor of studies on measurement properties with the quality of the PROM itself (psychometric properties) [8]. This enables reviewers to draw better conclusions about the quality of the PROMs when selecting evidence-based measures for use in research and clinical practice [9]. Also, the previously mentioned review included studies until 2013 and excluded the SGRQ-I due to the limited number of studies. The SGRQ and SGRQ-I are widely used in IPF despite the lack of a comprehensive evaluation of their psychometric properties. Thus, our study aimed to conduct a systematic review with meta-analysis, wherever possible, of the psychometric properties of both the SGRQ and SGRQ-I among IPF patients using the COSMIN guidelines. The psychometric properties encompass content validity, structural validity, internal consistency, test–retest reliability, criterion validity, construct validity, known-groups validity, responsiveness, and interpretability (floor and ceiling effects, minimal important difference). Our study is intended to provide evidence on the strengths and weaknesses of using these instruments in research and clinical practice.
2 Methods
2.1 Study design
This systematic review followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [10] and the COSMIN guidelines for conducting a systematic review of PROMs [9]. The protocol for this study is registered in Open Science Framework (https://osf.io/kgtz9/) but has not been published in a peer-reviewed journal.
2.2 Study eligibility criteria
The inclusion criteria for studies were as follows: (1) full-text studies and unpublished doctoral dissertations and master’s theses, (2) the study cohort included a substantial portion of patients with IPF ages 18 years and older, defined as more than 30%, consistent with literature [11], (3) studies that addressed any psychometric property of the PROMs (development or evaluation studies), (4) any study design, and (5) studies conducted in any country. Exclusion criteria included (1) studies that used PROMs as an outcome measurement instrument or validation studies in which the PROM was used for validating other instruments, (2) unpublished work such as abstracts from conference proceedings and technical reports, and (3) studies published in non-English language sources. The exclusion of unpublished studies and those published in languages other than English has been reported to have little effect on the overall results [12].
2.3 Data sources
The following six electronic databases were searched: (1) PubMed, (2) Medline (via OVID), (3) CINAHL, (4) PsycInfo, (5) Web of Science, and (6) Scopus. In addition, to identifying doctoral dissertations and master’s theses, the ProQuest Dissertations and Theses Database was also searched. Searches for relevant publications were conducted between the start dates of each respective database through June 2022. For all electronic database searches, we employed the general search filter proposed by the COSMIN guidelines [13]. Appendix A Table A.1 shows the search strategy used for all databases searched. This search strategy was adapted according to the requirements for each of the databases searched. The first and second authors (RM and YPM) conducted database searches and article retrieval with oversight from three senior authors (KMK, TL, and GAK).
2.4 Study selection
All search results were stored using Mendeley reference manager software (version 1.19.6). Duplicate records were removed both electronically using Mendeley and manually. Studies were then selected using a two-stage process. All titles and abstracts retrieved in the first stage were screened against the inclusion/exclusion criteria. Articles not excluded from the first stage were advanced to the second stage for review. Studies meeting all inclusion criteria were considered for data extraction. Two authors (RM and YPM) checked the bibliographies of included studies and previous systematic reviews to identify any additional studies meeting the inclusion criteria. The study selection and data extraction were conducted independently by RM and YPM. Both reviewers held meetings after each stage and reached an agreement to resolve any disagreements. When disagreements persisted, RM and YPM discussed them, and if no consensus was reached, they sought the opinion of one or more adjudicators (KMK, TL, or GAK).
2.5 Data abstraction
The data extracted from the studies included the population characteristics, recruitment period, language, sample size, age, gender, setting, and pulmonary function tests. All extracted data were entered into Microsoft Excel.
2.6 Risk of bias assessment (RoB)
The COSMIN risk of bias checklist was used to evaluate the methodological quality of studies on psychometric properties (Appendix A, Table A.2) [14]. Only checklist items relevant to the psychometric property examined in the included study were assessed, as not all measurement properties were evaluated in every article. The RoB of each study on a measurement property was assessed separately as very good, adequate, inadequate, and doubtful. The “worst score counts” principle was applied, meaning the lowest rating among any standard was used to determine the overall rating for each study on a measurement property [14]. This overall rating of the RoB of the studies was accounted for when categorizing the quality of the evidence of the PROM measurement property. The risk of bias was assessed independently by RM and YPM. They reviewed the assessments for agreement, and if there were unresolved disagreements, an adjudicator (KMK, TL, or GAK) was consulted.
2.7 Instrument measurement properties evaluation
The result of each study on a measurement property was rated according to the COSMIN updated criteria for good measurement properties as either sufficient ( +), insufficient ( −), or indeterminate (?) [9] (Appendix A, Table A.2). Based on general COSMIN recommendations, if the ratings for each study were consistent, results were either statistically pooled in a meta-analysis, assuming at least five effect sizes for that outcome, or qualitatively summarized. If all findings were consistent, the overall rating was categorized as either sufficient ( +), insufficient (–), or indeterminate (?). An overall rating was assigned for each measurement property by consolidating the rating of each study on that measurement property. A positive ( +) or negative ( −) rating was considered if 75% or more of the study results yielded a consistent rating, while an indeterminate (?) rating was assigned if the results were inconsistent (i.e., < 75% of studies did not show the same results). If the ratings were inconsistent, our conclusion was primarily guided by the most consistent results and downgraded the evidence for inconsistency ( ±).
2.8 Quantitative summarization
2.8.1 Calculation and pooling of effect sizes
The Fisher r-to-z conversion was applied to approximate the distribution of the correlation coefficients to a normal distribution and stabilize the variances [15,16,17]. We first transformed F-ratios and unstandardized beta coefficients to r and then to Fisher's Z. Transformation using Fisher's r to z approach was as follows: z = 0.5 × ln((1 + ICC)/(1‐ICC)), where ICC represents intraclass correlation coefficients which have an approximate variance, (Var(z) = 1/(N‐3)), and where N is the total sample size. Upon completion of the pooled analysis, results were back-transformed to r for ease of interpretation. The inverse variance heterogeneity (IVhet) model, recognized for being more robust than the traditional random-effects model, was employed to pool effect sizes for all outcomes [18]. A minimum of five studies per outcome was required for pooling effect sizes [18]. In addition to pooling results, influence analyses were conducted to assess the impact or sensitivity of each study on the overall results. Furthermore, an outlier analysis was performed by excluding effect sizes whose 95% confidence interval (CI) fell completely outside the pooled 95% CI.
2.8.2 Stability and validity of outcomes
Heterogeneity was examined using the Q statistic, with a significant level of ≤ 0.10, indicating statistically significant heterogeneity [19]. Inconsistency was evaluated using I-squared (I2), with < 25%, 25–50%, and > 50% indicating small, medium, and large amounts of between-study inconsistency, respectively [20]. Tau-squared, an absolute measure of between-study heterogeneity, was also computed. Small-study effects (publication bias, etc.) were qualitatively evaluated using Doi plots and quantitatively assessed using the Luis Furuya-Kanamori (LFK) index [21, 22]. LFK indices of ± 1, between ± 1 and ± 2 and > ± 2, are suggestive of no, minor, and major asymmetry, respectively.
2.9 Qualitative summarization
In addition to quantitative analysis, study results were summarized qualitatively (lowest and highest values) for interpretability.
2.10 Strength of evidence
The strength of the evidence was graded using a modified Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach [8]. The modified GRADE approach includes (1) risk of bias, (2) inconsistency, (3) imprecision, and (4) indirectness (Appendix A, Table A.3). For indirectness, per COSMIN guidelines, the level of evidence can be downgraded by 1 or 2 levels for “serious” or “very serious” indirectness if studies are performed in a population other than the specific population of interest. The evidence wasn’t downgraded in this review as it wasn’t deemed serious [8]. The quality of the evidence was graded for each measurement property and each PROM separately (high, moderate, low, and very low evidence).
2.11 Reporting evaluation
We utilized the COSMIN reporting guidelines to evaluate the level of transparent reporting in the included studies on the measurement properties of PROMs [23].
2.12 Summary of findings
The results for each measurement property of SGRQ and SGRQ-I were presented in the Summary of Findings (SoF) Table, along with a rating using symbols (+ / ─ /?) and a grading system (high, moderate, low, very low) to indicate the quality of evidence. Based on the findings in the SoF table, the two instruments under consideration were placed into one of three categories based on COSMIN guidelines [8]. Category A comprises PROMs that may be recommended as the most appropriate for the target construct in IPF patients, demonstrating adequate content validity. Category B consists of PROMs that show potential for recommendations but require additional validation studies. PROMs not falling into either categories A or C are classified in this category. Category C encompasses PROMs that should not be recommended due to strong evidence indicating insufficient measurement properties. Similar to study selection, data abstraction, and bias assessment, protocols for agreements and disagreements were followed by RM and YPM. Effect sizes were calculated using Microsoft Excel 2013, and data was analyzed using MetaXL, version 5.3, and STATA version 16.
2.13 Modifications to a priori protocol
We deviated from the original protocol by choosing not to email the study's corresponding author to request missing data. This decision was influenced by the large volume of missing information, which made the approach burdensome and time-consuming based on the resources available. Nonetheless, we thoroughly identified and documented all the reporting issues using the COSMIN reporting checklist.
3 Results
3.1 Study selection
A PRISMA flow diagram displays the study selection process (Fig. 1). We included 24 studies, where 19 assessed the psychometric properties of the SGRQ [6, 24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41] and seven assessed that of SGRQ-I [6, 28, 42,43,44,45,46]. A list of excluded studies with the reasons for exclusion can be found in Appendix B Table B.1.
3.2 Study and population characteristics
Study and patient characteristics are presented in Table 1. Across all studies, the mean age of the participants was reported to be between 52 and 81 years. The proportion of males ranged from 46.0 to 84.6%. While the sample size ranged from 20 to 1061 patients. Disease severity was heterogeneous due to different inclusion and exclusion criteria, such as excluding patients with comorbidities, malignancies, or if the patients had experienced any deterioration or appearance of any new symptom(s) the week before enrollment or the patient's pulmonary function tests (PFTs). Also, in nearly half of the studies, the enrollment period was prior to 2014 (the anti-fibrotic era). The version of the PROM used was never fully reported, noting there are different versions consisting of different recall periods and languages (e.g., American English vs. British English). Six studies reported how they handled missing data [6, 26, 36, 37, 45, 46]. Evaluation for studies according to COSMIN reporting guidelines is presented in Appendix C, Table C.1.
3.3 Psychometric properties
-
a) Content validity
No study assessed the content validity of either PROMs. However, one study briefly mentioned the degree to which the SGRQ content addressed quality-of-life aspects; the median rating was 8.5 on a scale of 1 to 10 as the worst and best, respectively [25]. We did not rate content validity as we found that the information provided did not count as a content validity analysis.
-
b) Structural validity
Only one study measured the structural validity of the SGRQ-I using Rasch analysis, a one-parameter logistic model [6]. The item trait interaction was χ2 = 10.4, p = 0.58, indicating a good fit to the Rasch model. We rated structural validity as 'indeterminate' due to the absence of information on unidimensionality, local independence, and monotonicity. The quality of evidence for both SGRQs was judged as 'moderate' (Appendix C, Table C.10 and Table 2).
-
c) Internal consistency
Appendix C, Table C.2, and Table 2 summarize internal consistency for the SGRQ and SGRQ-I. Cronbach's alpha coefficient was reported in all studies. Except for one study [36], all other studies met the threshold for a positive rating (i.e., ≥ 0.70), ranging from 0.91 to 0.94 for the total score, 0.83 to 0.86 for impact, and 0.84 to 0.86 for activity for the SGRQ. The reported Cronbach’s alpha for symptoms was 0.66. A similar finding was observed for the SGRQ-I, where the symptom domain showed low internal consistencies of 0.48 and 0.62 [6, 39]. All but two studies [27, 43] reported on the internal consistency of the subscales. The Cronbach alpha was reported to be greater than 0.70 in all studies. However, we qualitatively rated the internal consistency as ‘indeterminate’ due to the lack of evidence for sufficient structural validity, as only one study reported on the goodness of fit indices for the SGRQ-I [6]. The quality of evidence on internal consistency for both PROMs was deemed ‘moderate.’ Inadequate ratings for internal consistency were typically attributed to the internal consistency statistic (most commonly Cronbach’s alpha) being computed for the entire scale rather than for each subscale separately (Appendix C, Table C.2, and Table 2).
-
d) Test–retest reliability (repeatability)
The ICC was used for all studies to assess repeatability, with two studies using Bland–Altman plots for periods ranging between 2 to 52 weeks [44, 45]. For the SGRQ-I, the stability of patients was not assessed using any method. On the other hand, for the SGRQ, the stability of patients was assessed using different definitions, including change in Forced vital capacity percentage predicted (FVC%) (= < 2%), Patient Global Impression of Change (PGI-C) (no change), The University of California San Diego Shortness of Breath Questionnaire (UCSD-SOBQ) (< 5 points), and FVC% (= < 5%) between test–retest. Except for symptom and impact domains in the SGRQ, all ICC reports met the required threshold for a positive rating (i.e., ≥ 0.70). Thus, the reliability was qualitatively rated as "adequate" for both PROMs. We rated the test–retest reliability as 'insufficient' due to the low ICC for symptom and impact for the SGRQ and the lack of reporting of ICC for subscales for the SGRQ-I. For SGRQ, the discordance in the results rendered us to rate the quality of evidence on reliability as 'Moderate.' For the SGRQ-I, the quality of evidence on reliability was deemed "High" (Appendix C, Table C.3 and Table 2).
-
e) Criterion validity
Two studies assessed the criterion validity of the SGRQ-I, where the SGRQ was used as the criterion [6, 45]. Yorke et al. used the Bland–Altman plot and ICC, and an excellent agreement was observed (ICC > 0.95) [6]. Prior et al. used the Bland–Altman plot and correlation, and a high correlation was observed (r > 0.75) [45]. Thus, we rated the known-groups validity as "adequate" and 'sufficient' and the quality of evidence as ‘High’ (Table 2).
-
f) Construct validity
Construct (convergent) validity was the most assessed psychometric property. In our systematic review, we placed each comparator instrument in one of the following categories: either weak (< 0.4), moderate (0.4 ≥ r ≤ 0.7), or strong (< 0.7) [47]. Different measures were used to validate the SGRQ and SGRQ-I, PFTs, generic HRQoL questionnaires such as the 12-item and the 36-Item Short-Form Health Survey (SF-12 and-36), respiratory-specific HRQoL PROMs such as the King’s Brief Interstitial Lung Disease (KBILD), and respiratory symptom questionnaires such as the Baseline Dyspnea Index (BDI). As expected, the SGRQ and SGRQ-I should have stronger correlations with respiratory HRQoL and symptom-specific PROMS than generic PROMs and PFTs. The construct validity was rated as "adequate" for both PROMs and as 'sufficient' as the direction and magnitude of correlation matched our hypothesis 75% of the time. The quantitative synthesis is summarized in (Appendix C, Table C.4, and Table 2).
Meta-analysis of pooled results revealed the same pattern of correlations as individual studies (Figs. 2, 3, 4, 5, 6, 7, Appendix C, Table C.5). We observed substantial heterogeneity measured by Cochran’s Q for PFTs and 6-min walk distance (6MWD). We observed heterogeneity, large to moderate inconsistencies, and minor to major small-study effects. Alternatively, for the BDI, no heterogeneity was detected, no moderate inconsistency was observed, and no small-study effects were found. The inconsistency, as determined from the meta-analysis, led us to rate the quality of evidence as ‘Moderate’.
Sensitivity analysis and outlier analysis
We identified a study by Abdelaziz et al. [24], that was considered an outlier in the meta-analysis of the correlation between SGRQ domains and 6MWD. Excluding the study had two effects. First, it resulted in a decrease in heterogeneity and inconsistency among the remaining studies. Second, it improved the precision of the pooled correlation coefficients, enhancing the accuracy of the correlation estimates. The sensitivity and outlier analysis results are presented in (Appendix C, Table C.6). Also, our sensitivity analysis indicated that excluding the Lutogniewska et al. [32] study [32], which had patients with IPF comprising 35% of the total sample, did not significantly impact the overall findings (Supplement C, Table C.6).
-
g) Known-groups validity
The evidence for known-groups validity was strong. Studies that examined differences in HRQoL as assessed by the SGRQs across various levels of disease severity measured by various approaches, such as Composite Physiologic Index (CPI) and Gender-Age-Physiology (GAP), oxygen use, and PFTs, showed significant differences between groups. Thus, we rated the known-groups validity as “adequate” and ‘sufficient’ and the quality of evidence as 'High' (Appendix C, Table C.7 and Table 2).
-
h) Responsiveness
Construct and sub-group approaches were almost equally used. The change in patient condition was assessed using different definitions using the health transition question (SF2) in the SF-36, PFTs, 6MWD, and computerized tomography (CT) scans. Responsiveness was measured over various periods (6 months to 1 year). As we hypothesized, we found that the correlation was similar in direction but lower in magnitude. Thus, we rated the responsiveness as “adequate” and 'sufficient' and the quality of evidence as ‘High’ (Appendix C, Table C.8 and Table 2).
-
i) Interpretability
Floor and ceiling effects
For the SGRQ, three studies reported information on floor and ceiling effects. Furukawa et al. [26] reported that 9% of patients had scores = < 10, Matsuda et al. [33] found minor ceiling effects were observed in the domains, and Nishiyama et al. [40]observed some ceiling effects. For the SGRQ-I, Akihiko et al. [43] found minor floor and ceiling effects; 17.3% of the patients scored worst, and 23.1% scored best. Two studies found no floor or ceiling effects for the SGRQ-I total or domain scores [39, 41].
Minimal important difference (MID)
Appendix C, Table C.9. summarizes MID for the SGRQ and SGRQ-I. A total of six studies provided MID estimates. For SGRQ, MID estimates for total score ranged from − 4.4 points to − 8.1 points for improvement and from 3 to 10.9 for deterioration. For SGRQ-I, MID total score values ranged from − 0.7 points to − 5.5 points for improvement and from 1.3 to 7.6 for deterioration. Three studies did not provide information on whether the reported MID was for improvement or deterioration [27, 35, 45]. The MID was assessed across various timeframes, spanning from 100 days to 12 months, and employing diverse methodologies, including both Anchor-based and distribution-based approaches. Different anchors were used, such as the SF2 question, PGI-C, PFTs, 6MWD, Transition Dyspnea Index (TDI), Medical Research Council (MRC), Dyspnea Scale, The UCSD-SOBQ, and Global Rating of Change Scales (GRCS). Distribution-based MID estimates were based on different multiples of Standard Deviation (0.5SD and 1SD), and standard errors of measurement (SEM).
4 Discussion
To our knowledge, this is the first systematic review following the COSMIN guidelines to assess and summarize the psychometric properties of the SGRQ and SGRQ-I. Despite being commonly used instruments to measure HRQoL in IPF, both PROMs' content and structural validity are still not well established among patients with IPF. In addition, the questionnaires did not demonstrate acceptable reliability (internal consistency and repeatability), with some findings indicating ceiling effects. However, it also emphasizes that reliability is a characteristic of an instrument used in a population, not just an instrument [48]. In fact, it is imperative to assess all psychometric properties within the target population to ensure that the instrument is fit for purpose in accordance with FDA guidelines [49]. Nevertheless, both PROMs demonstrated acceptable structural validity, responsiveness, and known-groups validity.
A major shortcoming was the lack of content validity for both PROMs in IPF. The SGRQ was developed initially for COPD and asthma patients [5]. It has been criticized for including items irrelevant to IPF, such as wheezing and episodes of chest discomfort [6]. Assessing the content validity of a PROM developed in a different population is critical to ensure that the instrument's content is relevant and meaningful to the new population. Different populations may have distinct experiences that can influence their understanding and interpretation of PRO items [48]. By systematically assessing the instrument's relevance and appropriateness in the new population, researchers can address potential biases or limitations associated with using an instrument developed in a different population [48]. The lack of studies assessing the content validity was also a limitation reported in previous systematic reviews of other PROMs [50,51,52]. Structural validity refers to the extent to which the measurement instrument scores adequately reflect the construct being measured [8]. Unfortunately, we did not find any study assessing the structural validity of the SGRQ among patients with IPF. For the SGRQ-I, structural validity was only assessed in one study [6]. In that study, we could not determine if it had sufficient structural validity due to missing important information.
Construct validity refers to the extent to which PROM scores correlate with other measures in accordance with theoretically derived hypotheses concerning the construct being assessed [8]. Pulmonary function tests, 6MWD, and PROM, such as the BDI, were the most reported outcomes against which SGRQ scores were compared. The SGRQ consistently showed a weak correlation with PFTs, which could be because they are measuring different constructs. While PFTs provide information on the mechanical function and capacity of the lungs, the SGRQ assesses the impact of respiratory symptoms, activity limitations, and social and psychological factors on a patient's overall quality of life. It is worth noting that forced expiratory volume in one-second percentage predicted (FEV1%) is typically used as a measure of lung function in obstructive lung diseases, such as COPD or asthma [53]. However, IPF is a restrictive lung disease [54]. Therefore, FEV1% may be considered less relevant or informative in the context of IPF. Also, different time frames could be another reason affecting the correlation between PFTs and the SGRQ as PFTs provide a snapshot of lung function at the time of testing, while the SGRQ evaluates the impact of respiratory symptoms and limitations over a more extended period. It is important to note that while PFTs and SGRQ scores may not correlate, both assessments serve different purposes in evaluating respiratory diseases. Conversely, the SGRQs are expected to have a moderate to high correlation with respiratory-specific questionnaires, such as the BDI. The BDI and SGRQ measure aspects related to dyspnea (shortness of breath) and its impact on a patient’s life. The BDI focuses on the severity of breathlessness, the magnitude of functional impairment, and the level of the patient’s activity, which overlap with SGRQ content. The developers of the COSMIN checklist recommend that authors formulate clear hypotheses regarding the strength and magnitude of the correlations before analyses [9]. Except for two studies [25, 42], none specified the expected magnitude of correlations. Also, studies did not demonstrate the reliability and validity of the comparator measurements among IPF patients. Furthermore, some studies did not report the psychometrics of the comparator instruments, while others referred to the psychometrics of the comparator in another population, which is questionable [48]. When the comparator instrument has not been validated among the same population as the new instrument being evaluated, it becomes challenging to determine if any observed differences or similarities between the instruments are due to actual differences in the measured constructs or if they are artifacts of the instruments [48].
In our analysis, Abdelaziz et al. [24] was considered an outlier study. The authors utilized the Arabic version of the SGRQ and found a strong correlation between SGRQ scores and PFTs. However, it is essential to acknowledge certain limitations of their study, such as the absence of a published cross-cultural validation study of the Arabic version of SGRQ, the small sample size, and a relatively low FVC% in their study population compared to the populations included in our meta-analysis. These factors may have influenced the observed correlation between SGRQ scores and FVC%. Future research should incorporate statistical techniques for cross-cultural validation, such as differential item functioning and considering variations in disease severity when assessing construct validity [55].
Internal consistency refers to the degree to which items in a measurement instrument are interrelated [8]. The internal consistency received an ‘indeterminate’ rating because there was insufficient evidence to establish satisfactory structural validity, per COSMIN guidelines [8]. Most researchers utilized Cronbach’s alpha to assess internal consistency, which is suggested by COSMIN. Also, the COSMIN guidelines recommend that the risk of bias be increased when studies do not report Cronbach’s alpha for its subscales, which was the case for two of the studies included [28, 44]. Consistent with a previous systematic review [7], we found that the symptom domain had low internal consistency, possibly due to including symptoms unrelated to IPF, rendering the interrelatedness among items weaker. Also, the symptom domain could be considered formative model-based, and measuring internal consistency might not be the proper approach. In a formative model, the observed indicators (cough and shortness of breath) are considered to be causal indicators of the latent construct (symptom domain) [48]. The indicators are seen as distinct elements that contribute to the overall construct [48]. As measured by Cronbach’s alpha, internal consistency may not be suitable for formative models because the indicators are not expected to be highly correlated [56, 57].
Test–retest reliability is crucial in PROMs because it allows researchers to determine whether the instrument consistently captures the same patient information over time [8]. High test–retest reliability indicates that the PROM is stable and produces consistent results [8]. Test–retest reliability was commonly assessed using ICC [58]. They estimate the proportion of variance in the measurements attributed to true differences between subjects relative to the total variance [59]. Different ICC formulas are available, and the choice of formula can have implications for the reliability assessment [58]. Only two studies specified the model used to calculate ICC [37, 42]. The two-way mixed effects model is recommended when using ICC formulas to assess absolute agreement in PROMs [58]. We emphasize the importance of documenting the chosen analytical formula or model to ensure transparency and allow other researchers to understand and replicate the reliability assessment. We also found that the ICC changed based on patient stability defined by other measurements. The stability of patients, as determined by these other measurements, impacted the ICC values. The authors did not explain why they selected the cut-offs used to define patients’ stability according to the anchors chosen. Also, using the same cutoff points yielded different results over different time intervals. We found that the ICC measured at shorter intervals was higher than when measured over an extended period. The intensity and frequency of IPF respiratory symptoms can fluctuate on a daily basis for individuals with IPF [60]. The variability of respiratory symptoms in IPF daily poses challenges for accurately recalling and appraising their severity [60]. Additionally, response shift, that is, a change in an individual's evaluation of their health status over time due to various factors, such as adaptation to illness and changes in priorities, can affect an instrument’s ability to detect change over time [61]. Recent research suggests utilizing tools such as patient diaries or electronic symptom-tracking systems to address these challenges [60, 62]. These tools allow patients to record their symptoms and severity daily, providing a more comprehensive and accurate assessment of symptom variability. By collecting real-time data, these methods can capture day-to-day fluctuations and help overcome recall bias associated with subjective symptom assessments [60, 62].
Ceiling and floor effects are observed when a substantial percentage of participants (≥ 15%) attain the highest or lowest possible score or rating on a specific item or the entire questionnaire [63, 64]. Minor ceiling effects for the SGRQ and minor floor and ceiling effects for the SGRQ-I were observed. One possible reason is that the instruments may not have enough response options or variability to accurately capture the full range of patients’ experiences at each end of the scale [64]. If the response options are limited or the scale does not adequately capture the severity of symptoms or limitations, it can result in a cluster of scores at the upper or lower end of the scale, leading to ceiling or floor effects. Additionally, the design and wording of the questionnaire items themselves can influence the occurrence of ceiling or floor effects. If the items are not sufficiently comprehensive or relevant to the experiences of the target population, it may limit the ability of the instruments to detect subtle differences and result in a restricted range of scores [63]. Overall, addressing these issues through refinement of the instruments and incorporating more comprehensive and relevant items can help minimize ceiling and floor effects and improve the sensitivity of the measures in capturing the full spectrum of patients' quality of life.
The MID refers to the smallest change in score in the construct that patients perceive as important [65]. The objective of establishing the MID is to create a standardized metric for assessing treatment efficacy and disease progression. We observed different MCID values for the SGRQ or SGRQ-I. Multiple factors might have influenced this variability, such as inconsistency in the assignment of cut-offs for different anchors, multiple SDs, and follow-up periods [66]. Even when applying the same methodology, MCID values differ due to their context specificity, influenced by patient baseline characteristics [67]. Therefore, factors affecting MCID values render them non-transferrable across different patient groups, which ultimately impedes the clinical relevance of findings. Our review underscores the necessity for careful consideration when reporting and interpreting the MCID. Failure to acknowledge these limitations poses the risk of erroneously categorizing patients as non-responders despite improvement, or conversely, as responders when there hasn’t been a significant change [68].
During our analysis using the COSMIN reporting guideline, a notable issue emerged concerning the dearth of reporting the amount and/or methods used to handle the PROMs' missing data. Investigating the factors contributing to missing data PROMs is essential as it greatly influences our understanding of the PROM's characteristics and subsequent analyses. Factors such as response burden, item irrelevance, comprehension difficulties, and questionnaire design problems can all contribute to missing data in PROMs [69]. Additionally, accurately reporting the extent of missing data is equally important, as anticipated missing data can impact the selection of PRO instruments and the use of PRO as an endpoint in clinical trials [70, 71]. Additionally, our analysis revealed a significant issue with the lack of detailed reporting on the version and mode of administration of PROMs, which hampers a thorough evaluation of their psychometric properties. The mode of administration, such as interviewer-administered or self-administered, can introduce variations in response patterns and influence participant responses [72]. Additionally, the SGRQs have different versions in multiple languages with varying recall periods, potentially impacting their psychometric properties. To ensure accurate understanding and interpretation of results, addressing this reporting gap and emphasizing the importance of comprehensive information on the version and mode of administration in research studies utilizing the SGRQs is crucial.
Future research on the psychometric properties of the SGRQ and SGRQ-I should address the previously mentioned gaps and limitations. Firstly, there is a need for further research on the content validity of both PROMs among patients with IPF. The SGRQ was originally developed for COPD and asthma patients, and its relevance to IPF needs to be established through independent studies assessing the content validity in this population. Secondly, more studies are required to assess the structural validity of the SGRQ and SGRQ-I in patients with IPF. Additionally, future research should aim to improve the reliability of the questionnaires, particularly in terms of internal consistency and test–retest reliability. Studies should ensure that the internal consistency analysis is performed separately for each subscale, and alternative measures may be considered for assessing the quality of the symptom domain in formative model. Furthermore, there is a need to explore the construct validity of the SGRQ and SGRQ-I using appropriate comparator instruments validated in the IPF population. Clear hypotheses regarding the expected strength and direction of correlations should be formulated before conducting analyses to guide the interpretation of findings. This would help establish the relationships between the PROMs and other measures in a manner consistent with the theoretical concepts being measured. Lastly, future studies should also investigate the floor and ceiling effects of the SGRQ and SGRQ-I in IPF. Understanding the presence and extent of floor and ceiling effects is important for evaluating the instruments’ ability to capture this population’s full range of HRQoL. Addressing these gaps would enhance the psychometric properties of the SGRQ and SGRQ-I for use in IPF research and clinical practice, facilitating more accurate and meaningful assessments of HRQoL in this patient population. Thus, given the former, we cannot recommend either the SGRQ or SGRQ-I over the other in IPF patients at this time. Both instruments have demonstrated strengths and weaknesses in various domains of measurement validity and reliability. Further research and clinical validation are necessary to determine the most suitable instrument for assessing health-related quality of life in IPF patients.
Our systematic review has multiple strengths. First, the review's comprehensiveness is evident through the exhaustive search across a wide range of databases. Second, the protocol for the review was pre-registered. Third, two reviewers independently conducted screening, data extraction, and quality assessments for 100% of the studies. Fourthly, the review rigorously assessed the quality of studies and the psychometric properties of included instruments following COSMIN 2018 criteria, thereby enhancing the credibility of the assessment. Finally, we conducted meta-analyses on construct validity for the total score and subdomains against various measures, providing a more comprehensive summary of the evidence on the measurement properties. However, there are several potential limitations to consider. Firstly, despite our comprehensive search, it's possible that some eligible studies may have been missing. Furthermore, considering that the conclusions drawn from the available data were derived from aggregate data, there exists the possibility of ecological fallacy, known as Simpson's paradox. Simpson’s paradox is “A type of ecological fallacy in probability and statistics where a trend appears in several groups of data but vanishes or reverses when the groups are combined” [73]. Also, we observed that some measures received an inadequate score for their psychometric properties despite being close to the COSMIN-established cutoff for a "sufficient" rating (e.g., coefficient alpha was 0.69 rather than 0.7 or above). As such, we believe that the current COSMIN criteria may underestimate the quality of an instrument's psychometric properties. The same applies to the assessment of the methodological quality of studies, which employs the “worst case counts” rule and, therefore, downgrades the quality of a study even if there’s only a single concern about study quality. Also, we acknowledge that the diverse versions of the SGRQs, including reported and unreported variations in recall periods, modes of administration, and language, could have potentially influenced the heterogeneity observed in our meta-analysis and the certainty of our conclusions. Furthermore, there was a slight deviation from the COSMIN criteria in that we assessed all the psychometric properties despite the absence of at least one study on content validity. Finally, while some may consider that this review may have needed to be updated, given that the last search was conducted in June 2022, it is important to understand that there is no firm consensus on when a review should be updated. In addition, the time and effort devoted to conducting the current review was substantial. Furthermore, it is highly unlikely that any update would change the overall direction of findings.
5 Conclusions
In summary, it is crucial to acknowledge the limitations of employing the SGRQs in IPF due to the absence of given the lack of evidence and research supporting their content validity and structural validity in IPF, considering that they were not originally designed for this patient population. However, the SGRQs demonstrated acceptable construct validity and responsiveness, supporting their usefulness as PROMs in IPF. According to COSMIN guidelines, our finding indicates that SGRQs show potential for recommendation as a suitable PROM for IPF but require further validation studies before a conclusive recommendation can be made.
Data availability
Data are available upon reasonable request to the corresponding author.
References
Saito S, Alkhatib A, Kolls JK, Kondoh Y, Lasky JA. Pharmacotherapy and adjunctive treatment for idiopathic pulmonary fibrosis (IPF). J Thorac Dis. 2019;11(Suppl 14):S1740. https://doi.org/10.21037/JTD.2019.04.62.
Antoniou K, Kamekis A, Symvoulakis EK, Kokosi M, Swigris JJ. Burden of idiopathic pulmonary fibrosis on patients’ emotional well being and quality of life: a literature review. Curr Opin Pulm Med. 2020;26(5):457–63. https://doi.org/10.1097/MCP.0000000000000703.
Belkin A, Swigris JJ. Patient expectations and experiences in idiopathic pulmonary fibrosis: implications of patient surveys for improved care. Expert Rev Respir Med. 2014;8(2):173–8. https://doi.org/10.1586/17476348.2014.880056.
Jones PW, Quirk FH, Baveystock CM. The St George’s respiratory questionnaire. Respir Med. 1991;85:25–31. https://doi.org/10.1016/S0954-6111(06)80166-6.
Jones PW, Quirk FH, Baveystock CM, Littlejohns P. A self-complete measure of health status for chronic airflow limitation. The St. George’s respiratory questionnaire. Am Rev Respir Dis. 1992;145(6):1321–7. https://doi.org/10.1164/AJRCCM/145.6.1321.
Yorke J, Jones PW, Swigris JJ. Development and validity testing of an IPF-specific version of the St George’s respiratory questionnaire. Thorax. 2010;65(10):921–6. https://doi.org/10.1136/thx.2010.139121.
Swigris JJ, Esser D, Conoscenti CS, Brown KK. The psychometric properties of the St George’s respiratory questionnaire (SGRQ) in patients with idiopathic pulmonary fibrosis: a literature review. Health Qual Life Outcomes. 2014;12(1):1–14. https://doi.org/10.1186/s12955-014-0124-1.
L. B. Mokkink et al. COSMIN methodology for systematic reviews of Patient‐Reported Outcome Measures (PROMs). 2018.
Prinsen CAC, et al. COSMIN guideline for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27(5):1147–57. https://doi.org/10.1007/S11136-018-1798-3.
Page MJ, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ. 2021;372:n160. https://doi.org/10.1136/bmj.n160.
Ward M, et al. Impact of 2019–2020 mega-fires on Australian fauna habitat. Nat Ecol Evol. 2020;4(10):1321–6. https://doi.org/10.1038/s41559-020-1251-1.
Nussbaumer-Streit B, et al. Excluding non-English publications from evidence-syntheses did not change conclusions: a meta-epidemiological study. J Clin Epidemiol. 2020;118:42–54. https://doi.org/10.1016/j.jclinepi.2019.10.011.
Terwee CB, Jansma EP, Riphagen II, De Vet HCW. Development of a methodological PubMed search filter for finding studies on measurement properties of measurement instruments. Qual Life Res. 2009;18(8):1115. https://doi.org/10.1007/S11136-009-9528-5.
Mokkink LB, et al. COSMIN risk of bias checklist for systematic reviews of patient-reported outcome measures. Qual Life Res. 2018;27(5):1171–9. https://doi.org/10.1007/s11136-017-1765-4.
Hakstian AR, Whalen TE. A k-sample significance test for independent alpha coefficients. Psychometrika. 1976;41(2):219–31. https://doi.org/10.1007/BF02291840.
Sánchez-Meca J, López-López JA, López-Pina JA. Some recommended statistical analytic practices when reliability generalization studies are conducted. Br J Math Stat Psychol. 2013;66(3):402–25. https://doi.org/10.1111/J.2044-8317.2012.02057.X.
Rosenthal R. Parametric measures of effect size. In: Cooper H, Hedges LV, Valentine JC, editors. The handbook of research synthesis. New York: Russell Sage Foundation; 1994. p. 231–44.
Doi SAR, Barendregt JJ, Khan S, Thalib L, Williams GM. Advances in the meta-analysis of heterogeneous clinical trials I: the inverse variance heterogeneity model. Contemp Clin Trials. 2015;45:130–8. https://doi.org/10.1016/j.cct.2015.05.009.
Cochran WG. The combination of estimates from different experiments. Biometrics. 1954;10(1):101. https://doi.org/10.2307/3001666.
Higgins JPT, Thompson SG, Deeks JJ, Altman DG. Measuring inconsistency in meta-analyses. BMJ. 2003. https://doi.org/10.1136/bmj.327.7414.557.
Furuya-Kanamori L, Barendregt JJ, Doi SAR. A new improved graphical and quantitative method for detecting bias in meta-analysis. Int J Evid Based Healthc. 2018;16(4):195–203. https://doi.org/10.1097/XEB.0000000000000141.
Furuya-Kanamori L, et al. P value–driven methods were underpowered to detect publication bias: analysis of cochrane review meta-analyses. J Clin Epidemiol. 2020;118:86–92. https://doi.org/10.1016/j.jclinepi.2019.11.011.
Gagnier JJ, Lai J, Mokkink LB, Terwee CB. COSMIN reporting guideline for studies on measurement properties of patient-reported outcome measures. Qual Life Res. 2021;30(8):2197–218. https://doi.org/10.1007/S11136-021-02822-4.
Abdelaziz A, Azab N, Elwahsh R, Elsheikh A. Comparison of two questionnaires measuring the health-related quality of life in patients with idiopathic interstitial pulmonary fibrosis. Egypt J Chest Dis Tuberc. 2022;71(1):108. https://doi.org/10.4103/ejcdt.ejcdt_23_17.
Chang JA, Curtis JR, Patrick DL, Raghu G. Assessment of health-related quality of life in patients with interstitial lung disease. Chest. 1999;116(5):1175–82. https://doi.org/10.1378/chest.116.5.1175.
Furukawa T, et al. The St. George’s respiratory questionnaire as a prognostic factor in IPF. Respir Res. 2017;18(1):18. https://doi.org/10.1186/s12931-017-0503-3.
Kang M, Veeraraghavan S, Martin GS, Kempker JA. An updated approach to determine minimal clinically important differences in idiopathic pulmonary fibrosis. ERJ Open Res. 2021;7(4):00142–2021. https://doi.org/10.1183/23120541.00142-2021.
Kim JW, Clark A, Birring SS, Atkins C, Whyte M, Wilson AM. Psychometric properties of patient reported outcome measures in idiopathic pulmonary fibrosis. Chron Respir Dis. 2021;18:147997312110339. https://doi.org/10.1177/14799731211033925.
Kreuter M, et al. Health related quality of life in patients with idiopathic pulmonary fibrosis in clinical practice: insights-IPF registry. Respir Res. 2017;18(1):139. https://doi.org/10.1186/s12931-017-0621-y.
Kreuter M, et al. The clinical course of idiopathic pulmonary fibrosis and its association to quality of life over time: longitudinal data from the INSIGHTS-IPF registry. Respir Res. 2019;20(1):59. https://doi.org/10.1186/s12931-019-1020-3.
Kreuter M, et al. Health-related quality of life and symptoms in patients with IPF treated with nintedanib: analyses of patient-reported outcomes from the INPULSIS® trials. Respir Res. 2020;21(1):36. https://doi.org/10.1186/s12931-020-1298-1.
Lutogniewska W, et al. Dyspnea and quality of life in patients referred for lung transplantation. Eur J Med Res. 2010;15(S2):76. https://doi.org/10.1186/2047-783X-15-S2-76.
Matsuda T, et al. Depression is significantly associated with the health status in patients with idiopathic pulmonary fibrosis. Intern Med. 2017;56(13):1637–44. https://doi.org/10.2169/internalmedicine.56.7019.
O’Brien EC, et al. Disease severity and quality of life in patients with idiopathic pulmonary fibrosis. Chest. 2020;157(5):1188–98. https://doi.org/10.1016/j.chest.2019.11.042.
Peng S, Li Z, Kang J, Hou X. Cross-sectional and longitudinal construct validity of the Saint George’s respiratory questionnaire in patients with IPF. Respirology. 2008;13(6):871–9. https://doi.org/10.1111/j.1440-1843.2008.01359.x.
Swigris JJ, et al. The SF-36 and SGRQ: validity and first look at minimum important differences in IPF. Respir Med. 2010;104(2):296–304. https://doi.org/10.1016/j.rmed.2009.09.006.
Swigris JJ, et al. Psychometric properties of the St George’s respiratory questionnaire in patients with idiopathic pulmonary fibrosis: insights from the INPULSIS trials. BMJ Open Respir Res. 2018;5(1):e000278. https://doi.org/10.1136/bmjresp-2018-000278.
Tzanakis N, Samiou M, Lambiri I, Antoniou K, Siafakas N, Bouros D. Evaluation of health-related quality-of-life and dyspnea scales in patients with idiopathic pulmonary fibrosis. Correlation with pulmonary function tests. Eur J Intern Med. 2005;16(2):105–12. https://doi.org/10.1016/j.ejim.2004.09.013.
Zimmermann CS, et al. Comparison of two questionnaires which measure the health-related quality of life of idiopathic pulmonary fibrosis patients. Braz J Med Biol Res. 2007;40(2):179–87. https://doi.org/10.1590/S0100-879X2007000200004.
Nishiyama O, et al. Health-related quality of life in patients with idiopathic pulmonary fibrosis. What is the main contributing factor? Respir Med. 2005;99(4):408–14. https://doi.org/10.1016/j.rmed.2004.09.005.
Swigris JJ, et al. Psychometric properties of the St George’s respiratory questionnaire in patients with idiopathic pulmonary fibrosis. Eur Respir J. 2017;49(1):1601788. https://doi.org/10.1183/13993003.01788-2016.
Aguiar WF, et al. Translation, cross-cultural adaptation, and measurement properties of the Brazilian-Portuguese version of the idiopathic pulmonary fibrosis-specific version of the Saint George’s respiratory questionnaire (SGRQ-I) for patients with interstitial lung disease. Braz J Phys Ther. 2021;25(6):794–802. https://doi.org/10.1016/j.bjpt.2021.06.008.
Akihiko S, et al. The association between health-related quality of life and disease progression in idiopathic pulmonary fibrosis: a prospective cohort study. Sarcoidosis Vasc Diffuse Lung Dis. 2017;34(3):226–35. https://doi.org/10.36141/svdld.v34i3.5214.
Capparelli I, et al. Translation to Spanish and validation of the specific Saint George’s questionnaire for idiopathic pulmonary fibrosis. Archivos de Bronconeumología. 2018;54(2):68–73. https://doi.org/10.1016/j.arbr.2017.09.011.
Prior TS, et al. Validation of the IPF-specific version of St. George’s respiratory questionnaire. Respir Res. 2019;20(1):199. https://doi.org/10.1186/s12931-019-1169-9.
Prior TS, Hoyer N, Hilberg O, Shaker SB, Davidsen JR, Bendstrup E. Responsiveness and minimal clinically important difference of SGRQ-I and K-BILD in idiopathic pulmonary fibrosis. Respir Res. 2020;21(1):91. https://doi.org/10.1186/s12931-020-01359-3.
Akoglu H. User’s guide to correlation coefficients. Turk J Emerg Med. 2018;18(3):91–3. https://doi.org/10.1016/j.tjem.2018.08.001.
De Vet HCW, Terwee CB, Mokkink LB, Knol DL. Measurement in medicine: a practical guide. Cambridge: Cambridge University Press; 2011.
Gnanasakthy A, Qin S, Norcross L. FDA guidance on selecting, developing, or modifying fit-for-purpose clinical outcome assessments: old wine in a new bottle? Patient - Patient-Centered Outcomes Res. 2023;16(1):3–5. https://doi.org/10.1007/s40271-022-00607-6.
Chiarotto A, Ostelo RW, Boers M, Terwee CB. A systematic review highlights the need to investigate the content validity of patient-reported outcome measures for physical functioning in patients with low back pain. J Clin Epidemiol. 2018;95:73–93. https://doi.org/10.1016/j.jclinepi.2017.11.005.
Vieira RJ, et al. Control of allergic rhinitis and asthma test: a systematic review of measurement properties and COSMIN analysis. Clin Transl Allergy. 2022;12(9):e12194. https://doi.org/10.1002/CLT2.12194.
Beelen LM, et al. Patient-reported outcome measures in lymphedema: a systematic review and COSMIN analysis. Ann Surg Oncol. 2021;28(3):1656. https://doi.org/10.1245/S10434-020-09346-0.
Jones PW. Outcomes and markers in the assessment of chronic obstructive pulmonary disease. Eur Respir J. 2006;27(4):822–32. https://doi.org/10.1183/09031936.06.00145104.
Cottin V, Wollin L, Fischer A, Quaresma M, Stowasser S, Harari S. Fibrosing interstitial lung diseases: knowns and unknowns. Eur Respir Rev. 2019;28(151):180100. https://doi.org/10.1183/16000617.0100-2018.
Frost MH, Reeve BB, Liepa AM, Stauffer JW, Hays RD, Sloan JA. What is sufficient evidence for the reliability and validity of patient-reported outcome measures? Value Health. 2007. https://doi.org/10.1111/J.1524-4733.2007.00272.X.
Streiner DL. Being inconsistent about consistency: when coefficient alpha does and doesn’t matter. J Pers Assess. 2003;80(3):217–22. https://doi.org/10.1207/S15327752JPA8003_01.
Stadler M, Sailer M, Fischer F. Knowledge as a formative construct: a good alpha is not always better. New Ideas Psychol. 2021;60:100832. https://doi.org/10.1016/J.NEWIDEAPSYCH.2020.100832.
Qin S, Nelson L, McLeod L, Eremenco S, Coons SJ. Assessing test–retest reliability of patient-reported outcome measures using intraclass correlation coefficients: recommendations for selecting and documenting the analytical formula. Qual Life Res. 2019;28(4):1029–33. https://doi.org/10.1007/S11136-018-2076-0.
Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15(2):155–63. https://doi.org/10.1016/J.JCM.2016.02.012.
Bacci ED, O’Quinn S, Leidy NK, Murray L, Vernon M. Evaluation of a respiratory symptom diary for clinical studies of idiopathic pulmonary fibrosis. Respir Med. 2018;134:130–8. https://doi.org/10.1016/J.RMED.2017.11.011.
Vanier A, et al. Response shift in patient-reported outcomes: definition, theory, and a revised model. Qual Life Res. 2021;30(12):3309–22. https://doi.org/10.1007/s11136-021-02846-w.
Russell AM, et al. Daily home spirometry: an effective tool for detecting progression in idiopathic pulmonary fibrosis. Am J Respir Crit Care Med. 2016;194(8):989–97. https://doi.org/10.1164/RCCM.201511-2152OC.
Terwee CB, et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol. 2007;60(1):34–42. https://doi.org/10.1016/j.jclinepi.2006.03.012.
Carlsson IK, Ekstrand E, Åström M, Stihl K, Arner M. Construct validity, floor and ceiling effects, data completeness and magnitude of change for the eight-item HAKIR questionnaire: a patient-reported outcome in the Swedish national healthcare quality registry for hand surgery. Hand Ther. 2021;26(1):3–16. https://doi.org/10.1177/1758998320966018.
Mokkink LB, et al. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63(7):737–45. https://doi.org/10.1016/j.jclinepi.2010.02.006.
Mouelhi Y, Jouve E, Castelli C, Gentile S. How is the minimal clinically important difference established in health-related quality of life instruments? Review of anchors and methods. Health Qual Life Outcomes. 2020;18(1):136. https://doi.org/10.1186/s12955-020-01344-w.
Wang Y-C, Hart DL, Stratford PW, Mioduski JE. Baseline dependency of minimal clinically important improvement. Phys Ther. 2011;91(5):675–88. https://doi.org/10.2522/ptj.20100229.
Franceschini M, Boffa A, Pignotti E, Andriolo L, Zaffagnini S, Filardo G. The minimal clinically important difference changes greatly based on the different calculation methods. Am J Sports Med. 2023;51(4):1067–73. https://doi.org/10.1177/03635465231152484.
Mercieca-Bebber R, Palmer MJ, Brundage M, Calvert M, Stockler MR, King MT. Design, implementation and reporting strategies to reduce the instance and impact of missing patient-reported outcome (PRO) data: a systematic review. BMJ Open. 2016;6(6):e010938. https://doi.org/10.1136/bmjopen-2015-010938.
Ayilara OF, Zhang L, Sajobi TT, Sawatzky R, Bohm E, Lix LM. Impact of missing data on bias and precision when estimating change in patient-reported outcomes from a clinical registry. Health Qual Life Outcomes. 2019. https://doi.org/10.1186/S12955-019-1181-2.
Sloan JA, Dueck AC, Erickson PA, Guess H, Revicki DA, Santanello NC. Analysis and interpretation of results based on patient-reported outcomes. Value Health. 2007. https://doi.org/10.1111/J.1524-4733.2007.00273.X.
Cook C. Mode of administration bias. J Manual Manip Ther. 2010;18(2):61–3. https://doi.org/10.1179/106698110X12640740712617.
Rücker G, Schumacher M. Simpson’s paradox visualized: the example of the rosiglitazone meta-analysis. BMC Med Res Methodol. 2008;8(1):34.
Haddaway NR, Page MJ, Pritchard CC, McGuinness LA. PRISMA2020: an R package and Shiny app for producing PRISMA 2020-compliant flow diagrams, with interactivity for optimised digital transparency and open synthesis. Campbell Syst Rev. 2022. https://doi.org/10.1002/CL2.1230.
Funding
The authors received no funding for this work.
Author information
Authors and Affiliations
Contributions
R.M and G.K conceptualized the study, developed the study protocol, run the formal analysis, and wrote the original draft; R.M and Y.M conducted database searches, performed the abstract and full-text screening, data abstraction, and all assessments for the included studies; R.M, Y.M., and G.K prepared all the figure. R.M., G.K., Y.M., K.K., T.L., T.R., R.S., S.W., V.S. contributed to interpretation of results and critically revised the manuscript for clarity and accuracy. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
No ethical approval is required.
Informed consent
No informed consent is required.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mohamed, R., Mbous, Y.P.V., Kamal, K.M. et al. Evaluating the St. George’s respiratory questionnaire in idiopathic pulmonary fibrosis: a systematic review with meta-analysis of psychometric properties using COSMIN. Discov Soc Sci Health 4, 31 (2024). https://doi.org/10.1007/s44155-024-00091-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s44155-024-00091-3