Introduction

Chronic obstructive pulmonary disease (COPD) is a major global health burden, contributing to 3.2 million deaths worldwide each year1,2. It is characterized by persistent airway inflammation, which is usually progressive. Lung function decline based on repeated forced expiratory volume in 1 s (FEV1) has been used as a fundamental parameter to define and evaluate the progression of COPD3. However, COPD is a heterogeneous and complex disease with marked variations in symptoms and pathophysiological features in each patient, despite a similar degree of airflow obstruction. Its progression based on FEV1 is also very heterogeneous4,5. Beyond this single parameter of lung function, there is a growing interest in imaging parameters that could determine patients’ risk for COPD deterioration with the widespread availability of chest computed tomography (CT) in clinical lung cancer screening6,7,8.

The key pathological features of COPD are emphysema and small airway obstruction and destruction. These two features can be expressed and quantified as CT-assessed emphysema and functional small airway disease (fSAD) using parametric response mapping (PRM), hereafter PRMEmph for total percent emphysema and PRMfSAD for total percent fSAD, a CT voxel-based method applicable to paired inspiratory and expiratory CT scans9. Using these parameters, several studies have identified the subtype of COPD on chest CT for the assessment of COPD progression9. However, most studies were conducted with cross-sectional designs to show the relationship between radiological variables and the presence or progression of COPD assessed using FEV110,11,12,13,14. Regarding the progression pattern of fSAD and emphysema using PRM analysis, a cross-sectional study indirectly showed the progression of fSAD to emphysema using PRM analysis as a snapshot of a cohort of individual participant at various stages of the disease9. In a longitudinal study with the various stages of COPD, baseline PRMfSAD and PRMEmph were independently associated with an increase in PRMEmph for > 5 years of follow-up by the multivariable model15. However, there is limited study on whether fSAD and emphysema sequentially progress using longitudinal analysis of PRM in patients with CT-based early COPD. In this regard, we conducted a causal inference analysis on longitudinal data to investigate disease progression using parameters quantified by PRM in patients with COPD.

Methods

Study population

We obtained the data of 219 patients who had COPD, defined as post-bronchodilator FEV1/forced vital capacity (FVC) < 70% without current asthma based on the treating pulmonologists’ diagnosis, and had two serial chest computed tomography (CT) PRM scans at baseline and follow-up from a retrospective cohort between January 2016 and April 2020 at the Samsung Medical Center (a 1979-bed hospital, Seoul, South Korea). We excluded patients with lung cancer (n = 82), with 137 patients remaining in the final study population. This study used a derivation cohort and was approved by the Samsung Medical Center Institutional Review Board (no. 2014–09-103). Independent data were obtained from the Korean Obstructive Lung Disease cohort, a multicenter cohort of 16 hospitals in the Republic of Korea between June 2005 and October 2015, which was used as the validation cohort. The Korean Obstructive Lung Disease study protocol was previously published16. The inclusion criteria were: (1) 40 years of age or older; (2) COPD, defined as a post-bronchodilator FEV1/FVC < 0.7 and a smoking history of more than 10 pack-years; and (3) no history or radiographic evidence of tuberculosis, bronchiectasis, or other pulmonary disorders. This study was approved by the Institutional Review Board of Asan Medical Center (no. 2005–0010). The need for informed consent was waived by the Institutional Review Boards of Samsung Medical Center and Asan Medical Center because we only used de-identified patient information. This study was performed in compliance with Helsinki declaration.

Chest CT acquisition and analysis

Two serial chest CTs obtained at baseline and at intervals of 2–3 years after baseline were analyzed. Whole-lung volumetric multidetector CT acquisition was performed at full inspiration and normal expiration using a standardized protocol17. All patients underwent examination using a CT scanner (Somatom Definition Flash, Siemens Healthcare) with images obtained at full inspiration and normal expiration. We used the following parameters for non-contrast scan: 110 mAs (effective) at 120 kV, 32 × 0.6 mm collimation, 0.7 pitch, 0.5 s rotation time, and 512 × 512 pixel matrix size. A B50f. (medium-sharp) kernel were used for image reconstruction with a section thickness of 1 mm. CT was quantitated via PRM, a voxel-by-voxel co-registration of inspiratory and expiratory CT18. The PRM had two thresholds, expressed in Hounsfield units (HUs): (1) -950 HU on a full inspiration scan, with values below representing emphysema; and (2) -856 HU on a full expiration scan, with values below representing gas trapping. Total percent emphysema (PRMEmph) was defined as the fraction of the lung with a CT attenuation value below − 950 HU at co-registrated inspiration CT and below − 856 HU at co-registrated expiration CT. The total percent fSAD (PRMfSAD) was defined as the fraction of lung with a CT attenuation value above − 950 HU at co-registrated inspiration CT and below − 856 HU at co-registrated expiration CT.

Statistical analysis

A subgroup of male patients with a total percentage emphysema (PRMEmph) of < 10% on baseline CT was selected as the study group in the derivation cohort. We hypothesized that a comparison between the low PRMfSAD and high PRMfSAD groups during the tracking interval changes in both emphysema and fSAD components on serial PRM CTs can reveal whether emphysema progression is due to fSAD progression. To investigate the progression of PRMEmph after 2–3 years, we further divided the study group into low and high PRMfSAD groups using a cut-off value of 15% for fSAD at baseline. We chose a cut-off value of 15% for both theoretical and practical reasons. Our goal was to define the low PRMfSAD group as a healthy group requiring a small cut-off value. The derivation cohort had a first quartile and median of 11.5% and 20.2%, respectively. Choosing a cut-off value of 20% would result in similar sample sizes for the low and high PRMfSAD groups, which would lead to an unacceptable balance during the matching procedure. We determined that the maximum and minimum values should be 20% and 10%, respectively, to ensure a sufficient number of patients in the low group, with a median of 29.4% in the validation cohort. Among the possible ranges (10% and 20%), we selected 15% as the middle point and assessed sensitivity analysis using cut-off values of 10% or 20% for the PRMfSAD. The fSAD levels were not randomized, and the patients with low and high PRMfSAD on baseline CT may have had different clinical characteristics before fSAD progression. Therefore, patients with a low PRMfSAD were matched with those with a high PRMfSAD and had similar baseline characteristics: age, sex, body mass index, smoking status, PRMEmph, and time difference between baseline and follow-up CT. The time difference in follow-up CT acquisition was not considered for matching in the validation cohort, a prospective cohort, as the follow-up duration was regular at three years. Statistical analysis was performed using R software (version 4.1.1; R Foundation for Statistical Computing, www.cran.r-project.com). Optimal pair-matching was performed using OptMatch R package.

Given that matching successfully eliminated overt bias owing to the measured characteristics, the matched pairs were treated as data from a paired randomized experiment. PRMEmph changes from baseline to follow-up CT were evaluated and compared for each matched pair. A conventional paired sample t-test can be used but was not reliable for our data because the changes did not follow a normal distribution. Such non-normality was detected through a Q-Q plot and confirmed using a more formal test, including the Shapiro–Wilk test19. Instead, we considered nonparametric randomization inference-based tests using one-way Wilcoxon signed rank statistic and Huber’s M statistics that perform well for heavy-tailed distributions and are more robust. Huber’s M statistics were considered using the sensitivitymw R package. As randomization inference does not require the assumption that a t-test is usually required, a more robust inference could be obtained.

Finally, regression analysis with unmatched and matched patients was conducted using the same set of measured characteristics used for the matching. To test the alternative hypothesis that emphysema could precede SAD, a subgroup of male patients with PRMfSAD < 20% on baseline CT was further divided into low and high PRMEmph groups using a cut-off value of 10% of PRMEmph. We chose a slightly less strict value of PRMfSAD < 20% to select the initial subgroup before division to ensure a sufficient number of patients with high PRMEmph. We then applied matching to eliminate potential bias between the low and high PRMEmph groups and followed randomization inference with robust test statistics, which is the same statistical method as in the previous analysis.

Results

Study population

Table 1 shows the differences in baseline characteristics before and after the matching in both cohorts. In the derivation cohort, 128 patients (93%) were male. Among them, 78 (61%) male patients with PRMEmph < 10% on baseline CT were selected and divided into the low PRMfSAD (n = 26) and high PRMfSAD (n = 52) groups. When selecting our study population in the derivation cohort, no female patient met the criterion of PRMEmph < 10% on baseline CT. Consequently, all patients in the derivation cohort were male. At baseline, the low PRMfSAD group had a significantly lower PRMEmph than the high PRMfSAD group before the matching. However, there were no significant differences in age, body mass index, smoking status, or CT interval between the two groups after the matching. Meanwhile, among the 218 patients included in the validation cohort, 208 were male (95.4%), of whom 80 (38%) male patients had a PRMEmph of < 10% on baseline CT. They were divided into low PRMfSAD (n = 16) and high PRMfSAD (n = 64) groups. Before matching, the low PRMfSAD group had a higher body mass index and lower PRMEmph than the high PRMfSAD group at baseline. After matching, no significant differences were observed between the low and high PRMfSAD groups. The improvement in the standardized differences in baseline characteristics shown in Supplementary Fig. 1 (difference < 0.2) was generally considered acceptable after the matching20.

Table 1 Baseline characteristics of the study participants before and after matching.

All the patients in the low PRMfSAD groups were classified as Global Initiative for Obstructive Lung Disease (GOLD) grade 1 (23% in the derivation cohort and 31% in the validation cohort) or GOLD grade 2 (77% in the derivation cohort and 69% in the validation cohort). For the high PRMfSAD group, most of the matched patients were classified as GOLD grade 1 (11% in the derivation cohort and 19% in the validation cohort) or GOLD grade 2 (81% in the derivation cohort and 56% in the validation cohort); no statistically significant differences were observed between the two groups in both cohorts (Table 1).

Comparison of changes in PRMEmph on follow-up CT between the low and high PRMfSAD groups in patients with low PRMEmph at baseline

Positive changes were defined as an increase in PRMEmph on the follow-up CT. In the derivation cohort, the low PRMfSAD and high PRMfSAD groups showed average increase of 0.11 and 1.43 percentage points in the PRMEmph, respectively. Meanwhile, there were 0.58 and 2.09 percentage point increases in the PRMEmph in the low and high PRMfSAD groups in the validation cohort, respectively. Figure 1 shows the differences in the PRMEmph between the low and high PRMfSAD groups. In the low PRMfSAD group, there was a slight to no change in PRMEmph, whereas in the high PRMfSAD group, there were notable increases in PRMEmph with a larger variability.

Figure 1
figure 1

Changes in PRMEmph among low and high PRMfSAD groups with low PRMEmph in the derivation and validation cohorts. The line through the box is the median, and the top and bottom edges of each box represent the interquartile range. The vertical lines at each side of the box represent the minimum and maximum values of the data. PRMEmph = total percent emphysema, PRMFsad = total percent functional small airway disease.

Two-sided paired t-tests showed that the difference in the change in PRMEmph was significant in the derivation cohort (p = 0.027) but not in the validation cohort (p = 0.072). To overcome bias due to the small sample size and non-normally distributed follow-up data on emphysema, we conducted a randomization inference for hypothesis testing using the one-sided Wilcoxon signed-rank test and Huber’s M statistics. In the derivation cohort, the one-sided Wilcoxon signed-rank test yielded a p-value of 0.052 with a 95% confidence interval (CI) of -0.004–2.24, and the Hodges–Lehmann estimate for the median was 0.92. Meanwhile, the p-value was 0.130 with a 95% CI of -0.45–3.19, and the Hodges–Lehmann estimate for the median was 1.56 in the validation cohort. The Huber’s M statistics, a more powerful test to detect the actual difference between the two groups, yielded a p-value of 0.016 with a 95% CI of 0.10–2.49, and the point estimate was 1.24 in the derivation cohort; conversely, the p-value was 0.038 with a 95% CI of -0.20–3.20, and the point estimate was 1.51 in the validation cohort.

When we conducted a sensitivity analysis based on a baseline PRMfSAD of < 10% or < 20%, the results were similar (Supplementary Material). Regression analysis showed similar results (Supplementary Material).

Comparison of changes in PRMfSAD on follow-up CT between the low and high PRMEmph groups in patients with low PRMfSAD at baseline

To test the alternative hypothesis that emphysema could precede SAD, we investigated the changes in PRMfSAD between the low and high PRMEmph groups in patients with PRMfSAD < 20% on baseline chest CT. In the derivation cohort, ten matched pairs were identified using almost the same matching method as that discussed in the previous design. Changes in the PRMfSAD from baseline to follow-up were compared between the low and high PRMEmph groups; however, no significant differences were observed between them. The Wilcoxon signed-rank test yielded a p-value of 0.85, while the Huber’s M statistics yielded a p-value of 0.96 for two-sided tests. The 95% CI for the high-minus-low outcome was -13.55–14.46, and the point estimate was 0.16. Therefore, these findings do not provide evidence that emphysema precedes airway disease. In the validation cohort, 18 matched pairs were identified. Similar paired comparison results were obtained. The difference was not significant at a p-value of 0.05. The Wilcoxon signed-rank test yielded a p-value of 0.15, whereas the Huber’s M statistics yielded a p-value of 0.11. The point estimate was -3.09 with a 95% CI of -7.04–0.86. Figure 2 shows a comparison of the changes in PRMfSAD between the low and high PRMEmph groups in both cohorts.

Figure 2
figure 2

Changes in PRMfSAD among low and high PRMEmph groups with low PRMfSAD in the derivation and validation cohorts. The line through the box is the median, and the top and bottom edges of each box represent the interquartile range. The vertical lines at each side of the box represent the minimal and maximum values of the data. PRMEmph = total percent emphysema, PRMfSAD: total percent functional small airway disease.

There was an increase in the PRMfSAD value during the follow-up period for both the low and high PRMEmph groups, with average increases of 7.03 (low PRMEmph) ad 6.56 (high PRMEmph) in the derivation cohort and 5.51 (low PRMEmph) ad 2.42 (high PRMEmph) in the validation cohort. The increases observed in the validation cohort were found to be statistically significant via a one-sample t-test (p = 0.009 and 0.027 for the low and high PRMEmph groups, respectively), whereas those in the derivation cohort were not, likely due to the small sample size, although they were larger in magnitude.

Discussion

In our study, we found that SAD measured using PRMfSAD preceded emphysema measured using PRMEmph in the progression of COPD patients with early stage emphysema, while a significant direction of progression from PRMEmph to PRMfSAD was not found in COPD patients with early stage fSAD. This was observed in the longitudinal follow-up of a retrospective cohort and validated in a prospective multicenter cohort using several additional statistical tests.

As our study was conducted in patients who regularly underwent repeated PRM CTs, a confirmatory analysis was performed to directly examine the sequential relationship between emphysema and fSAD. Moreover, because most of the matched patients were classified as GOLD grade 1 or 2, we focused on COPD progression at an early stage. This study showed that in patients with low baseline PRMEmph, the group with high PRMfSAD showed a greater change in PRMEmph than those with low PRMfSAD on repeat CT in both the derivation and validation cohorts, indicating that the developed fSAD might progress to emphysema. However, in patients with low baseline PRMfSAD, there was no significant interval change in PRMfSAD between the low PRMEmph and high PRMEmph groups on repeated CT, indicating that when emphysema develops in a patient, the development of SAD is not guaranteed in patients with early stage fSAD. Furthermore, the point estimates in the validation cohort had a negative sign, indicating that patients with low PRMfSAD tended to have high PRMfSAD at follow-up. When we further investigated the changes in PRMfSAD between the low and high PRMEmph groups, an interesting and important finding was that PRMfSAD tended to increase during the follow-up period, regardless of PRMEmph levels. This may provide indirect evidence supporting the hypothesis that SAD could precede emphysema. In this regard, patients with low PRMfSAD and low PRMEmph on baseline CT may show a greater increase in PRMfSAD than in PRMEmph on follow-up CT, suggesting a potential progression towards fSAD rather than emphysema.

The interest in SAD in patients with COPD has increased since Hogg et al. showed that narrowing and disappearance of terminal bronchioles preceded emphysematous destruction in patients with COPD, which was measured using micro-CT with specimens from lung transplantation21. Subsequently, several studies showed the importance of SAD in predicting FEV1 change for accelerated COPD progression by radiologic biomarker14,22. The Canadian Chronic Obstructive Lung Disease cohort showed that the total airway count, quantified using chest CT, was reduced by 17% even in patients with mild-to-moderate COPD compared with that in at-risk participants22. A reduced total airway count was independently associated with longitudinal lung function decline over a six-year period, indicating that SAD is associated with FEV1 change and worsens even in the very early stage of COPD before the gross FEV1 change occurs. After the introduction of PRM as a new image analysis technique for the quantification of small airway and emphysema, the association between biomarkers identified through PRM CT and FEV1 changes were assessed14. A study with COPDGene cohort, including patients with GOLD 1–4 and baseline PRM CT, showed that baseline PRMfSAD and PRMEmph were associated with FEV1 decline14. In particular, the association of PRMfSAD and FEV1 decline was evident for COPD patients with GOLD grade 014. This results suggested that quantified CT biomarkers may provide additional information about disease progression in COPD beyond pulmonary function test alone.

The progression of SAD and emphysema has been assessed using PRM analysis in several studies. In a study that included 194 patients with COPD, fSAD initially developed more often than did emphysema in patients with mild COPD, and in patients with severe-to-very severe COPD, emphysema progressed with plateauing at 40–50% of the value for fSAD, which was conducted with a cross-sectional design9. Another study with 30-days or 1-year repeated chest CTs from the SPIROMICS cohort showed that there were temporal disease-specific changes in the lungs by PRM analysis beyond detection from change of pulmonary function. This indicates the potential of PRM analysis as a radiologic biomarker for monitoring disease status23. A longitudinal study with two PRM CTs for a five-year follow-up period showed that progression only in PRMfSAD, progression only in emphysema, and progression both in PRMfSAD and PRMEmph occurred in 17.6%, 48.4%, and 34% of COPD patients with all stages of disease severity, respectively. Additionally, multivariable analyses showed that baseline PRMfSAD on chest CT is independently associated with an increase in emphysema five years later after adjusting for age, gender, race, smoking status, baseline pulmonary function test, and scanner type15. Our study extended these findings by showing sequential progression from SAD to emphysema with longitudinal follow-up using PRM analysis on CT and statistical causal inference analysis.

Our study has several strengths. First, we conducted a longitudinal study using two series of PRM, which were practically available in clinics. Second, there was as much reduction in the bias associated with observational studies as possible. Third, the patients were strictly selected using the criterion of PRMEmph < 10%, because the objective of the study was to examine the progression of emphysema in the low and high PRMfSAD groups. Additionally, the results were adjusted for all confounders that are thought to influence the outcomes. Finally, the Huber’s M statistics were used to increase the robustness of the causal conclusions for a small, matched observational study24.

Furthermore, matching methods have a few key advantages over other approaches, such as regression models20. First, matching methods should not be seen in conflict with regression adjustment, and in fact, the two methods are complementary and best used in combination. Second, matching methods do not use extrapolation, but focus on the overlap between the treated and control groups. Regression models rely heavily on extrapolation when there is insufficient overlap, and have been shown to perform poorly. Finally, matching methods usually involve randomization-based tests, which enable us to conduct a sensitivity analysis of unmeasured (residual) confounding variables. Although matching or regression can eliminate overt bias due to measured covariates, there is no guarantee that unmeasured bias is absent. Assessing the impact of such an unmeasured bias can provide confidence in the findings and improve the robustness of the conclusion.

This study has several limitations that are worth noting. First, most of the patients received treatment for COPD, which may have influenced the results. Second, the measurement of fSAD using PRM is not a direct measure of airway thickness or destruction. However, instead of static information on airway wall thickness, dynamic metrics between the inspiratory and expiratory phases could provide a more functional measurement of small airways. Another limitation of our study is the approach of coregistering inspiration and expiration scans at a single time point. Introducing coregistration between inspiration and expiration scans at each time point presents challenges, particularly due to data loss and voxel matching difficulties. While voxel-by-voxel matching is suitable for longitudinal scans with a single read-out of the inspiration phase, the process becomes more complex with PRM, which involves both inspiration and expiration scans. We acknowledge these limitations but opted for a more feasible methodology given the constraints of our study. Despite these limitations, our study focusing on the radiologic progression in CT-based early COPD patients remains important and contributes valuable insights. Third, the relatively small number of matched patients may have limited the generalizability of our findings. Despite the small sample size, our conclusion was statistically significant, and this finding was replicated in the validation cohort. While the change in emphysema was more pronounced when we used traditional statistical analysis tools, such as regression or two-sample t-tests, these methods rely on certain assumptions, such as normality or correct model specification. In contrast, our matching approach used minimal (and weaker) assumptions, which can lead to a more robust conclusion. Consequently, the analysis results based on matching are conservative. Fourth, as this study was conducted only among patients in the Republic of Korea, the results may not be generalizable to other regions. Moreover, our study had the preponderance of male patients in both cohorts, which resulted in an insufficient sample size for conducting a robust analysis of female COPD patients. Especially, the validation cohort enrolled COPD patients with only a smoking history of more than 10 pack-years, thus 97% of the subjects in the cohort were male due to very low smoking rate among females (less than 5%)25,26. This gender imbalance restricts the generalizability of our findings to female. Future research should aim to include a more balanced gender distribution to ensure that our results are applicable to both gender. Fifth, the relatively short duration between follow-up imaging assessments in our study is a notable consideration. Despite ongoing analysis of subsequent imaging data, the interval of 2–3 years may not provide sufficient time to capture the full spectrum of emphysema progression accurately. Future studies should prioritize longer follow-up periods to track emphysema progression over extended durations accurately. Finally, we acknowledge another limitation of our study is that patients with pre-COPD were not included. Future studies should aim to incorporate patients with pre-COPD or even normal healthy individuals to better elucidate the full spectrum of COPD development and progression. This approach would enable a more detailed assessment of the initial changes in image biomarkers and improve our understanding of the natural history of COPD.

In conclusion, we observed that among patients with low PRMEmph, the high PRMfSAD group had a significantly increased proportion of emphysema on follow-up CT, compared with the low PRMfSAD group. Among patients with low PRMfSAD, there was no significant change in PRMfSAD between patients with low PRMEmph and those with high PRMEmph in the repeat CT group, whereas PRMfSAD tended to increase during the follow-up period, regardless of PRMEmph levels. These findings imply that a longitudinal analysis of PRM on CT may provide critical evidence that small airway illness precedes emphysema in patients with COPD.