Introduction

Gait is the primary means of mobility for most of the population and is a strong indicator of overall health1,2,3,4. The free-living 95th percentile stride velocity has been accepted as a primary endpoint by European regulators for patients with Duchenne Muscular Dystrophy (DMD) and has been correlated with existing clinical assessments such as the 6 min walk test (6MWT)Click or tap here to enter text. and the 4-stair climb test (4SC) with the added relevance of monitoring ambulatory function in the real-world5. An increase in gait speed over a one-year period is strongly correlated with significant reductions in both relative and absolute risk of death1. Studies have also indicated that a decline in gait speed is associated with reduced cognition, diminished vitality, and an increased risk of falls and other injurious events6. Due to its predictive power in a wide variety of health conditions, gait speed has come to be known as “the sixth vital sign” alongside respiratory rate, heart rate, blood pressure, temperature, and oxygen saturation7,8,9. Traditional measurements of physical performance, such as the 6MWT and the Short Physical Performance Battery (SPPB) where walking duration is measured with stop watch, are typically performed in clinical or laboratory settings using functional assessment tests and clinical grading scales. Such tests and scales may be unreliable due to their episodic administration, subjectivity of the rater, and the Hawthorne effect10,11. Additionally, in a clinical trial setting, traditional assessments might require multiple clinic visits increasing patient burden and risk of attrition12,13. Thus, there is a need to derive reliable and stable metrics of physical performance, such as gait speed in the 95th percentile and mean gait, in naturalistic environments, as an alternative to traditional assessments14,15. The emergence of digital health technologies (DHTs) has opened opportunities to quantify gait, physical activity, and enabled the continuous monitoring of patients in free-living settings15. Furthermore, DHTs have demonstrated the potential to reduce the need for multiple clinic visits and patient burden by saving time16,17,18. One example of DHTs are inertial sensors, which have been validated for their reliability and can be comfortably worn, often minimizing their visibility by resembling familiar articles of clothing like fitness trackers or watches15,19.

Validation of DHTs to support novel digital endpoints can be successful through stakeholder collaboration in pre-competitive initiatives combining industry and academic sponsors, regulators, manufacturers, patients, and healthcare providers (HCPs)20. Regulatory bodies and stakeholders have been actively investigating and advising on the procedures for incorporating the digital endpoints derived by DHTs within clinical use and necessary criteria for their acceptance as endpoints or biomarkers20,21,22,23,24,25. The necessary steps towards acceptance and adoption of novel digital endpoints include the technical and analytical validation for accuracy and reliability, as well as, patient experience, clinical validation, and clinical meaningfulness26,27,28,29,30. While the analytical validation of gait and activity endpoints are generally performed during a controlled in-lab environment, where traditional and reference methods can be measured simultaneously with the test methods for comparison, the validation of these endpoints during longer, naturalistic, and free living conditions has been more challenging7,31. Nonetheless, multiple DHT validation studies have shown evidence of measuring reliable gait and activity related metrics in both pediatric and adult populations32,33,34,35,36,37,38,39,40,41,42. Scientific consortiums such as Mobilise-Ds43,44 through the Innovative Medicines Initiative (IMI) has focused not only on a unified approach for the validation of mobility measures across patient populations45,46, but also understanding the acceptability of DHTs for remote monitoring47 and research on patient-centricity to shape development48.

We previously showed that a single lumbar-worn accelerometer can be used to reliably measure gait metrics during a walking task, and gait measured during free-living conditions captures useful information that cannot be obtained in controlled settings (e.g., clinic or laboratory)15,49. In this study entitled In-Clinic and Natural Gait Observations (I-CAN-GO), we provide further evidence on the reliability and performance of gait metrics derived by our in-house developed gait algorithm (SciKit Digital Health (SKDH) gait module)49 as a device-agnostic gait characterization tool during various naturalistic or simulated daily activity tasks using a different lumbar-worn accelerometer (ActiGraph CentrePoint watch). In order to quantify the sensitivity of gait metrics to detect naturalistic behavior, 1) we first validate that using a single lumbar-worn accelerometer can distinguish between participants' self-perceived gait speeds and the effects of different floor surfaces on gait speed, 2) we show that gait and activity during simulated free-living activities and longer bouts of walking can be reliably captured, suggesting accurate representation of gait metrics in free-living conditions, and 3) we evaluate device comfort and wearability, an important consideration for using wearable sensors in patient populations for extended durations19,50, for tested devices in different wear locations.

Results

Different self-paced walking speeds can be accurately captured by single lumbar-worn accelerometer

During in-lab assessments, participants were asked to walk three laps on an instrumented mat with three self-paced walking speed tasks (i.e., at their natural, fast, and slow walking speeds). During these walking tasks, participants wore an Ambulatory Parkinson’s Disease Monitoring (APDM) six-sensor set51,52, as well as a single accelerometer at their lumbar location. Gait metrics were derived from 1) GAITRite using ProtoKinetics’ Movement Analysis Software (PKMAS)53 (as reference), 2) APDM six-sensor set method51 (as reference), and 3) single lumbar-worn accelerometer using the SKDH gait algorithm49 (as comparator). Figure 1 depicts the comparison of gait speed between references and comparator devices/methods using Bland–Altman plots and correlations across various walking speeds. Although SKDH gait underestimated gait speed compared to APDM and GAITRite, as also reported in previous findings15, the Bland–Altman plots showed homogenous mean bias (GAITRite – SKDH gait m/s = 0.15 (26%), 0.19 (19%), 0.29 (22%); APDM – SKDH gait (m/s) = 0.05 (8%), 0.10 (10%), 0.18 (14%) for slow, normal, fast walking speeds, respectively). The variability of SKDH gait metrics were higher and intraclass correlation coefficients (ICC) varied between poor to good across the gait metrics between APDM (range across speeds [0.59–0.84], and GAITRite (range across speeds [0.36 – 0.55], mainly due to the mean differences (see Supplementary Tables 12 for summaries).

Figure 1
figure 1

Gait speed validation for natural, fast, and slow in-lab walking tasks. Between GAITRite (reference), APDM 6 sensor-set (reference), and SKDH gait (comparator) applied on single lumbar-worn accelerometer across natural, fast, and slow a shows Bland–Altman plots, red lines represent the limits of upper lower bound and dotted lines represent the 95%CI, blue line represents the mean difference and dotted lines represent the 95%CI, green is 0 line, b correlations between reference and comparator methods with 95%CI in shaded area. The dotted line represent the equal line.

Mixed effects model was used to test if gait metrics differed between the walking tasks and if gait metrics can be reliably captured with a single accelerometer device across difference walking speeds. The following model was tested for each gait metrics: methods (GAITRite/APDM/SKDH gait), task (slow/natural/fast), and their interaction were entered as independent variables (fixed effect), age was entered as covariate, participant as random effect. Figure 2 shows gait speed, cadence, stride duration, and stride length across multiple methods and tasks. A significant task effect was present for all gait metrics (gait speed F(152,2) = 604.85, p < 0.0001; cadence F(152,2) = 504.27, p < 0.0001; stride duration F(152,2) = 362.37, p < 0.0001; stride length F(152,2) = 418.03 p < 0.0001, persisted after false discovery rate (FDR) correction for multiple comparisons), therefore, gait metrics derived by different methods (GAITRite, APDM, SKDH gait) were able to differentiate between different walking speed tasks. Gait speed and stride length showed significant method effect (gait speed F(152,2) = 44.73, p < 0.0001; stride length F(152,2) = 107.55, p < 0.0001; persisted after FDR correction), e.g., overall gait speed was estimated significantly higher in GAITRite than APDM and SKDH gait (see Supplementary Table 3 for all comparisons), mainly driven by the stride length. (Fig. 2). Overall, temporal gait metrics, such as cadence and stride duration showed excellent agreement and minimum bias across different methods; and all methods were able to differentiate between different walking speeds.

Figure 2
figure 2

Different walking speeds were captured by gait metrics derived from all methods. All gait metrics showed significant task effect, gait speed and stride length had significant differences across methods that persisted FDR correction; gait speed and stride length were consistently underestimated compared to GAITRite. There was no interaction effects between tasks and methods which suggested error did not significantly variate across walking tasks and devices.

The floor surface affects gait metrics

To assess if gait on different floor surfaces differed, we compared the gait parameters extracted from a walk on a carpet and tile floors. GAITRite was placed under the carpet to capture as reference, however, GAITRite data was not possible to be captured for tile floor walk. Figure 3a shows the Bland–Altman plots and correlations for gait speed, reflecting homogeneous mean bias and good correlations across methods. Like different walking speeds, we observed homogenous mean bias across different methods (GAITRite – SKDH gait (m/s) [= 0.17 (17%) for carpet walk; APDM – SKDH gait (m/s) = 0.08 (7%), 0.09 (8%), for carpet and tile floor walking speeds, respectively). Mixed effects model showed significant task effect for gait metrics, except stride length; and significant effect of method for all gait metrics (Fig. 3b). Gait speed was underestimated by SKDH gait compared to GAITRite and APDM for both floor surfaces. There was no interaction effect between tasks and methods. The changes in gait due to floor surface differences were captured by all methodologies; i.e., higher gait speed, higher cadence, and faster strides were observed during tile walk compared to carpet walk (see Supplementary Table 3).

Figure 3
figure 3

Gait metrics validation for carpet and tile floor walking. Due to the nature of the task, GAITRite data could not be captured for tile floor walking. Similar characteristics across methodologies were observed compared to different walking speed tasks; a shows homogeneous bias between methods with moderate to excellent agreement and good correlations, b mixed effect models showed significant different across methods with SKDH gait for all gait metrics, and significant task effect, except stride length. SKDH gait underestimated gait speed compared to other methods; and all methods were able to capture the change of gait metrics due to different floor walk (i.e., gait speed was significantly higher for tile walk compared to carpet).

Gait metrics during simulated activity blocks and longer walking can be derived accurately from single lumbar-worn accelerometer

In order to simulate naturalistic walking behavior, participants were asked to perform two sets of simulated activities in the lab, each consisted of combinations of walking, sitting, and standing tasks and lasted around 7 min, as well as 20 min of free walking outside the lab (around the block) attended by a facilitator. The APDM six-sensor set and lumbar-worn accelerometer sensor were worn simultaneously by the participants to validate the gait metrics. Figure 4 a shows the Bland–Altman plots and correlations between gait metrics derived from APDM six-sensor set and SKDH gait from lumbar sensor. Gait speed shows good correlation and agreement between the two methods, with mean bias of 0 and 0.04 m/s (3.2%) for activity blocks and outside lab walk tasks. ICC values were 0.816 for activity block and 0.898 for the outside lab walk tasks. Other gait metrics also showed excellent agreement (see Supplementary Tables 12).

Figure 4
figure 4

Gait speed validation for simulated activity blocks and 20 min outside lab walk tasks. Participants performed two simulated activity blocks, each lasted around 7 min, as well as, 20 min outside lab walk, where they were let to move/walk freely around the neighborhood. The gait metrics were derived by the APDM six-sensor set (reference) and SKDH gait applied on the lumbar-worn sensor (comparator). There was good agreement and correlation for gait speed as reflected by a Bland Altman and correlation plots between APDM six-sensor set and single lumbar-worn accelerometer. The mean gait speed across devices and tasks are depicted in b. The mixed effects model showed significant task effect between methods, whereas, no method effect existed, showing only slightly lower gait speed during activity blocks compared to outside lab walk.

A mixed effects model was used to test if gait metrics differed between the simulated activity blocks and outside lab walks and if gait metrics can be reliably captured with a single accelerometer device across different walking speeds. The following model was tested for each gait metric: methods (GAITRite/APDM/SKDH gait), task (activity blocks/outside lab walk), and their interaction were entered as independent variables (fixed effect), age was entered as covariate and participant as random effect. Figure. 4b depicts the gait speed and cadence across multiple methods and tasks. There was only a significant task effect present for all gait metrics (gait speed F(57,1) = 6.97, p = 0.01; cadence F(57,1) = 4.2, p = 0.05; stride duration F(57,1) = 4.05, p = 0.05; stride length F(57,1) = 13.11 p = 0.0006), which indicated simulated activity blocks showed only slightly higher cadence and lower gait speed compared to outside walk, due to shorter, smaller steps taken during the simulated activity blocks.

In addition to gait metrics, we investigated the total number of steps detected by APDM and SKDH gait applied on lumbar-worn sensor, and wrist-worn sensor54 during simulated activity blocks and outside lab walk tasks. Figure 5 depicts the Bland–Altman plots and correlations of total number of detected steps during the 20 min outside lab walk task across three devices. On average mean (SD) = 2150 (299), 2076 (277), and 1894 (334) steps were detected during the 20 min walk with wrist-worn sensor, APDM six-sensor set, and SKDH gait applied on lumbar-worn ActiGraph sensor. The correlations and ICCs showed excellent agreement and correlations within the 20 min walk task across methods, whereas the ICCs were poor during the in-lab simulated activity blocks with high correlations (Supplementary Table 2 and Supplementary Fig. 1).

Figure 5
figure 5

Comparison of total number of steps detected across multiple methods during 20 min outside walk task. The number of detected steps were compared between APDM six-sensor set, wrist-worn sensor, as well as SKDH gait applied on lumbar-worn sensor. The wrist-worn sensor detected the highest number of steps, APDM and SKDH gait applied on lumbar-worn sensor detected slightly lower number of steps, which is expected as SKDH gait and APDM prioritize gait characterization over step counts. All methods were in good agreement and had excellent correlation between each other.

Simulated in-lab activity and 20 min walking compared to at home 7 days of data

We compared the gait speed derived from the single lumbar-worn accelerometer with SKDH gait during in-lab tasks resembling naturalistic walking; i.e., simulated activities and 20 min outside walk, and continuous monitoring for around one week. When compared using mixed effect model, there was a significant effect of task (F(37,2) = 12.3 p = 0.0001; F(37,2) = 12.8, p = 0.0001 for gait speed and 95th percentile of gait speed, respectively). The average at-home gait speed was significantly lower (paired t-test, t (18) = −4.72, p < 10−3; t (18) =  −3.97 p < 10−3) and 95th percentile of at-home gait speed was significantly higher (t(18) = 5.65; p < 10−4; t(18) = 3.61 p = 0.002) than both of the in-lab activities (Fig. 6a). Additionally, regression analyses were used to test if the in-lab gait speeds estimated the at home gait speed. The outside walk task gait speed showed trending pattern to predict the at home gait speed (adjusted R2 = 0.14, F(1,17) = 4.03, β = 0.33, p = 0.06); gait speed during activity blocks predicted the at home gait speed (adjusted R2 = 0.33, F(1,17) = 9.7, β = 0.48, p = 0.006), however, the explained variations were 14% and 33% (Fig. 6b). Similar behavior was observed for the relationships between gait speed during in-lab activities and 95th percentile of gait speed at home (see Supplementary Fig. 2). When we separated the at home gait speed across various bout duration (short: 10 s-30 s, medium: 30 s-60 s, long: > 60 s) we observed increasing gait speed with bout duration and gait speed derived from the medium and longer at-home bouts were in closer range with simulated activities and outside lab walking gait speed (Fig. 6c). The paired t-tests showed that the at-home gait speed during short bouts was significantly slower than the gait speed during in-lab activity blocks and 20 min walk tasks (t(18) = −5.19, p < 10−4; t(18) = −5.46 p < 10−4, respectively); gait speed during longer bouts was significantly higher than gait speed during in-lab activity blocks (t(18) = 3.43, p = 0.003), suggesting medium bouts were more reflective of in lab simulated activities, and both medium and longer bouts might be more relevant to outside lab walk. The regression analysis showed that the explained variance was relatively higher (adjusted R2 = 0.45, F(1,17) = 15.52, β = 0.79, p = 0.001, Fig. 6d) between activity blocks and at home gait speed during medium duration bouts, and adding shorter and longer bouts to the model did not have a significant change. Outside lab walk and medium and longer bouts shared low variance (see Supplementary Fig. 2).

Figure 6
figure 6

Comparison of naturalistic gait speed at home and in-lab tasks. (a) Both the average and 95th percentile gait speed at home was significantly different than in-lab gait speed. (b) The linear regression model showed the outside lab walk only explained 14% of the variance at-home (b = 0.33, and Pearson’s r = 0.44), and activity blocks explained only slightly higher 33% of the variance at-home (b = 0.47, and Pearson’s r = 0.6). (c) When broken down into increasing bout lengths, the at home gait speed spans in-lab gait speeds, and (d) at home gait speed during medium bout durations (30 s-60 s bouts) explained the most variance in gait speed during activity blocks (45%, (b = 0.79, and Pearson’s r = 0.69). The dashed lines represent the equal line.

Comfort and wearability of sensors

The comfort of wearing sensors was rated by participants both during in-lab visits and after 1 week of home monitoring. Overall, sensors were found acceptable and comfortable to wear; 88.2% and all participants are comfortable to wear lumbar and wrist sensors at home, respectively (Fig. 7a; rating 3 and above, where 0 indicated tolerable enough and 5 indicated very comfortable). Similarly, participants rated how long they would be willing to wear the sensors continuously; 72.2 and 85% of the participants indicated they would wear lumbar and wrist sensors for more than 1 week (Fig. 7b). The comfort of sensors during in-lab visits was also positive: for both visits, participants reported positive experience with sensors’ comfort (Fig. 7c).

Figure 7
figure 7

Overall comfort assessment of wearing the sensors at home and in-lab, and total home wear time. Participants were asked to assess the comfort of wearing the sensors by multiple questions; (a) shows results of when participants were asked after 1 week of home monitoring “Please rate the overall comfort of the lumbar- and wrist- sensor at home where 0 means that they were only tolerable enough to participate in the study and 5 means that you were not at all bothered.”; (b) shows participants’ willingness to wear those devices for extended periods of time, where more than 70% and 85% responded they would wear lumbar and wrist devices more than 1 week; (c) plots the overall comfort level of sensors during 2 in-lab visits (question worded similarly to what is shown in Fig. 7a); and (d) shows the average wear time per day recorded by the sensors compared to the participants’ responses to the wear time question “What percentage of the time did you wear the wrist/lumbar sensors?” Both methods resulted in similar results, note that the participants were given the option to take the lumbar sensor off during the night.

For this study, during the at home monitoring period, the compliance per day was defined as 18 and 10 h of wear time for the wrist- and lumbar-worn sensors, respectively. The average compliant days for the wrist-worn sensor was mean (SD) 5.79 (0.63), with range between 4 to 7 days; and the average compliant days for the lumbar-worn sensor was mean (SD) 5.7 (0.87), with range between 3 to 7 days. Participants were also asked to report average sensor wear time during at home monitoring. The participant reported wear time was in concordance with the average percent wear time derived by the sensor data (Fig. 7d; on average 97.4 and 72% wear time were reported by the participants compared to the 96.5 and 64.3% wear time derived from the wrist and lumbar-worn sensor data, respectively). The lumbar sensor wear time is shorter than the wrist sensor as the participants were given the option to take the lumbar-worn sensor off during night time.

Discussion

We designed a master protocol for technology evaluation against reference standards to allow for rapid evaluation of digital health technologies that assess physical activity and gait in adults aged 18 and older. In this first cohort of this protocol, we investigated the use of the ActiGraph Centrepoint Insight Watch in both the wrist-worn position and the lumbar position during two in-lab assessments and a free-living period of approximately 7 days. We aimed to (1) further validate and evaluate the performance of a method for measuring gait relying on a single lumbar-worn accelerometer running SKDH gait with respect to reference methods relying on six devices (APDM), and an instrumented walkway (GAITRite); (2) test the ability of our in-house built SKDH gait algorithm to distinguish between participants’ self-perception of different gait speed changes (fast walk, slow walk, natural walk); (3) test SKDH gait performance against APDM during a 20 min free-walk as well as simulated activity block with 2 sitting, 2 standing, and 2 walking tests performed in random sequence to better estimate performance during future unsupervised ambulatory monitoring; (4) look at the effect of different floor surfaces such as carpet and tile on walking performance; and (5) compare to previous studies with different accelerometer devices to demonstrate SKDH gait is agnostic to specific accelerometer devices across studies when evaluating algorithm performance. Our results show that SKDH gait performance is reliable and accurate with homogenous bias with respect to the reference methods and provide further evidence of its use in free-living conditions with different sensors.

Gait has often been called the sixth vital sign9. Mobility and how well we walk has been shown to be correlated with long term health outcomes and recovery from disease. Accurately assessing mobility is thus of paramount importance. In previous studies, we have assessed the validity of sensor-based methods for estimating gait speed in a controlled setting using the GENEActiv Activ Insights watch15,55. Technology is frequently changing both from hardware and software perspectives and being able to develop algorithms that are device-type agnostic is essential to the ability to continue to use digital health technologies to improve measurement capabilities in clinical trials and clinical use56. The results of this study on both temporal and spatial metrics of gait derived using SKDH gait using the ActiGraph Centrepoint Watch show similar agreement as measured via ICC to reference standards used in previous studies15. Similarly to previous studies, temporal metrics are more accurate compared to spatial metrics, which rely on the physical inverted pendulum model for a single lumbar accelerometer. However, there remains good to excellent agreement between all measures of gait, concordant with previously published results with a different manufacturer-made sensor worn in the same position15.

An important feature of devices that monitor gait and physical activity is the ability to detect walking at all walking paces from slow to fast as would happen during bouts of naturalistic walking. In this study, participants’ self-perception of three different walking speeds (fast, slow, natural speed) can clearly be captured with accelerometers (Fig. 2). As the self-perceived walking speed increased, all methods consistently trended towards smaller (i.e., decrease in stride length) but longer (i.e., increase in stride duration) steps. Rather, gait speed, as a byproduct of stride length and duration, might be the most promising gait metric, with larger effect size, to detect changes between different populations or therapeutic effect. There was a significant device effect on most measures as the overall gait speed is estimated significantly higher in the GAITRite than APDM and SKDH gait. This is likely due to the relatively higher accuracy of temporal vs spatial measures of gait using the SKDH gait model seen also in previous studies with an earlier version of the same algorithm15. Stride duration does not differ between the ActiGraph Centrepoint device running SKDH gait against the reference measures but there is a significant difference in stride length across the three devices. Overall, however, despite the device effect, the SKDH gait algorithm run on accelerometry data from ActiGraph Centrepoint Insight watch at the lumbar position was able to show clear differences between the three different speeds of walking of this cohort of participants in lab. Furthermore, the error measured across tasks appears to be consistent for each device which would also further give confidence towards measuring different walking paces at home using the same device consistently across a study or to measure therapeutic effect. Additionally, the correlation appears to be strongest between the reference standards and SKDH gait during the slow walk task; given that most pathological diseases will likely adversely affect mobility from the norm rather than improve it, this may be promising for future clinical applications.

Accurately assessing mobility at home is challenging as this may require special instrumented houses or rooms which is labor intensive, wearing multiple sensor sets continuously which is burdensome, or installing other specialized equipment in peoples’ homes which presents operational and privacy challenges45. This study incorporated longer duration walk activities and simulated activity blocks to replicate naturalistic behaviors while still undergoing reference measures. While it cannot completely replace validation in the home, these simulated blocks are longer and more consistent. We observed during the 20 min walks and the simulated activity blocks higher correlations and lower bias between gait speed as measured by SKDH gait with a single accelerometer against reference standards compared to during the walk tasks on the GAITRite. For step counts however, simulated activity blocks had lower ICCs; there was high correlation seen which could be related to bias. For the longer 20 min walk, the step counts had high ICC and high correlation across the devices likely due to the longer duration of the task and longer number of consecutive steps. Previously, we have published data, that replicates other studies, that stimulated walking on a GAITMat could not distinguish between younger and older participants in lab, likely related to a performance effect15,57,58 of being observed during walking activities. These simulated activities of daily living (ADLs) and longer 20 min walking tasks newly incorporated in this master protocol have a higher diversity of duration of walking bouts and walking speeds compared to traditional shorter and episodic walking tasks (e.g. 4 m or 30 m walking tasks). The SKDH gait algorithm continues to provide good performance under these more diverse simulated walking conditions that are closer to naturalistic walking bouts outside of the clinic setting. In fact, the performance of SKDH gait algorithm by ICC comparisons is higher for the simulated and 20 min walk tasks compared to the instrumented gait mat walking tasks which are of shorter length (Fig. 6b). Thus, the algorithm performance increases with longer duration and bouts of walking, which are likely closer to the diversity of walking tasks in the real world. Furthermore, the simulated activity blocks and the 20 min walk task capture the full range of different short, medium, and long walking bouts seen in the at home gait data (Fig. 6c).

As part of the study procedures, we also sought to determine if there was an effect of different types of floor surfaces on gait parameters. Based on both results from the APDM system and SKDH gait system we detected a statistically significant floor surface effect in gait metrics including cadence (steps/min), gait speed (m/sec), stride duration (sec). In general, participants walked faster on tile than on carpet, driven by stride duration as the stride length was comparable between carpet and tile floors. The ICC and correlation values between devices suggest good concordance between the test device and reference standard. From a measurement perspective, this is unlikely to impact measurements of efficacy and treatment effect as long as the participant is not being assessed frequently on a multitude of different floor surfaces. Additionally, with the same algorithm, over time, with longer bouts of monitoring continuously, effects from floor surfaces such as carpet or hardwood or tile will likely even out. However, as decentralized elements of trials continue to grow and assessments at home increase such as 6 min walk test evaluation, this has important implications for standardizing the room in which a participant may complete the assessment repeatedly as to not add an additional variable to the 6 min walk distance.

This study was conducted during the height of the COVID-19 pandemic and as a result the demographics of this cohort skew towards younger participants (Mean age 34 SD 9.06 years) which may affect the reports of participant acceptance of devices. However, because the measurements are around the devices in this study and not on a populational level, this does not affect the validation data but may make the study less generalizable in the future when trying to age-match this cohort to other studies with patients. It is important to note that this study serves as a proof-of-principle study around the analytical validation of the SKDH using in clinic tasks to reflect at home gait ranges and variations. With the small sample size, we are not generalizing the gait metrics on a populational level in this study but rather looking at the validity of a tool and methods to be used in future clinical validation studies with different patient populations. Also, as this is a Master protocol, in the future, populational level data could potentially be reported as the sample size for the overall study increases over time.

Future directions include applying SKDH gait to other form factors that are relevant to patient populations including chest accelerometry59 which, in particular, may be of interest to patients with cardiovascular diseases as many implanted devices as well as externally worn ECG patches also contain an accelerometer placed in the sternal position. Additionally, there are some technology companies that have designed pendants that are worn around the neck that sit upon the chest for which this algorithm may be applied. This is important as the lumbar position for gait assessment comes with some downsides compared to wrist worn devices shown when participants are asked to consider the best type of form factor that does not intrude with day-to-day activities and comfort (Fig. 7). There were still 94% of participants willing to wear the wrist worn device for 1–4 weeks or longer than 4 weeks compared to 72% of participants willing to wear the lumbar device for 1–4 weeks or longer than 4 weeks. No digital health technology study can accomplish its objectives if participants are unable or unwilling to wear the technology. Wearability is an important consideration for the future which may require different form factors coupled with statistical analysis of the data to investigate the amount of wear time required to assess for mobility in a particular study and target patient population to reduce participant burden.

While SKDH gait has been applied to adults as old as 85 years of age and other older populations with heart failure60,61, osteoarthritis56, and Parkinson’s disease55,62,63, a population that has been less well studied with digital health technologies includes children under the age of 18. This is a future direction to apply the knowledge learnt in validation of gait in adults to pediatric populations as well as to test the validity of the gait algorithm against reference standards in populations with physical activity limitations to evaluate the potential impact on algorithm reliability and gait speed variability. This dataset combined with other datasets collected from healthy control participants may also be used to look for construct validity by comparing these other populations against age-matched healthy controls and to evaluate sensitivity of certain gait endpoints such as gait speed in detecting differences induced by disease.

In current clinical trials, measures such as the 6MWT, 4SC, and the SPPB have been used as performance outcome assessments in the clinic to measure outcomes in interventional therapeutic trials. However, these assessments only provide a snapshot view of a participant’s functional ability and may depend on a multitude of factors including length of travel to clinical trial site, motivation, and performance effect15. Digital health technology offers an opportunity to measure patients in their own life and provide continuous assessment of their day-to-day real life fluctuations. The patient voice has greatly contributed to modern drug development; in multiple therapeutic areas ranging from heart failure to chronic obstructive pulmonary disease (COPD) to DMD5,64,65 there is a high unmet need for measures that accurately represent how a patient is doing at home. The demand for validated clinical outcome assessments that can demonstrate a clinically meaningful therapeutic response in clinical trials is only growing. This study illustrates an example of how to analytically validate these types of quantitative endpoints and select digital health technologies to give confidence for further clinical validation in patient populations.

Conclusion

The rise in sophisticated digital health technologies has enabled reliable and continuous measurements of gait and activity in free-living environments opening new doors in the clinical trial and healthcare spaces. This study serves as a proof-of-principle study around the analytical validation of the SKDH algorithm using in-lab tasks to simulate the variations of activity and walking patterns that would be found at home. Our results show that SKDH gait performance is reliable and accurate with respect to the reference methods and provide further evidence of its use with different sensors. Participants reported high comfort and compliance with the DHT in this study. In the future, these gait and activity measurement findings can be leveraged when performing clinical validation and assessments of digital health technologies in patient populations.

Methods

Subjects

Twenty healthy individuals aged 25–61 years (mean (SD) age = 33.95 (9.06 years), 13F,

Table 1), a race distribution of 13 white-identifying/7 Asian-identifying, mean (SD) of height of 168.5 (7.08) cm and weight of 70.11 (17.57) kg, signed informed consent prior to participation in this study10. Participants did not have any significant health problems as determined by a medically qualified study doctor during initial intake. The study was reviewed and approved by the Advarra IRB (protocol number: Pro00043100) and was conducted in accordance with relevant guidelines and regulations including the Declaration of Helsinki. Informed consent was obtained from all participants.

Table 1 Participant Demographics.

Experimental procedure

Each participant took part in two instrumented in-lab assessments about a week apart and around a 7 day at-home portion in between. The in-lab portion was completed at the Pfizer Innovation Research Laboratory (PfIRe Lab) in Cambridge, Massachusetts. During both visits, participants wore a set of 6 inertial sensors (Opal, APDM Inc., Portland, Oregon) consisting of tri-axial accelerometer, gyroscope, and magnetometer worn at the sternum, lumbar (approximately L4 position), and bilaterally on the wrists and the feet at a sampling rate of 128 Hz. Two tri-axial accelerometers (ActiGraph CentrePoint Insight Watch, ActiGraph, Pensacola, Florida) were also placed during the in-lab visits, one on the non-dominant wrist and one at the lumbar (L4 position) using a belt. During scripted walking tasks, participants walked on an instrumented walkway (GAITRite®, CIR Systems Inc., Franklin, New Jersey) to capture foot pressure data.

Participants completed a battery of tasks. First, they were asked to: walk at a self-selected “normal” walking speed, at a self-selected “slow” walking speed, and then at a self-selected “fast” walking speed for 3 laps over the 7 m mat. They were then asked to walk at a “normal” walking speed first on a thick carpet walkway and then on tile flooring to examine floor surface effect. The same 7 m mat was also used under the carpet but given the nature of the floor surface, GAITRite® could not be deployed on the tile walk task.

Six stations were set up where participants would complete 2 walking (tossing and retrieving beanbag, transferring blocks between stations), 2 sitting (reading a short story for a minute, reading a short passage aloud), and 2 standing tasks (rolling a dice 3 times and summing results, transferring paper crown from hook-to-hook) with station selection given at random as determined by the roll of a 6-sided dice. The participant would continue this process until 7 min passed and would repeat this a second time after a short break.

If they chose to and were deemed healthy-enough to do so, participants also participated in a 2 min elliptical task which served as a non-walking active task for algorithm training. Lastly, accompanied by study staff, participants completed a 20 min walk outside of the lab space wearing the 6 APDM Opals and 2 ActiGraph CentrePoint Insight watches at non-dominant wrist and lumbar locations. They then completed a self-developed comfort and wearability assessment (Supplementary Material: Comfort and Wearability Questions) and were instructed on how to wear the two ActiGraph CentrePoint Insight watches at home while going about their normal daily routines to capture free living data. The lumbar accelerometer was permitted to be removed during sleep, but participants were asked to wear the non-dominant wrist accelerometer continuously at home. They then returned to the site approximately 7 days later and repeated the battery of in lab testing and comfort and wearability assessment described above. In addition, participants answered comfort and wearability questions related to the devices they wore during the at-home monitoring period and their estimated wear time of these devices.

Algorithms and endpoint derivation

Three separate methods were used for estimating gait metrics during the performance of gait tasks in the laboratory. Reference gait metrics were estimated from data collected using an instrumented walkway using a vendor supplied proprietary algorithm (GAITRite Software version 4.8.8) with pressure data from the GAITRite® loaded into ProtoKinetics’ Movement Analysis Software (PKMAS Software Version 5.09C2a) for final endpoint derivation. Six inertial sensors (Opal, APDM) estimated gait metrics using proprietary algorithms (APDM Mobility Lab V2.0)51,52 during all in-lab tasks including the scripted gait tasks, the activity blocks, and the 20-min walking activity. Gait metrics including step count were also derived from the lumbar mounted ActiGraph tri-axial accelerometer using our open-source algorithm (SciKit Digital Health gait module, v0.9.7). Step counts were derived from the wrist-worn ActiGraph tri-axial accelerometer using the algorithm published by Bagui et al54.

Statistical analysis

The sample size is determined by ICC between gait speed obtained from the sensor data vs. gait speed obtained from the instrumented walkway (GAITRite®, CIR Systems Inc., Franklin, New Jersey) with null hypothesis derived by the median value of the lower bound of ICC for all gait metrics (8 spatial and temporal metrics) using the data from previous study15. In order to reach ICC value of 0.72 with ICCnull value of 0.27 using one-tailed test with alpha = 0.05 and power = 0.8, 17 participants were required, therefore, 20 participants were enrolled (anticipating 15% attrition).

Statistical analysis was performed in SAS version 9.4 and R version 4.2.2 (2022–10-31) with following main packages: “lme4” for linear mixed-effect regression, “BlandAltmanLeh” for Bland–Altman plots, and “psych” for ICC. For each in-lab task, the gait metrics or other activity-related variables were computed as the median values across the steps. The gait metrics were averaged across the in-lab visits for simplicity, as the data had high test–retest reliability and similar results were obtained when tested per each in-lab visit (see Supplementary Table 5). Agreement of gait metrics between different methods were characterized with Bland–Altman plots, accuracy metrics such as mean (percent) and mean absolute error, as well as ICC2,1 (two-way random effects, absolute agreement, with respect to single measurement). Reliability of estimated gait speed was assessed according to the following benchmarks: ICC < 0.4 indicates poor, 0.4–0.59 fair, 0.6–0.74 good, and 0.75–1 excellent reliability66. Pearson’s correlation coefficient was computed to test the similarity between gait metrics.

Linear mixed-effects models were used to test if gait metrics differed between the walking tasks and if gait metrics could be reliably captured with a single accelerometer device across different walking speeds as well as across simulated activities and outside lab walk tasks. The following model was tested for each gait metrics during multiple speed walking tasks: methods (GAITRite/APDM/SKDH gait as categorical variables), task (slow/natural/fast as categorical variables), and their interaction were entered as independent variable (fixed effect), age (as continuous variable) was entered as covariate, participant as random effect. For the comparison of simulated activity blocks and outside lab walks as well as the carpet vs tile walk tasks, the same model was used with methods (APDM/SKDH gait and GAITRite/APDM/SKDH gait), tasks (activity block/outside lab and carpet/tile), and their interaction as independent variables (fixed effect), age as covariate, and participant as random effect. The estimated least square means (lsmeans) or mean difference with 95% CI, F- or t-values, model fit parameters, and degrees of freedom are reported in supplementary materials as well as in the main figures when necessary. The uncorrected p-values are reported unless stated otherwise, then, correction for multiple comparisons was performed using false discovery rate (FDR). Paired t-tests were performed within the in-lab tasks across methods, within a method across the in-lab tasks, and between visits. One participant’s slow walk data for one visit was excluded from analysis, as there was an error in data collection for that task.

The at-home metrics were summarized with the following steps: non-compliant days (lumbar: 10 h of wear time, wrist: 18 h of wear time) and the first and last days of data monitoring (i.e., in-lab visit days) were excluded from the at-home analysis as participants performed an extensive amount of walking tasks during the in-lab visits. The metrics were then averaged/counted within days and averaged across days of wear. As suggested by multiple studies15,67,68,69,70, the average within the monitoring period was computed only if there were at least 4 compliant days. One participant’s lumbar at-home data was excluded as there were only 3 compliant days, and one participant’s wrist data was lost due to device malfunction. For the gait metrics, only gait bouts that were longer than 10 s and with more than four detected gait cycles were included in the analysis to ensure robust gait characterization. The median across the steps within each gait bout was computed, then averaged across the gait bouts within each day, finally each daily summary was averaged across the monitoring window. Only for the wear time computation at-home, non-compliant days were included in the summaries, and transformed into percentages, so that comparison to comfort and wearability questionnaires could be performed more precisely. The comfort and wearability questionnaires responses are summarized using counts and percentages. Two participants had missing data for the at-home comfort and wearability questionnaire for the lumbar device due to an error in the clinical data capture platform setup.