Introduction

Wearable biosensors are non-invasive devices capable of acquiring, processing, and storing health-related data, including an individual’s physical activity, heart rate, and caloric expenditure. Biosensors have been integrated into various platforms, including watches, wristbands, and smartphones, thereby providing opportunities for monitoring health and predicting medical events outside of traditional clinical settings1,2. In turn, patients can opt to share biosensor-acquired data with their healthcare providers, thereby enhancing clinical treatment decisions and facilitating self-management of diseases.3. In spine surgery, there is a growing trend towards digital and objective monitoring of physical activity in patients with degenerative spine disorders (DSD) using commercially-available wearables4,5. Currently, patient-reported outcome measures (PROMs), including the Oswestry Disability Index (ODI) or the Roland-Morris Disability Questionnaire (RMDQ) that measure disability and functional impairment specifically related to lower back pain, are the current gold standard in DSD patient assessment. However, PROMs collected at specific time points are limited in that they may not account for variability within a tested time-frame and they are inherently subjective in nature.6,7,8,9 In a 2021 published review, Zini et al.7 reported up to seven potential biases that arise during the collection and analysis of PROM questionnaires, including recall, language, timing, and data collection method biases. By using smartwatches, physicians can assess a patient’s daily ambulatory activity (measured in daily step counts) that, while not accounting for the entirety and intensity of a human’s physical activity, may correlate and provide a reliable surrogate for patient’s real-life physical activity. Importantly, individuals suffering from DSD typically experience pain, discomfort and neurological impairment that ultimately result in reduced mobility, including ambulation and real-life physical activity10.

Similarly, functional tests such as the Five-Times-Sit-to-Stand test or the 6-min Walking Test measure a person’s capacity: that is, their potential to perform a task in a controlled environment. However, functional tests are unable to determine one’s performance—i.e., the actual physical activity carried out in their daily life. This is because functional tests are only able to capture a snapshot of a patient’s ability at a single point in time, while longitudinal activity monitoring provides a comprehensive profile of a patient’s daily functional performance, with variations over time. Thus, in recent years, wearable real-life physical activity monitoring has been introduced as a new tool to assess disability and outcomes in patients with DSD4,11. Objective monitoring of patient’s physical activity may complement the comprehensive assessment of patients with a new outcome dimension that avoids the subjectivity and bias of PROMs.

As the body of research exploring the potential of real-life activity monitoring in assessing patients with DSD continues to expand, there emerges a need to standardize data collection and interpretation and to validate this new measure. This is especially important when using a measure like longitudinal activity monitoring that continuously records activity (e.g., steps/day or even steps/hour), which contrasts with the time-point analysis of commonly used PROMs. In particular, we need to consider inter- and intra-individual variability in order to understand how long we must monitor patient’s physical activity in order to obtain reliable and high-quality data. Other factors to consider in interpreting objectively monitored data include the temporal factors that influence a patient’s daily routines and activity levels, such as whether physical activity data is measured during weekdays or weekends.

The study presented here uses a commercially available smartwatch (Apple Watch) to measure preoperative daily physical activity of patients undergoing either cervical or thoracolumbar elective spine surgeries. We evaluate the intra-individual variability with the aim of determining the number of monitoring days necessary to reliably estimate and predict the variance observed during a week of physical activity measurement. Finally, to inform how future studies can weigh step count data collected on different days, we examined how patient’s physical activity changed between days, and whether data collected on weekdays differed from that collected on weekends. Given the high cost and effort required for continuous daily monitoring, the goal of this study is to provide guidelines for the minimum number of monitoring days needed in future studies reporting real-life physical activity monitoring in patients with spine disorders.

Methods

Study design and patient inclusion criteria

Between October 2020 and May 2023, 100 patients were enrolled and provided with an Apple Watch (Model SE 1st generation or Series 5) to assess the usability, effectiveness, and reliability of using commercially-available wearable devices to track the physical activity of patients diagnosed with DSD, encompassing both thoracolumbar and cervical pathologies (see Suppl. Table 1 for demographics of study cohort). To be eligible for the study, participants had to be English-speaking adults who owned an iPhone and were scheduled for elective spine surgery with an attending neurosurgeon at Stanford University. Patients undergoing surgery for spine trauma, tumors, or infections were excluded from participation in order to maintain a homogenous cohort of patients with DSD. Patients were asked to wear the Apple Watch continuously for a minimum of 2 weeks before surgery. Data was collected prospectively according to a predefined study protocol. A full-time clinical trial coordinator was responsible for orienting patients to the study, providing detailed instructions, and ensuring adherence to the protocol. The clinical trial coordinator met with patients in-person to provide them with their Apple Watch devices, download the study-specific application, and familiarize them with use of the smartwatch. Compliance was monitored on a weekly basis by the coordinator, who reached out to the patients via phone or email if they were non-compliant. Patients were advised to continue with their usual daily activities and routines without any additional encouragement to change their behavior leading up to the surgery. Please find a detailed patient flowchart in Suppl. Figure 1. For the presented analysis, we used preoperative step count data of both patients with cervical and lumbar DSD who completed at least 2 weeks of preoperative physical activity monitoring.

Physical activity monitoring

The built-in accelerometers of Apple Watches have repeatedly demonstrated a high level of reliability and fidelity in tracking activity data12,13. To collect health data from the Apple Watch while maintaining compliance with HIPAA regulations and the study’s specific requirements, patients utilized the custom-designed and study-specific NeuroCoach application, which they downloaded onto their iPhones after consenting to participate in this study. To ensure the data’s reproducibility and accuracy, patients were instructed to wear the Apple Watch on their wrist continuously throughout the day and night, recharging it overnight as needed. The collected data included various health metrics such as step counts, distance traveled, flights climbed, standing time, and heart rate.

Statistical analysis

Individual median, 25th and 75th percentiles, and interquartile range (IQR) for daily step count going back to 14 days before surgery were calculated. Following prior work and to increase generalizability, we only analyzed days on which participants wore their watches for at least 8 h14. We then adjusted patient’s step counts based on their watch wear-time, using reference values from the Michigan Predictive Activity & Clinical Trajectories in Health (MIPACT) study (Suppl Cont 1). This is based on the observation that the n = 6454 participants in the MIPACT study presented higher daily step counts with increasing daily watch wear time.14 Consecutive averages of preoperative days and comparisons in step count between weekdays and weekends are similarly reported. A Kolmogorov–Smirnov normality test concluded that our data significantly deviates from normality. As a result, we used a nonparametric Mann–Whitney U test to detect differences in patient’s median daily step counts, between preoperative weekdays and weekends, and also between the difference in average step count between the first and second preoperative week.

To determine whether the average steps/day were statistically different between preoperative days, we used a repeated measured analysis of variance (ANOVA). Given that our data was found to not be normally distributed, a logarithmic transformation was applied to stabilize variance and normalize the distribution. After applying the logarithmic transformation, the data’s adherence to normality was confirmed using the Kolmogorov–Smirnov test, making the data more suitable for the ANOVA analysis. The analysis of within-day effects for preoperative step counts included Mauchly’s Test of Sphericity (W), which assesses the assumption that the data’s pattern of variation is roughly the same for all of the preoperative days. If sphericity is met, it suggests that the data’s variability remains consistent across conditions. However, if sphericity is violated, it indicates that the variability in the data differs significantly between preoperative days. In the case that sphericity was violated, we applied the appropriate correction methods to our statistical analysis by interpreting the Greenhouse-Geisser (ε)-corrected F-statistic and its associated p-value. If a significant F level was detected, post-hoc tests were used to evaluate the observed differences between days. Next, in order to evaluate whether step counts differed between weekdays and weekends, we first computed the median, 25th and 75th percentiles, and the IQR of steps/day. A nonparametric Mann–Whitney U test was used to assess statistical significance between patient’s recorded step count during weekdays versus weekends.

To determine how many preoperative days are necessary to reliably represent patient’s physical activity, we computed the intra-class correlations (ICC) to assess the level of agreement, or consistency, between the step counts recorded on sequential preoperative days and the average step count over a defined period, such as a week or 2 weeks. In this model, patients were treated as a random effect, recognizing that each individual’s activity level could vary randomly from others in the cohort. The preoperative days were treated as fixed effects, under the assumption that the specific sequence of days leading up to surgery was of interest for the analysis rather than a random sample of days from a larger pool. The ‘rater’ in this context was the Apple Watch, which was used for tracking steps. Given that each patient wore the same type of device and that the Apple Watch’s step-counting mechanism is consistent across units, we did not treat the device as a source of random variability. Instead, we assumed that any variability in step counts would arise from patient behavior or day-to-day variations in physical activity rather than differences in measurement by the device itself. Thus, a two-way mixed-effects ICC model was utilized to gauge the consistency of daily measurements in approximating the average physical activity across the preoperative period. To do this, we first we computed the ICC between a single day and the average of 7 consecutive preoperative days. Then, we averaged patient’s steps between 1 and 2 preoperative days and computed the ICC between the average of 1–2 preoperative days to the average of 7 days. We continued sequentially averaging preoperative days (i.e., 1–3, 1–4…1–6) and computing a single ICC value between the sequential averages and the week average. In a separate analysis, we performed the same computation, except extending our ICC computations out to 14 preoperative days. We compared the ICC between each of these averages to determine how the reliability between preoperative day step counts changed throughout the 2 weeks before surgery.

Next, we evaluated fluctuations in physical activity during the preoperative period by calculating each patient’s difference in their average daily step counts between the week before surgery and the penultimate preoperative week. To determine whether patient’s physical activity statistically changed between these two weeks, we used a nonparametric Mann–Whitney U test to evaluate if the median difference in step counts deviated significantly from zero—a null value implying no change. A statistically significant change to a negative median indicates a general decline in activity from the penultimate preoperative week to the week before surgery. Conversely, a positive median change signifies an increase in physical activity. This calculation aimed to reveal whether there are significant changes in patient’s physical activity that cannot be captured by analyzing medians across all patients.

Last, a sequential linear regression analysis was used to determine how incorporating sequential preoperative physical activity data improves a model’s ability to predict the steps/day over fourteen consecutive days. In this analysis, each day’s data is added one at a time as a predictor to a linear model. The process starts with the first day as the sole predictor and progressively includes each subsequent day up to the 14th day, evaluating the impact of adding each day’s data on the model’s performance. As before, since our data is not normally distributed, a logarithmic transformation was applied to the data and adherence to normality was confirmed using the Kolmogorov–Smirnov test. The adjusted R2 output of a sequential regression model indicates the proportion of the variance in the dependent variable (steps/day over fourteen consecutive days) that is explained by the predictor variables (in this case, sequential averages in preoperative data).

For all statistical analyses performed, an alpha level of 0.05 was set to evaluate significance. Data was processed, structured, and analyzed using custom code written in MATLAB R2023a (The MathWorks, Natick, MA), R version 3.6.3 for Mac (R Core Team, 2020, RStudio: Integrated Development for R. RStudio, Inc., Boston, MA, http://www.rstudio.com/), and IBM SPSS Statistics for Mac (Version 29.0 Armonk, NY: IBM Corp).

Ethical consideration

This study received approval from the Stanford Institutional Review Board (IRB) under the reference number #52367 and was registered on ClinicalTrials.Gov under the title “Improving Spine Surgical Care with Real-Time Objective Patient Tracking Using the Apple Watch,” identified by the registration number NCT04379921. All methodological procedures were conducted in strict accordance with the relevant ethical guidelines and regulatory requirements. Furthermore, our research was financially supported through internal research grants from both Stanford and Stryker. All participating patients provided written informed consent before data was collected.

Results

Study sample

This study analyzed a total of 95 patients: 32.6% (n = 31) had cervical spondylosis and 67.4% (n = 64) had degenerative thoracolumbar disease (see Supp. Figure 1 for patient flowchart). The average patient age was 61.5 ± 13.5 years old, with a BMI of 29.3 ± 5.6. 64.2% of patients were female (n = 61) and 35.8% male (n = 34). We found that on average, patients wore the watch for 15.7 ± 4.2 h/day, for 87.1% of the days during the 14-day preoperative monitoring period.

Fig. 1
figure 1

Violin Plot of Patients’ Step Counts during Preoperative Days. Median daily step counts, along with the 25th and 75th percentiles, and the Interquartile Range (IQR) for each preoperative day during a 2-week period.

Measuring differences in daily step count

Figure 1 shows the median, 25th and 75th percentiles, and the IQR of patients’ step counts per preoperative days. The study participants had a median step count of 4757.68 (4930.94) steps/day over two preoperative weeks during monitoring. Patients had the highest step counts on Fridays, with a median of 5139.74 (5065.18) and were the least active on Sundays (4669.75 (4410.58)) (Fig. 2). The majority of patients had a median step count of ~ 5000 steps/day, with a small subset of patients having more than that, specifically between Monday and Friday (weekdays). We found no difference in patients’ average step count on weekdays versus weekends (4847.16 (4568.92) vs. 4679.43 (4545.30), p = 0.93) (Table 1).

Fig. 2
figure 2

Violin Plot of Patients’ Step Counts Monday–Sunday. Median daily step counts, along with the 25th and 75th percentiles, and the Interquartile Range (IQR) for each day of the week, from Monday to Sunday.

Table 1 Statistics of patients’ step counts Monday–Sunday Percentiles for step counts on weekdays and weekends, including the 25th, 50th (median), and 75th percentiles, as well as the Interquartile Range (IQR). Mann–Whitney U test p values evaluate the difference in step counts between weekdays and weekends.

Given that the assumption of sphericity was violated (W = 0.033, df = 90, p < 0.006), indicating that the variance of step counts between preoperative days differs significantly, we used the Greenhouse-Geisser (ε) correction to interpret the ANOVA F-statistic. We found that there was no significant difference in the step counts across preoperative days within participants (F = 0.727, df = 8.41, p = 0.68). This indicates that averages in step counts across different preoperative days did not differ significantly and that any observed differences in step counts are due to random fluctuations rather than a systematic day-to-day effect.

In a separate analysis, we examined whether there was a trend in physical activity over a longer timeframe by comparing the average daily step counts between the first and second preoperative weeks for each patient. The individual differences in patients’ median daily step counts between the first and second preoperative weeks were found to significantly deviate from zero, with a median value of − 163.47 steps (Mann–Whitney U p-value: 0.0116), indicating a decrease in physical activity as the surgery date approached (Fig. 3). The distribution of step count differences is skewed towards negative values, reinforcing the trend of declining activity, and reveals variations in the change of physical activity between days in the first versus second preoperative week.

Fig. 3
figure 3

Histogram of the difference in daily step counts between the first and second preoperative weeks. Frequency distribution shows the number of patients with specific differences in step count, ranging from a decrease of up to 4000 steps to an increase of over 5000 steps. The median difference is indicated by the red line at − 163.47 steps, with a p value of 0.0116.

Intra-class correlations (ICC) analysis

The ICC between the day before surgery (i.e., preoperative day 1) and the average of 7 preoperative days was ICC = 0.7291 (p < 0.001), which is interpreted to mean that 73% of the variance in the 7-day time-frame is captured within one preoperative day (Fig. 4). Averaging the step counts of the 2 days before surgery increased the ICC to 0.9182 across 7 preoperative days (p < 0.001). In this case, comparing two preoperative days to the 7-day average was sufficient to reach an ICC value where > 90% of the variance was captured. However, this differed from a parallel analysis comparing the ICC values within a 14-day preoperative time period. In this case, 2 days of preoperative data accounted for 86.9% of the variance in the week. Only when four preoperative days were compared to the 14-day average did the ICC value indicate that > 90% of the variance was being captured. In both cases, increasing averages of multiple preoperative days led to steady increases in the computed ICC (Fig. 4). This is further supported by ICC analysis that shows similar findings for both age groups with a constant cut off value of 2 and 4 preoperative monitoring days to reach ICC values > 90% for the 7- and 14-day preoperative time period (Suppl. Figure 2).

Fig. 4
figure 4

ICC analysis over preoperative days. Graph illustrating the Intraclass correlation coefficient (ICC) values plotted on the y-axis against 7- or 14-preoperative days on the x-axis. The ICC values demonstrate the consistency in step counts across different preoperative days, with higher values indicating greater agreement in step counts for the corresponding days.

Sequential linear regression model

Based on a sequential regression analysis and the adjusted R2 of each model (Table 2), when the day before surgery (i.e., preoperative day 1) was evaluated as a predictor of the 1-week step count average, the adjusted R2 was 0.73 (p < 0.001). When three preoperative days (i.e., preoperative day 1, 2, and 3) were considered as predictors, the adjusted R2 increased to 0.93 (p < 0.001), accounting for 93% of the variance in the model. Meanwhile, using three preoperative days as a predictor of a 2-week step count average resulted in an adjusted R2 of 0.88 (p < 0.001). A fourth preoperative day was a necessary predictor to increase the adjusted R2 to 0.92 (p < 0.001) and thus, account for at least 92% of the variance in a 2-week period. In both cases, adding the 5th preoperative day and onwards as predictors accounted for > 90% of the variance, with improvements observed as additional days were added to the model.

Table 2 Sequential linear regression between consecutive preoperative days sequential linear regression analysis for different combinations of preoperative days as predictors of the 1- and 2-week step count average.

Discussion

In this study, we utilize a commercially available smartwatch to assess activity pattern of patients before surgery for DSD. We specifically examine the number (and type) of days of step count monitoring that is required to estimate weekly average real-life physical activity. Our findings show that a minimum of 2 days of wearable step count data is sufficient to estimate > 90% of the variance in free-living patients’ physical activity over 1 week before surgery. However, the number of days of monitoring increases when accounting for a similar degree of the variance in physical activity over 2 weeks. While daily preoperative step counts remain relatively stable, longer activity monitoring is therefore necessary to account for the variance in step count over an increasing time frame. Moreover, the observed decline in individual patient’s physical activity as the surgery day approaches support the need for longer physical activity monitoring in order to accurately capture individual fluctuations in activity levels. These findings carry substantial implications for study design, as it can guide the development of assessment protocols that are as short as possible to maximize patient compliance, but still long enough to provide critical information to researchers.

As there is increasing reliance on digital monitoring and continuous real-time functional monitoring for surgical patients, it is important that we understand this variability so that we can better understand how much data we need to collect and when. Data collection can be costly and difficult to obtain, requiring physician and patient effort, as well as patient compliance. This is especially true when capturing patient’s real-life activity also known as physical performance. The WHO’s International Classification of Functioning (ICF) defines two distinct categories of function: capacity and performance. While capacity represents the capability of a subject to complete a given task in a ‘standardized’ environment (e.g. 6-min-Walking Test), performance relates to what a person actually does in his or her ‘current’ (usual) environment (real-life physical activity, e.g. measured in steps/day)11,15. By definition, performance therefore is determined in a non-standardized environment which highlights the need for uniform data capturing guidelines to reduce biases. Standardization of activity monitoring protocols and data collection would not only permit comparisons across diverse patient populations but also reveal the critical endpoints and metrics by which we evaluate data. Previous work in the general adult population without spine diseases have shown conflicting results regarding differences in daily physical activity variance, with some studies identifying significant differences in physical activity levels between different days of the week and across seasons16,17, while others have found no such disparities18. This inconsistency in the literature encouraged our own investigation into patient’s daily physical activity patterns. In our study, although patients’ activity levels showed some variations depending on the day of the week, the distinction between weekdays and weekends did not reach statistical significance. Moreover, the repeated measures ANOVA did not reveal a statistical difference in the means of preoperative days that may have resulted from specific day-to-day trends. This observation has important clinical implications and may inform us how measurement protocols may be designed and implemented. Specifically, the fact that patients did not exhibit a significant difference in activity levels between weekdays and weekends may allow for flexibility when deciding the duration of data collection. It may be possible for researchers to tailor their recording protocols without the strict necessity of distinguishing between weekdays and weekends, which provides a more adaptable and generalizable approach to data collection. It is important to note, however, that patients undergoing surgery for DSD may differ from a broadly healthy adult population in that they may be more compliant with data collection due to the perceived association with their surgical procedure and care team. Thus, while our findings suggest that patients did not exhibit a significant difference in activity levels between weekdays and weekends, allowing for flexibility in data collection duration is context-dependent.

As depicted in Figs. 1 and 2, patient’s average daily step counts remained relatively stable over the 14-day period of observation. This suggests that, on average, patients did not experience a significant increase or decrease in physical activity within each day. However, individual patients did show inter-day variance as well as differences in average step count between week one and two as depicted in Fig. 4. This suggests that on an individual patient level, analysis of longitudinal activity data may be more valuable to assess trends compared to single point analysis.

Our ICC analyses served as an illustration of the minimum number of days necessary to confidently estimate weekly and bi-weekly activity. We found that the inclusion of a minimum of 2 days of wearable step count data is sufficient to estimate > 90% of the variance in free-living patients’ physical activity over 1 week before surgery and that the inclusion of additional days offered marginal improvements to the reliability of these estimates. Interestingly, when estimating the 2-week average, no less than 4 days of monitoring were necessary to capture a similar degree of variance. This suggests that, if the goal is to derive a singular mean or median step count value for patients during a specific time frame, it is necessary to consider the duration of monitoring to yield reasonable, dependable estimates. This again highlights that the full extent of data fluctuation may only become apparent with long-term trend analysis.

Finally, our sequential regression analysis provided a different perspective from our ICC calculations. While the ICC evaluated the consistency of daily step counts over time for an individual, the sequential regression determined the accuracy of a prediction model over a 1-week and 2-week preoperative period. As the adjusted R2 values approached 1 with the inclusion of more days, our model’s ability to account for the total variance in preoperative step counts improved, confirming that our predictions become more accurate with each additional day of data. This complements our ICC findings by showing not just internal consistency, but also the predictive accuracy of our step count data over time. The regression analysis shows that the inclusion of 3 days was sufficient for an adjusted R2 > 0.9 and that more than 4 days of monitoring provided little additional predictive power, both in the weekly and bi-weekly analysis. However, it is essential to interpret these results alongside our ICC analysis, which confirmed that no single day alone could sufficiently represent habitual activity. There is a crucial distinction between the two statistical analyses: while the regression analysis assesses the predictability of a model based on a few days of data, the ICC evaluates the consistency or reliability of those days as representative samples of a longer time span. Hence, when interpreting the ability to represent habitual activity, the ICC findings are particularly telling; they emphasize the inherent variability in daily activities, affirming that multiple days of monitoring are necessary to gain a true reflection of a patient’s typical activity pattern, beyond what can be predicted by a regression model.

Is it possible to extrapolate our results to other settings in which we aggregate longitudinal data to capture patient’s variability in physical activity? Our findings suggest that it is necessary to average longer monitoring periods to accurately estimate a patient’s physical activity over extended durations such as a month or a season. However, it is important to recognize that this approach can lead to a decrease in the granularity of data. This loss of detail may veil critical short-term variations in physical activity that are essential for fully understanding individual patient behaviors and effectively customizing treatment plans19. Therefore, future research and clinical practice should aim to balance monitoring to capture long-term differences with the need to preserve the granular detail that highlights trends and transient but significant changes in physical activity19.

Limitations

The primary limitation in this study stems from our relatively small sample size, which limits our ability to analyze specific DSD subgroups based on specific symptoms, thereby potentially limiting the generalizability of our results. Additionally, we did not account for seasonal factors or the employment status of patients, which may both affect their daily activity patterns. Furthermore, our study concentrated on identifying the optimal number of days necessary to represent patient’s physical activity throughout 2 weeks, without incorporating longer time intervals. Studies have emphasized the importance of considering longer monitoring periods when aiming to accurately capture an individual’s habitual physical activity level. In their 2018 paper, Bergman et al.20 found that even with a 7-day monitoring period, there remains a considerable range of variability in accelerometer-tracked activity levels. This variability can potentially mask the effects of interventions or obscure meaningful associations between physical activity and health outcomes. Another limitation of our study is the primary focus on ambulatory activity, measured by step count. While steps or ambulatory activity is a component of daily physical activity, it does not capture the entirety of a person’s physical activity. For instance, this measure does not account for the intensity, speed, or duration of ambulation, which are also important aspects of physical activity. Additionally, the study’s generalizability may be constrained by the socioeconomic status of iPhone users, which can influence activity levels and health outcomes. To mitigate this limitation, future endeavors may consider subsidizing or providing iPhones through grant-funded initiatives to standardize access across diverse socioeconomic groups—an approach that could help equalize participation opportunities and ensure that our findings are more reflective of the general population, irrespective of socioeconomic status.

Conclusion

Here, we report that a minimum of 2 days of wearable step count data is sufficient to estimate over 90% of the variance in free-living DSD patient’s weekly physical activity before surgery. There was no difference in patient’s step count during weekdays versus weekends. A minimum of 4 days was needed to explain over 90% of the variance in weekly step counts, but no single day alone accurately represented habitual activity. While preoperative step counts remain relatively stable, longer activity monitoring is necessary to account for the variance in step count over an increasing time frame. The full extent of data fluctuation may therefore only become apparent with long-term trend analysis. These findings have significant implications for study design, offering cost-effective assessment protocols and improving participant compliance.