Abstract
Consumer wearable technology use is widespread and there is a need to validate measures obtained in uncontrolled settings. Because no standard exists for the treatment of heart rate data during exercise, the effect of different approaches on reliability (Coefficient of Variation [CV], Intraclass Correlation Coefficient [ICC]) and validity (Mean Absolute Percent Error [MAPE], Lin’s Concordance Correlation Coefficient [CCC)] were determined in the Polar Verity Sense and OH1 during trail running. The Verity Sense met the reliability (CV < 5%, ICC > 0.7) and validity thresholds (MAPE < 5%, CCC > 0.9) in all cases. The OH1 met reliability thresholds in all cases except entire session average (ICC = 0.57). The OH1 met the validity MAPE threshold in all cases (3.3–4.1%), but not CCC (0.6–0.86). Despite various heart rate data processing methods, the approach may not affect reliability and validity interpretation provided adequate data points are obtained. It is also possible that a large volume of data will artificially inflate metrics.
Similar content being viewed by others
Introduction
Heart rate (HR) is used as a physiological indicator of exercise intensity by athletes, coaches, and recreational exercisers1. Many exercise prescriptions are based on heart rate range, either as a percent of maximal2 or using a relative level such as with the Karvonen formula3. It becomes important then for individuals to accurately obtain heart rate during exercise and physical activity. Wearable technology has become nearly universally utilized4. These wearable devices return a variety of metrics including step count5, energy expenditure6, and heart rate7. Wearable devices have been used to provide metrics for many public health issues. For example, heart rate measurements can be incorporated into artificial pancreas systems to improve glycemic control, serving as a useful tool for managing diabetes8,9. Moreover, wearable devices can be used to track and monitor stress management10, obesity11, heart failure12, sleep disorders13, and cardiovascular disease14. Therefore, accurate wearable devices have the potential to improve the outcomes of a wide range of public health concerns. Investigating the reliability and validity of different wearable devices provides valuable information.
When considering the variable of heart rate during exercise, wearable technology investigations have used a variety of processing algorithms to evaluate the concurrent validity of wearable devices against criterion devices. Some studies have used a cross-sectional approach, obtaining a single HR measurement at specific intervals such as one measure every second7,15,16,17,18,19,20,21,22,23, 15 s24, 30 s25, or 60 s25,26,27,28,29,30. Other investigations have processed the heart rate data by taking an arithmetic mean over specific intervals, including 5-s epochs31,32,33,34, 10-s epochs35, the exercise stage during steady state activities of differing intensity36, or the entire bout37. It is unknown what effect differences in the data processing of heart rate may have on the ultimate decision of agreement, validity, and reliability in wearable devices.
Another unanswered question is what effect the exercise duration has on decisions of validity and reliability. Our previous work evaluated heart rate agreement and validity over the course of a two-mile (3.2 km) trail run (average duration was approximately 22 min), but reliability was not evaluated20. Determining the reliability of wearable devices is an issue that has been raised in several systematic reviews38,39,40, but continues to be understudied, perhaps because of the added time investment needed to measure reliability. Because the Consumer Technology Association (CTA) recommends a minimum of 5 min in duration when validating heart rate devices during exercise41, this has likely become the minimum default length of time for many investigators7,18,42. The consequences of differing exercise durations on decisions relating to validity and reliability of heart rate-based devices is, to our knowledge, unaddressed.
One difficulty is there are no universally accepted standards utilized for the processing of heart rate data. Various organizations have set forth recommendations41,43, but as evidenced by the variety of approaches highlighted above, investigators have yet to put these guidelines into practice. In 2018, the CTA published a report recommending that data processing be accomplished through the temporal averaging of the experimental and criterion devices and synced according to the sampling rate of the experimental device41. More recently, in 2021 a group of European universities started an initiative to develop and recommend best practices for validating heart rate measurements by consumer wearables (Towards Intelligent Health and Well-Being: Network of Physical Activity Assessment, or INTERLIVE)43. Like the CTA, the group recommended that the criterion measure be aligned with the experimental epoch. The group went a step further by recommending that the average measurement window be 5 s or fewer and that an automated synchronization process be implemented43.
To date, an unanswered question remains regarding what effect heart rate data processing has on decisions made with respect to wearable technology device agreement, equivalence44, reliability, and validity. It is hypothesized that data processing will affect whether wearable technology devices are considered valid and reliable according to predetermined thresholds. Additionally, there is a need to evaluate the effect of a minimal duration versus an entire exercise bout when performed in an outdoor setting. In this regard, we hypothesize that exercise duration should not affect decisions when heart rate is measured concurrently. Finally, as the experimental wearable devices utilized in the current investigation have not been determined to be valid or reliable in any use case, there is a need for this information to be reported. Toward this end, the three main purposes of the study were to (1) determine the effect of heart rate data processing on metrics used to make decisions regarding validity and reliability, (2) evaluate the effect of differing lengths of sampling duration on measures associated with heart rate validity, agreement, equivalence, and reliability, and (3) report the concurrent heart rate validity and reliability of the Polar Verity Sense and Polar OH1 during a trail running use case.
Results
Validity
When the entire duration of the trail run was considered, the Polar Verity Sense met the minimum threshold for validity under all data processing methods (see Table 1, Bland–Altman plots are provided in the Supplementary file Figs. S1–S7). When only the first 5 min of the trail run were considered, the Polar Verity Sense did not meet either of the predetermined validity thresholds for any of the data processing methods (see Table 2, Bland–Altman plots are provided in the Supplementary file Figs. S8–S14).
When the entire duration of the trail run was considered, the Polar OH1 met the minimum mean absolute percent error (MAPE) threshold for validity under all of the data processing methods but did not meet the minimum Lin’s Concordance threshold (see Table 3, Bland–Altman plots are provided in the Supplementary file Figs. S15–S21). When only the first 5 min of the trail run were considered, the Polar OH1 did not meet either of the predetermined validity thresholds for any of the data processing methods (see Table 4, Bland–Altman plots are provided in the Supplementary file Figs. S22–S28).
Equivalence
When the entire duration of the trail run was considered, the Polar Verity Sense did not meet the assumption of equivalence for any of the data processing methods (see Table 1, equivalence plots are provided in the Supplementary file Figs. S29–S35). The device did not meet the assumption when only the first 5 min of the trail run were considered (see Table 2, equivalence plots are provided in the Supplementary file Figs. S36–S42).
Similar to what was observed for the Polar Verity Sense, the OH1 did not meet the assumption of equivalence for any of the data processing methods when the entire trail run was considered, or when only the first 5 min of the run were considered (see Tables 3 and 4, equivalence plots are provided in the Supplementary file Figs. S43–S56).
Reliability
The Polar Verity Sense met the threshold for both absolute reliability (coefficient of variation, CV) and relative reliability (intraclass correlation coefficient, ICC) for all data processing methods when the entire duration of the trail run was considered (see Table 1). The same observations were noted when only the first 5 min of the trail run were considered (see Table 2).
The Polar OH1 met all thresholds for reliability over the course of the entire trail run except when considering the session average heart rate method (see Table 3). The session average did not meet the assumption for ICC. When only the first 5 min were considered, the Polar OH1 met the threshold for all reliability tests for all of the data processing methods (see Table 4).
Power and sample size determination
Trail running is an inherently dynamic exercise that produces a variable, rather than steady state, heart rate response. With this acknowledgement, we report the actual power derived from each of the data processing methods along with a calculated sample size (see Table 5). The aim is to provide subsequent researchers with information necessary to determine appropriate sample sizes for similar use cases.
Considering the Polar Verity Sense over the course of the entire trail run period, the actual power ranged from 0.8575 (15-s cross-sectional sampling) to 0.9158 (average heart rate across the entire session). Power analyses using these data revealed an appropriate total sample size to be four to five participants. When only the first 5 min of the trail run were considered, the actual power ranged from 0.8029 (30-s cross-sectional sampling) to 0.8886 (15-s cross-sectional sampling). Power analyses using these data revealed an appropriate total sample size to be five to seven participants.
When the Polar OH1 was considered over the entire trail run duration, the actual power ranged from 0.8004 (second-by-second cross-sectional sampling) to 0.8499 (1-min cross-sectional sampling). Power analyses using these data revealed an appropriate total sample size to be six to twelve participants. When only the first 5 min of the trail run were considered, the actual power ranged from 0.8045 (session average) to 0.8634 (10-s averages). Power analyses using these data revealed an appropriate total sample size to be six to nine participants.
Discussion
The three-fold purpose of this investigation was to (1) determine the effect of heart rate data processing methods on assumptions used to make validity and reliability decisions, (2) evaluate the effect of different lengths of sampling duration on measures associated with heart rate validity, agreement, equivalence, and reliability, and (3) report concurrent heart rate validity and reliability of the Polar Verity Sense and Polar OH1 during trail running. Differences in data processing methods did not affect the interpretation of the Polar Verity Sense heart rate data. The same observations were true for the Polar OH1, with the exception of the overall session average, which was not aligned with the remaining data processing methods. Considering the duration of data processing, utilizing only the first 5 min of the trail run affected agreement (increased bias and limits of agreement) and validity (increased MAPE and lower CCC) measurements for both devices but not equivalence or reliability metrics when evaluated against the entire duration of the run. Overall, these findings provide evidence that the Polar Verity Sense is both valid and reliable for heart rate measurements during a trail running use case. The utility of the Polar OH1 depends on how the heart rate data are processed.
To determine if utilizing different data processing methods would affect decisions related to the reliability and validity of the experimental wearable technology devices, a variety of methods were employed in the current study. The methods have been commonly used in the literature, and include a cross-sectional approach, evaluating a single measurement second-by-second7,15,16,17,18,19,20,21,22,23, every 15 s24, 30 s25, and 60 s25,26,27,28,29,30. We also evaluated the effect of smoothing heart rate data by taking an average over time, including 5-s epochs31,32,33,34, 10-s epochs35, and an average of the entire session37 as have been reported in the literature. Our findings reveal that the Polar Verity Sense was considered both reliable and valid over the duration of the entire trail run regardless of the data processing method used. Our findings of the Polar OH1 are mixed, with the average of the entire session not meeting the predetermined threshold for reliability (specifically the ICC). Additionally, the Polar OH1 did not meet the validity threshold for CCC using any of the data processing methods. It should be noted that the average of the entire session contained the least number of data points (17 versus 320 to 19,067 for the other methods), although evidence exists to suggest that an appropriate number of participants were tested and sufficient power was obtained. It is tempting to speculate that a small number of data points may not affect decisions on wearable devices that should be considered reliable and valid but may expose devices where the assumptions cannot be met. Further investigation into the consequences of these findings is warranted.
The Consumer Technology Association recommends a minimum duration of 5 min when validating heart rate devices during an exercise use case41. Because of this recommendation, 5 min may be the preferred length of time used for validation studies7,18,42. Since we previously recommended utilizing longer time periods in applied settings20, we wanted to determine what effect evaluating only the first 5 min of the trail run would have on common assumptions, contrasting them with the entire duration of the session. The Polar Verity Sense met the minimum thresholds for MAPE and CCC when the entire run was considered but neither threshold when only the first 5 min were considered. This case is peculiar, as concurrent device validity should theoretically be expected to meet the predetermined thresholds regardless of the duration employed (i.e. a valid heart rate device will report accurate measures regardless of terrain inclines or how variable the heart rate response is to exercise). These data raise questions of interest that warrants further investigation. The first question is associated with the quantity of data reported—namely, whether more data consequentially reduces the influence of spurious readings from a device. Evidence from the current investigation suggests this may be the case, particularly the interpretation of the Polar OH1 data over the entire run when considering the session average against all other data processing methods. Another question centers on the frequency of such spurious readings, and whether they are more likely to occur at the outset of an exercise bout before a steady state is reached. While this potential explanation is intriguing, we previously reported no change in heart rate assumptions during the uphill portion (initial portion of a trail run) when compared to the downhill portion of a trail run (latter portion)20. It is clear that while much research has focused on the concurrent validity of wearables during exercise15,18,31,36,45,46,47, a greater focus needs to be directed toward the consequences of varying duration and what effect this factor has on ultimate decisions related to device validity and reliability. Additionally, how exercise intensity is varied is important to future investigations. While trail running is an applied activity that is inherently variable, future studies employing consistent variations in intensity (such as high-intensity interval training) are warranted. Furthermore, conducting the same analyses in a wider array of steady state aerobic exercises (such as cycling, swimming, and running), and high-intensity anaerobic exercise would be useful to confirm whether those results are similar to the trail running use case in the current investigation.
The validity of the Polar OH1 has been reported for various use cases including treadmill and cycle exercise19,23, swimming21, and a variety of training modalities (biking, tennis, running, soccer, walking)35. With second-by-second data processing, the Polar OH1 was deemed to have acceptable validity during treadmill (MAPE between 0.2 and 1.9%) and cycle exercise (MAPE between 0.6 and 3.9%)23. Employing second-by-second data processing, the Polar OH1 was reported to have acceptable agreement during treadmill and spin bike activities (mean bias less than 1 bpm)19. Also utilizing second-by-second processing, the Polar OH1 was deemed to have acceptable validity through all ranges of front crawl swimming intensity (ICC between 0.72 and 0.96)21. Using 10-s smoothing, the Polar OH1 was considered to have good agreement, particularly for endurance sports (difference from criterion < 5%), as well as acceptable reliability (ICC = 0.99) although the protocol for determining reliability was not disclosed35. We add to the literature that the Polar OH1 may be considered both valid and reliable during trail runs longer than 5 min, with the exception of when the data processing is averaged over the course of the session.
The use of the Polar Verity Sense has been reported in a variety of applications, including during a 24-h ultramarathon48, obtaining physiological stress measures in patients on a workplace stress reduction program49, and in a proposal to monitor intensity adherence of a frame running program in children with cerebral palsy50. To our knowledge, the only published literature on the validity of the Polar Verity Sense is in abstract form from our laboratory group51,52,53, and the reliability of the device has not been established. We report for the first time that the Polar Verity Sense can be considered both valid and reliable during trail runs longer than 5 min.
This investigation is not without limitations. Our previous work has detailed how conducting research in applied settings with ambient light sources could affect wearable devices that rely on photoplethysmography (PPG)20. As the present investigation was conducted in an outdoor trail setting, ambient light must be considered a potential limiting factor. Another limitation could lie in the manner in which we evaluated concurrent reliability, utilizing two of the same devices attached to each arm. While this approach has been used with footpod-based devices54, the utility has not been employed in PPG-based wearables. Thus, it is possible that differences in blood flow patterns between limbs could have affected reliability measures, making the devices appear unreliable when they were actually reliable. Another limitation is potentially found in the statistical measures used to determine the acceptability of the devices. While no common set of statistical tests are utilized to provide evidence of device acceptability, testing for equivalence has been proposed44. A common test of equivalence is the two one-sided test (TOST); unfortunately, appropriate TOST thresholds have not been established for wearable devices45. Given the data presented in the current investigation, the utility of the TOST for the determination of acceptability of wearable devices in an applied setting may be limited. This conclusion stems from the observation that equivalence was unacceptable regardless of whether the thresholds for reliability and validity were met. Further investigation into the appropriate use cases of the TOST test in wearable device evaluation are warranted. Finally, a potential limitation could be that we did not test at least twenty participants, as recommended by the CTA41. In this regard, we have reported the actual power obtained from each of the data processing methods (Table 5) and provide evidence to suggest that an appropriate number of data points were obtained from enough participants.
The current investigation provides evidence that despite the numerous methods in which wearable device heart rate data are processed, the approach may have little effect on the interpretation of overall validity and reliability, provided an adequate number of data points are obtained from enough participants. If a device is truly valid and reliable, it will meet the minimum thresholds regardless of the number of observations obtained. On the other hand, it is possible that obtaining a large number of observations, such as through second-by-second processing, may artificially inflate the validity or reliability metrics by concealing spurious observations. Considering this possibility, it may be prudent for researchers to perform data processing with both a minimal number of data points (session average) and many data points (i.e., any of the other methods used in this investigation) to tease out their potential effects upon which decisions are made about reliability and validity. The data additionally seem to suggest that, for exercises of highly variable intensity such as trail running, durations longer than 5 min are warranted. With the evidence presented in this study, we conclude that the Polar Verity Sense is both valid and reliable during trail running.
Methods
Participants
Seventeen healthy participants (Female n = 7; Male n = 10; Transgender, Intersex, or Other n = 0) completed testing. Demographic characteristics: Age = 25 ± 9 years (mean ± standard deviation), height = 168 ± 9 cm, mass = 72 ± 14 kg. Participants were screened and deemed not to require medical clearance to complete exercise according to the American College of Sports Medicine preparticipation health screening recommendations55. Participants were deemed healthy if they had no cardiovascular, metabolic, or renal disease, and had no signs or symptoms suggestive of the diseases. Participants were excluded if they had known cardiovascular, metabolic, or renal disease or if they did not participate in regular exercise and had signs or symptoms associated with the diseases. A power analysis was conducted using our pilot data with the same wearable devices52, indicating the need for at least eleven participants (coefficient of determination r2 = 0.57, correlation ρ effect size = 0.755, α = 0.05, β = 0.80)56. Prior to participation, individuals gave verbal consent and completed an approved informed consent document. The methods were performed in accordance with relevant guidelines and regulations and approved by Southern Utah University (#11-082022a) and the University of Nevada, Las Vegas (UNLV-2022-392).
Protocol
Participants were outfitted with heart rate sensing wearable devices and a secure Bluetooth connection was confirmed. In all instances, devices were affixed according to manufacturer recommendations. The criterion device was the Polar H10 (Polar Electro, Kempele, Finland) attached securely around the chest of the participant. The experimental devices were the Polar OH1 (Polar Electro, Kempele, Finland) and Polar Verity Sense (Polar Electro, Kempele, Finland), placed on both the right and left biceps. Two of the same models were used simultaneously so that concurrent reliability could be obtained54. All devices (H10, Verity Sense, OH1) were connected via Bluetooth to an iPad mini (Apple Inc., Cupertino, CA) with the PerformTek application (Valencell, Inc., Raleigh, NC) which provides second-by-second heart rate of all connected devices on a single csv file.
Participants were instructed to complete a self-paced, out-and-back run on the Thunderbird Gardens Lightning Switch trail in Cedar City, UT (see Fig. 1). Participants ran out on the trail for 10 min in a generally uphill direction and then returned to the trailhead. The mean running time was 21.2 ± 1.6 min (range = 19.5 to 24.3 min). Estimated maximal heart rate was calculated using 211 – (0.64 × age) which formula is accurate for active individuals57. Using the highest heart rate obtained from the criterion device during the trail run as a percentage of the age estimated maximal heart rate revealed the exercise bout to be of high intensity (mean = 94.5 ± 4.9%; range = 83.5 to 100.0%). The environmental conditions during testing included the following averages and ranges: temperature = 19.8 ± 4.5 °C (8.9 to 25 °C), humidity = 48.6 ± 20.6% (12 to 86%), windspeed = 14.3 ± 12.4 km h−1 (0 to 33.8 km h−1). The altitude was 1783 m at the trailhead, and the elevation change was 52.5 ± 11.1 m (36.6 to 72.8 m).
Devices
Polar H10
The Polar H10 chest strap has been shown to be valid compared to electrocardiography58, and have acceptable reliability59, although the use case specific to trail running has not been determined. The Polar H10 is an electrocardiogram-based heart rate sensor that was secured around the chest of the participant at the level of the xyphoid process. The device contains plastic electrodes on the underside of the strap that detect heart rate. The sensor materials include acrylonitrile butadiene styrene (ABS), ABS plus glass fiber (ABS + GF), polycarbonate, and stainless steel, while the strap material is composed of 38% polyamide, 29% polyurethane, 20% elastane, 13% polyester, and silicone prints. The Polar H10 has a sampling frequency of 1000 Hz. It was connected to an iPad mini via Bluetooth.
Polar Verity Sense
The Polar Verity Sense is a PPG device. It is an optical heart rate sensor designed to be worn on the upper arm. The sensor materials include ABS, ABS + GF, poly(methyl methacrylate) (PMMA), and steel use stainless (SUS) 316. The device was positioned with the sensor on the underside of the armband and firmly against the skin. The Polar Verity Sense has a sample rate of 135 Hz and was connected to an iPad mini via Bluetooth.
Polar OH1
The Polar OH1 is a PPG device. Like the Polar Verity Sense, it is an optical heart rate sensor designed to be worn on the upper arm. The sensor materials include ABS, ABS + GF, PMMA, and SUS 316. The device was positioned so that the sensor was on the underside of the armband and firmly against the skin. The Polar OH1 has a sample rate of 135 Hz. It was connected to an iPad mini via Bluetooth.
Data processing
There was no missing data from either of the experimental wearable technology devices or from the criterion device. Data were processed per methods commonly reported in the literature using cross-sectional (CS) and smoothing (or averaging, [AVG]) methods. For the CS approach, data were obtained at each timepoint noted. For the second-by-second method, data were obtained each second (60 times on the second over the course of 60 s). For the 15-s cross-sectional method, data were obtained every 15 s (four times per minute: at 15 s, 30 s, 45 s, and 60 s). For the 30-s cross-sectional method, data were obtained every 30 s (two times per minute: at 30 s and 60 s). For the 60-s cross-sectional method, data were obtained every minute for the duration of the exercise period.
For the AVG approach, data were averaged across the particular timeframe. For the 5-s average method, the mean of the data was obtained in 5-s increments (12 times per minute: 0–5 s, 5–10 s, 10–15 s, 15–20 s, 20–25 s, 25–30 s, 30–35 s, 35–40 s, 40–45 s, 45–50 s, 50–55 s, 55–60 s). For the 10-s average method, the mean of the data was obtained in 10-s increments (six times per minute: 0–10 s, 10–20 s, 20–30 s, 30–40 s, 40–50 s, 50–60 s). For the 30-s average method, the mean of the data was obtained in 30-s increments (two times per minute: 0–30 s and 30–60 s). For the session average, the mean of the entire data set for each participant was utilized (one value per participant).
Statistical analysis
Measures associated with validity that we reported included mean absolute percent error, and Lin’s Concordance Correlation Coefficient, and the mean absolute error. The equations for these metrics were input into an Excel spreadsheet (Microsoft Excel for Mac version 16.66.1, Redmond, WA). For validity thresholds we have used a MAPE value ≤ 5%7,20, and a CCC ≥ 0.9020.
Agreement was determined using the Bland–Altman analysis. Bland–Altman bias and limits of agreement were determined using the blandr analysis in jamovi (version 2.3.19.0)60. There are currently no thresholds established to denote acceptable agreement on the basis of the Bland–Altman analysis independent of other measures.
Equivalence was determined using the two one-sided test. Equivalence testing was determined using the TOSTER analysis in jamovi (version 2.3.19.0)60. If the confidence interval (CI) lies within the upper and lower estimate, the two means are considered equivalent61.
Measures associated with reliability that we reported included the coefficient of variation, and intraclass correlation coefficient. The equation for CV was input into an Excel spreadsheet (Microsoft Excel for Mac version 16.66.1, Redmond, WA). Both the ICC and Cronbach’s α were determined using SPSS Statistics (IBM SPSS Statistics, version 28.0.1.0, Chicago, IL). For the outdoor trail setting we used a threshold of ≤ 10% for CV, and ≥ 0.70 for ICC62.
SPSS Statistics (IBM SPSS Statistics, version 28.0.1.0, Chicago, IL) were used to determine Pearson’s Product Moment Correlation Coefficients. The r2 value was then used in G Power56 to determine actual power and sample sizes.
Data availability
The raw dataset generated during the current study are available in the Harvard Dataverse repository, https://doi.org/10.7910/DVN/0M49BY.
References
Karvonen, J. & Vuorimaa, T. Heart-rate and exercise intensity during sports activities—Practical application. Sports Med. 5, 303–311 (1988).
Franklin, B. A., Hodgson, J. & Buskirk, E. R. Relationship between percent maximal O2 uptake and percent maximal heart rate in women. Res. Q. Exerc. Sport 51, 616–624 (1980).
Roitman, J. L., Pavlisko, J. J., Schultz, G. W., Sheffer, D. B. & Hillman, G. Exercise prescription by heart rate and met methods. Phys. Sportsmed. 6, 98–102 (1978).
Liguori, G., Kennedy, D. J. & Navalta, J. W. Fitness wearables. ACSMs Health Fit J. 22, 6–8 (2018).
Navalta, J. W. et al. Wearable device validity in determining step count during hiking and trail running. J. Meas. Phys. Behav. 1, 86–93 (2018).
Wahl, Y., Duking, P., Droszez, A., Wahl, P. & Mester, J. Criterion-validity of commercially available physical activity tracker to estimate step count, covered distance and energy expenditure during sports conditions. Front. Physiol. 8, 725. https://doi.org/10.3389/fphys.2017.00725 (2017).
Navalta, J. W., Ramirez, G. G., Maxwell, C., Radzak, K. N. & McGinnis, G. R. Validity and reliability of three commercially available smart sports bras during treadmill walking and running. Sci. Rep. 10, 7397. https://doi.org/10.1038/s41598-020-64185-z (2020).
Hettiarachchi, C. et al. Integrating multiple inputs into an artificial pancreas system: Narrative literature review. JMIR Diabetes 7, e28861. https://doi.org/10.2196/28861 (2022).
Resalat, N. et al. Adaptive control of an artificial pancreas using model identification, adaptive postprandial insulin delivery, and heart rate and accelerometry as control inputs. J. Diabetes Sci. Technol. 13, 1044–1053. https://doi.org/10.1177/1932296819881467 (2019).
Hickey, B. A. et al. Smart devices and wearable technologies to detect and monitor mental health conditions and stress: A systematic review. Sensors https://doi.org/10.3390/s21103461 (2021).
Hu, R., van Velthoven, M. H. & Meinert, E. Perspectives of people who are overweight and obese on using wearable technology for weight management: Systematic review. JMIR Mhealth Uhealth 8, e12651. https://doi.org/10.2196/12651 (2020).
Singhal, A. & Cowie, M. R. The role of wearables in heart failure. Curr. Heart Fail. Rep. 17, 125–132. https://doi.org/10.1007/s11897-020-00467-x (2020).
Shelgikar, A. V., Anderson, P. F. & Stephens, M. R. Sleep tracking, wearable technology, and opportunities for research and clinical care. Chest 150, 732–743. https://doi.org/10.1016/j.chest.2016.04.016 (2016).
Bayoumy, K. et al. Smart wearable devices in cardiovascular care: Where we are and how to move forward. Nat. Rev. Cardiol. 18, 581–599. https://doi.org/10.1038/s41569-021-00522-7 (2021).
Jo, E., Lewis, K., Directo, D., Kim, M. J. & Dolezal, B. A. Validation of biofeedback wearables for photoplethysmographic heart rate tracking. J. Sports Sci. Med. 15, 540–547 (2016).
Parak, J., Uuskoski, M., Machek, J. & Korhonen, I. Estimating heart rate, energy expenditure, and physical performance with a wrist photoplethysmographic device during running. JMIR Mhealth Uhealth 5, e97. https://doi.org/10.2196/mhealth.7437 (2017).
Reddy, R. K. et al. Accuracy of wrist-worn activity monitors during common daily physical activities and types of structured exercise: Evaluation study. JMIR Mhealth Uhealth 6, e10338. https://doi.org/10.2196/10338 (2018).
Bunn, J. A., Wells, E., Manor, J. & Wenster, M. Evaluation of earbud and wristwatch heart rate monitors during aerobic and resistance training. Int. J. Exerc. Sci. 12, 374 (2019).
Hettiarachchi, I. T., Hanoun, S., Nahavandi, D. & Nahavandi, S. Validation of Polar OH1 optical heart rate sensor for moderate and high intensity physical activities. PLoS ONE 14, e0217288. https://doi.org/10.1371/journal.pone.0217288 (2019).
Navalta, J. W. et al. Concurrent heart rate validity of wearable technology devices during trail running. PLoS ONE 15, e0238569. https://doi.org/10.1371/journal.pone.0238569 (2020).
Olstad, B. H. & Zinner, C. Validation of the Polar OH1 and M600 optical heart rate sensors during front crawl swim training. PLoS ONE 15, e0231522. https://doi.org/10.1371/journal.pone.0231522 (2020).
Reece, J. D., Bunn, J. A., Choi, M. & Navalta, J. W. Assessing heart rate using consumer technology association standards. Technologies 9, 46. https://doi.org/10.3390/technologies9030046 (2021).
Muggeridge, D. J. et al. Measurement of heart rate using the polar oh1 and fitbit charge 3 wearable devices in healthy adults during light, moderate, vigorous, and sprint-based exercise: Validation Study. JMIR Mhealth Uhealth 9, e25313. https://doi.org/10.2196/25313 (2021).
Wallen, M. P., Gomersall, S. R., Keating, S. E., Wisloff, U. & Coombes, J. S. Accuracy of heart rate watches: Implications for weight management. PLoS ONE 11, e0154420. https://doi.org/10.1371/journal.pone.0154420 (2016).
Shumate, T. et al. Validity of the Polar Vantage M watch when measuring heart rate at different exercise intensities. PeerJ 9, e10893. https://doi.org/10.7717/peerj.10893 (2021).
Stove, M. P., Haucke, E., Nymann, M. L., Sigurdsson, T. & Larsen, B. T. Accuracy of the wearable activity tracker Garmin Forerunner 235 for the assessment of heart rate during rest and activity. J. Sports Sci. 37, 895–901. https://doi.org/10.1080/02640414.2018.1535563 (2019).
Montes, J., T Tandy, R., Young, J., Lee, S.-P. & Navalta, J. A Comparison of Multiple Wearable technology devices heart rate and step count measurements during free motion and treadmill based measurements. Int. J. Kinesiol. Sports Sci. 7, 30–39 (2019).
Shcherbina, A. et al. Accuracy in wrist-worn, sensor-based measurements of heart rate and energy expenditure in a diverse cohort. J. Pers. Med. https://doi.org/10.3390/jpm7020003 (2017).
Stahl, S. E., An, H. S., Dinkel, D. M., Noble, J. M. & Lee, J. M. How accurate are the wrist-based heart rate monitors during walking and running activities? Are they accurate enough?. BMJ Open Sport Exerc. Med. 2, e000106. https://doi.org/10.1136/bmjsem-2015-000106 (2016).
Thiebaud, R. S. et al. Validity of wrist-worn consumer products to measure heart rate and energy expenditure. Digit. Health 4, 2055207618770322. https://doi.org/10.1177/2055207618770322 (2018).
Gillinov, S. et al. Variable accuracy of wearable heart rate monitors during aerobic exercise. Med. Sci. Sports Exerc. 49, 1697–1703 (2017).
Spierer, D. K., Rosen, Z., Litman, L. L. & Fujii, K. Validation of photoplethysmography as a method to detect heart rate during rest and exercise. J. Med. Eng. Technol. 39, 264–271. https://doi.org/10.3109/03091902.2015.1047536 (2015).
Khushhal, A. et al. Validity and reliability of the Apple watch for measuring heart rate during exercise. Sports Med. Int. Open 1, E206–E211. https://doi.org/10.1055/s-0043-120195 (2017).
Sanudo, B., De Hoyo, M., Munoz-Lopez, A., Perry, J. & Abt, G. Pilot study assessing the influence of skin type on the heart rate measurements obtained by photoplethysmography with the Apple watch. J. Med. Syst. 43, 195. https://doi.org/10.1007/s10916-019-1325-2 (2019).
Hermand, E., Cassirame, J., Ennequin, G. & Hue, O. Validation of a photoplethysmographic heart rate monitor: Polar OH1. Int. J. Sports Med. 40, 462–467. https://doi.org/10.1055/a-0875-4033 (2019).
Dooley, E. E., Golaszewski, N. M. & Bartholomew, J. B. Estimating accuracy at exercise intensities: A comparative study of self-monitoring heart rate and physical activity wearable devices. JMIR Mhealth Uhealth 5, e34. https://doi.org/10.2196/mhealth.7043 (2017).
Dondzila, C. J., Lewis, C. A., Lopez, J. R. & Parker, T. M. Congruent accuracy of wrist-worn activity trackers during controlled and free-living conditions. Int. J. Exerc. Sci. 11, 575–584 (2018).
Evenson, K. R., Goto, M. M. & Furberg, R. D. Systematic review of the validity and reliability of consumer-wearable activity trackers. Int. J. Behav. Nutr. Phys. Act 12, 159. https://doi.org/10.1186/s12966-015-0314-1 (2015).
Bunn, J. A., Navalta, J. W., Fountaine, C. J. & Reece, J. D. Current state of commercial wearable technology in physical activity monitoring 2015–2017. Int. J. Exerc. Sci. 11, 503–515 (2018).
Carrier, B., Barrios, B., Jolley, B. D. & Navalta, J. W. Validity and reliability of physiological data in applied settings measured by wearable technology: A rapid systematic review. Technologies 8, 70. https://doi.org/10.3390/technologies8040070 (2020).
Physical Activity Monitoring for Heart Rate. (Consumer Technology Association, 2018).
Bent, B., Goldstein, B. A., Kibbe, W. A. & Dunn, J. P. Investigating sources of inaccuracy in wearable optical heart rate sensors. NPJ Digit. Med. 3, 18. https://doi.org/10.1038/s41746-020-0226-6 (2020).
Muhlen, J. M. et al. Recommendations for determining the validity of consumer wearable heart rate devices: Expert statement and checklist of the INTERLIVE Network. Br. J. Sports Med. 55, 767–779. https://doi.org/10.1136/bjsports-2020-103148 (2021).
Welk, G. J. et al. Standardizing analytic methods and reporting in activity monitor validation studies. Med. Sci. Sports Exerc. 51, 1767–1780. https://doi.org/10.1249/MSS.0000000000001966 (2019).
Carrier, B. & Navalta, J. W. Data analysis processes and techniques for validation of wearable technology: An example. Topics Exerc. Sci. Kinesiol. 3, 10 (2022).
Chowdhury, S. S., Hyder, R., Bin Hafiz, M. S. & Haque, M. A. Real-time robust heart rate estimation from wrist-type PPG signals using multiple reference adaptive noise cancellation. IEEE J. Biomed. Health 22, 450–459. https://doi.org/10.1109/Jbhi.2016.2632201 (2018).
Montes, J., Young, J. C., Tandy, R. & Navalta, J. W. Reliability and validation of the hexoskin wearable bio-collection device during walking conditions. Int. J. Exerc. Sci. 11, 806–816 (2018).
Takayama, F. & Mori, H. The Relationship between 24 h ultramarathon performance and the “big three” strategies of training, nutrition, and pacing. Sports https://doi.org/10.3390/sports10100162 (2022).
Byun, K. et al. Investigating how auditory and visual stimuli promote recovery after stress with potential applications for workplace stress and burnout: Protocol for a randomized trial. Front. Psychol. 13, 897241. https://doi.org/10.3389/fpsyg.2022.897241 (2022).
Reedman, S. E. et al. Study protocol for Running for health (Run4Health CP): A multicentre, assessor-blinded randomised controlled trial of 12 weeks of two times weekly Frame Running training versus usual care to improve cardiovascular health risk factors in children and youth with cerebral palsy. BMJ Open 12, e057668. https://doi.org/10.1136/bmjopen-2021-057668 (2022).
Gil, D. et al. Validity of average heart rate and energy expenditure in polar oh1 and verity sense while self-paced running. In Int J Exerc Sci: Conf Proc, vol. 14, 27 (2022).
Bodell, N. et al. Validity of average heart rate and energy expenditure in Polar OH1 and Verity Sense while self-paced walking. In Int J Exerc Sci: Conf Proc, vol. 14, 69 (2022).
Fullmer, W. B. et al. Validity of average heart rate and energy expenditure in Polar armband devices while self-paced biking. In Int J Exerc Sci: Conf Proc, vol. 14, 26 (2022).
Pinedo-Jauregi, A., Garcia-Tabar, I., Carrier, B., Navalta, J. W. & Camara, J. Reliability and validity of the Stryd Power Meter during different walking conditions. Gait Posture 92, 277–283. https://doi.org/10.1016/j.gaitpost.2021.11.041 (2022).
Riebe, D. et al. Updating ACSM’s Recommendations for exercise preparticipation health screening. Med. Sci. Sports Exerc. 47, 2473–2479. https://doi.org/10.1249/MSS.0000000000000664 (2015).
Faul, F., Erdfelder, E., Lang, A. G. & Buchner, A. G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav. Res. Methods 39, 175–191. https://doi.org/10.3758/bf03193146 (2007).
Nes, B. M., Janszky, I., Wisloff, U., Stoylen, A. & Karlsen, T. Age-predicted maximal heart rate in healthy subjects: The HUNT fitness study. Scand. J. Med. Sci. Sports 23, 697–704. https://doi.org/10.1111/j.1600-0838.2012.01445.x (2013).
Gilgen-Ammann, R., Schweizer, T. & Wyss, T. RR interval signal quality of a heart rate monitor and an ECG Holter at rest and during exercise. Eur. J. Appl. Physiol. 119, 1525–1532. https://doi.org/10.1007/s00421-019-04142-5 (2019).
Speer, K. E., Semple, S., Naumovski, N. & McKune, A. J. Measuring heart rate variability using commercially available devices in healthy children: A validity and reliability study. Eur. J. Investig. Health Psychol. Educ. 10, 390–404. https://doi.org/10.3390/ejihpe10010029 (2020).
The jamovi project v. 2.3 (2022).
Schuirmann, D. J. A comparison of the two one-sided tests procedure and the power approach for assessing the equivalence of average bioavailability. J. Pharmacokinet. Biopharm. 15, 657–680. https://doi.org/10.1007/BF01068419 (1987).
Navalta, J. W. et al. Reliability of trail walking and running tasks using the Stryd Power Meter. Int. J. Sports Med. 40, 498–502. https://doi.org/10.1055/a-0875-4068 (2019).
Author information
Authors and Affiliations
Contributions
Study conception and design: J.W.N., D.W.D., B.C., J.W.M., J.C., M.F., M.M.L., M.D. Data collection and reduction: J.W.N., D.W.D., E.M.M., B.C., N.G.B., J.W.M., J.C., M.F., M.M.L., M.D. Writing manuscript: J.W.N. Editing manuscript: D.W.D., E.M.M., B.C., N.G.B., J.W.M., J.C., M.F., M.M.L., M.D. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Navalta, J.W., Davis, D.W., Malek, E.M. et al. Heart rate processing algorithms and exercise duration on reliability and validity decisions in biceps-worn Polar Verity Sense and OH1 wearables. Sci Rep 13, 11736 (2023). https://doi.org/10.1038/s41598-023-38329-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-023-38329-w
- Springer Nature Limited