Keywords

1 Introduction

Designing vibrotactile (VT) icons has been the subject of much research in the last decades. VT patterns can help convey abstract information (e.g., message urgency) or emotions, provide cues for navigation, or correct user movement in physical rehabilitation and skill-training scenarios [5]. To identify salient VT parameters or design a set of distinct vibrations for a use case, haptics researchers often ask users to rate or group VT patterns according to their similarity [3, 6, 7, 9]. For example, Park and Choi evaluated pairwise similarities among one carrier and seven amplitude-modulated sinusoidal vibrations and showed that the envelope waveform is a salient parameter for VT design [7]. Ternes and MacLean assessed the perceptual structure of 84 rhythmic VT patterns by asking 6 experienced users to group the patterns according to their similarity [9]. They found that the length of VT pulses, their evenness, and frequency defined the pattern similarity. Because in-lab perceptual similarity studies are time consuming and expensive, researchers often use a small stimuli set and/or few users.

Online crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) allow running large-scale perception studies at a fraction of time and cost [4], but little is known about how the results differ from in-lab experiments in haptics. Haptic studies often need specialized and/or calibrated hardware and focused attention which are lacking in crowdsourced settings. While smartphones increasingly have vibration actuators with improved rendering fidelity, the diversity of phone actuators [1], materials, internal make-up, and software can lead to differences in VT perception [8]. Despite these challenges, a few crowdsourced VT studies have been reported in recent years. The first study by Schneider et al. compared lab and crowdsourced ratings for 10 VT signals rendered on a voice-coil actuator and Android phones [8]. Android ratings for duration, roughness, and pleasantness were statistically equivalent to the voice-coil data for the majority of the patterns, but energy, speed, and roughness ratings showed mixed results. Demers et al. recently proposed a probabilistic method for efficient sampling of the VT engineering space by using the similarity data collected from Android users on MTurk [2]. These studies highlight the need for crowdsourcing in haptics, yet they do not assess the validity of crowdsourced data for perceptual similarity assessment tasks. Also, both studies focus on Android devices.

In this paper, we investigate two questions: 1) How comparable are VT similarity ratings obtained from crowdsourced and lab studies? and 2) How comparable are results from Android and iOS smartphones given their distinct hardware and software for producing VT stimuli? In contrast to typical VT similarity studies, we do not aim to identify the underlying VT perceptual parameters. Instead, we focus on testing the consistency of the data obtained from different device platforms and experiment settings.

We conducted online and lab experiments with iOS and Android devices to answer the above questions. Forty-eight participants rated the pairwise similarities of 14 VT patterns varying in rhythm and amplitude in four experiments (12 participants each). Pairwise similarities collected from MTurk and lab studies were highly correlated (RQ1), but iOS data showed stronger correlation (\(r_s=0.9\), \(p<0.001\)) than Android results (\(r_s=0.68\), \(p<0.001\)). Furthermore, iOS and Android data were highly correlated with each other (RQ2). To provide qualitative insights into these correlations, we obtained the perceptual spaces from the four experiments using dimensionality reduction techniques. Our results suggest that VT amplitude and even rhythms were perceived consistently across the four experiments, but perception of uneven VT rhythms varied in Android devices. Recruiting iOS users was faster than Android users which suggests that large-scale studies on iOS devices are viable. We discuss our findings and present future directions for haptic crowdsourcing.

2 Stimuli and Apparatus

We designed 14 VT patterns and iOS and Android applications for our studies.

VT Patterns. We used a subset of the VT patterns proposed by Ternes and MacLean [9]. They designed 21 rhythms consisted of 62.5 ms, 125 ms, and 375 ms pulses. They also modulated the VT frequency (200, 300 Hz) and amplitude (full, half) to obtain 84 patterns. Each rhythm was 500 ms long and repeated four times in a 2-second stimulus. They divide the rhythms into even and uneven groups depending on whether each part of the pattern feels regularly repeating (e.g., R1) or not (e.g., R21).

From the aforementioned patterns, we selected seven perceptually different rhythms with different pulse lengths and evenness (Fig. 1) in their reported perceptual space [9]. We modulated the amplitude for each rhythm to be the devices’ upper amplitude limit or half of it. Because Android vibration API does not offer any frequency modulation options, we measured an Android phone’s average vibration frequency for the patterns (\(\mu \) = 159.34 Hz; Samsung Galaxy S10) and matched a similar frequency on the iOS patterns (\(\mu \) = 149.87 Hz).

Fig. 1.
figure 1

Each row represents a 2-second VT pattern in our study. Gray and white denote vibration and silence. The red lines mark the 500-ms rhythms.

Fig. 2.
figure 2

Acceleration plots of exemplary VT patterns (R5 and R21) played by iPhone 11 Pro Max and Samsung Galaxy S10, and their power spectrums.

Android and iOS Applications. We designed functionally identical applications on iOS and Android with four sequential screens of initial, training, VT comparison, and submission. The initial screens provided the consent information and collected the participant’s age, gender, nationality, and use of phone cover. The user also rated their haptics expertise on a 5-point scale from no experience to expert. Next, the applications instructed the user to remove their phone case, hold the phone with their left hand, and use their right index finger to interact with it. The training screen showed 14 buttons randomly assigned to the 14 VT patterns. The user could press the VT buttons to feel the assigned vibrations, and the applications enabled proceeding to the VT comparison screen only when all the buttons were pressed at least once. The VT comparison screen allowed users to evaluate the similarity of 92 unique pairwise VT patterns. 91 pairs were selected from 14 patterns each compared once (\(_{14}C_{2}\)), and we added one pair that compared two identical vibrations (R1) as an attention test. The order of the 92 pairs were randomized per user. For each pair, the user had to play both vibrations before rating their similarity on a sliding scale of 0 (totally different) to 100 (the same). After the VT comparison screen, the user could provide free-form comments and submit the results by pressing a submit button. The demographic data, similarity ratings, and time stamps collected from the experiment were stored in an external Google Firestore database.

To reduce the data’s noise, the applications returned to the training screen if the user spent more than 40 min in the training and main screens. Also, the applications did not proceed from the initial screens if the user’s phone did not support the vibration API specifications in our study.

3 Similarity Rating Data Collection

We ran four between-subject experiments with a total of 42 participants, using two platforms (iOS, Android) and two experimental settings (crowdsourced,lab).

Crowdsourced Experiment. We recruited 27 users (14 iOS, 13 Android) on Amazon Mechanical Turk. To be eligible, MTurk users must have completed 5000 or more tasks on the platform with a success rate of 98% or greater. The experiment took 10–20 min and we offered 3 US$s in compensation. Data from 3 participants (2 iOS, 1 Android) who provided a rating of less than 70 on the attention test were excluded, resulting in 12 participants per platform. The included participants were 24–70 years old (35.4 ± 12 years, 6F/6M) on iOS and 26–43 years old (33.4 ± 4.5 years, 3F/9M). The participants were from US (10 iOS, 6 Android), India (5 Android), and Brazil (2 iOS, 1 Android). All the recorded device models were unique, except for Galaxy S10+, iPhone XR, and iPhone 11 Pro Max that were used twice.

Lab Experiment. We recruited 25 participants (13 iOS, 12 Android) at Gwangju Institute of Science and Technology through an e-mail and flyers posted on digital and physical bulletins. No participants reported tactile disorder, and each participant read an instruction document, filled out a consent form, and used the same applications to complete the task as the MTurk participants did. The experiment took 30 min on average, and we offered 10 USD as compensation. We excluded data from one iOS participant who failed the attention test, resulting in 12 participants per platform. The included users were 23–31 years old (26.2 ± 2.73 years, 6F/6M) on iPhone 11 Pro Max and 20–27 years old (24.6 ± 1.75 years, 5F/7M) on Galaxy S10 platform. No phone case was used in the study. The participants listened to pink noise on headphones to block any sounds generated from the VT patterns.

4 Results

Fig. 3.
figure 3

Stress values of non-metric MDS over 14 dimensions.

We converted the similarity ratings into dissimilarity values and averaged over participants in each of the four conditions. Because the data is ordinal, we applied Spearman’s Rank Correlation Coefficient (Spearman’s \(\rho \)) to compare the platforms and experimental settings. We use non-metric multidimensional scaling (nMDS) to visualize the perceptual spaces. These methods have been used to compare perceptual spaces from pairwise similarity rating tasks [10].

In the nMDS analysis, 3–4 dimensions adequately capture the perceptual dissimilarities of the VT stimuli as in the stress plots (Fig. 3), but we plot the perceptual spaces in 2D for better visibility (Fig. 4). The perceptual spaces from the four experiments show similar trends. One clear trend is that VT rhythm is more salient than the amplitude; the two amplitudes of the same rhythm are always close to each other. Also, the four perceptual spaces show similar VT clusters and configurations. Specifically, R1 and R16 consistently appear close to each other, whereas R5 and R6 are mapped far apart. Similarly, R9 appears equidistant to R1-R16 and R6 while R21 appears in between R5 and R6. Only R12 tends to notably vary its location over different conditions. Below, we analyze the VT similarities in relation to our research questions (RQ1, RQ2).

Fig. 4.
figure 4

Perceptual spaces of 14 vibrotactile patterns.

4.1 RQ1. How Comparable Are VT Similarity Ratings Obtained from Crowdsourced and Lab Studies?

For the iOS platform, the Spearman \(\rho \) correlation between the averaged pairwise dissimilarities in the crowdsourced and lab datasets show a very strong correspondence (\(r_s=0.90, p<0.0001\)). Also, the two perceptual spaces are nearly identical with minor variations in the proximity of the VT stimuli. The only notable difference is that the distance between R9 and R12-R21 is higher in the lab experiment. For the Android platform, the correlation of the lab and crowdsourced dissimilarities is strong (\(r_s=0.68, p<0.0001\)). The two perceptual spaces show overall alignment with some inconsistencies; R9, R12, and R21’s relationship vary across the two spaces. The Android MTurk participants perceived R12 and R21 to be similar and different from R9, whereas the lab participants perceived R9 and R12 to be very similar and R21 to be dissimilar to the pair. The strong correlations between the lab and online settings and the conformity of the perceptual spaces, especially with iOS devices, suggest that MTurk similarity ratings can yield comparable results to the lab data.

4.2 RQ2. How Comparable Are Results from Android and iOS Smartphones Given Their Distinct VT Hardware and Software?

To answer this question, we compare Android and iOS data for each experimental setting (MTurk, lab). For the MTurk setting, the Spearman correlation is strong (\(r_s=0.78, p<0.0001\)) between iOS and Android data. We observe comparable configurations between the two perceptual spaces. The main difference is that Android participants found R12 and R21 to be closer to R5 and farther from R9 compared to the iOS participants. Also, the iOS ratings for R12 and R21 are closer to R6 than the Android ratings. For the lab setting, the Spearman correlation is strong (\(r_s=0.65, p<0.0001\)) for the two datasets. The perceptual spaces of iOS and Android show some differences. The Android users rated R9 and R12 as highly similar and regarded R21 dissimilar to the pair. In contrast, the iOS users perceived R12 and R21 as notably similar and R9 as different from the two stimuli. Also, the Android lab data shows a higher similarity between R1 and R16 than iOS lab data.

5 Discussion and Conclusion

We investigate the correspondence between data obtained from lab and crowd-sourced VT studies. Thus, instead of analyzing the underlying perceptual parameters of vibrations, we discuss what vibration parameters are robust under variations in device platform and experimental settings. In all the four studies, rhythm dominates over amplitude in VT similarity evaluations, with participants rating patterns with the same rhythm as very similar despite the modulation of their amplitudes. The lengths of VT pulses influence the configuration of the perceptual space, with patterns with short pulses (e.g., R5) appearing distant from longer pulses (e.g., R6, R9). Also, the patterns with mixed pulse lengths (R12, R21) tend to appear between those with long and short pulses. Finally, even vibrations (e.g., R1, R16) cluster together in all perceptual spaces, whereas uneven vibrations are away from the pair.

iOS data from crowdsourcing and lab studies were better aligned than Android data. The similarity of iOS results could be attributed to the software, hardware, and build parity in iPhones. In contrast, the diversity of Android devices may have contributed to the lower correspondence in the Android’s perceptual spaces. Upon comparing the stimuli generated by our iOS and Android lab devices, we noticed that the Android device’s VT fall-off time varied for different rhythm patterns and it was nearly twice as long as that of the iOS device for some patterns (e.g., R21). This variation may have caused the disparities in the Android lab’s perceptual space. These disparities were lower in the crowdsourced Android data, possibly due to the washout effect from various devices. Surprisingly, the recruitment time for the iOS crowdsourced study was almost half the time of the crowdsourced Android experiment. This difference could be due to the Android version requirements in our experiment. Nevertheless, our data suggests that crowdsourcing on iOS devices is viable on MTurk.

Our research has a number of limitations. First, we did not modulate the frequency of the stimuli due to Android’s inability to render this parameter. Second, our studies have a small sample size due to difficulties of recruiting lab participants during the COVID-19 pandemic. Third, we chose a subset of the stimuli from Ternes and MacLean’s work [9] as the number of pairwise comparisons would increase rapidly with additional patterns. Lastly, the age range of participants across the experiments varied, which may have affected the results.

Our work shows promising results for crowdsourcing perceptual similarity studies on iOS and Android devices. We hope that future work further establishes the validity of data obtained from haptic crowdsourcing with different sets of VT stimuli and perceptual tasks.