Keywords

1 Introduction

The action of shaking a hollow container to get an insight of its content is a natural reflex [6]. This way of interacting can be of great value in Virtual and Augmented Reality applications, e.g., for the manipulation of virtual instruments such as maracas or to get insights on the number of incoming messages on a smartphone [11]. When interacting with a hollow container that houses one or multiple objects inside, we take advantage of multiple senses to identify various characteristics of the hidden objects, e.g., their shape, weight, size, material, all of them affecting the way they interact with the hollow object when shacked.

This paper introduces a multi-modal method for the rendering of multiple virtual moving objects inside a hollow container, combining audio and haptic feedback for applications in Virtual and Augmented Reality. Starting from an acceleration-based interaction model of the moving objects with respect to the hollow object, we generate the interactions to be rendered through the audio and/or haptic output systems. We called the proposed approach “Haptic Rattle”, and performed a user study to assess this approach.

2 Related Work

The perceptual and interaction aspects of manipulating a hollow container filled with small moving objects have been investigated in the literature. Sekiguchi et al. [7] evaluated the perception of feeling a moving object inside a larger container using two solenoids. They assessed the user’s ability to discriminate between four interaction models representing a box with an object inside, with only one model faithfully simulating such interaction. Results showed that users were able to effectively discriminate each interaction model [8]. Tanaka et al. [10] assessed the ability of users to discriminate changes in the inner parameters of their model to render the presence of moving objects in a container, using voice coil motors. Similarly, Yamamoto et al. [12] tested the perception between a real and virtual model for solid and liquid content inside a container, with no significant difference between the two models. On another line, the Gravity Grabber of Minamiza et al. used two motors to actuate the motion of a belt onto the fingertip of the user, providing the sensation of holding a glass with liquid inside [1]. Such approach ignores the proprioceptive/kinesthetic aspects of holding a mass, but still offers a reliable feeling of weight and the ability to discriminate between objects with different levels of liquid [2]. Other works aim at improving the realism of simulating the presence of one or multiple objects inside a container using the combination of a voice coil motor and impact actuators [4], the high-speed change of the rotational inertia [9], physics-based haptic vibrations interface [11] or library [5].

While effective, these works focus on rendering the motion of single objects inside a hollow container. On the other hand, Plaisier and Smeets [6] performed an experiment on the ability to perceive the numerosity of objects inside a container, using a tangible box filled with real spheres. Their experiment assessed the number of spheres the participants could accurately detect using audio and haptic or only audio cues, considering between one and five spheres. Results showed that participants accurately detected up to three spheres but underestimated their numerosity when presented with four or five. Finally, results were significantly more accurate when the participants had audio and haptic feedbacks with respect to when they were provided only with audio feedback.

With respect to the abovementioned works, we present a simplified physic-interaction virtual model that can be easily evaluated at runtime for an arbitrary number of virtual moving hidden objects. To assess the viability of our rendering model, we carry out a user study inspired by the work of Plaisier and Smeets [6].

3 Methods

When interacting with a container filled with one or more moving objects, the manipulation of the hollow container provides various feedbacks, such as the feeling of the objects hitting or rolling on the inner surface of the container. If the container is rigid, this feedback has audio and haptic components that are characterized by many factors, such as the magnitude of the container’s movement and the materials and size of the moving objects. The user’s imparted movement on the container is essential in this perception as it directly causes the interactions of the moving objects and thus the resulting audiohaptic sensations. To render such interactions, we propose an interactive acceleration-based model combined with a multi-modal audio-haptic rendering.

3.1 Apparatus

As a proof-of-concept, we use a 3D-printed hollow cylinder which dimensions allow comfortable grasping, containing two haptic voice coil actuators on each base (HapCoil-One, Actronika, FR) and an IMU MPU6050 with a sampling rate of 1 kHz, a 3-axis gyroscope and a 3-axis accelerometer in the middle (see Fig. 1a). The voice coils are connected to a TDA3116D2 dual-channel amplifier, powered by a 5 V power supply, and a stereo 24-bit 96 kHz MOTU sound card.

Fig. 1.
figure 1

(a) Our cylindrical prop, housing an IMU and two voice-coil actuators (b) Experimental setup for the manipulation of the device.

The actuators are used to simulate the interaction of virtual spheres inside the container, while the IMU registers the cylinder accelerations when it is shaken by the user. The haptic actuators are inserted symmetrically inside the cylinder, to avoid favoring one handedness over the other. The IMU is set up in a plane position to minimize the impact of rotational movements.

3.2 Acceleration-Based Rendering Model

We chose a simple acceleration-based model to enable the fast simulation of an arbitrary number of moving objects inside the container. It registers the data from the IMU and then provides the acceleration parameters for the multi-modal rendering. The acceleration data is filtered using a low-pass filter at frequency \(\omega _0 = 60\) Hz and then integrated to output the movement velocity. When the acceleration is higher than a threshold \(a_t = 0.5\) m/s\(^2\), the magnitude and direction of the movement are transmitted to the rendering algorithm. The frequency \(\omega _0\) and the threshold \(a_t\) are tuned based on empirical data coming from pilot experiments, so as to correctly detect when the user is shaking the cylinder. In the world reference frame, the forces acting on the system are the gravity, the inertia of each element, and the force applied by the user. The force applied on each sphere is induced by the acceleration induced by the cylinder movement. Each sphere follows the simple equation: \( \vec {F} + \vec {g} + \vec {i} = m\vec {a}\), with \(\vec {F}\) being the acceleration applied to the cylinder by the user, \(\vec {g}\) the acceleration gravity, \(\vec {i}\) the inertia component, m the mass and \(\vec {a}\) the acceleration of the sphere. The gravity and inertial physic interactions are handled through the physics engine of Unity 3D, in which we add the forces induced by the user movements. In the cylinder’s reference frame, the sphere follows the movement applied by the user with a small delay, induced by the inertia, which depends on the mass of the object. The inertia has an impact on the realism of the interaction [4], however, the simulated spheres are small objects with a small mass. The induced inertia is therefore small as well. For this reason, to reduce the complexity of the algorithm, we considered inertia negligible.

Vibrations and sounds generated by the interaction of the spheres with the cylinder are highly dependant on the inner properties of the cylinder itself. The vibratory impact of a small object against a surface can be simplified as the combination of a sinusoidal signal and an exponential decay, inspired by the model of [3] (see Eq. (1)). The resulting low-frequency signal depends on a decay constant B, and the frequency \(\omega \). B and \(\omega \) depend on the modulus of elasticity of the material, and on the density and geometry of the object,

$$\begin{aligned} Q(t) = e^{-Bt}sin(\omega \cdot t) \end{aligned}$$
(1)

If we consider objects with a regular shape, this equation allows to tune the interaction for any material by recording repeated impacts and fitting the sinusoidal waveform and decay to match the record. This signal corresponds to the output feedback and only needs to be modulated by the object velocity \(\vec {\nu }\) before being displayed through the audio and/or haptic output systems. In our implementation, we used voice-coil actuators for haptic feedback and hi-fi headphones for audio feedback. The signal is displayed through two audio channels, which are independently controlled depending on the direction/location of the impact. Doing so, when the user shakes the container, the system is able to detect its movement and effectively simulate the presence of an arbitrary number of spheres moving inside. Figure 2 shows a diagram of the rendering process. The IMU acceleration data are computed into an estimated velocity vector \(\vec {\nu }\). \(f(\nu )\) is a linear function computing a gain factor from each sphere acceleration, based on \(\nu \) value and the configuration of the spheres inside the container. This factor is used to modulate the audio and haptic signals. Audio signal \(S_a(t)\) is then computed based on signal R(t) of recorded impact sounds from the desired objects, e.g. wooden spheres, with multiple frequencies and intensities of impact,

Fig. 2.
figure 2

Diagram of the rendering process: from the data sensed by the IMU to the feedback output.

$$\begin{aligned} S_a(t) = f(\nu )R(t) \end{aligned}$$
(2)

On the other hand, the parameters \((B,\omega )\) are chosen according to a wood material (\(B = 154\) s\(^{-1}\); \(\omega = 67\) rad/s) to compute Q(t) in Eq. (1), on which is based the haptic signal \(S_h(t)\),

$$\begin{aligned} S_h(t) = f(\nu )Q(t) \end{aligned}$$
(3)

These signals are then duplicated according to the number of spheres n and displayed according to the feedback condition considered, with a randomized delay between two empirical values (0 ms and 100 ms) to give a natural effect of collisions.

4 User Study

We conducted a user study to assess the effectiveness of our approach in simulating the presence of an arbitrary number of virtual moving objects inside a hollow container. A video is available at https://youtu.be/cMXTvAOvQtc.

4.1 Population, Materials and Setup

Thirty participants volunteered to participate in this experiment, aged between twenty and fifty-five years old. Among the participants were eight women and twenty-two men, five left-handed and twenty-five right-handed.

The setup is that described in Sect. 3.1 and shown in Fig. 1b. Users are asked to sit comfortably in front of a computer screen while manipulating the hollow cylinder. The screen shows the instructions of the experiment and enables the user to answer the related questions. Users also wear a pair of headphones for providing audio feedback when this type of feedback is considered, as well as for masking any external audio cues when dealing with haptics-only conditions, e.g., they can mask the audio cues coming from the vibrating voice-coil actuators.

4.2 Experimental Conditions and Hypotheses

The design of our user study was inspired by that of Plaisier and Smeets [6]. Participants were asked to power grasp the cylinder with their dominant hand, as shown in Fig. 1b, shake the cylinder along its main axis (left-right movement) for five seconds, and answer the question: “How many spheres do you feel moving inside the container?”. As the cylinder was shaken, the rendering algorithm described in Sect. 3 provided the user with compelling audio or/and haptic feedback to simulate the presence of multiple spheres. We considered three feedback conditions: combined haptics and audio feedback (H+A), audio-only feedback (A), and haptics-only feedback (H). For each condition, the number of simulated spheres inside the hollow container varies from one to five. Each participant carried out 15 repetitions of the trial per condition and per number of spheres, yielding 15 (repetitions) \(\times \) 3 (feedback conditions) \(\times \) 5 (number of simulated spheres) = 225 trials, that were randomized. Following the results of Plaisier and Smeets [6] (see Sect. 2), our hypotheses (HP) are as follows:

  • HP1: Participants can effectively discriminate the presence of up to three spheres inside the container.

  • HP2: Participants underestimate the number of spheres when there are four or five spheres in the container.

  • HP3: Participants show a better recognition score with composite audiohaptic (H+A) feedback than with audio (A) or haptic (H) only feedback.

Our objective is to achieve a performance as close as possible to that that humans achieve when interacting with a hollow container filled with real objects.

4.3 Results

The following results are the analysis of the experiment data. We observed a learning effect on the five first trials for each participant and therefore removed the related answers, leaving seventy trials for each condition. Two subjects were considered outliers and removed from this data analysis, as they were unable to follow the experiment indications and did not understand the experimental task. To avoid any bias, we did not provide them with additional information with respect to the other participants. Figure 3 shows the mean and standard deviation of the user’s answer in discriminating the number of spheres inside the container, for each actual rendered number of spheres and conditions.

Fig. 3.
figure 3

Presented vs. Reported numerosity of the spheres inside the container. Mean and standard deviation are plotted.

We evaluated the error between the user’s reported numerosity and the actual rendered one, computed simply as the absolute value of the difference between the actual and reported numerosity. Then, we carried out statistical analysis to see whether there is a difference with respect to the feedback condition (H+A, A, H) or the number of rendered spheres (1, 2, 3, 4, 5). Boxplots of the reported number of spheres and error per condition and number of presented spheres are reported in Fig. 4. We used non-parametric tests because the user’s response is not on a continuous scale. The Kruskal-Wallis rank sum test for error according to the conditions shows a significant difference between at least two conditions (\(p<0.001\), \(\chi ^2\)(2) = 33.279). The pairwise Wilcoxon rank sum test with continuity correction confirms these results, with a significant difference between conditions (H) vs. (A), \(z=3.83\) and \(p<0.001\), and (H) vs. (H+A), \(z=5.62\) and \(p<0.001\), but no significant difference between (A) vs. (H+A), \(z=1.83\) and \(p>0.1\).

Fig. 4.
figure 4

(Left) Presented vs. Reported numerosity of the spheres inside the container, per condition. (Right) Presented numerosity vs. error, per condition.

5 Discussion and Conclusion

This paper introduces a multi-modal rendering approach for presenting an arbitrary number of virtual moving objects inside a hollow container, using voice-coil actuators and audio headphones to provide haptic and audio feedback about their interaction, respectively. We carried out a user study to assess the capability of human users to effectively discern the number of virtual spheres moving inside the hollow container, providing them with audiohaptic feedback (H+A), audio feedback only (A), or haptic feedback only (H). Results summarized in Figs. 3 and 4 show that, in general, subjects were quite good at estimating the number of spheres rendered inside the hollow container, proving the effectiveness of the proposed rendering techniques. This is especially true when 1 to 3 spheres were rendered, confirming the results of [6] and HP1. As also seen in [6], as the number of rendered spheres augments, users tend to underestimate their number, confirming also HP2. Indeed, after 3 rendered spheres the curves shown in Fig. 3 flatten and never reach 5. This phenomenon is stronger for condition (H), and it is very similar in conditions (A) and (H+A). Also, the standard deviation increases as the number of rendered spheres increases. The statistical analysis of the results reveals that performance under (H) is significantly different (worse) than the two other conditions, but no significant difference between (H+A) and (A) was found, partially disproving hypothesis HP3.

Asking the users about their strategies revealed a certain similarity. They fall into two main categories: the first one consists of repeatedly and energetically shaking the container and comparing the intensity of the feedback with previous trials; the second one consists of counting the impacts on one extremity of the cylinder after a single energetic stroke. These two strategies can be of course mixed together and carried out one after the other.

After the experiment, participants shared their impressions. While they agreed that the task was difficult, some of them expressed that they were “impressed by the realism of the sensation". One participant added that he was “surprised to actually feel the spheres moving in the middle of the cylinder". About the difficulty of the task, participants mostly referred to conditions (A) and (H). Finally, even if we do not measure a significant difference between (A) and (H+A), participants indicated condition (H+A) as significantly easier. (H+A) was overall the preferred condition.

As for future work, it would be interesting to consider the inertia of the simulated spheres, which adds to the realism of the interaction [4]. We could also study whether a continuous variation of frequency for the audio feedback has an impact. Finally, it could be interesting to study a comparison between simulated and real interactions and the impact of the number of actuators inside.