Keywords

1 Introduction

Manipulating 3D digital contents through mid-air hand gestures is a new paradigm of Human Computer Interaction. In many applications, such as interactive and virtual product exhibition in public spaces and medical image displays in surgery rooms, the tasks may include scaling, translation, and rotation of 3D components. In order to ensure the natural map** between controls and displays, the selections of mid-air hand gestures for different tasks were based on the metaphors of physical object operations [1] or the gestures from user-elicitation studies [2]. Given a consensus hand gesture type for a specific task, the performance of gesture recognition and control could still be influenced by many factors, such as the moving speed and trajectory of hands, the occlusion of fingers due to hand pose changes and transitions, as well as the individual differences of performing a specific hand gesture. Since the characteristics of diverse gestures could result in different challenges, it is necessary to identify usability factors for specific gestures through experiments. Therefore, the objective of this research is to study the usability factors of hand gestures for 3D digital content manipulations.

2 Literature Review

With the benefits of natural and intuitive interactions, and free from sanitation problems in public spaces, the applications of mid-air hand gestures included interactive navigation systems in museum or virtual museum [3, 4], medical or surgical imaging system [5,6,7,8], large display interactions [9], interactive and public display [10], and 3D modelling [11,12,13,14]. Based on previous research, mid-air hand gestures could be analyzed in five gesture types: pointing, semaphoric, pantomimic, iconic, and manipulation [15]. Based on the number and trajectory of hands, mid-air gestures could be classified as one or two hands, linear or circular movements, and different degrees of freedom in path (1D, 2D, or 3D) [16]. Gesture vocabulary was dependent on the context [17, 18]. For example, the gestures for short-range human computer interaction [19] and TV controls [20] were reported to be different. For mid-air 3D object manipulation, natural gestures were necessary for accurate control tasks of scaling, translation, and rotation [21]. While choosing an appropriate mid-air hand gesture, it is necessary to consider the mental models of target users [22], reduce the workload [23], and increase gesture recognition robustness [24]. Although design principles could be derived from the literature, the factors that influence the perceived usability of gestures for specific tasks should be identified through experiments.

3 Experiment

In order to investigate the usability factors of mid-air hand gestures for manipulating 3D virtual models, an experimental system was constructed by modifying the sample programs of Intel RealSense 3D Camera with Unity 3D Toolkit (Fig. 1). In a laboratory with illumination control, the participant stood on the spot in front of a 100-inch projection screen, at a distance of 240 cm. During the experiments, the 3D virtual car model was projected on the screen. Each participant completed the tasks for scaling the car, translating the car seat, and rotating the car with respect to the vertical axis using designated hand gestures as follows.

Fig. 1.
figure 1

A 3D Car model in the unity 3D system

Among the diverse gesture types, “grab and move” and bimanual “handle bar metaphor” were reported as the intuitive gestures for object manipulation tasks [1, 2, 25]. In addition, users often preferred using gestures resembling physical manipulation for wall-sized displays [26]. Therefore, the two-hand gesture with grabbing while moving apart/close to each other was applied to enlarging/shrinking the 3D virtual car (Fig. 2). The one-hand gesture with grabbing while moving up/down, left/right, or forward/backward was applied to translating a car seat (Fig. 3). The two-hand gesture with grabbing while moving relatively along the circumference of a horizontal circle, a handlebar metaphor, was applied to rotating the car (Fig. 4). The characteristics of these gestures and referenced literature were summarized in Table 1.

Fig. 2.
figure 2

Enlarging/shrinking the 3D virtual car

Fig. 3.
figure 3

Translating the car seat to the target position

Fig. 4.
figure 4

Rotating the car with respect to the vertical axis

Table 1. Designated hand gestures for experiment tasks

In the experiment, an Intel RealSense 3D Camera (F200) was used to extract the positions and movements of 22 joints on each hand skeleton. With the Intel RealSense SDK, basic gestures, such as spread fingers and fist could be recognized (Fig. 5). Spread fingers and fist were the static gestures of opening palm and grabbing, respectively. Therefore, it was expected to discriminate the transitions from opening palm to grabbing, and vice versa.

Fig. 5.
figure 5

The spread-fingers and fist gestures identified by the intel realsense SDK

The camera was placed between the participant and the screen. The distance to the participant was adjusted with respect to the arm length. The height was adjusted to the elbow height of each participant.

4 Results and Discussions

Seventeen students, 7 female and 10 male, were invited to participate in the experiment. They studied in either the Ph.D. program of Design Science or the Master program of Industrial Design, with the age range from 22 to 37 (mean: 26.12, standard deviation: 4.65). All participants had the experiences of using 3D modelling software and smartphones with touch gestures. In the experiment, they were asked to apply gestures to carry out scaling, translation, and rotation tasks.

After completing each task, they were asked to evaluate the gesture using a 7-point Likert scale, indicating the degree of agreement, from the perspectives of acceptance to performing in public, comfort, smoothness of operation, easy to understand, easy to remember, informative feedback, correctness of system response, appropriate control-response ratio, and overall satisfaction (Table 2). The result of ANOVA indicated that there were significant differences in user evaluation among these usability criteria. The gesture for scaling was considered as the one needed to be improved in smoothness of operation, correctness of system response, control-response ratio, and overall satisfaction. The gesture for translation yielded the lowest score in the smoothness of operation. The gesture for rotation yielded the lowest score in appropriate control-response ratio.

Table 2. The evaluation of gestures and system performance for different tasks

Tables 3, 5, and 7 listed the reported usability problems with respect to scaling, translation, and rotation tasks, respectively. Two-hand gestures for scaling and rotation tasks caused more usability problems in failure of gesture recognition, lag in system response, gesture detection range, control-response ratio, and fatigue. Based on the original comments from participants, quick movements or rapid pose changes of two hands were the major causes of system failures. Evidently, the original detection range was not wide or deep enough for the natural and linear motions of both hands. The default control-response ratio needed to be adaptive for precise controls.

Table 3. The usability problems of two-hand grabbing while moving apart/close for scaling
Table 4. Alternative gestures for scaling
Table 5. The usability problems of one-hand grabbing while moving for translation
Table 6. Alternative gestures for translation
Table 7. The usability problems of two-hand grabbing while moving relatively on a circumference of a horizonal circle for rotation

In addition, the participants were encouraged to offer user-defined gestures for each task. The alternative gestures were listed in Tables 4, 6, and 8. The alternative gestures for scaling (Table 4) included using one hand with posture change (from open palm to grab or pinch) or using two hands to form two corners of a rectangle boundary and slightly adjust the boundary size (Fig. 6). These gestures featured the benefit of requiring less movement range. The alternative gesture for translation was to employ pinching, instead of grabbing (Table 6). The alternative gestures for rotation ranged from steering wheel metaphor, one-hand circular movement, or a more complicated gesture with the first hand staying still as a rotation axis and the second hand moving circularly with respect to the first hand (Table 8).

Table 8. Alternative gestures for rotation
Fig. 6.
figure 6

Alternative gestures for two-hand scaling (enlarging/shrinking)

5 Conclusion

In this research, the usability factors of mid-air hand gestures for 3D virtual model manipulations were identified. The results indicated that the width and depth of detection ranges were the key factors for two-hand gestures with linear motions. For dynamic gestures with quick transitions and motions from open to close hand poses, ensuring gesture recognition robustness was extremely important. Furthermore, given a gesture with ergonomic postures, inappropriate control-response ratio could result in fatigue due to repetitive exertions of hand gestures for achieving the precise controls of 3D model manipulation tasks. These results could be used to inform the development team of vision-based mid-air hand gestures and serve as the checking lists for gesture evalation.