Keywords

1 Intro and Motivation

User interfaces are commonly evaluated in a laboratory setting with the efficacy of an interface assessed based on user performance metrics, such as task completion time and accuracy. In addition to the objective measures, a variety of survey instruments can be used to elicit subjective assessments related to user satisfaction, perceived interface difficulty, cognitive load and other factors. Eye movement and biometric measurements offer a promising approach to investigating the cognitive aspects of interface use, such as attention, situational awareness and others. However, eye movement tracking and biometric tools remain invasive, costly and offer mostly an indirect assessment of the strategies people employ in problem solving. An alternative approach, which is described in this paper, is to perform a close inspection and analysis of system-user interactions, based on the actions taken by users in pursuit of their goals.

In this paper we describe what we have learned via a detailed analysis of videos capturing users’ interactions with two different data interfaces: a visual and a tabular one. This analysis revealed patterns of interaction and learning that can be used to improve interfaces and guide user training. When combined with objective performance measures, such as task completion time and accuracy, this analysis may offer an empirical basis for explaining performance gains, supporting reasoning about the advantages of one interface over another and hel** identify efficiency traps.

1.1 Background

The work presented here originated from an experimental side-by-side comparison of two alternative interfaces AM2.0 [1] and Oracle SQL Developer [2] Reports (henceforth, OR) for working on data exploration and analysis tasks. Both interfaces present views of ternary relationships (i.e. associations of three different entities) and are intended to support reasoning about such associations to solve enterprise tasks. The AM2.0 (see Fig. 1) is a visualization-based interface, while the OR (see Fig. 2) uses the table-based representation of data traditionally employed in the major enterprise systems, such as Oracle, SAP, PeopleSoft and others. In the laboratory user study described in [1], users worked with each of the two interfaces to solve tasks of varying difficulty. The analysis of user performance data along with user responses to the post-usage questionnaire regarding their experiences, has shown statistically significant comparisons in favor of AM2.0 for measurements of time, correctness, and user satisfaction [1]. As a part of this study, we recorded screen capture videos for ten out of forty one participants working with both interfaces.

Fig. 1.
figure 1

A snapshot of AM2.0.

Fig. 2.
figure 2

A snapshot of Oracle SQL Developer Reports (OR).

We performed a detailed analysis of the ten videos with the purpose of getting some insight into how the users go about solving the tasks using each interface. The analysis unveiled certain patterns of interactions that would not be captured using other means of analysis. We present some of the results of this analysis here and argue that this kind of close analysis of interaction data captured in a video is useful for understanding the realities of use of interfaces. Results from close observation of usage videos can suggest explanations for performance gains, inform fine-tuning of the design, and direct user guidance. In the context of this study, our observations of usage videos offered empirical justification to our assumptions regarding what makes the visual interface more effective.

2 Related Work

Former studies undertook systematic reviews of visualization evaluation methods using different approaches. Lam et al. [3] categorize evaluation methods into seven scenarios based on evaluation goals and questions. Furthermore, they present most common evaluation goals and outputs, evaluation questions, and common evaluation methods for each scenario. Work presented in our paper falls under the Evaluating Visual Data Analysis and Reasoning (VDAR) category, as our analysis presented here examines how the visualization tool as a whole supports the analytics and reasoning process. Lam et al. suggest three types of techniques that can be used under the VDAR category, which are Case Studies, Laboratory Observation and Interviews, and Controlled Experiment, and provide some sample evaluation approaches under each technique. The detailed analysis of coded videos that we report on here adds to the example approaches presented in [3].

Yen and Bakken [4] reviewed and categorized health IT usability study methods into five stages of an integrated usability specification and evaluation framework that was based on a usability model and the system development life cycle (SDLC) – associated stages of evaluation. The paper categorized usability evaluation stages based on the number of involved components, which are user, tool, task, and environment. Our study fits Stage 3 category, which evaluates interaction performance in the laboratory. According to Yen and Bakken, objective measures that researchers use at Stage 3 include system validity (accuracy and completeness), efficiency (speed and learnability) and user satisfaction. In addition to time, accuracy and user satisfaction measures described in our earlier paper [1], this paper includes analysis of learnability (as adoption), but uses different methods than those listed in [4].

A systematic review of visualization evaluation strategies, data collection methods and other characteristics is presented by Merino et al. [5], who argue that an effective software visualization should boost not only time and correctness but also recollection, usability, engagement and other emotions.

In this paper, we demonstrate that observation and analysis of videos that capture how people used the interfaces can uncover aspects of interface learnability, different approaches to problem solving, and efficiencies associated with good design. The analyses presented here are not described in any literature we found so far, although Kang et al. [6] describe how they used analysis of videos and system logs to derive activity patterns and problem solving strategies employed by users of visual analytics system called Jigsaw in a laboratory study.

The analyses presented here can be put to use in directing design efforts after pilot evaluations of interfaces in order to improve the designs. Such improvements may consist of eliminating unused redundant interface affordances, directing user attention to useful features they do not employ, reducing the number of affordances and redesigning the interaction model.

3 Video Data Collection

The analyses presented here are based on the data collected in a laboratory side-by-side comparison of two alternative interfaces for working on data exploration and analysis tasks, AM2.0 [1] and Oracle SQL Developer [2] Reports (OR). The snapshots of these interfaces are shown in Fig. 1 and Fig. 2.

3.1 AM2.0 and OR Interfaces

AM2.0 is an interactive visualization (Fig. 1), in which the instances of three entities Vendor, Material and Plant, are shown as labeled geometric primitives on the left and the right sides of a circle (Vendor and Plant) and inside along the vertical diameter line (Material). AM2.0 is a version of a node-link diagram, in which related items are connected via links; selection of an item causes the associated links and related items to be highlighted to create a pop-out effect. For example, Fig. 1 depicts selection of a vendor node 290546 (left), which highlights two materials supplied by this vendor (CKB6900, GCJ4300, center) and the plants (FQ29, GF24, LM51, right), to which each of the two materials are supplied by vendor 290546.

The node-link diagram in AM2.0 is supplemented with a search interface displayed above it, which allows to select items of one or more types on the diagram by entering the values in the search fields.

OR (Fig. 2) is a typical table-based interface for displaying data. Each column represents a single entity, and each row presents an instance of an association. For example, the associations which are highlighted in Fig. 1 by AM2.0 and described above, would be represented by the following three rows of data:

290546

CKB6900

FQ29

290546

CKB6900

GF24

290546

GCJ4300

LM51

The table can be searched, using a dialog box shown in Fig. 2. The search is performed across all rows and fields of the table, with the found items simultaneously highlighted in the selected color. The table can also be sorted by a column value and filtered to select only the rows with matching items.

Note that data which is presented on one screen of AM2.0 may occupy multiple pages in the tabular representation of OR. The node-link representation along with the zooming capability of AM2.0 allow for a compact representation of multiple rows, in which no item of an entity is ever repeated. In contrast, the tabular representation includes an item as many times as there are relationships in which it participates.

3.2 Data Collection

Ten participants recruited from a graduate student population of a small business school in the Northeastern US were asked to perform nineteen tasks using AM2.0 and OR. Prior to performing these tasks, the participants were shown brief tutorials on using each interface and given two problems with solutions and explanations in a practice session. The two interfaces were supplied with data that was isomorphic in structure, but used different labels; likewise, the task questions were similar, with the only difference being the item labels involved. Tasks ranged from very simple, like finding all plants that supply a specific material, to more difficult ones, which required more reasoning and multiple actions with the interface to determine the answer. An example of a more difficult task is finding a replacement for a vendor; this task requires identifying materials supplied by the vendor, and then finding a set of other vendors who supply these materials. The seven most difficult tasks appeared in the end of the sequence of nineteen tasks.

4 Video Data Analysis

Analysis of performance and accuracy measures from the side-by-side comparison of AM2.0 and OR are presented in [1]. The same paper also presents responses to a questionnaire that included learnability and engagement parameters assessed on a Likert scale as well as open-ended questions regarding user preferences and suggestions from using the two interfaces. Here we report on the analysis of ten screen capture videos of ten anonymous participants who self-selected to perform the study in a laboratory setting with screen capture.

Our first step involved coding each video. The coding schema consisted of interface actions performed by users as they were solving each of the nineteen tasks using each of the two interfaces. We used the following coding procedure: for each task performed using each interface we first identified the steps required to answer the task question. This yielded 57 AM2.0 steps and 82 steps in OR. Then, for each step, we recorded all actions that the user took to finish it. Each action used for a task step was recorded once per task step, even if it was invoked multiple times. The list of action codes and action descriptions classified into categories is presented in Table 1. Action codes were created as different user actions were observed; upon finishing the coding, we derived the action categorization, which we describe next.

Table 1. Action codes and descriptions.

Table 1 starts with actions in SF (Search and Filter) category. This category encompasses actions that enact search and filter capabilities implemented via controls that are separated from the representation of the data, visual or tabular. In the AM2.0 such controls consist of the search interface that is displayed above the visualization (see top of Table 2). The AM2.0 search interface includes three input boxes for entering each of the three entity values, along the Search and Clear buttons. In OR, the search and filter tools accessible outside of the table itself include a search dialog box (shown in Fig. 2) that is revealed via a key combination or a press of a mouse button.

Table 2. Action categories.

The second category, called SFV (Search and Filter, Visual), lists those search and filter actions that are executed while the cursor is positioned on the visual representation of the data itself. Notably, only the OR interface allows to invoke search and filter from within the table cells. A dialog box appears when the cell is selected with a click of a mouse. AM2.0 interface does not implement similar capabilities.

The V category contains those actions that implement direct interaction with the visual representation of the data but do not invoke a search. Examples include clicking on a data item, selecting items, clicking on them, moving between them, and others. This category actions occur while the user’s attention is focused on the data directly.

In the meta or mental actions listed in category M we included clearly observed actions to verify the correctness of the result, including any checks users performed after the answer was already entered.

O category includes a single action of ty** in an answer into an answer box. It is separate from category V, because it does not involve interaction with the data, though it is based on the viewing of the data.

The created categorization is based on the intent behind the action (search and filter, selection, mental actions) as well as the locus of interaction: whether the user is interacting through the data items directly or with the user interface controls separated from the data.

5 Analytics

5.1 Range and Frequency of Actions

Figure 3 summarizes the use of specific actions for solving tasks. In it, each action is depicted with a percentage number (also visualized with a horizontal bar). The total number from which the percentage is computed is the count of all actions taken by all users while working on all tasks, where within each separate task step and user, the action is counted only once even if it was used multiple times (see Sect. 4 for the coding procedure description). The bar thus depicts the percentage of the count of a specific action over the total count of all actions performed by all users on all tasks.

Fig. 3.
figure 3

Frequency of actions grouped by category for AM2.0 vs. OR

From Fig. 3 we see that users executed fewer different actions with AM2.0 than with OR. Among all action categories, when using AM2.0 participants used SF (Search and Filter) actions the most. The frequency of SF category actions is about three times that of the frequency of direct interaction with the visualization (V) category.

Usage of OR shows a much broader spread of actions: users interacted directly with the data, including for search and filter (V and SFV categories, respectively). Scrolling (L), highlighting multiple items using a mouse (SH), moving the mouse over the data of interest (W), moving between the data items using the keyboard arrow buttons (Y), and seemingly aimless clicking on data items (A) are actions that users engaged in with OR, but did not within AM2.0. Of these five actions only two (scrolling and moving between items with keyboard, L and Y) are available exclusively in OR.

Although AM2.0 has fewer affordances tied to the data items than OR, participants used less time and achieved greater accuracy with AM2.0 (see [1]) in all categories of question difficulty. Hence, the availability and use of multiple UI controls that are enacted via interacting directly with data items is not necessary for the efficient operation of an interface.

Data in the M category indicates users reviewed and verified answers more when using OR than AM2.0. The verification is twice as prevalent with OR, which may indicate a lower confidence in the correctness of the final answer, when working with OR than with AM2.0. The explanation may be that the visualization in AM2.0 displays data in one screen, whereas the OR interface takes multiple screens, which users need to scroll on occasion. The ability to view all information pertinent to the answer in one screen reduces the user’s need to perform additional verification of the correctness of the answer.

5.2 Adoption of Different Actions

The next series of figures visualizes how users adopt actions while working with the interfaces. Recall, that each of the participants worked on nineteen tasks (also referred to as questions) using AM2.0 and then OR interface. We categorized the questions by difficulty: questions 1–7 were simple, the next five (8–12) were of moderate difficulty and the last seven were higher difficulty questions (13–19).

Each of the figures in this subsection is drawn based on data from AM2.0 or OR usage. Each figure depicts curves associated with specific actions. The points on the curve indicate how many participants have used that action at the time of or before answering the given question. For example, from Fig. 4 we see that action P (the top curve) was used by five people while working on the first question, and all ten participants have used it by question number 15.

Fig. 4.
figure 4

Adoption of V category actions (direct interaction with the visualization) in AM2.0.

Note that all actions used, except for filtering in OR, were demonstrated in the tutorial shown to participants before they started working on tasks independently.

Figure 4 and Fig. 5. Adoption of V category actions (direct interaction with the visualization) in OR. Figure 5 present adoption curves for actions in category V (direct interaction with the visualization, see Table 1). Figure 4 shows that out of six category V actions available in AM2.0 only three were used:

Fig. 5.
figure 5

Adoption of V category actions (direct interaction with the visualization) in OR.

  • P – Copy/paste a value from the visualization into some other area.

  • K – Select item by clicking, and

  • B – Browse over items.

We can observe a notable increase in the use of K and B starting at question 14. As question 14 is one of the higher difficulty questions, we may assume that higher difficulty questions caused more users to select items by clicking on them and to browse over the data.

We do not observe a similar pattern in the use of OR (Fig. 5). The graph shows that scrolling (L) is one of the fastest adopted actions in OR. Clicking on a single data item (K) is not used by as many users as within AM2.0, because it yields no useful effect in OR. Instead, manual selection of multiple items (SH) is gradually adopted. Interestingly, none of the V category actions in OR are adopted by more than 80% of the users, and only one category V action (P-copy/paste) of AM2.0 is adopted by 100% of users.

Figure 6 illustrates how users of AM2.0 gradually learn to use more ways of interacting with the search interface. All actions in the SF category pertain to clearing and entering values into the search box (either by ty** or pasting) and executing the search command (by button click or via keyboard). We see that with time users learn to use the Search and Clear Fields buttons. Though 80% of users start of by clearing the content of a search field via writing it over or erasing (C), the users of the Clear Fields button increase from 20% to 60% in the end (Cb), presumably as the more difficult questions require multiple search boxes to be used, hence cleared at once with a button. As in the case of V category actions with AM2.0 (Fig. 4), there also seems to be a pattern of an increase in use of different SF mechanisms after question 14, when working on higher difficulty questions. Two-three people used the partial search (STB, highlights all items which start with the searched string), on the last two questions of higher difficulty (Fig. 7).

Fig. 6.
figure 6

Adoption of SF (Search and Filter) Functions in AM2.0.

Fig. 7.
figure 7

Adoption of SF, SFV (Search and Filter, and Search and Filter, Visual) Functions in OR.

When it comes to the Search and Filter actions in OR, the action adopted by all users is Find/Highlight (FH), which is executed via the ubiquitous Ctrl-F key combination. It is used from the very beginning by six out of ten participants and is gradually adopted by all. Most users also adopt sorting (T) very early on. One user discovered and used the partial filtering action (FITE) after answering a dozen questions.

To summarize: as participants become more familiar with the visualization and as the questions become more difficult, users gradually adopt different actions and use different strategies to find answers to task questions. As time progresses, more users engage with the visual part of the interface. However, the fact that most action curves stop growing before reaching the maximum number of participants suggests that as the time progresses, users settle in on their chosen set of action.

5.3 Comparison of Prevalence of Actions Between Interchangeable Actions in AM2.0

The next set of analyses examines individual users’ choices of actions among actions with the same effect, in other words, among interchangeable actions.

Figure 8 visualizes the prevalence of different selection mechanisms in the use of AM2.0. Each bar represents one user. Each bar is subdivided into rectangular components, depicting the number of actions found in the user’s transcript: the blue (bottom) rectangle represents selection via the search interface actions ending with a click on the Search button (Spb, Sb, Stb), the green (middle) rectangle represents similar actions ending with a press on the Enter keyboard key (Spe, Se), and the purple (top) rectangle represents actions (K, B) of selecting items by a click on its visual representation (circle or rectangle) and browsing.

Fig. 8.
figure 8

Comparison between item selection enacted via the Search button (Spb, Sb, Stb), via Enter key press (Spe, Se), and via browse and click actions (K, B) in AM2.0.

The same figure also displays the accuracy measure (a higher value is better) as a rhombus and the completion time measure (lower is better) as a circle.

We observe that all but two users (User 1 and User 4) relied mostly on the selection via the search interface ending with a Search button click. Eight participants have a count of (K, B) actions ranging from 5 to 8. There is no visible influence of the reliance on Search button versus keyboard Entry key on total completion time.

The next figure (Fig. 9), constructed to examine preferences between clearing search fields with a click on Clear Fields button (bottom rectangle) versus manual erasure in the field (top rectangle), shows that users tend to prefer to erase or overwrite the field content to using the button. Four out of ten users have never used the button and two users used it very little compared to the manual erasure. Looking back at Fig. 6, we observe a slight increase of use of the button associated with more complex questions.

Fig. 9.
figure 9

Comparison between clear content actions using the Clear Fields button (Cb) or erasing/overwriting the content of the search field (C).

Finally, the graph in Fig. 10 shows that users overwhelmingly preferred using copy/paste (P, shown in blue, bottom rectangle) to manual entry (M, green, top rectangle). Notably, using mostly manual entry did not slow down user 3 compared to many others, who relied mostly on copy/paste.

Fig. 10.
figure 10

Comparison between using copy/paste (P) and manual keyboard entry (M) to enter the search parameter. (Color figure online)

6 Conclusions and Limitations

The screen capture data was collected from 10 randomly chosen participants of a laboratory study ([1]) comparing user performance using the Association Map (AM2.0) with the performance using Oracle SQL Developer reports (OR). We have demonstrated that the data collected in this way can be analyzed to assess a number of factors, for example:

  • What was the relative usefulness of each action for an individual user and overall?

  • What are the different action categories? Did users employ the action categories in the same or in different ways when using two different interfaces to solve problems?

  • For a set of interchangeable actions, what were the patterns of use and patterns of discovery of the alternatives?

  • Are there clear efficiency gains associated with different patterns of action use?

Specifically for the two interfaces that were examined and compared, we have several useful findings that could not have been observed without the close examination of screen capture videos. For example, that the multiple search and filter features of the OR interface enacted directly via the data items did not help users in completing the tasks faster or more accurately than when using AM2.0, which lacks such features. Furthermore, using AM2.0 incurred fewer user actions aimed at verifying answer correctness as the use of OR. This finding confirmed our intuition that the one-page node-link diagram provides a representation of ternary relationships that is more intuitively clear and easier to work with than the tabular representation. Lastly, users tend to gradually expand their repertoire of actions in both interfaces. In case of AM2.0 there were no easily observable efficiency gains associated with a specific way to execute a search interface.

The observations made in this analyses, although useful in confirming hypothesized user behaviors or revealing insights, cannot be used for a definitive assessment of interface features. They offer insights that can be further investigated with more thorough multi-method investigation. The summary results presented here are also influenced by some degree of subjectivity involved in the coding and categorization methods performed by the authors of the paper. Nevertheless, we believe that the types of analyses presented here can be used during the formative stage of interface evaluation to understand patterns of learning and use (or non-use) of the different affordances. The learning patterns discovered in this way can be used in guiding user training or to offer suggestions to users, pointing their attention to overlooked features that were found to be useful by others.