Subjective Usability, Mental Workload Assessments and Their Impact on Objective Human Performance

Longo, Luca

doi:10.1007/978-3-319-67684-5_13

Luca Longo¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10514))

Included in the following conference series:

IFIP Conference on Human-Computer Interaction

3935 Accesses
25 Citations
7 Altmetric

Abstract

Self-reporting procedures and inspection methods have been largely employed in the fields of interaction and web-design for assessing the usability of interfaces. However, there seems to be a propensity to ignore features related to end-users or the context of application during the usability assessment procedure. This research proposes the adoption of the construct of mental workload as an additional aid to inform interaction and web-design. A user-study has been performed in the context of human-web interaction. The main objective was to explore the relationship between the perception of usability of the interfaces of three popular web-sites and the mental workload imposed on end-users by a set of typical tasks executed over them. Usability scores computed employing the System Usability Scale were compared and related to the mental workload scores obtained employing the NASA Task Load Index and the Workload Profile self-reporting assessment procedures. Findings advise that perception of usability and subjective assessment of mental workload are two independent, not fully overlap** constructs. They measure two different aspects of the human-system interaction. This distinction enabled the demonstration of how these two constructs cab be jointly employed to better explain objective performance of end-users, a dimension of user experience, and informing interaction and web-design.

You have full access to this open access chapter, Download conference paper PDF

Defining and Structuring the Dimensions of User Experience with Interactive Products

Improving User Experience: A Methodology Proposal for Web Usability Measurement

Web Usability and Eyetracking

1 Introduction

In recent decades the demands of evaluating usability of interactive web-based systems have produced several assessment procedures. Very often, during usability inspection, there is a tendency to overlook features of the users, aspects of the context and characteristics of the tasks. This tendency is also justified by the lack of a model that unifies all of these aspects. Considering features of users is fundamental for the User Modeling community [1, 16]. Similarly, taking into consideration the context of use is of extreme importance for inferring reliable assessments of usability [3, 36]. Additionally, during the usability assessment process, accounting for the demands of the task executed is core for describing user experience [20]. Building a cohesive model is not trivial, however we believe the construct of human mental workload (MWL) – often referred to as cognitive load – can significantly contribute to such a goal and inform interaction and web-design. MWL, with roots in Psychology, has been mainly applied within the fields of Ergonomics and Human Factors. Its assessment is key to measuring performance, which in turn is fundamental for describing user experience and engagement. A few studies have tried to employ the construct of MWL to explain usability [2, 24, 41, 46, 50]. Despite this interest, not much has yet been done to investigate their relationship empirically. The aim of this research is to empirically test the relationship between subjective perception of usability and mental workload as well as their impact on objective user performance, which means tangible quantifiable facts (Fig. 1).

This paper is organised as follows. Firstly, notable definitions of usability and mental workload are provided, followed by an overview of the assessment techniques employed in Human-Computer Interaction (HCI). Related work is also presented, highlighting how the two constructs have been employed so far, distinctly and jointly. An experiment is subsequently designed in the context of human-web interaction, aimed at investigating the relationship between the perception of usability of three popular web-sites (Youtube, Wikipedia and Google) and the mental workload experienced by users after interacting with them. Results are presented and critically discussed, showing how these constructs interact and how they impact objective user performance. A summary concludes this paper pointing to future work and highlighting the contribution to knowledge.

2 Core Notions and Definitions

Widely employed in the broader field of HCI, usability and mental workload are two constructs from Ergonomics, with no crystal and generally applicable definitions. There is an acute debate on their assessment and measurement [4,5,6]. Although ill-defined, they remain extremely important for describing the user experience and improving interaction, interface and system design.

2.1 Definitions of Usability

The amount of literature covering definitions [21, 48], frameworks and methodologies for assessing usability is vast. The ISO (International Organisation for Standardisation) defines usability as ‘The extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency, and satisfaction in a specified context of use’. Usability, according to Nielsen [38], is a method for improving ease-of-use in the design of interactive systems and technologies. It embraces other concepts such as efficiency, learnability and satisfaction. It is often associated with the functionalities of a product rather than being merely a feature of the user interface [39].

2.2 Measures of Usability

Often when selecting an appropriate procedure in the context of interaction and web-design, it is desirable to consider the effort and expense that will be incurred in collecting and analysing data. For this reason, designers have tended to adopt subjective usability assessment techniques for collecting feedback from users [21]. On one hand, self-reporting techniques can only be administered post-task, thus influencing their reliability with regard to long tasks. Meta-cognitive limitations can also diminish the accuracy of reporting and it is difficult to perform comparisons among raters on an absolute scale. On the other hand, these techniques appear to be the most sensitive and diagnostical [21]. Nielsen’s principles, thanks to their simplicity in terms of effort and time, are frequently employed to evaluate the usability of interfaces [38]. The evaluation is done iteratively by systematically finding usability problems in an interface and judging them according to the principles [39]. The main problem associated to these principles is that they mainly focus on the user interface forgetting contextual factors, the cognitive state of the users and the underlying tasks.

The System Usability Scale [9] is a questionnaire that consists of ten questions (Table 9). It is a highly cited usability assessment method and it has been massively applied [7]. It is a very easy scale to administer, demonstrating reliability to distinguishing usable and unusable systems and even with small sample sizes [54]. Alternatives include the Computer System Usability Questionnaire (CSUQ), developed at IBM and the Questionnaire for User Interface Satisfaction (QUIS), developed at the HCI lab at the University of Maryland. The former is a survey that consists of 19 questions on a seven-point Likert scale of ‘strongly disagree’ to ‘strongly agree’ [25]. The latter was designed to assess users’ satisfaction with aspects of a computer interface [49]. It includes: a demographic questionnaire, a measure of system satisfaction along six scales, and a hierarchy of measures of nine specific interface factors. Each of these factors relates to a user’s satisfaction with that particular aspect of an interface as well as to the factors that make up that facet, on a 9-point scale. Although it is more complex than other instruments, QUIS has shown high reliability across several interfaces [19]. Many other usability inspection methods and techniques have been proposed in the literature [21, 54].

2.3 Definitions of Mental Workload

Human Mental Workload (MWL) is an important design concept and it is fundamental for exploring the interaction of people with technological devices [29, 31, 32]. It has a long history in Psychology with applications in Ergonomics, especially in the transportation industry [14, 20]. The principal reason for MWL assessment is to quantify the cognitive cost associated to performing a task for predicting operator or system performance [10]. However, it has been largely reported that mental underload and overload can negatively influence performance [2, 50]. They proposed a technique to identify sub-areas of a web-site in which end-users manifested a higher mental worklaod during interaction, allowing designers to modify those critical regions. Similarly, [15] investigated how the design of query interfaces influence stress, workload and performance during information search. Here stress was measured by physiological signals and a subjective assessment technique – Short Stress State Questionnaire. Mental workload was assessed using the NASATLX and log data was used as objective indicator of performance to characterise search behaviour.

4 Design of Experiments

A study involving human participants executing typical tasks over 3 popular web-sites (Youtube, Google, Wikipedia) was set to investigate the relationship between perception of usability, mental workload and objective performance. One self-assessment procedure for measuring usability and two for mental workload:

the System Usability Scale (SUS) [9]
the Nasa Task Load Index (NASATLX), developed at NASA [20]
the Workload Profile (WP) [52], based on Multiple Resource Theory [56, 57].

Five classes of the objective performance of participants on tasks were set:

1.
the task was not completed as the user gave up
2.
the execution of the task was terminated because the available time was over
3.
the task was completed and no answer was required by the user
4.
the task was completed, the user provided an answer, but it was wrong
5.
the task was completed and the user provided the correct answer.

These are sometimes conditionally dependent (Fig. 2). The experimental hypothesis are defined in Table 1 and illustrated in Fig. 3.

Table 1. Research hypothesis

Full size table

4.1 Details of Experimental Subjective Self-reporting Techniques

The System Usability Scale is a subjective usability assessment instrument that uses a Likert scale, bounded in the range 1 to 5 [9]. Questions can be found in Table 9. Individual scores are not meaningful on their own. For odd questions ($SUS_i$ with $i=\{1|3|5|7|9\}$), the score contribution is the scale position ($SUS_i$) minus 1. For even questions ($SUS_i$ with $i=\{2|4|6|8|10\}$), the contribution is 5 minus the scale position. For comparison purposes, the SUS value is converted in the range $[1..100] \in \mathfrak {R}$ with $i_1=\{1,3,5,7,9\}, \ i_2=\{2,4,6,8,10\}$

$$SUS= 2.5 \cdot \Bigg [ \sum _{i_1} (SUS_i - 1) + \sum _{i_2} (5 - SUS_i) \Bigg ]$$

The NASA Task Load Index instrument [20] belongs to the category of self-assessment measures. It has been validated in the aviation industry and other contexts in Ergonomics [20, 45] with several applications in many socio-technical domains. It is a combination of six factors believed to influence MWL (questions of Table 10). Each factors is quantified with a subjective judgement coupled with a weight computed via a paired comparison procedure. Subjects are required to decide, for each possible pair (binomial coefficient, $\left( {\begin{array}{c}6\\ 2\end{array}}\right) = 15$) of the 6 factors, ‘which of the two contributed the most to mental workload during the task’, such as ‘Mental or Temporal Demand?’, and so forth. The weights w are the number of times each dimension was selected. In this case, the range is from 0 (not relevant) to 5 (more important than any other attribute). The final MWL score is computed as a weighed average, considering the subjective rating of each attribute $d_i$ and the correspondent weights $w_i$:

$$ NASATLX : [0..100] \in \mathfrak {R}\qquad NASATLX = \Biggl ( \sum _{i=1}^{6} d_i \times w_i\Biggr ) \frac{1 }{15} $$

The Workload Profile (WP) assessment procedure [52] is built upon the Multiple Resource Theory proposed in [56, 57]. In this theory, individuals are seen as having different capacities or ‘resources’ related to: $\bullet $ stage of information processing – perceptual/central processing and response selection/execution; $\bullet $ code of information processing – spatial/verbal; $\bullet $ input – visual and auditory processing; $\bullet $ output – manual and speech output. Each dimension is quantified through subjective rates (questions of Table 11) and subjects, after task completion, are required to rate the proportion of attentional resources used for performing a given task with a value in the range $0..1 \in \mathfrak {R}$. A rating of 0 means that the task placed no demand while 1 indicates that it required maximum attention. The aggregation strategy is a simple sum of the 8 rates $d$ (averaged here, and scaled in $[1..100] \in \mathfrak {R}$ for comparison purposes):

$$WP : [0..100] \in \mathfrak {R} WP=\frac{1}{8}\sum _{i=1}^8 d_i \times 100$$

4.2 Participants and Procedure

A sample of 46 people fluent in English volunteered to participate in the study after signing a consent form. Subjects were divided into 2 groups of 23 each: those in group A were different to those in group B. Participants could not interact with instructors during the tasks and they did not have to be trained. Ages ranges from 20 to 35 years; 24 females and 22 males evenly distributed across the 2 groups (Total - Avg.: 28.6, Std. 3.98; g.A - Avg. 28.35, Std.: 4.22; g.B - Avg: 28.85, Std.: 3.70) all with a daily Internet usage of at least 2 hours. Participants were required to execute a set of 9 information-seeking web-based tasks (Table 13) as naturally as they could, over 2 or 3 sessions of approximately 45/70 min each, on different non-consecutive days. Tasks differed in terms of difficulty, time-pressure, time-limits, interference, interruptions and demands on different psychological modalities. Two groups were created because the tasks were executed on web-based interfaces, sometimes altered at run-time (through a CSS/HTML manipulation) (as in Table 12). This manipulation was implemented, as part of a larger study [27, 28, 34], to enable A/B testing of web-interfaces (not included here). Interface alteration was not extreme, like making things very hard to read. Rather the goal was to alter the original interface to manipulate task difficulty and usability independently. The order of the tasks administered was the same for all the participants. Computerised versions of the SUS (Table 9), the NASATLX (Table 10) and the WP (Table 11) instruments were administered immediately after task completion. Note that the question of the $NASA-TLX$ related to ‘physical load’ was set to 0 as well as its weight. Consequently, the pairwise comparison procedure was shorter. Some volunteer did not execute all the tasks and the final dataset contains 405 cases.

5 Results

Table 2 contains the means and standard deviations of the usability and the mental workload scores for each task, depicted also in Fig. 4.

Table 2. Mental workload and usability - Groups A, B (G.A/G.B)

Full size table

5.1 Testing Hypothesis 1 - Difference Usability and Mental Workload

From an initial analysis of Fig. 5, it seems clear that there is no correlation between the usability scores (SUS) and the mental workload scores (NASATLX, WP). This is statistically confirmed in Table 3 by the Pearson and Spearman correlation coefficients computed over the full dataset (Groups A, B). Person was chosen for exploring linear correlation while Spearman for monotonic relationship, not necessarily linear.

Table 3. Correlation coefficients

Full size table

Despite perception of usability does not seem to correlate at all with mental workload, a further investigation of their relationship was performed on the scores obtained for each task. Table 4 lists the correlations between the MWL scores (NASATLX, WP) against the usability scores (SUS), and Fig. 6 their densities. Generally, in behavioural/social sciences, there may be a greater contribution from complicating factors, as in the case of subjective ratings. Hence, correlations above 0.5 are regarded as very high, within [0.1–0.3] small and within [0.3–0.5] as medium/moderate (symmetrically to negative values) [12, p. 82]. For this analysis, only medium/high coefficients are considered. Yet, a clearer picture does not emerge and just a few tasks show some form of correlation between mental workload and usability. Figure 7 provides further details aiming at extracting further information and possible interpretations on why workload scores were moderately/highly correlated with usability.

Table 4. Correlations MWL vs usability. Groups A and B

Full size table

task 1/A and task 4/B: WP is moderately negatively correlated with SUS. This suggests that when the proportion of attentional resources being taxed by a task is moderated and decreases, the perception of good usability increases. In other words, when web-interfaces and the tasks executed over them require a moderate use of different stages, codes of information processing and input, output modalities (Sect. 4.1), the usability of those interfaces is increasingly perceived as positive.
task 9/A and task 9/B: the NASATLX is highly and positively correlated with SUS. This suggests that, even when time pressure is imposed upon tasks causing an increment in the workload experienced, and the perception of performance decreases because task answer is not found, than perception of usability is not affected if the task is pleasant and amusing (like task 9). In other words, even if experienced workload increases but is not excessive, and even if the interface is slightly altered (task 9 group B), the perception of good usability is strengthened if tasks are enjoyable.
tasks 1/B, 4/B, 5/B, 7/B the NASATLX is highly negatively correlated with SUS. This suggests that when the MWL experienced by users increases, perhaps because tasks are not straightforward, perception of usability can be negatively affected even with a slight alteration of the interface.

The above interpretations do not aim to be exhaustive; they are just our own interpretations, they cannot be generalised and are only confined to this study. To further strengthening the data analysis, an investigation of the correlation between the MWL and the usability scores has been performed by considering users on an individual-basis (Table 5 and Fig. 8).

Table 5. Correlation MWL-usability by user

Full size table

As in the previous analysis (by task), just medium and high correlation coefficients (${>}0.3$) are considered for deeper investigation. Additionally, because the results of Tables 3 and 4 were not able to systematically show common trends, the analysis on the individual-basis was reinforced by considering only those users for which a medium/high linear relationship (Pearson) and a monotonic relationship (Spearman) was detected between both the two MWL scores (NASA, WP) and the usability scores (SUS). Table 5 highlights these users (1, 5, 11, 12, 21, 22, 27, 39, 40, 46). The objective was to look for the presence of any particular pattern of user’s behaviour or a complex deterministic structure. Figure 9 depicts the linear scatterplots associated to these users with a linear straight regression line and a local smoothing regression line (Lowess algorithm [11]). The former type of regression is parametric and stands on the normal distribution, while the latter is non-parametric and it is aimed at supporting exploration and identification of patterns, enhancing the ability to see a line of best fit over data not necessarily normally distributed. Outliers from scatterplot are not removed: the rationale behind this decision is justified by the limited amount of points – maximum 9 points that coincides with the number of tasks.

No clear and consistent patterns emerge from Fig. 9. However, by analysing the mental workload scores (NASATLX and WP), it is possible to note that the 10 selected users have all achieved, except a few outliers, a score of optimal mental workload (on average between 20–72). In other words, these users did not perceive underload or overload while executing the nine tasks. From an analysis of the usability assessments, all the users achieved scores higher than 40, indicating that no interface was perceived not usable at all. This might indicate that when the mental workload experienced by users is within an optimal range, and usability is not bad, then the combination of mental workload and usability in a joint model might not be fully powerful in explaining objective performance more than mental workload alone. In the other cases, where correlation of mental workload and usability is almost inexistent, then a joint model might better explain objective performance. The following section is devoted to test this.

5.2 Testing Hypothesis 2 - Usability and Mental Workload Impact Performance More than Just Workload

From the previous analysis it appears that the perception of usability and the mental workload experienced by users are not related, except few cases in which mental workload was optimal and usability was not bad. Nonetheless, as previously reviewed, literature suggests that these constructs are important for describing and exploring the user’s experience with an interactive system. For this reason a further investigation of the impact of the perception of usability and mental workload on objective performance has been conducted to test hypothesis 2 (Sect. 4). In this context, objective performance refers to objective indicators of the performance of the volunteers who participated in the user study, categorised in 5 classes (Sect. 4). During the experiment, the measurement of the objective performance of users was in some case faulty. These were discarded and a new dataset with 390 valid cases was formed. The exploration of the impact of the perception of usability and mental workload on the 5 classes of objective performance was treated as a classification problem, employing supervised machine learning. In detail, 4 different classification methods were chosen to predict the objective performance classes, according to different types of learning:

information-based learning: decision trees (with Gini coefficient);
similarity-based learning: k-nearest neighbors;
probability-based learning: Naive Bayes;
error-based learning: support vector machine (with a radial kernel) [8, 23].

The distribution of the 5 classes is depicted in Fig. 10 and Table 6:

Clearly, the above frequencies are unbalanced. For this reason a new dataset has been formed through oversampling, a technique to adjust class distributions and to correct for a bias in the original dataset, aimed at reducing the negative

impact of class unbalance on model fitting. Random sampling (with replacement) the minority classes to be the same size as the majority class is used (Table 6). The two mental workload indexes (NASA and WP) and the usability index (SUS) were treated as independent variables (features) and they were used both individually and in combination to form models aimed at predicting the 5 classes of objective performance (Fig. 11).

Table 6. Frequencies of classes

Full size table

The independent features were normalised in the range $[0..1] \in \mathfrak {R}$ to facilitate the training of models and 10-fold stratified cross validation has been adopted in the training phase. In other words, the oversampled dataset was divided in 10 folds and in each fold, the original ratio of the distribution of the objective performance classes (Fig. 10, Table 6) was preserved. 9 folds were used for training and the remaining fold for testing against accuracy and this was repeated 10 times changing the testing fold. This generated 10 models and produced 10 classification accuracies for each learning technique and for each combination of independent features (Fig. 12, Table 7). It is important to note that training sets (a combination of 9 folds) and test sets (the remaining holdout set) were always the same across the classification techniques and the different combination of independent features (paired 10-fold CV). This is critical to perform a fair comparison of the different trained models using the same training/test sets.

To test hypothesis 2, the 10-fold cross-validated paired Wilcoxon statistical test has been chosen for comparing two matched accuracy distributions and to assess whether their population mean ranks differ (it is a paired difference test) [58]. This test is a non-parametric alternative to the paired Student’s t-test selected because the population of accuracies (obtained testing each holdout set) was assumed to be not normally distributed. Table 8 lists these tests for the individual models (containing only the mental workload feature) against the combined models (containing the mental workload and the usability features). Except in one case (k-nearest neighbor, using the NASA-TLX as feature), the addition of the usability measure (SUS) to the mental workload feature (NASA or WP) always statistically significantly increased the classification accuracy of the induced models, trained with the 4 selected classifiers. This suggests how mental workload and usability can be jointly employed to explain objective performance measure, an extremely important dimension of user experience.

Table 7. Ordered distributions of accuracies of trained models

Full size table

Table 8. Wilcoxon test of distributions of accuracies with different independent features and learning classifiers

Full size table

Table 9. System Usability Scale (SUS)

Full size table

Table 10. The NASA Task Load Index (NASA-TLX)

Full size table

Table 11. Workload Profile (WP)

Full size table

Table 12. Run-time manipulation of web-interfaces

Full size table

Table 13. Experimental tasks (M = manipulated; g = Group)

Full size table

5.3 Summary of Findings

In summary, from empirical evidence, the two hypothesis can be accepted.

$H_{1}$: Usability and Mental workload are two uncorrelated constructs (as measured with the selected self-reporting techniques (SUS, NASA-TLX, WP).

They capture different variance in experimental tasks. This has been tested by a correlation analysis (both parametric and nonparametric) which confirmed that the two constructs are not correlated. The obtained Pearson coefficients suggest that there is no linear correlation between usability (SUS scale) and mental workload (NASA-TLX and WP scales). The Spearman coefficients confirmed that there is no tendency for usability to either increase or decrease when mental workload increases. The large variation in correlations within different tasks and for different individuals is interesting and worth of future investigation.

$H_{2}$: A unified model incorporating a usability and a MWL measure can better explain objective performance than MWL alone.

This has been tested by inducing combined and individual models, using four supervised machine learning classification techniques, to predict objective performance of users (five classes of performance). The combined models were most of the times able to predict objective user performance significantly better than the individual models, according to the Wilcoxon non-parametric test.

6 Conclusion

This study attempted to investigate the correlation between the perception of usability and the mental workload imposed by typical tasks executed over three popular web-sites: Youtube, Wikipedia and Google. Prominent definitions of usability and mental workload were presented, with a particular focus on the latter. This because usability is a central notion in human-computer interaction, with a plethora of definitions and applications existing in the literature. Whereas, the construct of mental workload has a background in Ergonomics and Human Factors, but less mentioned in HCI. A well known subjective instrument for assessing usability—the System Usability Scale—and two subjective mental workload assessment procedures—the NASA Task Load Index, and the Workload Profile—have been employed in a user study involving 46 subjects. Empirical evidence suggests that there is no relationship between the perception of usability of a set of web-interfaces and the mental workload imposed on users by a set of tasks executed on them. In turn, this suggests that the two constructs seem to describe two not overlap** phenomena. The implication of this is that they could be jointly used to better describe objective indicator of user performance, a dimension of user experience. Future work will be devoted to replicate this study employing a set of different interfaces, tasks and with different usability and mental workload assessment instruments. The contributions of this research are to offer a new perspective on the application of mental workload to traditional usability inspection methods, and a richer approach to explain the human-system interaction and support its design.

References

Addie, J., Niels, T.: Processing resources and attention. In: Handbook of Human Factors in Web Design, pp. 3424–3439. Lawrence Erlbaum Associates (2005)
Google Scholar
Albers, M.: Tap** as a measure of cognitive load and website usability. In: Proceedings of the 29th ACM International Conference on Design of Communication, pp. 25–32 (2011)
Google Scholar
Alonso-Ríos, D., Vázquez-García, A., Mosqueira-Rey, E., Moret-Bonillo, V.: A context-of-use taxonomy for usability studies. Int. J. Hum.-Comput. Interact. 26(10), 941–970 (2010)
Article Google Scholar
Annett, J.: Subjective rating scales in ergonomics: a reply. Ergonomics 45(14), 1042–1046 (2002)
Article Google Scholar
Annett, J.: Subjective rating scales: science or art? Ergonomics 45(14), 966–987 (2002)
Article Google Scholar
Baber, C.: Subjective evaluation of usability. Ergonomics 45(14), 1021–1025 (2002)
Article Google Scholar
Bangor, A., Kortum, P.T., Miller, J.T.: An empirical evaluation of the system usability scale. Int. J. Hum.-Comput. Interact. 24(6), 574–594 (2008)
Article Google Scholar
Bennett, K.P., Campbell, C.: Support vector machines: hype or hallelujah? SIGKDD Explor. Newsl. 2(2), 1–13 (2000)
Article Google Scholar
Brooke, J.: SUS: a quick and dirty usability scale. In: Jordan, P.W., Weerdmeester, B., Thomas, A., Mclelland, I.L. (eds.) Usability Evaluation in Industry. Taylor and Francis, London (1996)
Google Scholar
Cain, B.: A review of the mental workload literature. Technical report, Defence Research and Development Canada, Human System Integration (2007)
Google Scholar
Cleveland, W.S.: Robust locally weighted regression and smoothing scatterplots. Am. Stat. Assoc. 74, 829–836 (1979)
Article MathSciNet MATH Google Scholar
Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Mahwah (1988)
MATH Google Scholar
Cooper, G.E., Harper, R.P.: The use of pilot ratings in the evaluation of aircraft handling qualities. Technical report AD689722, 567, Advisory Group for Aerospace Research and Development, April 1969
Google Scholar
De Waard, D.: The measurement of drivers’ mental workload. University of Groningen, The Traffic Research Centre VSC (1996)
Google Scholar
Edwards, A., Kelly, D., Azzopardi, L.: The impact of query interface design on stress, workload and performance. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 691–702. Springer, Cham (2015). doi:10.1007/978-3-319-16354-3_76
Google Scholar
Fischer, G.: User modeling in human-computer interaction. User Model. User-Adapt. Interact. 11(1–2), 65–86 (2001)
Article MATH Google Scholar
Gwizdka, J.: Assessing cognitive load on web search tasks. Ergon. Open J. 2(1), 114–123 (2009)
Article Google Scholar
Gwizdka, J.: Distribution of cognitive load in web search. J. Am. Soc. Inf. Sci. Technol. 61(11), 2167–2187 (2010)
Article Google Scholar
Harper, B.D., Norman, K.L.: Improving user satisfaction: the questionnaire for user interaction satisfaction version 5.5. In: 1st Annual Mid-Atlantic Human Factors Conference, pp. 224–228 (1993)
Google Scholar
Hart, S.G.: Nasa-task load index (NASA-TLX); 20 years later. In: Human Factors and Ergonomics Society Annual Meeting, vol. 50. SAGE Journals (2006)
Google Scholar
Hornbaek, K.: Current practice in measuring usability: challenges to usability studies and research. Int. J. Hum.-Comput. Stud. 64(2), 79–102 (2006)
Article Google Scholar
Huey, B.M., Wickens, C.D.: Workload Transition: Implication for Individual and Team Performance. National Academy Press, Washington, D.C. (1993)
Google Scholar
Karatzoglou, A., Meyer, D.: Support vector machines in R. J. Stat. Softw. 15(9), 1–32 (2006)
Article Google Scholar
Lehmann, J., Lalmas, M., Yom-Tov, E., Dupret, G.: Models of user engagement. In: Masthoff, J., Mobasher, B., Desmarais, M.C., Nkambou, R. (eds.) UMAP 2012. LNCS, vol. 7379, pp. 164–175. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31454-4_14
Chapter Google Scholar
Lewis, J.R.: IBM computer usability satisfaction questionnaires: psychometric evaluation and instructions for use. Int. J. Hum.-Comput. Interact. 7, 57–78 (1995)
Article Google Scholar
Longo, L., Kane, B.: A novel methodology for evaluating user interfaces in health care. In: 2011 24th International Symposium on Computer-Based Medical Systems (CBMS), pp. 1–6, June 2011
Google Scholar
Longo, L.: Human-computer interaction and human mental workload: assessing cognitive engagement in the world wide web. In: Campos, P., Graham, N., Jorge, J., Nunes, N., Palanque, P., Winckler, M. (eds.) INTERACT 2011. LNCS, vol. 6949, pp. 402–405. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23768-3_43
Chapter Google Scholar
Longo, L.: Formalising human mental workload as non-monotonic concept for adaptive and personalised web-design. In: Masthoff, J., Mobasher, B., Desmarais, M.C., Nkambou, R. (eds.) UMAP 2012. LNCS, vol. 7379, pp. 369–373. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31454-4_38
Chapter Google Scholar
Longo, L.: Formalising human mental workload as a defeasible computational concept. Ph.D. thesis, Trinity College Dublin (2014)
Google Scholar
Longo, L.: A defeasible reasoning framework for human mental workload representation and assessment. Behav. Inf. Technol. 34(8), 758–786 (2015)
Article Google Scholar
Longo, L.: Designing medical interactive systems via assessment of human mental workload. In: International Symposium on Computer-Based Medical Systems, pp. 364–365 (2015)
Google Scholar
Longo, L.: Mental workload in medicine: foundations, applications, open problems, challenges and future perspectives. In: 2016 IEEE 29th International Symposium on Computer-Based Medical Systems (CBMS), pp. 106–111, June 2016
Google Scholar
Longo, L., Barrett, S.: A computational analysis of cognitive effort. In: Nguyen, N.T., Le, M.T., Świątek, J. (eds.) ACIIDS 2010, Part II. LNCS, vol. 5991, pp. 65–74. Springer, Heidelberg (2010). doi:10.1007/978-3-642-12101-2_8
Chapter Google Scholar
Longo, L., Dondio, P.: On the relationship between perception of usability and subjective mental workload of web interfaces. In: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2015, Singapore, 6–9 December, vol. I, pp. 345–352 (2015)
Google Scholar
Longo, L., Rusconi, F., Noce, L., Barrett, S.: The importance of human mental workload in web-design. In: 8th International Conference on Web Information Systems and Technologies, pp. 403–409. SciTePress, April 2012
Google Scholar
Macleod, M.: Usability in context: improving quality of use. In: Proceedings of the International Ergonomics Association 4th International Symposium on Human Factors in Organizational Design and Management. Elsevier (1994)
Google Scholar
Moustafa, K., Luz, S., Longo, L.: Assessment of mental workload: a comparison of machine learning methods and subjective assessment techniques. In: Longo, L., Leva, M.C. (eds.) H-WORKLOAD 2017. CCIS, vol. 726, pp. 30–50. Springer, Cham (2017). doi:10.1007/978-3-319-61061-0_3
Chapter Google Scholar
Nielsen, J.: Heuristic evaluation. In: Nielsen, J., Mack, R.L.E. (eds.) Usability Inspection Methods. Wiley, New York (1994)
Chapter Google Scholar
Nielsen, J.: Usability inspection methods. In: Conference Companion on Human Factors in Computing Systems, CHI 1995, pp. 377–378. ACM, New York (1995)
Google Scholar
O’Donnel, R.D., Eggemeier, T.F.: Workload assessment methodology. In: Boff, K., Kaufman, L., Thomas, J. (eds.) Handbook of Perception and Human Performance, pp. 42/1–42/49. Wiley-Interscience, New York (1986)
Google Scholar
O’Brien, H.L., Toms, E.G.: What is user engagement? A conceptual framework for defining user engagement with technology. J. Am. Soc. Inf. Sci. Technol. 59(6), 938–955 (2008). doi:10.1002/asi.20801
Article Google Scholar
Reid, G.B., Nygren, T.E.: The subjective workload assessment technique: a scaling procedure for measuring mental workload. In: Hancock, P.A., Meshkati, N. (eds.) Human Mental Workload, Advances in Psychology, vol. 52, chap. 8, pp. 185–218. North-Holland, Amsterdam (1988)
Google Scholar
Rizzo, L., Dondio, P., Delany, S.J., Longo, L.: Modeling mental workload via rule-based expert system: a comparison with NASA-TLX and workload profile. In: Iliadis, L., Maglogiannis, I. (eds.) AIAI 2016. IAICT, vol. 475, pp. 215–229. Springer, Cham (2016). doi:10.1007/978-3-319-44944-9_19
Chapter Google Scholar
Roscoe, A.H., Ellis, G.A.: A subjective rating scale for assessing pilot workload in flight: a decade of practical use. Technical report 90019, Royal Aerospace Establishment, Farnborough (UK), March 1990
Google Scholar
Rubio, S., Diaz, E., Martin, J., Puente, J.M.: Evaluation of subjective mental workload: a comparison of SWAT, NASA-TLX, and workload profile methods. Appl. Psychol. 53(1), 61–86 (2004)
Article Google Scholar
Saket, B., Endert, A., Stasko, J.: Beyond usability and performance: a review of user experience-focused evaluations in visualization. In: Proceedings of the Sixth Workshop on Beyond Time and Errors on Novel Evaluation Methods for Visualization, BELIV 2016, pp. 133–142. ACM, New York (2016). http://doi.acm.org/10.1145/2993901.2993903
Schmutz, P., Heinz, S., Métrailler, Y., Opwis, K.: Cognitive load in ecommerce applications: measurement and effects on user satisfaction. Adv. Hum.-Comput. Interact. 2009, 3/1–3/9 (2009)
Article Google Scholar
Shackel, B.: Usability - context, framework, definition, design and evaluation. Interact. Comput. 21(5–6), 339–346 (2009)
Article Google Scholar
Slaughter, L.A., Harper, B.D., Norman, K.L.: Assessing the equivalence of paper and on-line versions of the QUIS 5.5. In: 2nd Annual Mid-Atlantic Human Factors Conference, pp. 87–91 (1994)
Google Scholar
Tracy, J.P., Albers, M.J.: Measuring cognitive load to test the usability of web sites. Usability and Information Design, pp. 256–260 (2006)
Google Scholar
Tsang, P.S.: Mental workload. In: Karwowski, W. (ed.) International Encyclopedia of Ergonomics and Human Factors, 2nd edn., vol. 1, chap. 166. Taylor & Francis, Abingdon (2006)
Google Scholar
Tsang, P.S., Velazquez, V.L.: Diagnosticity and multidimensional subjective workload ratings. Ergonomics 39(3), 358–381 (1996)
Article Google Scholar
Tsang, P.S., Vidulich, M.A.: Mental workload and situation awareness. In: Salvendy, G. (ed.) Handbook of Human Factors and Ergonomics, pp. 243–268. Wiley, Hoboken (2006)
Chapter Google Scholar
Tullis, T.S., Stetson, J.N.: A comparison of questionnaires for assessing website usability. In: Annual Meeting of the Usability Professionals Association (2004)
Google Scholar
Vidulich, M.A., Ward Frederic, G., Schueren, J.: Using the subjective workload dominance (SWORD) technique for projective workload assessment. Hum. Factors Soc. 33(6), 677–691 (1991)
Article Google Scholar
Wickens, C.D.: Multiple resources and mental workload. Hum. Factors 50(2), 449–454 (2008)
Article Google Scholar
Wickens, C.D., Hollands, J.G.: Engineering Psychology and Human Performance, 3rd edn. Prentice Hall, Upper Saddle River (1999)
Google Scholar
Wilcoxon, F.: Individual comparisons by ranking methods. Biom. Bull. 1(6), 80–83 (1945). doi:10.2307/3001968
Article Google Scholar
Wilson, G.F., Eggemeier, T.F.: Mental workload measurement. In: Karwowski, W. (ed.) International Encyclopedia of Ergonomics and Human Factors, vol. 1, 2nd edn., chap. 167. Taylor & Francis, Abingdon (2006)
Google Scholar
**e, B., Salvendy, G.: Review and reappraisal of modelling and predicting mental workload in single and multi-task environments. Work Stress 14(1), 74–99 (2000)
Article Google Scholar
Young, M.S., Stanton, N.A.: Mental workload. In: Stanton, N.A., Hedge, A., Brookhuis, K., Salas, E., Hendrick, H.W. (eds.) Handbook of Human Factors and Ergonomics Methods, chap. 39, pp. 1–9. CRC Press, Boca Raton (2004)
Google Scholar
Young, M.S., Stanton, N.A.: Mental workload: theory, measurement, and application. In: Karwowski, W. (ed.) International Encyclopedia of Ergonomics and Human Factors, vol. 1, 2nd edn., pp. 818–821. Taylor & Francis, Abingdon (2006)
Google Scholar
Zhu, H., Hou, M.: Restrain mental workload with roles in HCI. In: Proceedings of Science and Technology for Humanity, pp. 387–392 (2009)
Google Scholar
Zijlstra, F.R.H.: Efficiency in work behaviour. Doctoral thesis, Delft University, The Netherlands (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, Dublin Institute of Technology, Dublin, Ireland
Luca Longo

Authors

Luca Longo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luca Longo .

Editor information

Editors and Affiliations

Ruwido Austria GmbH, Neumarkt am Wallersee, Austria
Regina Bernhaupt
Indian Institute of Technology Bombay, Mumbai, India
Girish Dalvi
Indian Institute of Technology Bombay, Mumbai, India
Anirudha Joshi
Indian Institute of Technology Bombay, Mumbai, India
Devanuj K. Balkrishan
Microsoft Research Centre India, Bangalore, India
Jacki O'Neill
Université Paul Sabatier, Toulouse, France
Marco Winckler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Longo, L. (2017). Subjective Usability, Mental Workload Assessments and Their Impact on Objective Human Performance. In: Bernhaupt, R., Dalvi, G., Joshi, A., K. Balkrishan, D., O'Neill, J., Winckler, M. (eds) Human-Computer Interaction - INTERACT 2017. INTERACT 2017. Lecture Notes in Computer Science(), vol 10514. Springer, Cham. https://doi.org/10.1007/978-3-319-67684-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-67684-5_13
Published: 20 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67683-8
Online ISBN: 978-3-319-67684-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Subjective Usability, Mental Workload Assessments and Their Impact on Objective Human Performance

Abstract

Similar content being viewed by others

Defining and Structuring the Dimensions of User Experience with Interactive Products