Keywords

1 Introduction

Evidence indicates that healthy eating is an effective way to slow down the progression of diabetes mellitus [1]. In recent decades, an increasing knowledgebase about the disease has been built and moved into the digital sphere. Online communities provide strong support for dietary behavior change, particularly for the patient newly diagnosed with prediabetes or type II diabetes. However, for newly diagnosed older adults, that digital health-related information may not be easily accessible due to difficulty in adapting to new technology. To reduce this barrier, it is expedient to design a smart system adopting an appropriate technology to aid in diet change. A potentially useful technology for this problem is a healthy food recommender system. This is essentially a decision support tool aimed at optimizing the nutritional intakes for an individual by providing recipe recommendations tailored to their particular health related dietary needs.

Recommender systems have been found useful in different domains such as e-commerce and scientific literature to solve the problem of information overload, leveraging big data analytics of user behavior, ratings, and user-defined tags [2,3,4]. In recent years, researchers in the food recommender system domain have focused on develo** algorithms that incorporate health aspects into the system [5,6,7,8,9]. However, there is limited research focusing on the user interface design and evaluation of a food recommender system, factors which are likely to have a direct impact on guiding user behavior. What’s more, the technology is not currently designed for older adults, users of the system who may stand to gain substantially from the health benefits of diet change, but who may struggle to learn the system due to low computer proficiency and self-efficacy of adopting technology into their daily life.

For the purpose of beginning to fill in the research gap around the design of a healthy food recommender system for older adult users, a prototype app with a choice-based user interface design was created. In this article, two critical user interface design variables were selected to be systematically examined by a 2 × 2 full factorial experiment; presenting the choice options on a vertical list-view layout or a side-by side layout; presenting the nutrition information in a text-list label (FDA Nutrition Facts Panel style) or in a symbolic, interpretative label (FSA Nutri-scores). The outcome variables were collected by a scenario-based human factors evaluation study. The usability and human workloads of using the system were evaluated by the USE questionnaire and NASA-TLX questionnaire after participants performed a healthy food choice task in a breakfast cereal selection scenario.

The purpose of this research is to gain insight on the optimal user interface design for the proposed healthy food recommender system. Our intent is to fill the research gap surrounding ageing-centered design of a healthy food recommender system by testing potential design guidelines within the paradigm of a user-centric evaluation framework. The paper is organized as follows: we begin with a literature review around the issues of technology and older adults and two key design variables, we then briefly describe the development of a prototype recommender system, proceeding next to describe the methods of our pilot experiment to test the system, and we finish with results from the experiment and concluding remarks.

2 Literature Review

2.1 Technology Acceptance Gap of Dietary Apps for Older Adults

Develo** technology for older adults is a unique problem. Compared to younger adults, older adults aged over 55 are less likely to be engaged with food-related technologies [10]. This lower likelihood of older adult consumers adopting mHealth apps can be explained by the Technology Acceptance Model (TAM) proposed by Davis (1985), within which a consumer’s attitude towards technology is mainly determined by Perceived Usefulness (PU) and Perceived Ease of Use (PEOU) [11]. In the perspective of PU, it appears that researchers have overestimated users’ enthusiasm for tracking their health using technology [12]. According to the 2013 Pew Study of the Quantified-Self movement, as many as 69% of adults track their own health-related measures such as weight, body circumstances, and food intakes. However, half of these do so by roughly kee** such information in their head, seldom using technology [13, 14].

Older adults may also have a lower PEOU for food tracking apps. Orso et al. (2019) developed a food-tracking app called ‘Salus’ and invited older adult users to evaluate the usability. Overall they found positive results for perceived ease of use and satisfaction, but the learnability and error prevention of the app were relatively low and they found that older adults were not particularly engaged in using the app [15]. Guo et al. (2013) also reported finding that older adults were reluctant to use the mHealth app until they were forced to do so, for example, as when “prescribed” by a doctor [16].

Despite their reluctance to embrace mHealth apps, older adults are particularly interested in using nutrition information. For example, Sanjari et al. (2017) performed a literature review about customer’s attitude and response to front-of-package (FOP) nutrition labels, from 1990 to 2016. They found that when compared to younger adults, older adults’ food choices are more influenced by the nutrition labels on packaged food during grocery shop** [17]. There is reason to believe that the older adult perceived usefulness in nutrition information can remain intact even when the information is delivered through a digital platform. Ali et al. (2012) developed a digital nutrition education package on platforms using touch-screen technology (i.e. smartphone and tablets). The app was designed based on appropriate guidelines for older adults. In the context of this digital education package, they found that older adults accepted the usefulness of the content for promoting a healthy lifestyle and the delivery mode as easy to use [18].

Focusing on perceived ease of use, the groundwork has been laid for promoting older adult friendly mHealth apps. Watkins et al. (2014) performed a heuristics evaluation of a small set of commercial dietary apps available in the Apple App store in 2012. They derived 14 heuristics from Nielsen’s general usability heuristics [19], older adult-friendly design heuristics by Chisnell et al. [20], iOS-specific heuristics, and the usability issues from user feedback [21].

2.2 Search User Interface Design

An important choice point for the design of a food recommender system is the Search User Interface (SUI). The human user of such an mHealth app interacts with the Information Retrieval (IR) and Recommender System (RS) or the database through the SUI. SUI development has evolved from a query search on a command-line system to a keyword search on a graphical user interface. There has been less recent research in the SUI design area as the Google search engine UI has been established as the keyword search paradigm. The Google search engine UI has been found relatively easy to use and easy to learn for older adults and novice users as compared to other search engine UIs [22]. Despite this, some usability problems for older adults still occur due to confusion with the information architecture, which may be a new idea to some older adults [23].

Given Google’s dominance of the search engine market as the most frequently used landing homepage of internet browsers, modern SUI features are defined by and compared against its well-known layout and features including input, control, informational, and personalizability [24]. However, given the fast-paced development of new technologies, it is important that SUIs continue to evolve to accommodate new innovations. A prime example is the faceted search used on mobile devices to adapt to the limitation of smaller screen size [25, 26]. Faceted metadata search allows users to refine the searching results with filters of its metadata, which defined the search space [27, 28]. Faceted systems can be more dynamic and have been found to improve the search experience for exploratory search type tasks compared to keywords searches [29]. It could facilitate the “Berrypicking” browsing model of Bates, which is close to searchers’ natural behavior of searching, by effectively collecting metadata of needed information from various resources [30].

2.3 Nutrition Information Format Design

A widely accepted and a cost-effective method to nudge consumer dietary behavior is by providing nutrition information in the context of daily food selection. One of the primary vehicles is the nutrition label on the packaged food provided to support a consumer’s decision-making process. In the United States, the government (Food and Drug Administration; FDA) established a packaged food regulation policy through the Nutrition Labeling and Education Act of 1990, introducing the standardized format nutrition label called the Nutrition Facts Panel (NFP). The Nutrition Facts Panel is a Back-of-Package (BOP) nutrition label, displaying information including serving size, number of servings, total energy, and a selection of nutrients such as energy from fat, total fat, saturated fat, trans fat, cholesterol, sodium, carbohydrates, dietary fiber, sugar, protein, vitamin A, vitamin C, Vitamin D, calcium, and iron. In response to the related evidence-based research in academia about the risks regarding trans-fat the FDA made a 2006 amendment to require data on Trans fat. In 2016, a revision was made to require a line on added sugar, bringing the NFP in alignment with the 2015–2020 Dietary Guidelines for Americans regarding limiting added sugar intake to less than 10% of daily calories [31].

In many countries of Europe, a variety of symbolic labels, such as health claims, Guidelines Daily Amounts (GDA), traffic light system, and Nutri-scores, have been developed as the Front-of-Packaged (FOP) labels [32]. The GDA, developed in the UK around 1998, is a list of the absolute amount per serving and the percentage of daily value for five key nutrients: calories, fat, saturated fat, sugar, salt. It’s a simplified nutrition facts panel with the general guidelines for healthy adults and children. The traffic light system further adds the color-coding of a traffic light signal to further interpret healthiness with the high-, medium-, and low-level content of those nutrients corresponding to a color. Nutri-scores integrates the weighted sum of the GDA and the color-coding of the traffic light system to interpret the healthiness with a 5-point scale of a letter grading system.

3 Prototype

In order to test a pair of UI design variables, the prototype of a smart food decision support system for an mHealth app was designed and developed by the User-Centered Design (UCD) process. The design project began with a thematic analysis of literature and also included informal interviews with older adults from the local community. From this information a use case was development surrounding dietary planning as a part of informal caregiving for a spouse newly diagnosed with Type II diabetes. The sketch of the design was drawn based on this use scenario, and a cognitive walkthrough was performed by the designer to develop a low-fidelity prototype as shown in Fig. 1. This system integrates the food composition and nutrition database and the recipe recommender system to provide a decision support feature to users.

Fig. 1.
figure 1

Fig. 1: Keywords search and browsing the recipe for the ingredient and adding the recipe to the grocery list.

4 Methods

Two critical UI design variables, search result layout and nutrition information format were addressed in the literature review and were examined in a user testing pilot experiment. For the search result layout, the designer proposed a choice-based UI design which facilitated the faceted search by presenting two side-by-side alternatives at a time. For the purpose of comparison, a browsing-based UI with a vertical list-view layout was used as the baseline. For the nutrition information format, the designer adopted FSA Nutri-score labels [33, 34] to compare to the FDA NFP label. A smart healthy food recommender system prototype was developed in the NodeJS environment as a web application to implement the subset of desired functionalities required to test the design variables. A within-subject test was employed across four user interfaces, varying the testing order to protect against confounding with a learning effect.

4.1 Participants

Thirty participants were recruited for this experiment from the greater Lafayette area in Indiana. Participants came from two populations: the first group consists of 15 older adults age 60 years and older, who consider themselves capable of meal planning. These individuals are likely to represent at least a subset of those who would most benefit from diet change required by a chronic health condition diagnosis and use a food recommendation system for this purpose.

The second group was recruited from college and graduate students aged from 20–35 who were attending Purdue University. These individuals are more likely to be early adopters of technology to enhance personal health management in their daily lives. Data from this second group permits us to draw some inference about how ageing impacts the design of the application.

4.2 Procedures

All subjects were asked to fill out a preliminary questionnaire and then follow a scripted use scenario using the web-based application on a computer. A small subset of participants who volunteered to do so, completed the computer-based task on a desktop equipped with an eye tracker located in a lab in the Discovery Learning Resource Center (DLRC) at Purdue University. Remaining participants completed the computer task on a laptop without an eye tracker in a designated meeting room or public area at the recruitment location. For each UI in the computer task, participants filled out a usability questionnaire about their perceived ease of use and perceived ease of learning for the system.

Preliminary Questionnaire.

The preliminary questionnaire consisted of basic demographic information, questions adapted from the Computer Proficiency Questionnaire [35], and Health Literacy Questionnaire [36]. Demographics included age and gender. The Computer Proficiency Questionnaire is designed to assess an individual’s experience using computers and provides an overall score as well as three sub-domain scores. This information allows us to control for an individual’s experience using a computer. The Health Literacy Questionnaire is a nutrition facts label comprehension test for evaluating an individual’s knowledge level in health and nutrition and serves as a proxy to control for the participant’s experience in making food choices based on nutrition information.

Computer Task.

For the study, participants sat at a table with a laptop with the experimenter close at hand for assistance. The experimenter first explained and demoed the standard operation procedures of the system to the participant. The participant was then permitted to freely try out the various features of the system until they indicated they were confident in how to use the system. The subject was informed by the experimenter that their task was to use the web-based application to make a choice of cereal for breakfast for an informal care recipient (e.g. Spouse). The experimental task was to choose a cereal out of six (or six pairs of) alternatives within the context that the participant was planning a meal for their care recipient who was newly diagnosed with type II diabetes. The care recipient strongly wants to have breakfast cereal in their meal plan, and the consulting dietitian has suggested an allowable amount of cereal for breakfast, and recommend using the food recommendation system to assist in meal planning. The participants were encouraged to think aloud when they performed the task to give further insight to the experimenter in how they were interacting with the system.

Each participant encountered four UI alternatives of the recommendation system. These included the combinations of browsing-based user interface or choice-based user interface and text-list nutrition information or symbolized nutrition information. The order of the four versions were randomized to avoid confounding results with a learning effect. After finishing each version of the recommendation system, participants completed the USE questionnaire for measuring the Perceived Usefulness (PU), Perceived Ease Of Use (PEOU), Perceived Ease Of Learning (PEOL), and Satisfaction; and a NASA-TLX subjective workload questionnaire for measuring the mental workload. The experiment concluded with a short interview were participants were asked for their comments and suggestions to improve the system design.

4.3 Measures and Analysis

The self-rated metrics of subjective workload and perceived usability were the quantitative measures collected. The subjective workload was measured by NASA-TLX questionnaire with the weighted total scores as the primary output metric. The perceived usability was measured by four instruments with 7-point Likert scales from the USE questionnaire. The output metric from those four instruments were the total score of PU, the total score of PEOU, the total score of PEOL, and the total score of Satisfaction.

A General Linear Model was used for analysis. The models’ main effects were the two design variables estimated as fixed effects. A random effect for subject was used to capture the within subject correlation. All possible interactions between fixed effects were estimated. Significance was set at 5% and all data analyses were performed in Minitab® statistical software. A secondary analysis was performed using the within subject Z-score adjustment to increase the test power given the high degree of variability between subjects [37]. For each test, an individual’s outcomes (e.g. workload rating) were standardized using that individual’s mean and standard deviation (e.g. from their four workload ratings).

Qualitative data were collected in the form of notes during the experiment as the participants practiced thinking aloud and from the brief interview at the conclusion of the experiment. These data were first transcribed and then printed out on paper so that they could be marked up. The experimenter read each transcript several times, each time marking important themes. After marking all transcripts, these themes were collected from all participants and the differences and similarities were analyzed for basic tendencies.

4.4 Hypotheses

The goal of the experiment was to examine two UI design variables for a relationship with participants perceived usability and subjective workload rating. We tested two key hypotheses:

H1: The search result layout is a significant predictor of the self-rated metrics (weighted total of NASA-TLX, PU, PEOU, PEOL, Satisfaction).

H2: The nutrition information format is a significant predictor of the self-rated metrics (weighted total of NASA-TLX, PU, PEOU, PEOL, Satisfaction).

5 Results

5.1 Quantitative Results

General linear models with a random effect for subject were fit to assess for a relationship between the design variables (search result layout and nutrition information format) and the outcome variables of workload and perceived usability (PU, PEOU, PEOL, Satisfaction). The main results are given in Table 1. For the outcome of subjective workload, the nutrition information format is a statistically significant predictor of the total score from NASA-TLX (P = 0.000). The main effect of search result layout was not found to be statistically significant (P = 0.294). Nutrition information format was a statistically significant predictor of PEOU scores (P = 0.008) and also of PEOL scores (P = 0.01). Neither the search result layout main effect (P = 0.167 for PEOU and P = 0.208 for PEOL) nor the two-way interaction (P = 0.387 for PEOU and P = 0.803 for PEOL) were statistically significant for both PEOU score and PEOL score. The FSA Nutri-scores nutrition information format had the lowest mean workload score and highest mean PEOU and mean PEOL. For the models with PEOL, PU, and Satisfaction as the outcome variable neither main effect nor the interaction were significant.

Table 1. ANOVA table of GLMs

Notably, for each of the general linear models, variability in the outcome was dominated by the between subject variability. For the purpose of reducing this impact and thereby increase the power of the tests we used a within subject z-score adjustment approach. This allowed us to get a better feel for whether or not subjects tended to prefer a particular design variable level even if there was no consensus on how they ought to be rated numerically. We repeated the General Linear Model analysis on the transformed data, results are given in Table 2.

Table 2. ANOVA table of GLMs using z-scores standardization

The within subject standardization of the outcome variables had the intended effect of reducing the between-subject variance such that the random effect was no longer statistically significant. The main effect of nutrition information format remained significant for workload (P = 0.000), PEOU (P = 0.002), PEOL (P = 0.032), and was now found to be statistically significant for the outcome of Satisfaction (P = 0.029). The two-way interactions of the design variables was significant in the PEOU model (P = 0.004), and was now statistically significant in the PU model (P = 0.049). The optimal combination of design factors for Perceived Ease of Use were list-view search result layout and FSA Nutri-label nutrition information format.

5.2 Qualitative Results

For the purpose of better explaining the meaning of quantitative results, several sources of qualitative data were collected. These include experimenter notes derived from the participant practice of thinking aloud during the experiment and the short interview about participant impressions of and suggestions for the food recommender system.

The qualitative results support the quantitative results since there was not a consensus on which Search Result Layout was preferred. It appears that an individual’s preference of the search results layout may be influenced by their search strategy. For example, three students and two younger older adults (early 60s) who were all familiar with using computers, specifically mentioned that they preferred the vertical list view over the horizontal side-by-side view as it requires fewer steps to do an exhaustive search of all alternatives. On the other hand, one student and three older adults showed a strong preference for the side-by-side view layout as it made pairwise comparisons straightforward. One of the older adults noted that they preferred the list-view layout for the exhaustive search, but also appreciated the horizontal presentation of the alternatives, for the ease of comparison.

The qualitative results regarding nutrition information format also aligned with the quantitative results as most participants preferred the Nutri-scores labels. Four students and two older adults raised some important concerns about the Nutri-score label. First, they pointed out correctly that the label was not specifically designed for the patient with type II diabetes and so might not be appropriate for decision making when caregiving for such a care recipient. Secondly, they pointed out that the absence of information such as serving size and sugar content are absent from the Nutri-score label making it difficult to control daily intake values. Along that same line, two older adult participants mentioned they were unsure in deciding between NFP and Nutri-scores when they were asked about their preference. They both thought they should choose the NFP for its detailed information but they both desired a simple, easy to use label for decision making.

Another theme that arose from the notes is that it takes a little time to learn how to use and to trust the Nutri-score labels. Most of the participants saw the Nutri-scores label for the first time during the experiment and requested some explanation about the context of the label and how to interpret it. Once participants learned that the rating was created by government authorized experts and became familiar with its interpretation, they were willing to use it. Participants did not fully trust the label until they felt that it conformed with their expectations. Only a minority of participants trusted the label implicitly without going through this process.

To our surprise, most participants needed some instruction in how to interpret the Nutri-score rating system. One student and one older adult reported a comprehension issue, noting that the Nutri-score label was in the reverse order, in terms of quality, from what they found intuitive. Nutri-score marks a letter grades on an A to E scale with the traffic light color coding, A being the best and E being the worst. The current design drew the scale such that it starts at “A” on the left-hand side. These two participants were most familiar with thinking of mathematical scales (increasing numerical order) and so experienced difficulty at first in interpreting the label.

Given the exploratory nature of the experiment, several participants gave suggestions for improvement of the design. The majority of participants preferred the vertical list-view of search results, but two older adults suggested a horizontal list-view layout, which follows their natural reading behavior. Given the different searching strategies, the most compelling suggestion came from a student who recommended a faceted search with a dynamic layout design alternative. Search results would first be presented in a list-view layout on the main page from which the user could pick multiple alternatives before advancing to a secondary page. On this secondary page, the selected alternatives would be shown in a side-by-side layout to aid in pairwise comparison decision making.

6 Discussion

In our pilot study, we examined the relationship of search result display and nutrition information format with workload and ease of use outcomes in the context of a food recommender system. We collected quantitative data using subjective questionnaires, which is the most frequently used usability evaluation method in empirical UI design studies for older adults [38]. The variability in our outcome was dominated by between subject variability. We took several steps to try to reduce the impact of this such as allowing participants to check their previous answers as they evaluate subsequent UI designs and encouraging them to consider the first UI they encounter as the baseline UI during the experiment. Additionally, we performed a secondary analysis with transformed outcomes using the within subject Z-standardization approach. In order to supplement the quantitative analyses we collected qualitative data. We consider each of these

We originally expected to find that older adults would consider the choice-based UI to be easier to use. Quantitatively there was not a significant difference between choice-based and list-view browsing-based UIs, but a divide between two UIs was born out in the qualitative data. We can think of at least two reasons that may have led to this contrary finding. First, it may simply be that we have an underpowered study, and given a sufficiently sized and representative sample of older adults we might find the choice-based UI was preferred. While our sample size was certainly small, we think the second reason is more compelling. Our qualitative data made it clear that at least two search strategies were being employed. If an individual was seeking the optimal choice through an exhaustive search of the options, list-based UI was preferred. However, if participants were more focused on picking a quality (but not globally optimal) choice using the satisficing rule, the choice-based UI was a natural preference. Which search strategy is most likely to be used and which UI is most natural to first time users will likely depend, at least in part, on the most common application designs dominating user experience, such as Google’s search engine UI.

In terms of quantitative results, we prefer the standardized outcomes over the raw data. We think it is easier for the novel user to make relative comparisons between different UI designs than to make an unbiased judgement on an absolute scale. The question being answered on the standardized scale is more a matter of which UI does the user prefer as opposed to the overall quality of the UI design. Although this is the question we are after in this study, it does reduce generalizability of the findings overall. For example, it may be clear that a design is preferred, but that does not establish that the design is good (i.e. the best of four poor designs).

In the interest of limitations, the fact that we performed unplanned secondary analysis on a small sample size is likely to inflate Type I error of statistical tests. This reduces the strength of the quantitative findings. However, given the agreement from the qualitative data, we have at least convinced ourselves of some potentially interesting improvements that can be made to the UI design of a food recommender system. In particular, it was noted that the Nutri-score label was generally preferred for ease of decision making, but that some additional information from the NFP would still be required to support diet management for the care recipient with a chronic condition. Additionally, the multiple search strategies suggest that a fully faceted search UI beginning with a list of alternatives to select from and ending in the choice-based UI is likely to be more useful than either UI by itself.

Our measures were subjective, but it is possible to have objective data in this same setting. In particular it is possible to measure human task performance or eye tracking data to better assess the impact of these and other UI design decisions [24, 39].

We leave these explorations for future work.

7 Conclusion

In this article, we describe a pilot experiment using a food recommender application prototype developed using the ageing-centered design process. Such an app could be used to support healthy diet change in individuals facing a chronic health condition or disease. As the knowledgebase surrounding diet specific strategies to combat various conditions and diseases, such an app could be tailored to support the individuals care needs. We identified two critical design variables from the literature for this context, search result layout and the nutrition information format. We expected the choice-based UI design would be preferred. The study results suggested that users may be split on the list vs choice-based UI and an optimal design may actually be a combination of the two (list to choose a subset of alternatives which are fed into the choice-based second stage for deeper comparison). We also found a general desire for the simplicity of the interpretive Nutri-score label, but also a noted need to maintain certain elements from the NFP to support proper food intake management. Design is an iterative process and the insights from this step suggest the next step should be towards the faceted search feature development.