Keywords

1 Introduction

Quantitative data-driven user personas (DDPs) provide an alternative to qualitatively created personas (QCPs). DDPs can represent user populations in ways that are statistically valid, replicable via algorithms and verifiable by statistical metrics [1, 2]. The proliferation of DDPs is driven by the rise of “personified big data” [3] from social media and online analytics platforms that provides new opportunities to generate personas describing digital user populations.

Moreover, data science algorithms and machine learning libraries have made it possible to automate persona creation processes [4, 5] and to automatically update the personas when the underlying user data changes [6]. Based on these advantages, Human-Computer Interaction (HCI) scholars have proposed many types of profiles and layouts for DDPs [7,8,9,10], with varying complexity and informational content. The general goal is to increase quantitatively reliable information in personas.

Nonetheless, the multitude of layouts and templates for DDPs has resulted in two challenges: (1) there is a lack of a general template for DDPs, meaning that researchers and practitioners are uncertain of what information to include when using quantitative methods and online user data for the creation of DDPs. Moreover, (2) it is not well-known what the boundaries are of DDPs relative to QCPs. QCPs are based on social constructivism and human meaning-making [11] and the understanding that human persona creators infer from other humans (the users) when creating the persona. It has been postulated that persona creation is an immersive practice that in itself enhances understanding about the users. In turn, DDPs, might be limited in their ability to capture human nuances and understand meanings of social importance, as the persona creation takes place via probabilistic calculations that humans have little or no interaction with.

Thus, there is a need for research that critically examines the boundaries of the practice of DDPs and the information included in such personas. Figure 1 illustrates this concern with an example of a statistically valid but potentially non-useful persona. Previous research on DDPs fails to deliver a critical analysis such as this, focusing primarily on evaluating DDPs using technical accuracy metrics [1]. While Nielsen et al. [12] have analyzed the templates of user personas developed by Danish companies, such a review has not been conducted for DDPs personas specifically that, as we argue, require a dedicated analysis of their own.

Fig. 1.
figure 1

DDP based on a quantitatively identified patterns of user behavior [13]

Understanding DDPs is important because in-depth information about user motivations and pain points may not be readily available when relying solely on quantitative methods. This is because machine learning methods rely on probabilistic learning rather than a true understanding of human nature, and thus have limited ability to detect human pain points, needs, and wants, as well as goals of individuals [14]. Algorithms are unable to capture tacit information or to understand why a person acts the way he or she does. This limitation might form a fundamental obstacle for the value and usefulness of DDPs, as personas traditionally rely on their ability to convey human-centric information. For HCI, it is a principle of primary importance that personas appear as realistic profiles of otherwise cold and unempathetic “target groups” (as descriptions that cannot evoke empathy), thus enhancing stakeholders’ focus on end user needs [15]. Personas typically contain demographic information, as well as user goals and motivations [12]. The principle of rounded personas [16] calls for the persona to contain all the necessary information for stakeholders using the personas.

Thus, it is important to identify and discuss the boundaries of the information design of DDPs for research and practice. For this, a review of layouts and information of DDPs is needed. To this end, this analysis specifically focuses on DDPs, specifically on their layouts and information designs. Using systematic review methods, we locate and retrieve 31 DDP templates from prior research. We analyze the information in these templates using an extended version of the categories by Nielsen et al. [12].

Relevant studies containing persona profiles were identified, and the content of the persona profiles was extracted to answer the following research questions (RQs):

  • RQ1: What information do quantitative personas typically contain?

  • RQ2: What patterns can be found in quantitative persona layouts?

  • RQ3: How are purely quantitative personas different from qualitative or mixed quantitative-qualitative ones?

Our results indicate gaps in information design for DDPs, demonstrating the limitations of purely quantitative methods to generate rounded personas that serve stakeholders’ information needs in a holistic way. To remedy these gaps, we outline potential avenues for the use of algorithms, both independently and in collaboration with humans, to generate more holistic, more rounded DDPs than the current state of the art provides. As such, we provide an important contribution of combining algorithms and machine learning techniques with online user data and human judgment in order to create user personas that involve the benefits of quantitative data but also contain the type of information needed to understand the humans being behind the profile.

2 Related Literature

Persona templates are characterized by influencing each other and very few have looked at research for inspiration. Anvari et al. [17] have looked at cognitive psychology and learning for inspiration on what to include in the persona description. Nielsen [18] takes inspiration from filmscript writing.

Looking at the literature concerning what to include in the descriptions, there are some variations. Bornet and Brangier [19] describe in their study of the literature how three categories define the persona: (a) identity of the persona, (b) attitude towards the product or service, and (c) context of usage. Their study is built upon nine texts written between 2001 and 2009. Floyd et al. [20] report from 13 papers written between 1999 and 2006 and differentiates between seven kinds of personas that have different characteristics. Some types refer to authors of persona literature and advocacy, others to variations in use context. The types vary in how detailed they have, according to how much and what data they are built upon and the purpose of creation.

Nielsen et al. [12] analyzed 12 templates from 2006–2013. The study shows that the attitude towards the product and the context of use is often intertwined; thus the information can be divided into two main areas: (a) personality that includes various information about demographics and personality traits (b) information related to the specific area to design for such as technology use, a-day-in-the-life, products goals and behavioral information. Apart from this, some researchers suggest adding business information such as market size and brand relationships [21,22,23]. Finally, a few researchers suggest indicating differences that can affect the persona, such as differences between international markets [21, 22, 24] and different behavior according to disabilities [25].

Looking at recommendations from personas based on design team’s assumption, the literature recommends a limited amount of information such as name and demographics, behaviors and beliefs, needs and goals [26].

Common is that for both the qualitative data-based personas and the assumption-based personas is that the suggestions are not based on research across disciplines and large amounts of cases but are based on individual experiences and single case studies.

When it comes to the application areas, there is almost no area where personas has not been applied; digital services [27], learning [28], health care [29] are among the most common areas and target groups are both children [30], adults and users with special needs [31] using both mobile devices [32], and web services.

Previous research has shown that DDPs can take many forms and shapes. For example, Aoyama [7] used conjoint analysis to create DDPs for software embedded in digital consumer products. Holden et al. [9] developed “biopsychosocial” DDPs of elderly patients with heart failure using quantitative survey data. DDPs have also been applied in fashion [8], ecommerce [33], news [34], and many other domains. The diversity in persona information design, thus, appears to originate on one hand from the specificity of the methods applied – with the intuition that the outputs of different methods enable different information to be used for persona development – and, on the other hand, from the varying information needs of persona users, which inarguably affect the goals of the persona development endeavor. Thus, the consequence is that the field is embedded in the diversity of proposed design templates for DDPs. This diversity reflects the increasing relevance of DDPs for researchers and practitioners in user-centric industries.

Moreover, the design of DDPs has been explored both empirically, using experimental designs, and conceptually, by crafting research agendas that entail open questions for what is considered as “optimal” persona template. For example, Hill et al. [35] experimented with two persona designs: one that includes multiple pictures (consisting both of males and females) for a given persona and another one that has only one picture. Using a controlled laboratory study with eye-tracking measurement, they found that the use of multiple pictures may represent an appropriate technique to expand the persona users’ understanding of the persona as a gender-free (or, “multi-gender”) user segment rather than evoking gender stereotypes [35].

Similarly, Salminen et al. [34] experimented with persona profiles: one with lifestyle photos and the other with a single portrait picture. Contrary to Hill et al. [35], their findings indicated the use of multiple photos can distract and confuse the persona users, possibly because these are more used to the conventional template of the persona including only one photo [34]. Nonetheless, neither Hill et al. [35] or Salminen et al. [34] found that multiple photos would decrease the user engagement with the persona.

In another experimental study, Salminen et al. [36] presented 38 professionals with two alternate layouts: one that used numbers-oriented information presentation style and another one that used text-oriented style. They found that the numbers-oriented template was perceived significantly more usefulness by analysts but significantly less complete by both marketers and analysts [36]. The visual engagement with the persona profiles was found not to vary significantly between the templates [36].

Conceptually, persona information design in the context of DDPs has been raised as one of the prominent research areas [14, 37]. For example, Anvari et al. [38] discuss the use of personality traits in personas: it is unclear how well such traits that require subject-matter expertise and human analysis could be automatically added to DDPs.

3 Methodology

The persona layouts analyzed in this research were retrieved using systematic review methods. Two academic databases (Google Scholar and ACM Digital Library) were consulted for initial identification of articles. Identical literature searches were carried out for both databases in June 2019. The search phrases were devised with references to DDPs (“quantitative personas”, “data-driven personas”, “procedural personas”) in addition to specific methodologies (“automatic persona generation”, personas + cluster analysis | clustering | conjoint analysis | factor analysis | latent semantic analysis | matrix factorization | principal component analysis).

Snowball sampling was also applied [39] to identify additional DDP articles. In total, the searches yielded 138 unique articles, which were first assessed by reading the titles and abstracts and, subsequently, reviewing the full texts. The criteria for including an article in the final sample were:

  • full research article (no short articles, books or theses)

  • published in peer-reviewed journal or conference

  • written in the English language

  • empirical paper that develops personas using quantitative data

After a full text review, 49 (35.5%) articles remained. For the purposes of this researchers, we further excluded articles, which did not attach graphical representations of their final personas (i.e., persona layouts). At this stage, 30 final articles remained, and their persona layouts were extracted for further analysis. Data from each paper’s persona layout(s) was recorded using a standardized data extraction form [40] with sub-categories built on the previous work of Nielsen et al. [12] (see Table 1). In addition, the methodology conducted by each study (i.e., whether the paper used statistical and/or numerical techniques such as k-means cluster analysis, solely or in combination with qualitative methods such as ethnography) was also recorded.

Table 1. Information extracted from each persona layout, with examples

Furthermore, the categories were analyzed within the contexts of the authors’ methods and goals in their respective papers. This included three papers that contained illegible layouts (i.e., too small or blurry), but nonetheless offered sufficient details in their text regarding the individual components of the persona layouts.

The following section presents the findings. Appendix 1 shows the recorded data.

4 Findings

4.1 Levels of Information Richness

The persona layouts varied in “richness,” which we define as containing multifaceted, well-rounded information regarding the persona. We quantitatively calculated the richness of personas by tallying the total pieces of information (i.e., subcategories present) within each persona layout. The most complex persona layout contained information for 14 subcategories [47], while the least complex contained only 4 [8, 13]. The mean number of subcategories was 8.83, while the standard deviation was 2.57.

Based on the descriptive statistics, the persona layouts were divided into three levels of richness styles: “simple” (4 to 7 subcategories), “moderate” (8 to 10 subcategories), and “high” (11 to 14 subcategories) (see Table 2 for examples). We selected the number of subcategories for the levels after examining the entire dataset and identifying the natural ‘breakpoints’ in the number of subcategories. Half of the persona layouts (50%) fell under the “moderate” category, with the remainder falling relatively evenly between either “simple” (26.6%) or “high” (23.3%) richness.

Table 2. Examples of each category of persona layouts with varying richness

The graphical complexity and information richness of the personas do not necessarily correlate. For example, one persona layout [13], while an interesting graphical way of presenting personas, was questionable in its informativeness for end users; such extreme cases of abstraction were thus categorized under “simple” style despite their graphical complexity (see Table 2).

Persona layouts falling under the most “simple” information style, as exemplified by the layout from Dupree et al. [42] in Fig. 2, contained sparse information limited to bullet points detailing common behaviors. The persona is not identified with characteristics to make it human, like a name, or demographic and psychographic information; instead, it may only be labeled with a general epithet, such as “Lazy Experts” (close to what Floyd et al. [20] term as user archetypes).

Fig. 2.
figure 2

Information category distribution by methodology

Most “simple” persona layouts could be regarded as “skeleton personas” [43] that can be further enriched with details once time, costs, or limited data are removed as barriers. Persona layouts falling under the “moderate” information style – the most common category – reflect what such an upgrade in resources can result in. As exemplified by Kanno et al. [49], personas in this category are enriched with human-like elements, such as a full name, age, gender, and details on leisurely activities and temperament. In many cases, a photo of a real person is enclosed. The persona layout also contains a short narrative (or in some cases, detailed bullet points) about the persona’s daily life scenarios and design-related goals.

Finally, persona layouts falling under the “high” information category are enriched with the most details (see “C” in Table 2). They extend beyond “moderate” information layouts through the inclusion of quotes, graphical representations, and categorization of the persona’s information. In short, persona layouts in this category contain not only more comprehensive information on demographic and psychographic details, but also categorize details in direct relation to the authors’ objectives. For example, Tempelman-Kluit and Pearce [41] categorize specific details under library usage and frustrations, which are in direct line with the authors’ topics of inquiry. Graphical symbols illustrate what relevant devices or subscriptions the persona has (the authors’ point of interest). This contrasts with the personas in the “moderate” category (“B” in Table 2), which usually contain only a short narrative with details that are not necessarily arranged into meaningful categories. As such, persona layouts in the “high” category go beyond mere personification and become mediums of analysis, as users can view these layouts to quickly discern between relevant information from various categories.

4.2 Mixed Methods Vs. Purely Quantitative Methods

Most articles (56.7%, N = 17) adopted solely quantitative methods, while 43% (N = 13) of the papers adopted mixed methods (i.e., used both qualitative and quantitative methods). Among the articles that adopted quantitative methods only, 29.4% (N = 5) fell under the “simple,” 47.1% (N = 8) under the “medium,” and 23.5% (N = 4) under the “high” information styles. Among the articles that adopted mixed methods, 23.1% (N = 3) fell under the “simple,” 53.8% (N = 7) fell under the “moderate,” and 23.1% (N = 3) fell under the “high” information styles.

As such, quantitative articles fell relatively more often under the “simple” category (29.4% versus 23.1% for mixed method studies). Beyond this, no other major differences in richness could be observed between papers, either adopting quantitative methods solely or in combination with qualitative methods. These findings are similar to Nielsen et al.’s, who found in their analysis that companies with the lack of a formal quantitative data collection protocol nonetheless still resulted in final personas as lengthy and with descriptions just as thorough as those with data [12].

When analyzing the number of information pieces from personas from papers adopting mixed methods, 42.1% fell under the background, 40.5% under the design-related, 0.8% under business and marketing related, 11.6% under the graphics, and 5.0% under miscellaneous categories. A similar distribution was observed for personal layouts from papers adopting only quantitative methods, 42.3% of details fell under the background, 38.7% under the design-related, 0.9% under business and marketing related, 11.7% fell under the graphics, and 6.3% under miscellaneous categories (see Fig. 2).

Table 3 displays examples of how solely quantitative versus mixed method approaches differ in presentation of persona layouts. Each of the example layouts (quantitative from Goodman-Deane et al. [50] and qualitative from Tu et al. [45]) detail mostly background and design-related information. Nonetheless, the solely quantitative approach [50] results in a chart-like presentation of the details, with “scores” directly representing the quantitative data from the survey. On the other hand, the mixed method approach [45] results in more narrative-like, contextual descriptions.

Table 3. Purely quantitative versus mixed method persona

Furthermore, the former only captures “work related issues,” “daily life context,” and “product related issues” in subcategories with the design-related category, while the latter captures these in addition to the “product goals,” “scenarios,” and “a day in the life” subcategories. The former also does not capture any personality and psychographic information within the background category, while the latter infuses many of these details into narrative form. Thus, we surmise that the type of information collected for persona development (i.e., quantitative vs. qualitative) may carry over to the actually design of the persona, with numerical details such as graphs, scores, metrics, and tables being more common with purely quantitative personas and text-focused, narrative-like descriptions more prevalent in mixed method personas.

However, most layouts (especially in the “high” information richness category) combine both information styles, with some numerical cues and some textual information. The degree of text vs. numbers in data-driven personas is an open research question, with some previous research showing that the persona developers’ choices can affect the persona perceptions of users [36].

5 Discussion

While Nielsen et al. [12] found that most persona layouts from Danish organizations had a “strict distinction” between personas and scenarios, our analysis found that personas were generally intertwined with descriptions of present scenarios. Many of the richest personas (i.e. falling under the “high” information category) had narratives infused into the persona layouts’ descriptions to give them a more human-like quality. Interestingly, Nielsen et al. [12] reported that companies found this style “difficult to use (…) for the design of future solutions and as a result cancelled using the method [of intertwining]” (p. 6). This suggests the importance of considering the layout and categorization of information in a persona in conjunction with the researchers’ and/or practitioners’ needs. From our analysis, the persona layout from dos Santos et al. [44] is an example of how pertinent scenarios can be embedded into personas in a manner that remains relevant to stakeholders (see Fig. 3).

Fig. 3.
figure 3

Example of scenarios embedded into personas [44]

Furthermore, while Nielsen et al. [12] found a “noteworthy difference” in the “lack of information on income, urbanicity, and lifestyle” (p. 5) in their analysis, we did not find this to be the case. Moreover, Nielsen et al. also found that market segments were rarely captured in persona templates from Danish companies, which we also did not find to be the case. This may be attributed to a greater diversity in contexts observed in our set of persona layouts. As we only included studies that were data-driven and did not exclude by geographical region, such demographic details were intentionally included in many of the persona layouts. For example, studies develo** personas for e-health devices found it pertinent to capture lifestyle and subpopulation distribution percentages [9], while studies conducted in market research and business contexts captured relevant income, lifestyle, and urbanicity data of potential customers [2, 45].

Nielsen et al.’s [12] finding that Danish persona descriptions lacked business and marketing related information was also reflected in our analysis of international, data-driven persona layouts. We also found it to be the case that “even though it is stated in several interviews that personas are used as a strategic tool and in marketing, the descriptions do not reflect this” (p. 6). Specifically, information pertaining to competitors, business objectives, and brand relationship were not captured in any of the persona layouts we analyzed. Rather, relevant information to businesses was mostly indirectly captured in the design-related categories, in relation to the personas’ product goals, work-related issues, and daily life or work context. This may suggest that the persona layouts were designed to envision common product usage scenarios among customers rather than to explicitly to illustrate and correspond to business objectives.

Moreover, the persona layouts in our analysis also reflected Nielsen et al.’s [12] observation that researchers developed “different ways of fostering identification (…) the use of keywords, headlines, and quotes give a quick understanding of the kernel of the persona description” (p. 6). Our own analysis found that a variety of visual cues across persona layouts, from color coding, use of icons, visual scales, and even data charts [47] (see Fig. 4 for an example).

Fig. 4.
figure 4

Example of visual identification aids in personas [47]

The field is in dire need of empirical user studies. Nascent work shows promise in applying methodologically diverse methods such as eye-tracking, think-aloud, observer notes [34, 35], and examining multiple aspects of persona design, such as photos, text vs. numbers, and so on [36, 51, 52]. Yet, there is a lack of systematic research that would incrementally advance the design practice of personas into a more optimal state. Currently, some of the empirical findings are conflicting (such as those by Hill et al. [35] and Salminen et al. [34] regarding the use of multiple photos). We surmise that this is due to variations in the implementation of persona templates – both small and large variations can affect user experiences in crucial ways. In other words, the persona templates tested by different research “look and feel” different and thus are perceived as and engaged with in different ways. The only way, it thus appears, to produce consistent research insights that are generalizable across the nuanced implementations of DDPs, is to include more design variations in these user studies. This would, consequently, require the use of large-scale data collection, potentially prompting for more scalable data collection such as persona crowd experiments. There are already existing examples of using crowdsourcing for data collection in persona studies [51, 52]; however, more efforts are needed.

6 Conclusion

An empirical analysis shows that DDP layouts draws heavily from both quantitative and qualitative data. Some persona information (e.g., lifestyle, personality) is difficult to obtain using purely quantitative methods, thus requiring qualitative insights to realistically portray this information. We also find that graphical complexity and information richness do not necessarily correlate. The range of information categories is high, as the most information-rich persona template has more than 300% more information categories than the least information-rich template. Furthermore, the choice of the persona development methodology may carry over to the information design of DDPs, with quantitative data typically presented as scores, metrics, or tables and qualitative data as text-rich narratives. We did not find one “general template” for DDPs; this cannot be defined easily if at all, due to the variety of the outputs of different methods as well as differences in the information needs of the persona users.