Serious games in high-stakes assessment contexts: a systematic literature review into the game design principles for valid game-based performance assessment

Bijl, Aranka; Veldkamp, Bernard P.; Wools, Saskia; de Klerk, Sebastiaan

doi:10.1007/s11423-024-10362-0

Serious games in high-stakes assessment contexts: a systematic literature review into the game design principles for valid game-based performance assessment

Research Article
Open access
Published: 08 April 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Educational technology research and development Aims and scope Submit manuscript

Serious games in high-stakes assessment contexts: a systematic literature review into the game design principles for valid game-based performance assessment

Download PDF

Aranka Bijl ORCID: orcid.org/0000-0001-5745-1396^1,2,3,
Bernard P. Veldkamp²,
Saskia Wools³ &
…
Sebastiaan de Klerk³

950 Accesses
Explore all metrics

Abstract

The systematic literature review (1) investigates whether ‘serious games’ provide a viable solution to the limitations posed by traditional high-stakes performance assessments and (2) aims to synthesize game design principles for the game-based performance assessment of professional competencies. In total, 56 publications were included in the final review, targeting knowledge, motor skills and cognitive skills and further narrowed down to teaching, training or assessing professional competencies. Our review demonstrates that serious games are able to provide an environment and task authentic to the target competency. Collected in-game behaviors indicate that serious games are able to elicit behavior that is related to a candidates’ ability level. Progress feedback and freedom of gameplay in serious games can be implemented to provide an engaging and enjoyable environment for candidates. Few studies examined adaptivity and some examined serious games without an authentic environment or task. Overall, the review gives an overview of game design principles for game-based performance assessment. It highlights two research gaps regarding authenticity and adaptivity and concludes with three implications for practice.

A Provisional Framework for Multimodal Evaluation: Establishing Serious Games Quality Label for Use in Training and Talent Development

Success factors for serious games to enhance learning: a systematic review

Article 20 September 2016

A Systematic Literature Review of Game-Based Learning and Safety Management

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

In the years since their first introduction (ca. 1950s), videogames have only increased in popularity. In education, videogames are already widely applied as tools to support students in learning (cf. Boyle et al., 2016; Ifenthaler et al., 2012; Young et al., 2012). In contrast, less research has been done on the use of videogames as summative assessment environments, even though administering (high-stakes) summative assessments through games has several advantages.

First, videogames can be used to administer standardized assessments that provide richer data about candidate ability in comparison to traditional standardized assessments (e.g., multiple-choice tests; Schwartz & Arena, 2013; Shaffer & Gee, 2012; Shute & Rahimi, 2021). Second, assessment through videogames gives considerable freedom in recreating real-life criterion situations, which allows for authentic, situated assessment even when this is not feasible in the real working environment (Bell et al., 2008; Dörner et al., 2016; Fonteneau et al., 2020; Harteveld, 2011; Kirriemur & McFarlane, 2004; Michael & Chen, 2006). Third, videogames can offer candidates a more enjoyable test experience by providing an engaging environment where they are given a high degree of autonomy (Boyle et al., 2012; Jones, 1998; Mavridis & Tsiatsos, 2017). Finally, videogames allow for assessment through in-game behaviors (i.e., stealth assessment), which intends to make assessment less salient for candidates and lets them retain engagement (Shute & Ke, 2012; Shute et al., 2009).

The benefits above highlight why videogames are viable assessment environments, irrespective of the specific level of cognitive achievement (e.g., those depicted in Bloom’s revised taxonomy; Krathwohl, 2002). Moreover, the possibility for immersing candidates in complex, situated contexts make them especially interesting for higher-order learning outcomes such as problem solving and critical thinking (Dede, 2009; Shute & Ke, 2012). Therefore, videogames may provide a solution to the validity threats associated with traditional high-stakes performance assessments: an assessment type to evaluate competencies through a construct-relevant task in the context for which it is intended (Lane & Stone, 2006; Messick, 1994; Stecher, 2010), often used for the purpose of vocational certification.

The first validity threat associated with high-stakes performance assessments is the prevalence of test anxiety among candidates (Lane & Stone, 2006; Messick, 1994; Stecher, 2010), which is shown to be negatively correlated to test performance (von der Embse et al., 2018; von der Embse & Witmer, 2014). Although some debate exists about the causal relationship between the two (Jerrim, 2022; von der Embse et al., 2018), it is apparent that candidates who experience test anxiety are unfairly disadvantaged in high-stakes assessment contexts.

The second threat identified is caused by a need for high-stakes performance assessment to be both standardized to ensure objectivity and fairness (AERA et al., 2014; Kane, 2006) as well as include a construct-relevant task (e.g., writing an essay, participating in a roleplay; Lane & Stone, 2006; Messick, 1994). While neither rule out adaptivity (e.g., adaptive testing and open-ended assessments), the combination often restricts us to use a linear performance task that is not adaptable to candidate ability level. The potential mismatch that could occur between task difficulty and the ability level of candidates posits two disadvantages. First, the mismatch can frustrate candidates, which negatively affects their test performance (Wainer, 2000). Second, candidates likely receive fewer tasks that align with their ability level, which negatively affects test reliability and efficiency (Burr et al., 2023). High-stakes performance assessments would thus benefit from adaptive testing that is personalized and appropriately difficult, allowing candidates to be challenged enough to retain engagement (Burr et al., 2023; Malone & Lepper, 1987; Van Eck, 2006) while assessors are able to determine whether the candidate is at the required level efficiently and reliably (Burr et al., 2023; Davey, 2011). Additionally, adaptive testing allows for more personalized (end-of-assessment) feedback that could further boost candidate performance (Burr et al., 2023; Martin & Lazendic, 2018).

The third threat identified in high-stakes performance assessment is a lack of assessment authenticity. Logically, assessment would be administered best in the authentic context (i.e., the workplace in the case of professional competencies). This leads to a high degree of fidelity: how closely the assessment environment mirrors reality (Alessi, 1988, as cited in Gulikers et al., 2004). Unfortunately, this is not attainable for competencies that are dangerous or unethical to carry out (Bell et al., 2008; Williams-Bell et al., 2015). Another concern is that in the workplace, assessments are largely dependent on the workplace in which they are carried out. This would lead to considerable variations in testing conditions between candidates, but also the construct relevance of tasks they are evaluated on (Baartman & Gulikers, 2017). Authenticity of physical context and task are two dimensions required for mobilizing the competencies of interest (Gulikers et al., 2004), there is a need to achieve authenticity in other ways. Authenticity is also related to transfer: applying what is learned to new contexts. The higher the alignment between assessment and reality is, the more likely it is that the transfer of competence to the professional practice is made.

The fourth threat identified are inconsistencies between raters in scoring candidate performance. Traditional high-stakes performance assessments are often accompanied by rubrics to evaluate candidate performance; however, inconsistencies in how rubrics are interpreted and used leads to construct-irrelevant variance (Lane & Stone, 2006; Wools et al., 2010). In this study, the aim is to investigate whether ‘serious games’ (SGs)—those “used for purposes other than mere entertainment” (Susi et al., 2007; p. 1)—provide a viable solution to this and the other limitations posed by traditional high-stakes performance assessments.

The most important characteristic of games is that they are played with a clear goal in mind. Many games have a predetermined goal, but other games allow players to define their own objectives (Charsky, 2010; Prensky, 2001). Goals are given structure by the provision of rules, choices, and feedback (Lameras et al., 2017). First, rules direct players towards the goal by placing restrictions on gameplay (Charsky, 2010). Second, choices enable players to make decisions, for example to choose between different strategies to attain the goal (Charsky, 2010). The extent to which rules are restrictive for the gameplay is also closely related to the choices players have in the game (Charsky, 2010). Thus, rules and choices seem to be on two ends of a continuum that determines the linearity of a game. Linearity is defined as the extent to which players are given freedom of gameplay (Kim & Shute, 2015; Rouse, 2004). The third characteristic, feedback, is a well-versed topic in the field of education. In education, the main purpose of feedback is to help students get insight into their learning and get student understanding to the level of learning goals (Hattie & Timperley, 2007; Shute, 2008; van der Kleij et al., 2012). In games, feedback is used in a similar way to guide players towards the goal, as well as facilitate interactivity (Prensky, 2001). Feedback in games is provided in many modalities and gives players information about how they are progressing and where they stand with regards to the goal. For instance whether their actions have brought them closer to the goal or further away. Games are made up of a collection of game mechanics that define the game and determine how it is played (Rouse, 2004; Schell, 2015). In other words, game mechanics are how the defining features of games are translated into gameplay. To illustrate, game mechanics that provide feedback to players can include hints, gaining or losing lives, progress bars, dashboards, currencies and/or progress trees (Lameras et al., 2017).

When designing a game-based performance assessment, determining the information that should be collected about candidates to inform competence and designing the tasks that fulfill this information need is something that should be considered carefully for each professional competency. One way is through the use of the evidence-centered design (ECD) framework (cf. Mislevy & Riconscente, 2006). The ECD framework is a systematic approach to test development that relies on evidentiary arguments to move from a candidates behavior on a task to inferences about candidate ability. It is beyond the scope of the current study to examine the design of game content in relation to the target professional competencies. In this systematic literature review, the aim is to determine which game mechanics could help overcome the validity threats associated with high-stakes performance assessments and are suitable for use in such assessments.

Previous research for game design has been done for instructional SGs (e.g., dos Santos & Fraternali, 2016; Gunter et al., 2008). For SGs used in high-stakes performance assessments, emphasis is put on the potential effect of game mechanics on the validity of inferences should be considered. For instance, choices in game design can affect correlations between in-game behavior and player ability (Kim & Shute, 2015). Moreover, game mechanics exist that are likely to introduce construct-irrelevant variance when used in high-stakes performance assessments. To illustrate, when direct feedback about performance (e.g., points, lives, feedback messages) is given to players, at least part of the variance in test scores would be explained by the type and amount of feedback a candidate has received.

Establishing design principles for SGs for high-stakes performance assessment is important for several reasons. First, such an overview allows future developers such assessments to make more informed choices regarding game design. Second, combining and organizing the insights gained from the available empirical evidence advances the knowledge framework around the implementation of high-stakes performance assessment through games. Reviews on the use of games exist for learning (e.g., Boyle et al., 2016; Connolly et al., 2012; Young et al., 2012) or are targeted at specific professional domains (e.g., Gao et al., 2019; Gorbanev et al., 2018; Graafland et al., 2012; Wang et al., 2016). Nevertheless, a research gap remains as there is no knowledge of a systematic literature review that addresses the high-stakes performance assessment of professional competencies. To this end, this study begins with identifying the available literature on SGs targeted at professional competencies; then extracts the implemented game mechanics that could help to overcome the validity threats associated with high-stakes performance assessment; and finally synthesizes game design principles for game-based performance assessment in high-stakes contexts.

The scope of the current review is limited to professional competencies specifically catered to a vocation (e.g., construction hazard recognition). More generic professional competencies (e.g., programming) are not taken into consideration, as the context in which they are used can also fall outside of secondary vocational and higher education. Additionally, there is a growing body of literature that recognizes the potential of in-game behavior as a source of information about ability level in the context of game-based learning (e.g., Chen et al., 2020; Kim & Shute, 2015; Shute et al., 2009; Wang et al., 2015; Westera et al., 2014). As the relationship between in-game behavior and candidate ability is of equal importance in assessment, the scope of the current review includes SGs that focus not only on assessment, but also teaching and training of professional competencies.

Method

The following section describes the procedure followed in conducting the current systematic literature review. First, a description of the inclusion criteria and search terms is given. This is followed by a description of the selection process and data extraction, together with an evaluation of the objectivity of the inclusion and quality criteria. Then, the search and selection results are presented, where two further categorizations of included studies operationalized: the type of competency and the how a successful SG is defined.

Procedure

Following the guidelines described in Systematic Reviews in the Social Sciences (Petticrew & Roberts, 2005), the protocol below gives a description and the rationale behind the review along with a description of how different studies were identified, analyzed, and synthesized.

Databases and search terms

The databases that include most publications from the field of educational measurement (Education Resources Information Center (ERIC), PsycInfo, Scopus, and Web of Science) were consulted for the literature search using the following search terms:

Serious game: (serious gam* or game-based assess* or game-based learn* or game-based train*) and
Quality measure: (perform* or valid* or effect* or affect*)

Inclusion criteria and selection process

The initial search results were narrowed down by selecting only publications that were published in English and in a scientific, peer-reviewed journal. To be included, studies were required to report on the empirical research results of a study that (1) focused on a digital SG used for teaching, training, or assessment of one or more professional competencies specific to a work setting, (2) was conducted in secondary vocational education, higher education or vocational settings, and (3) included a measure to assess the dependent variable related to the quality of the SG. Studies were excluded when the focus was on simulations; while they have an overlap** role in the acquisition of professional competencies to SGs, these modalities represent distinct types of digital environments.

All results from the databases were exported to Endnote X9 (The EndNote Team, 2013) for screening. The selection process was conducted in three rounds. First, duplicates, and alternative document types (e.g., editorials, conference proceedings, letters) were removed. Then, the publications were screened based on the titles and abstracts; publications were removed when the title or abstract mentioned features of the study mutually exclusive with the inclusion criteria (e.g., primary school, rehabilitation, systematic literature review). Second, titles and abstracts of the remaining results were screened again. When the title or abstract lacked information, the full article was inspected. To illustrate, some titles and abstracts did not mention the target population, or whether the game was digital, or whether the professional competency was specific to a work setting. Finally, full-text articles were screened for full compliance with the inclusion criteria. Data was extracted from those publications.

The objectivity of the inclusion criteria was determined by blinded double classification on two occasions. The first occasion, after the removal of duplicates and alternative document types, 30 randomly selected publications were independently double-classified by an expert in the field of educational measurement based on the title and abstract. An agreement rate of 93% with a Cohen’s Kappa coefficient of .81 translated to a near perfect inter-rater reliability (Landis & Koch, 1977). On the second occasion, a random selection of 32 publications considered for data extraction were blindly double-classified based on the full-text by a master student in educational measurement which resulted in an agreement rate of 97% was with a near perfect Cohen’s Kappa coefficient (.94; Landis & Koch, 1977).

To assess the comprehensiveness of the systematic review and identify additional relevant studies, snowballing was conducted by backward and forward reference searching in Web of Science. For publications not available on Web of Science, snowballing was done in Scopus.

Data extraction

For the publications included, data was extracted systematically by means of a data extraction form (Supplementary Information SI1). The data extraction form includes: (1) general information, (2) details on the professional competency and research design, (3) serious game (SG) specifics and (4) a quality checklist.

The quality checklist contains 12 closed questions with three response options: the criterion is met (1), the criterion is met partly (.5), and the criterion is not met (0). Studies that scored 7 or below were considered to be of poor quality and were excluded. Studies that scored between 7.5 and 9.5 were considered to be of medium quality, while studies with scores 10 or above were considered to be of good quality (denoted with an asterisk in the data selection table; Supplementary Information SI2). These categories were determined by piloting the study quality checklist on two publications that were included, based on the inclusion criteria: one that was considered to be of a poor quality and one that was considered to be of good quality. The scores obtained by those studies were set as the lower and upper threshold, respectively.

As this systematic literature review is focused on the extraction of game mechanics to inform game design principles, all articles included in the review needed to obtain a score of at least .5 on the criteria that the game is discussed in enough detail. When publications explicitly refer to external sources for additional information, information from those sources were included in the data extraction form as well.

Blinded double coding to determine the reliability of the quality criteria for inclusion was done by the same raters described above. 24 randomly selected publications from the final review were included, with a varying overlap between three raters. The assigned scores were translated to the corresponding class (i.e., poor, medium, and good) to calculate the agreement rate. The rates ranged between 82 and 93%, which correspond to Cohen’s Kappa coefficients between substantial and near perfect (.66–.88; Landis & Koch, 1977; Table 1).

Table 1 Results of reliability assessment of quality criteria by blinded double coding

Full size table

Search and selection results

In the PRISMA flow diagram of the publication selection process (Fig. 1; Moher et al., 2009), the two rounds in which titles and abstracts were screened for eligibility are combined. The databases were consulted on the 21st of December 2020 and yielded a total of 6,128 publications. After the removal of duplicates, 3,160 publications were left. On the basis of the inclusion criteria, another 2,981 publications were excluded from the review. In total, data was extracted from 179 publications. During the examination of the full-text articles, 129 studies were excluded due to insufficient quality (n = 42), lack of a detailed game description (n = 6), unavailability of the article (n = 5), not classifying the application as a game (n = 10) and an overall mismatch with the inclusion criteria (n = 66). In total, 50 publications were included. Snowballing was conducted in November of 2021 and resulted in the inclusion of six additional studies. In total, 56 publications were included in the final review.

Categorization of selected studies

Competency types

Professional competencies are acquired and assessed in different ways. Given the variety of professional competencies, there is no universal game design that is likely to be beneficial across the board (Wouters et al., 2009). Other researchers (e.g., Young et al., 2012) even suggest that game design principles should not be generalized across games, contexts or competencies. While more content-related game design principles likely need to be defined per context, this review is conducted with the idea that generic game design principles exist that can be successfully used in multiple contexts. In that sense, the aim is to provide a starting point from where more context-specific SGs can be designed, for example through the use of ECD.

The review is organized according to the type of professional competency that is evaluated rather than the content of the SG under investigation, as this provides an idea of what researchers expect to train or assess within the SG. Different distinctions between competencies can be made. For example, Wouters et al. (2009) distinguish between cognitive, motor, affective, and communicative competencies. Moreover, Harteveld (2011) distinguishes between knowledge, skills, and attitudes. These taxonomies served as a basis to inductively categorize the targeted professional competencies into knowledge, motor skills, and cognitive skills.

The knowledge category includes studies that focus on for instance declarative knowledge (i.e., fact-based) or procedural knowledge (i.e., how to do something). For instance, the procedural steps involved in cardiopulmonary resuscitation (CPR). The motor skills category refers to motor behaviors (i.e., movements). For CPR, an example would be compression depth. The cognitive skills category encompasses skills such as reasoning, planning, and decision making. For example, studies that focus on the recognition of situations that require CPR.

Successful SGs

The scope of this systematic literature review is limited to SGs that are shown to be successful in teaching, training, or the assessment of professional competencies. As research methodologies differ between studies, there is a need to define what characterizes a successful SG. When SGs were used in teaching or training, it was deemed successful when a significant improvement in the targeted professional competency was found (e.g., through an external validated measure of the competency). Some studies compared an active control group and an experimental group that additionally received an SG (e.g., Boada et al., 2015; Dankbaar et al., 2016; Graafland et al., 2017; see Supplementary Information SI2 for a full account): an SG was not deemed successful in the current results when such two groups showed comparable results. When SGs were used for assessment, it was deemed successful when (1) research results showed a significant relationship between the SG and a validated measure of the targeted competency, or (2) the SG was shown to accurately distinguish between different competency levels.

Results

The studies included in the review are discussed in two ways. First, descriptives of the included studies are given in terms of the degree to which games were successful in teaching, training, or assessment of professional competencies, the professional domains, and the competency types. Then, the game mechanics associated with the potential solutions to the validity threats in traditional performance assessment are presented.

Descriptives of the included studies

The final review includes 56 studies, published between 2006 and 2020 (consult Supplementary Information SI2 for a more detailed overview). No noteworthy differences were found between the SGs that aimed to teach, train, and assess professional competencies. Therefore, the results for the SGs included in the review are presented collectively.

Serious games with successful results

Divided over the type of professional competency evaluated, 84%, 83%, and 100% reported research results showing the SG was successful for cognitive skills, knowledge, and motor skills respectively (Table 2). Of the studies included in the systematic review, three studies found mixed effects of the SG under investigation between competency types (i.e., Luu et al., 2020; Phungoen et al., 2020; Tan et al., 2017).

Table 2 The proportion of studies that reported on serious games successful in teaching, training, or assessing professional competencies per competency type

Full size table

Professional domains and competency types

The studies included in the review can be divided over seven professional domains (Table 3). These are further separated into professional competencies (see Supplementary Information SI2 for a full account). Examples include history taking (Alyami et al., 2019), crisis management (Steinrücke et al., 2020) and cultural understanding (Brown et al., 2018). Furthermore, the studies included in the review can be divided into three competency types: cognitive skills (n = 21), knowledge (n = 31), and motor skills (n = 4). An important note is that some studies evaluate the SG on more than one competency type, thus the sum of these categories is greater than the total number of studies included.

Table 3 Studies included in the review divided over type of competency and professional domain

Full size table

Game mechanics

The following section discusses the inclusion of game mechanics—all design choices within the game—for the SGs discussed in the studies included in the review. Following the aim of the current paper, the game mechanics discussed are selected for having the potential to (1) mediate the validity threats associated with traditional performance assessments, and (2) be appropriate for implementing in a game-based performance assessment.

Authenticity

Authenticity in the SGs is divided into two dimensions: authenticity of the physical context and task. First, an example of a physical context that was not representative of the real working environment was found for all three competencies (Table 4). Regarding the SGs targeted at cognitive skills, this was the case for Effic’ Asthme (Fonteneau et al., 2020). In this SG, the target population—medical students—would normally carry out pediatric asthma exacerbation in a hospital setting. The game environment used is, however, the virtual bedroom of a child. Regarding the SGs targeted at knowledge, Alyami et al. (2019) implemented the game Metaphoria to teach history taking content to medical students. Here, the game environment is inside a pyramid within a fantasy world. The final SG using a game environment that does not resemble the real working environment within the motor skill competency type studied by Jalink et al. (2014). In this SG, laparoscopic skills are trained by having players perform tasks in an underground mining environment.

Table 4 The degree of authenticity of the physical context

Full size table

Second, of the studies for which task authenticity could be determined, all but four included an authentic task for the professional competency targeted (Table 5). Examples of a task that was not authentic were found for all three competency types. Two SGs that targeted cognitive skills did not include an authentic task (Brown et al., 2018; Chee et al., 2019) as a result of implementing role reversals. Within these SGs, the players played in a reversed role fashion, and thus the task was not authentic for the task in the real working environment. One SG targeting knowledge did not include an authentic task (Alyami et al., 2019). In Metaphoria, the task for players is to interpret visual metaphors in relation to symptoms, whereas the target professional competency was history taking content. Finally, the SG studied by Drummond et al. (2017), targeting motor skills, the professional competency under investigation was not represented authentically within the game as the navigation was through point-and-click.

Table 5 The degree of task authenticity in the serious game

Full size table

Unobtrusive data collection

For all three competency types, studies were found that use in-game data to make inferences about player ability (Table 6). While other studies did mention the collection of in-game behaviors, the results were limited to those that assessed the appropriateness of using the data in the assessment of competencies.

Table 6 Unobtrusive data collection

Full size table

Different measures of in-game behaviors were found. First, 12 SGs determine competency by comparing player performance to some predetermined target, sometimes also translated to a score. In the game VERITAS (Veracity Education and Reactance Instruction through Technology and Applied Skills; Miller et al., 2019), for instance, players are assessed on whether they accurately assess whether the statement given by a character in the game is true or false. Second, seven SGs use time spent (i.e., completion time or playing time) as a measure of performance. For example, in the SG Wii Laparoscopy (Jalink et al., 2014), completion time is used to assess performance. This performance metric in the game showed a high correlation with performance on a validated measure for laparoscopic skills, but it should be noted that time penalties were included for mistakes made during the task. Finally, the use of log data was found in one SG targeted at cognitive skills (Steinrücke et al., 2020). In the Dilemma Game, in-game measures collected during gameplay were found to have promising relationships with competency levels.

Adaptivity

In SGs, the difficulty level can be adapted in two ways: independent of the actions of players or dependent on the actions of players (Table 7). Whereas SGs that varied in difficulty level were found for professional competencies related to both knowledge and motor skills, none were found for professional competencies related to cognitive skills. Three SGs were found that adjust difficulty level based on player actions; however, none of the SGs adjusts the difficulty level down based on player actions. Three studies evaluated SGs where difficulty level was varied independent of player actions. Regarding the SGs targeted at knowledge, players either received fixed assignments (Boada et al., 2015) or were able to set the difficulty level prior to gameplay (Taillandier & Adam, 2018). The SG studied by Asadipour et al. (2017), targeting motor skills, increased challenge by building up the flying speed during the game as well as random generation of coins, but this was independent of player ability. Two SGs targeted at knowledge did mention difficulty levels, but not how they were adjusted. The SG Metaphoria (Alyami et al., 2019) included three difficulty levels. The SG Sustainability Challenge (Dib & Adamo-Villani, 2014) became more challenging as players progress to higher levels, but it is not clear when or how this was done.

Table 7 Adaptivity incorporated within the serious games

Full size table

Test anxiety

As described earlier, games are able to provide a more enjoyable testing experience by providing an engaging environment with a high degree of autonomy. Therefore, the way game characteristics, feedback, rules, and choices—are expressed in the studies included in the review are discussed below. To avoid confusion with linearity of assessment, the expression freedom of gameplay to describe the interaction between rules and choices.

First, seven examples were found where players are given feedback unrelated to performance (Table 8). Some ways feedback was given included a dashboard (Perini et al., 2018), remaining resources (Calderón et al., 2018; Taillandier & Adam, 2018) remaining time (Calderón et al., 2018; Dankbaar et al., 2017a, 2017b; Mohan et al., 2014) or remaining tasks (Jalink et al., 2014).

Table 8 Feedback unrelated to performance given in the serious games

Full size table

Second, all studies included in the review but two include game mechanics to give some freedom of gameplay (Table 9). For cognitive skills and knowledge, game mechanics included the choice between multiple options (n = 14 for both), the inclusion of interactive elements (n = 8, for both) and the possibility for free exploration (n = 5 and n = 8, respectively). Two examples of customization were found: Dib and Adamo-Villani (2014) gave players the choice of avatar, whereas Alyami et al. (2019) allowed for a custom name. For the SGs that target motor skills, freedom of gameplay was given through control over the movements. For three out of four SGs in this category, special controllers were developed to give players authentic control over the movements in the game. This was not the case for Drummond et al. (2017), as their game did not explicitly train CPR; however, the researchers did assess its effect on motor skills.

Table 9 Freedom of gameplay

Full size table

Discussion

Included studies

The final review included 56 studies. Of these, many reported positive results. This suggests that SGs are often successful in teaching, training, or assessing professional competencies, but could also point to a publication bias of positive results. As similar reviews to the current one (e.g., Connolly et al., 2012; Randel et al., 1992; Vansickle, 1986; Wouters et al., 2009) draw on similar databases, it is difficult to establish what is true. Some studies found mixed results for different competency types, suggesting that different approaches are warranted. Therefore, game mechanics in SGs for different competency types are discussed separately.

The review included few studies on SGs targeting motor skills compared to those targeting cognitive skills and knowledge. The low number of SGs for motor skills could be due to the need for specialized equipment to create an SG targeting motor skills. For example, Wii Laparoscopy (Jalink et al., 2014) is played using controllers that are specifically designed for the game. Not only does it require an extra investment, it also affects the ease of large scale implementation. There is no indication that motor skills cannot be assessed through SGs: four out of five studies have shown positive effects, both in learning effectiveness and assessment accuracy. Despite this, the benefits may only outweigh the added costs in situations where it is unfeasible to perform the professional competency in the real working environment.

Authenticity

Focusing on game mechanics for the authenticity of the physical context and the task, the results indicate that SGs are able to provide both. It should be noted that, while SGs are able to simulate the physical context and task with high fidelity, authenticity remains a matter of perception (Gulikers et al., 2008). The review focused only on those SGs that were successful when compared to validated measures of the targeted professional competency. Since these measures are considered to be accurate proxies for workplace performance, the transfer to the real working environment is likely to have been made. For all three competency types, examples were found for SGs that did not include an authentic physical context or authentic task, while still mobilizing competencies of interest. Even though the number of SGs in these categories is quite small, it does indicate that it is possible to assess professional competencies without an authentic environment or task.

Unobtrusive data collection

The in-game measures most often used in the included SGs are those that indicate how well a player did in comparison to some standard or target. This suggests that SGs are able to elicit behavior in players that is dependent on their ability level in the target professional competency. Since the accuracy measures varied depending on the professional competency, an investigation is warranted to determine which in-game measures are indicative of ability per situation. Evidentiary frameworks such as the ECD framework can provide guidance in determining which data could be used to make inferences about candidate ability. Despite the promising results, more research should be done on the informational value of log data before claims can be made.

Adaptivity

Some examples of studies were found where adaptivity was implemented was adaptive. In particular, some promising relationships between in-game behaviors and ability level were found. In traditional (high-stakes) testing, adaptivity has already been implemented successfully (Martin & Lazendic, 2018; Straetmans & Eggen, 2007). Although there are professional competencies for which ability levels cannot be differentiated, you are either able to do it or not. For such competencies, adaptivity does not have an added benefit. In contrast, for professional competencies where it is possible to differentiate ability levels, adaptivity should be considered.

Feedback

Considering the appropriateness of game mechanics for high-stakes assessment, feedback considered in the current review was limited to progress feedback. This adds a fourth type of feedback to the feedback already recognized for assessment: knowledge of correct response, elaborated feedback, and delayed knowledge of results (van der Kleij et al., 2012). Although the small number of SGs that incorporated progress feedback affect the generalizability of the finding, it does indicate that feedback about progress may be the most appropriate solution.

Freedom of gameplay

A variety of game mechanics implemented in the SGs included in the review fulfill freedom of gameplay. While some studies did not elaborate on the choices given in the game, common ways players are given freedom are through choice options, interactive elements, and freedom to explore. These game mechanics were found in various studies, which raises the possibility that these findings can be generalized to new SGs targeted at assessing professional competencies. Other game mechanics related to freedom of gameplay were also found in a smaller capacity. Thus, further research should shed light on their generalizability. Moreover, the freedom of gameplay provided to the player plays a substantial role in sha** overall player experience and behavior (Kim & Shute, 2015; Kirginas & Gouscos, 2017). Therefore, future research should shed further light on whether different game mechanics influence players in different ways.

Limitations

Although the current systematic literature review provides a useful overview of the game design principles for game-based performance assessment of professional competencies, some limitations are identified.

First, the review covered a substantial amount of studies from the healthcare domain. This may be because the medical field consists of many higher order standardized tasks which may be particularly suitable to SGs. Although the large contribution of studies in the healthcare domain could limit the generalizability to other domains. The results of this systematic review were quite uniform; no indication was found that SGs in healthcare employed different game mechanics were employed. Moreover, there is a growing popularity of SGs in healthcare education (Wang et al., 2016), resulting in a higher number of studies that were available compared to other professional domains. It is advisable to regard the current results as a starting point for game design principles game-based performance assessment. Further research into the generalizability of game design principles across professional domains is warranted.

The second limitation is true for all systematic literature reviews: it is a cross section of the literature and may not present the full picture. The inclusion of studies is dependent on what is available in the search databases, what is accessible, and what keywords are included in the literature. Likely due to this limitation, only studies published from 2006 are included in the review, while the use of SGs dates back much further (Randel et al., 1992; Vansickle, 1986). To minimize the omission of relevant literature, snowballing was conducted on the final selection of studies. This method allowed for including related and potentially relevant studies. In total, six additional publications were included through this method out of the 2,370 considered.

After snowballing, an assessment of why these additionally included studies were not found through the search results resulted in various insights. First, three studies used the terms (educational) video game in their publication on SGs (Duque et al., 2008; Jalink et al., 2014; Mohan et al., 2017). Including this term in the original search would have resulted in too many hits outside of the scope of the current review. Second, Moreno-Ger et al. (2010) used the term simulation to describe the application, but refer to the application as game-like. As simulations fall outside of the scope of the current review, the absence of this study in the initial search cannot be attributed to a gap in the search terms, Third, the publication from Blanié et al. (2020) was probably not found due to a mismatch in search terms related to the quality measure. Additional search terms such as impact or improve could have been included. As only one additional study was found that presented this issue, it is unlikely to have had a great effect on the outcome of the review. Finally, it is unclear why the study by Fonteneau et al. (2020) was not found through the initial search, as it showed a match with the search terms used in the current review. Perhaps, this misclassification can be ascribed to the search databases queried.

Finally, many of the studies included in the review compare SGs to other, non-digital or digital, alternatives in terms of learning. These types of studies often include many confounding variables (Cook, 2005). This is because a comparison is done between interventions that are different in more ways than one. These differences affect the results in different ways: positive, negative, or even through an interaction with other features.

Suggestions for future research

Besides providing interesting insights, the current review also has implications for research. First, the review identified SGs successful in teaching, training, or assessment that did not authentically represent the physical context or task. Although in this review, too few examples were found to generalize the findings. Second, while some studies were found in which the SGs difficulty was adaptive, more studies should be conducted on the implementation of adaptivity within SGs. In particular, how in-game behavior to match the difficulty level to the ability level of the candidates. Third, Fantasy is included in many games (Charsky, 2010; Prensky, 2001) and is regarded as one of the reasons for playing them (Boyle et al., 2016). By including fantasy elements in game-based performance assessments, assessment can become even more engaging and enjoyable and candidates can become even less aware of being assessed. For learning, it has been suggested that fantasy should be closely connected to the learning content (Gunter et al., 2008; Malone, 1981), but further research might explore whether this holds for SGs used for the (high-stakes) assessment of professional competencies. Furthermore, while fantasy elements may blur the direct link between the SG and the professional practice, in-game behavior may still have a clear relationship with professional competencies (Kim & Shute, 2015; Simons et al., 2021). More research into the effect of authenticity on the measurement validity of SGs in assessing professional competencies is warranted.

Implications for practice

Based on the results of the review, four recommendations can be made for practice. First, regardless of the competency type: design the SG in such a way that both the task and the context are authentic. The results have shown that SGs are able to provide a representation of the physical context and task, authentic to the professional competency under investigation. Thus, in situations where the physical context or assessment task are difficult to represent in a traditional performance assessments, SGs can provide a solution. At the same time, implementing non-authentic (fantasy) contexts and tasks should be investigated further before being implemented in high-stakes performance assessment.

Second, ensure that in-game behavior within the SG is collected. This review has synthesized additional evidence for the potential of in-game behavior as a source of information about ability level. That being said, the in-game behavior that can be used to inform ability level is dependent on both the professional competency of interest and game design. While no generalized design principles regarding the collection of gameplay data can be given, evidentiary frameworks (e.g., ECD) can be used to determine which in-game behavior can be used to infer ability level. This is ultimately connected to implementation of adaptivity. While a limited number of SGs were found that implemented adaptivity, the potential to unobtrusively data about ability level underscores a missed opportunity for the wider implementation of adaptivity in SGs. Taken together with the successful implementation of adaptive testing in traditional high-stakes assessments (Martin & Lazendic, 2018; Straetmans & Eggen, 2007), a third recommendation would be to implement adaptivity where appropriate.

Finally, this review gives an overview of the game mechanics for high-stakes game-based performance assessment with little risk of affecting validity. To provide freedom of gameplay for SGs targeted at cognitive skills and knowledge, include free exploration, interactive elements and providing options. For motor skills, giving control over movements is a, perhaps straightforward, game design principle. Furthermore, feedback in SGs for high-stakes performance assessments can be done through providing progress feedback, which is different from traditional types of feedback in education (van der Kleij et al., 2012) but has potential to satisfy feedback as a game mechanic. These recommendations, intended for game developers, may prove useful in designing future SGs for the (high-stakes) assessment of professional competencies.

References

In-text citations

American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (2014). Standards for educational and psychological testing. American Educational Research Association.
Google Scholar
Baartman, L., & Gulikers, J. (2017). Assessment in Dutch vocational education: Overview and tensions of the past 15 years. In E. De Bruijn, S. Billet, & J. Onstenk (Eds.), Enhancing teaching and learning in the Dutch vocational education system: Reforms enacted (pp. 245–266). Springer.
Chapter Google Scholar
Bell, B. S., Kanar, A. M., & Kozlowski, S. W. J. (2008). Current issues and future directions in simulation-based training in North America. The International Journal of Human Resource Management, 19(8), 1416–1434. https://doi.org/10.1080/09585190802200173
Article Google Scholar
Boyle, E., Hainey, T., Connolly, T. M., Gray, G., Earp, J., Ott, M., Lim, T., Ninaus, M., Ribeiro, C., & Pereira, J. (2016). An update to the systematic literature review of empirical evidence on the impacts and outcomes of computer games and serious games. Computers & Education, 94, 178–192. https://doi.org/10.1016/j.compedu.2015.11.003
Article Google Scholar
Boyle, E. A., Connolly, T. M., Hainey, T., & Boyle, J. M. (2012). Engagement in digital entertainment games: A systematic review. Computers in Human Behavior, 28(3), 771–780. https://doi.org/10.1016/j.chb.2011.11.020
Article Google Scholar
Burr, S., Gale, T., Kisielewska, J., Millin, P., Pêgo, J., Pinter, G., Robinson, I., & Zahra, D. (2023). A narrative review of adaptive testing and its application to medical education. MedEdPublish. https://doi.org/10.12688/mep.19844.1
Article Google Scholar
Charsky, D. (2010). From edutainment to serious games: A change in the use of game characteristics. Games and Culture, 5(2), 177–198. https://doi.org/10.1177/1555412009354727
Article Google Scholar
Chen, F., Cui, Y., & Chu, M.-W. (2020). Utilizing game analytics to inform and validate digital game-based assessment with evidence-centered game design: A case study. International Journal of Artificial Intelligence in Education, 30(3), 481–503. https://doi.org/10.1007/s40593-020-00202-6
Article Google Scholar
Connolly, T. M., Boyle, E. A., MacArthur, E., Hainey, T., & Boyle, J. M. (2012). A systematic literature review of empirical evidence on computer games and serious games. Computers & Education, 59(2), 661–686. https://doi.org/10.1016/j.compedu.2012.03.004
Article Google Scholar
Cook, D. A. (2005). The research we still are not doing: An agenda for the study of computer-based learning. Academic Medicine, 80(6), 541–548. https://doi.org/10.1097/00001888-200506000-00005
Article Google Scholar
Davey, T. (2011). A guide to computer adaptive testing systems. C. o. C. S. S. Officers.
Google Scholar
Dede, C. (2009). Immersive interfaces for engagement and learning. Science, 323(5910), 66–69. https://doi.org/10.1126/science.1167311
Article Google Scholar
Dörner, R., Göbel, S., Effelsberg, W., & Wiemeyer, J. (2016). Introduction. In R. Dörner, S. Göbel, W. Effelsberg, & J. Wiemeyer (Eds.), Serious games: Foundations, concepts and practice (pp. 1–34). Springer.
Chapter Google Scholar
dos Santos, A. D., & Fraternali, P. (2016). A Comparison of methodological frameworks for digital learning game design. Lecture notes in computer science games and learning alliance. Springer.
Google Scholar
Gao, Y., Gonzalez, V. A., & Yiu, T. W. (2019). The effectiveness of traditional tools and computer-aided technologies for health and safety training in the in the construction sector: A systematic review. Computers & Education, 138, 101–115. https://doi.org/10.1016/j.compedu.2019.05.003
Article Google Scholar
Gorbanev, I., Agudelo-Londoño, S., González, R. A., Cortes, A., Pomares, A., Delgadillo, V., Yepes, F. J., & Muñoz, Ó. (2018). A systematic review of serious games in medical education: Quality of evidence and pedagogical strategy. Medical Education Online, 23(1), Article 1438718. https://doi.org/10.1080/10872981.2018.1438718
Article Google Scholar
Graafland, M., Schraagen, J. M., & Schijven, M. P. (2012). Systematic review of serious games for medical education and surgical skills training. British Journal of Surgery, 99(10), 1322–1330. https://doi.org/10.1002/bjs.8819
Article Google Scholar
Gulikers, J. T. M., Bastiaens, T. J., & Kirschner, P. A. (2004). A five-dimensional framework for authentic assessment. Educational Technology Research and Development, 52(3), 67. https://doi.org/10.1007/BF02504676
Article Google Scholar
Gulikers, J. T. M., Bastiaens, T. J., Kirschner, P. A., & Kester, L. (2008). Authenticity is in the eye of the beholder: Student and teacher perceptions of assessment authenticity. Journal of Vocational Education and Training, 60(4), 401–412. https://doi.org/10.1080/13636820802591830
Article Google Scholar
Gunter, G. A., Kenny, R. F., & Vick, E. H. (2008). Taking educational games seriously: using the RETAIN model to design endogenous fantasy into standalone educational games. Educational Technology Research and Development, 56(5), 511–537. https://doi.org/10.1007/s11423-007-9073-2
Article Google Scholar
Harteveld, C. (2011). Foundations. Triadic Game design: balancing reality, meaning and play (pp. 31–93). Springer.
Chapter Google Scholar
Hattie, J., & Timperley, H. (2007). The power of feedback. Review of Educational Research, 77(1), 81–112. https://doi.org/10.3102/003465430298487
Article Google Scholar
Ifenthaler, D., Eseryel, D., & Ge, X. (2012). Assessment in game-based learning: Foundations, innovations, and perspectives. Springer.
Book Google Scholar
Jerrim, J. (2022). Test anxiety: Is it associated with performance in high-stakes examinations? Oxford Review of Education. https://doi.org/10.1080/03054985.2022.2079616
Article Google Scholar
Jones, M. G. (1998). Creating engagement in computer-based learning environments. https://www.yumpu.com/en/document/read/18776351/creating-engagement-in-computer-based-learning-environments
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Praeger Publishers.
Google Scholar
Kim, Y. J., & Shute, V. J. (2015). The interplay of game elements with psychometric qualities, learning, and enjoyment in game-based assessment. Computers & Education, 87, 340–356. https://doi.org/10.1016/j.compedu.2015.07.009
Article Google Scholar
Kirginas, S., & Gouscos, D. (2017). Exploring the impact of freeform gameplay on players’ experience: an experiment with maze games at varying levels of freedom of movement. International Journal of Serious Games. https://doi.org/10.17083/ijsg.v4i4.175
Article Google Scholar
Kirriemur, J., & McFarlane, A. (2004). Literature review in games and learning. Sage.
Google Scholar
Krathwohl, D. R. (2002). A revision of Bloom’s taxonomy: An overview. Theory Into Practice, 41(4), 212–218. https://doi.org/10.1207/s15430421tip4104_2
Article Google Scholar
Lameras, P., Arnab, S., Dunwell, I., Stewart, C., Clarke, S., & Petridis, P. (2017). Essential features of serious games design in higher education: Linking learning attributes to game mechanics. British Journal of Educational Technology, 48(4), 972–994. https://doi.org/10.1111/bjet.12467
Article Google Scholar
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.
Article Google Scholar
Lane, S., & Stone, C. A. (2006). Performance assessment. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 387–431). Praeger Publishers.
Google Scholar
Malone, T. W. (1981). Toward a theory of intrinsically motivating instruction. Cogntive Science, 4, 333–369. https://doi.org/10.1207/s15516709cog0504_2
Article Google Scholar
Malone, T. W., & Lepper, M. R. (1987). Making learning fun: A taxonomy of intrinsic motivations for learning. In R. E. Snow & M. J. Farr (Eds.), Aptitude, learning, and instruction: Conative and affective process analyses (pp. 223–253). Lawrence Erlbaum Associates, Inc.
Google Scholar
Martin, A. J., & Lazendic, G. (2018). Computer-adaptive testing: Implications for students’ achievement, motivation, engagement, and subjective test experience. Journal of Educational Psychology, 110, 27–45. https://doi.org/10.1037/edu0000205
Article Google Scholar
Mavridis, A., & Tsiatsos, T. (2017). Game-based assessment: Investigating the impact on test anxiety and exam performance. Journal of Computer Assisted Learning, 33(2), 137–150. https://doi.org/10.1111/jcal.12170
Article Google Scholar
Messick, S. (1994). Alternative modes of assessment, uniform standards of validity. ETS Research Report Series, 1994(2), i–22. https://doi.org/10.1002/j.2333-8504.1994.tb01634.x
Article Google Scholar
Michael, D., & Chen, S. (2006). Serious games: Games that educate, train, and inform. Muska & Lipman/Premier-Trade.
Google Scholar
Mislevy, R. J., & Riconscente, M. M. (2006). Evidence-centered assessment design. In S. M. Downing & T. M. Haladyna (Eds.), Handbook of test development (pp. 61–90). Lawrence Erlbaum Associates.
Google Scholar
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G., & The, P. G. (2009). Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. PLOS Medicine, 6(7), e1000097. https://doi.org/10.1371/journal.pmed.1000097
Article Google Scholar
Petticrew, M., & Roberts, H. (2005). Systematic reviews in the social sciences: A practical guide. Blackwell Publishing. https://doi.org/10.1002/9780470754887
Article Google Scholar
Prensky, M. (2001). Fun, play and games: What makes games engaging? In M. Prensky (Ed.), Digital game-based learning (pp. 16–47). McGraw-Hill.
Google Scholar
Randel, J. M., Morris, B. A., Wetzel, C. D., & Whitehill, B. V. (1992). The effectiveness of games for educational purposes: A review of recent research. Simulation & Gaming, 23(3), 261–276. https://doi.org/10.1177/1046878192233001
Article Google Scholar
Rouse, R. (2004). Game design: Theory and practice (2nd ed.). Jones and Bartlett Publishers, Inc.
Google Scholar
Schell, J. (2015). The art of game design: A book of lenses (2nd ed.). CRC Press.
Google Scholar
Schwartz, D. L., & Arena, D. (2013). Measuring what matters most: Choice-based assessment for the digital age. The MIT Press.
Book Google Scholar
Shaffer, D. W., & Gee, J. P. (2012). The right kind of GATE: Computer games and the future of assessment. In M. C. Mayrath, J. Clarke-Midura, & D. H. Robinson (Eds.), Technology-based assessments for 21st century skills: Theoretical and practical implications from modern research (pp. 211–228). Information Age Publishing.
Google Scholar
Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78(1), 153–189. https://doi.org/10.3102/0034654307313795
Article Google Scholar
Shute, V. J., & Ke, F. (2012). Games, learning, and assessment. In D. Ifenthaler, D. Eseryel, & X. Ge (Eds.), Assessment in game-based learning: foundations, innovations, and perspectives (pp. 43–58). Springer.
Chapter Google Scholar
Shute, V. J., & Rahimi, S. (2021). Stealth assessment of creativity in a physics video game. Computers in Human Behavior, 116, Article 106647. https://doi.org/10.1016/j.chb.2020.106647
Article Google Scholar
Shute, V. J., Ventura, M., Bauer, M., & Zapata-Rivera, D. (2009). Melding the power of serious games and embedded assessment to monitor and foster learning: Flow and grow. In U. Ritterfeld, M. J. Cody, & P. Vorderer (Eds.), Serious games: Mechanisms and effects (pp. 295–321). Routledge.
Google Scholar
Simons, A., Wohlgenannt, I., Weinmann, M., & Fleischer, S. (2021). Good gamers, good managers? A proof-of-concept study with Sid Meier’s Civilization. Review of Managerial Science, 15(4), 957–990. https://doi.org/10.1007/s11846-020-00378-0
Article Google Scholar
Stecher, B. (2010). Performance assessment in an era of standards-based educational accountability. Stanford University, Stanford Center for Opportunity Policy in Education.
Google Scholar
Straetmans, G. J. J. M., & Eggen, T. J. H. M. (2007). WISCAT-pabo: computergestuurd adaptief toetspakket rekenen. Onderwijsinnovatie, 2017(3), 17–27.
Google Scholar
Susi, T., Johannesson, J., & Backlund, P. (2007). Serious game—An overview [IKI Technical Reports]. https://www.diva-portal.org/smash/get/diva2:2416/FULLTEXT01.pdf
The EndNote Team. (2013). EndNote (Version X9) Clarivate. https://endnote.com/
van der Kleij, F. M., Eggen, T. J. H. M., Timmers, C. F., & Veldkamp, B. P. (2012). Effects of feedback in a computer-based assessment for learning. Computers & Education, 58(1), 263–272. https://doi.org/10.1016/j.compedu.2011.07.020
Article Google Scholar
Van Eck, R. (2006). Digital game-based learning: It’s not just the digital natives who are restless. Educause Review, 41(2), 16–30.
Google Scholar
Vansickle, R. L. (1986). A quantitative review of research on instructional simulation gaming: A twenty-year perspective. Theory & Research in Social Education, 14(3), 245–264. https://doi.org/10.1080/00933104.1986.10505525
Article Google Scholar
von der Embse, N., Jester, D., Roy, D., & Post, J. (2018). Test anxiety effects, predictors, and correlates: A 30-year meta-analytic review. Journal of Affective Disorders, 483–493, 132–156. https://doi.org/10.1016/j.jad.2017.11.048
Article Google Scholar
von der Embse, N., & Witmer, S. E. (2014). High-stakes accountability: Student anxiety and large-scale testing. Journal of Applied School Psychology, 30(2), 132–156. https://doi.org/10.1080/15377903.2014.888529
Article Google Scholar
Wainer, H. (2000). Introduction and history. In H. Wainer, N. J. Dorans, R. Eignor, B. F. Green, R. J. Mislevy, L. Steinberg, & D. Thissen (Eds.), Computerized adaptive testing: A primer (2nd ed., pp. 1–21). Lawrence Erlbaum Associates Inc.
Chapter Google Scholar
Wang, L., Shute, V., & Moore, G. R. (2015). Lessons learned and best practices of stealth assessment. International Journal of Gaming and Computer-Mediated Simulations, 7(4), 66–87. https://doi.org/10.4018/ijgcms.2015100104
Article Google Scholar
Wang, R., DeMaria, S., Jr., Goldberg, A., & Katz, D. (2016). A systematic review of serious games in training health care professionals. Simulation in Healthcare: The Journal of the Society for Simulation in Healthcare, 11(1), 41–51. https://doi.org/10.1097/sih.0000000000000118
Article Google Scholar
Westera, W., Nadolski, R., & Hummel, H. (2014). Serious gaming analytics—What students’ log files tell us about gaming and learning. International Journal of Serious Games, 1(2), 35–50. https://doi.org/10.17083/ijsg.v1i2.9
Article Google Scholar
Williams-Bell, F. M., Kapralos, B., Hogue, A., Murphy, B. M., & Weckman, E. J. (2015). Using serious games and virtual simulation for training in the fire service: A review. Fire Technology, 51, 553–584. https://doi.org/10.1007/s10694-014-0398-1
Article Google Scholar
Wools, S., Eggen, T., & Sanders, P. (2010). Evaluation of validity and validation by means of the argument-based approach. Cadmo. https://doi.org/10.3280/cad2010-001007
Article Google Scholar
Wouters, P., van der Spek, E. D., & van Oostendorp, H. (2009). Current practices in serious game research: A review from a learning outcomes perspective. In T. Connolly, M. Stansfield, & L. Boyle (Eds.), Games-based learning advancements for multi-sensory human computer interfaces: Techniques and effective practices (pp. 232–250). IGI Global.
Chapter Google Scholar
Young, M. F., Slota, S., Cutter, A. B., Jalette, G., Mullin, G., Lai, B., Simeoni, Z., Tran, M., & Yukhymenko, M. (2012). Our princess is in another castle: A review of trends in serious gaming for education. Review of Educational Research, 82(1), 61–89. https://doi.org/10.3102/0034654312436980
Article Google Scholar

Studies included in the systematic review

Adams, A., Hart, J., Iacovides, I., Beavers, S., Oliveira, M., & Magroudi, M. (2019). Co-created evaluation: Identifying how games support police learning. International Journal of Human-Computer Studies, 132, 34–44. https://doi.org/10.1016/j.ijhcs.2019.03.009
Article Google Scholar
Aksoy, E. (2019). Comparing the effects on learning outcomes of tablet-based and virtual reality–based serious gaming modules for basic life support training: Randomized trial. JMIR Serious Games, 7(2), Article e13442. https://doi.org/10.2196/13442
Article Google Scholar
Albert, A., Hallowell, M. R., Kleiner, B., Chen, A., & Golparvar-Fard, M. (2014). Enhancing construction hazard recognition with high-fidelity augmented virtuality. Journal of Construction Engineering and Management, 140(7), Article 04014024. https://doi.org/10.1061/(ASCE)CO.1943-7862.0000860
Article Google Scholar
Alyami, H., Alawami, M., Lyndon, M., Alyami, M., Coomarasamy, C., Henning, M., Hill, A., & Sundram, F. (2019). Impact of using a 3D visual metaphor serious game to teach history-taking content to medical students: Longitudinal mixed methods pilot study. JMIR Serious Games, 7(3), Article e13748. https://doi.org/10.2196/13748
Article Google Scholar
Ameerbakhsh, O., Maharaj, S., Hussain, A., & McAdam, B. (2019). A comparison of two methods of using a serious game for teaching marine ecology in a university setting. International Journal of Human-Computer Studies, 127, 181–189. https://doi.org/10.1016/j.ijhcs.2018.07.004
Article Google Scholar
Asadipour, A., Debattista, K., & Chalmers, A. (2017). Visuohaptic augmented feedback for enhancing motor skill acquisition. The Visual Computer, 33(4), 401–411. https://doi.org/10.1007/s00371-016-1275-3
Article Google Scholar
Barab, S. A., Scott, B., Siyahhan, S., Goldstone, R., Ingram-Goble, A., Zuiker, S. J., & Warren, S. (2009). Transformational play as a curriculur scaffold: Using videogames to support science education. Journal of Science Education and Technology, 18(4), 305–320. https://doi.org/10.1007/s10956-009-9171-5
Article Google Scholar
Benda, N. C., Kellogg, K. M., Hoffman, D. J., Fairbanks, R. J., & Auguste, T. (2020). Lessons learned from an evaluation of serious gaming as an alternative to mannequin-based simulation technology: Randomized controlled trial. JMIR Serious Games, 8(3), Article e21123. https://doi.org/10.2196/21123
Article Google Scholar
Bindoff, I., Ling, T., Bereznicki, L., Westbury, J., Chalmers, L., Peterson, G., & Ollington, R. (2014). A computer simulation of community pharmacy practice for educational use. American Journal of Pharmaceutical Education, 78(9), Article 168. https://doi.org/10.5688/ajpe789168
Article Google Scholar
Binsubaih, A., Maddock, S., & Romano, D. (2006). A serious game for traffic accident investigators. Interactive Technology and Smart Education, 3(4), 329–346. https://doi.org/10.1108/17415650680000071
Article Google Scholar
Blanié, A., Amorim, M. A., & Benhamou, D. (2020). Comparative value of a simulation by gaming and a traditional teaching method to improve clinical reasoning skills necessary to detect patient deterioration: A randomized study in nursing students. BMC Medical Education, 20(1), Article 53. https://doi.org/10.1186/s12909-020-1939-6
Article Google Scholar
Boada, I., Rodriguez-Benitez, A., Garcia-Gonzalez, J. M., Olivet, J., Carreras, V., & Sbert, M. (2015). Using a serious game to complement CPR instruction in a nurse faculty. Computer Methods and Programs in Biomedicine, 122(2), 282–291. https://doi.org/10.1016/j.cmpb.2015.08.006
Article Google Scholar
Brown, D. E., Moenning, A., Guerlain, S., Turnbull, B., Abel, D., & Meyer, C. (2018). Design and evaluation of an avatar-based cultural training system. The Journal of Defense Modeling and Simulation, 16(2), 159–174. https://doi.org/10.1177/1548512918807593
Article Google Scholar
Buttussi, F., Pellis, T., Cabas Vidani, A., Pausler, D., Carchietti, E., & Chittaro, L. (2013). Evaluation of a 3D serious game for advanced life support retraining. International Journal Medical Informatics, 82(9), 798–809. https://doi.org/10.1016/j.ijmedinf.2013.05.007
Article Google Scholar
Calderón, A., Ruiz, M., & O’Connor, R. V. (2018). A serious game to support the ISO 21500 standard education in the context of software project management. Computer Standards & Interfaces, 60, 80–92. https://doi.org/10.1016/j.csi.2018.04.012
Article Google Scholar
Chan, W. Y., Qin, J., Chui, Y. P., & Heng, P. A. (2012). A serious game for learning ultrasound-guided needle placement skills. IEEE Transactions on Information Technology in Biomedicine, 16(6), 1032–1042. https://doi.org/10.1109/titb.2012.2204406
Article Google Scholar
Chang, C., Kao, C., Hwang, G., & Lin, F. (2020). From experiencing to critical thinking: A contextual game-based learning approach to improving nursing students’ performance in electrocardiogram training. Educational Technology Research and Development, 68(3), 1225–1245. https://doi.org/10.1007/s11423-019-09723-x
Article Google Scholar
Chee, E. J. M., Prabhakaran, L., Neo, L. P., Carpio, G. A. C., Tan, A. J. Q., Lee, C. C. S., & Liaw, S. Y. (2019). Play and learn with patients—Designing and evaluating a serious game to enhance nurses’ inhaler teaching techniques: A randomized controlled trial. Games for Health Journal, 8(3), 187–194. https://doi.org/10.1089/g4h.2018.0073
Article Google Scholar
Chon, S., Timmermann, F., Dratsch, T., Schuelper, N., Plum, P., Berlth, F., Datta, R. R., Schramm, C., Haneder, S., Späth, M. R., Dübbers, M., Kleinert, J., Raupach, T., Bruns, C., & Kleinert, R. (2019). Serious games in surgical medical education: A virtual emergency department as a tool for teaching clinical reasoning to medical students. JMIR Serious Games, 7(1), Article e13028. https://doi.org/10.2196/13028
Article Google Scholar
Cook, N. F., McAloon, T., O’Neill, P., & Beggs, R. (2012). Impact of a web based interactive simulation game (PULSE) on nursing students’ experience and performance in life support training—A pilot study. Nurse Education Today, 32(6), 714–720. https://doi.org/10.1016/j.nedt.2011.09.013
Article Google Scholar
Cowley, B., Fantato, M., Jennett, C., Ruskov, M., & Ravaja, N. (2014). Learning when serious: Psychophysiological evaluation of a technology-enhanced learning game. Journal of Educational Technology & Society, 17(1), 3–16.
Google Scholar
Creutzfeldt, J., Hedman, L., & Felländer-Tsai, L. (2012). Effects of pre-training using serious game technology on CPR performance—An exploratory quasi-experimental transfer study. Scandinavian Journal of Trauma, Resuscitation and Emergency Medicine, 20(1), Article 79. https://doi.org/10.1186/1757-7241-20-79
Article Google Scholar
Creutzfeldt, J., Hedman, L., Medin, C., Heinrichs, W. L., & Felländer-Tsai, L. (2010). Exploring virtual worlds for scenario-based repeated team training of cardiopulmonary resuscitation in medical students. Journal of Medical Internet Research, 12(3), Article e38. https://doi.org/10.2196/jmir.1426
Article Google Scholar
Dankbaar, M. E. W., Alsma, J., Jansen, E. E. H., van Merrienboer, J. J. G., van Saase, J. L. C. M., & Schuit, S. C. E. (2016). An experimental study on the effects of a simulation game on students’ clinical cognitive skills and motivation. Advances in Health Sciences Education, 21(3), 505–521. https://doi.org/10.1007/s10459-015-9641-x
Article Google Scholar
Dankbaar, M. E. W., Bakhuys Roozeboom, M., Oprins, E. A. P. B., Rutten, F., van Merrienboer, J. J. G., van Saase, J. L. C. M., & Schuit, S. C. E. (2017a). Preparing residents effectively in emergency skills training with a serious game. Simulation in Healthcare, 12(1), 9–16. https://doi.org/10.1097/sih.0000000000000194
Article Google Scholar
Dankbaar, M. E. W., Richters, O., Kalkman, C. J., Prins, G., ten Cate, O. T. J., van Merrienboer, J. J. G., & Schuit, S. C. E. (2017). Comparative effectiveness of a serious game and an e-module to support patient safety knowledge and awareness. BMC Medical Education, 17(1), Article 30. https://doi.org/10.1186/s12909-016-0836-5
Article Google Scholar
de Sena, D. P., Fabrício, D. D., da Silva, V. D., Bodanese, L. C., & Franco, A. R. (2019). Comparative evaluation of video-based on-line course versus serious game for training medical students in cardiopulmonary resuscitation: A randomised trial. PLOS ONE, 14(4), Article e0214722. https://doi.org/10.1371/journal.pone.0214722
Article Google Scholar
Dib, H., & Adamo-Villani, N. (2014). Serious sustainability challenge game to promote teaching and learning of building sustainability. Journal of Computing in Civil Engineering, 28(5), Article A4014007. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000357
Article Google Scholar
Diehl, L. A., Souza, R. M., Gordan, P. A., Esteves, R. Z., & Coelho, I. C. M. (2017). InsuOnline, an electronic game for medical education on insulin therapy: A randomized controlled trial with primary care physicians. Journal of Medical Internet Research, 19(3), Article e72. https://doi.org/10.2196/jmir.6944
Article Google Scholar
Drummond, D., Delval, P., Abdenouri, S., Truchot, J., Ceccaldi, P., Plaisance, P., Hadchouel, A., & Tesnière, A. (2017). Serious game versus online course for pretraining medical students before a simulation-based mastery learning course on cardiopulmonary resuscitation: A randomised controlled study. European Journal of Anaesthesiology, 34(12), 836–844. https://doi.org/10.1097/EJA.0000000000000675
Article Google Scholar
Duque, G., Fung, S., Mallet, L., Posel, N., & Fleiszer, D. (2008). Learning while having fun: The use of video gaming to teach geriatric house calls to medical students. Journal of the American Geriatrics Society, 56(7), 1328–1332. https://doi.org/10.1111/j.1532-5415.2008.01759.x
Article Google Scholar
Fonteneau, T., Billion, E., Abdoul, C., Le, S., Hadchouel, A., & Drummond, D. (2020). Simulation game versus multiple choice questionnaire to assess the clinical competence of medical students: Prospective sequential trial. Journal of Medical Internet Research, 22(12), Article e23254. https://doi.org/10.2196/23254
Article Google Scholar
Gerard, J. M., Scalzo, A. J., Borgman, M. A., Watson, C. M., Byrnes, C. E., Chang, T. P., Auerbach, M., Kessler, D. O., Feldman, B. L., Payne, B. S., Nibras, S., Chokshi, R. K., & Lopreiato, J. O. (2018). Validity evidence for a serious game to assess performance on critical pediatric emergency medicine scenarios. Simulation in Healthcare, 13(3), 168–180. https://doi.org/10.1097/SIH.0000000000000283
Article Google Scholar
Graafland, M., Bemelman, W. A., & Schijven, M. P. (2014). Prospective cohort study on surgeons’ response to equipment failure in the laparoscopic environment. Surgical Endoscopy, 28(9), 2695–2701. https://doi.org/10.1007/s00464-014-3530-x
Article Google Scholar
Graafland, M., Bemelman, W. A., & Schijven, M. P. (2017). Game-based training improves the surgeon’s situational awareness in the operation room: A randomized controlled trial. Surgical Endoscopy, 31(10), 4093–4101. https://doi.org/10.1007/s00464-017-5456-6
Article Google Scholar
Hannig, A., Lemos, M., Spreckelsen, C., Ohnesorge-Radtke, U., & Rafai, N. (2013). Skills-O-Mat: Computer supported interactive motion- and game-based training in mixing alginate in dental education. Journal of Educational Computing Research, 48(3), 315–343. https://doi.org/10.2190/EC.48.3.c
Article Google Scholar
Hummel, H. G. K., van Houcke, J., Nadolski, R. J., van der Hiele, T., Kurvers, H., & Löhr, A. (2011). Scripted collaboration in serious gaming for complex learning: Effects of multiple perspectives when acquiring water management skills. British Journal of Educational Technology, 42(6), 1029–1041. https://doi.org/10.1111/j.1467-8535.2010.01122.x
Article Google Scholar
Jalink, M. B., Goris, J., Heineman, E., Pierie, J. P., & ten Cate Hoedemaker, H. O. (2014). Construct and concurrent validity of a Nintendo Wii video game made for training basic laparoscopic skills. Surgical Endoscopy, 28(2), 537–542. https://doi.org/10.1007/s00464-013-3199-6
Article Google Scholar
Katz, D., Zerillo, J., Kim, S., Hill, B., Wang, R., Goldberg, A., & DeMaria, S. (2017). Serious gaming for orthotopic liver transplant anesthesiology: A randomized control trial. Liver Transplantation, 23(4), 430–439. https://doi.org/10.1002/lt.24732
Article Google Scholar
Knight, J. F., Carley, S., Tregunna, B., Jarvis, S., Smithies, R., de Freitas, S., Dunwell, I., & Mackway-Jones, K. (2010). Serious gaming technology in major incident triage training: A pragmatic controlled trial. Resuscitation, 81(9), 1175–1179. https://doi.org/10.1016/j.resuscitation.2010.03.042
Article Google Scholar
LeFlore, J. L., Anderson, M., Zielke, M. A., Nelson, K. A., Thomas, P. E., Hardee, G., & John, L. D. (2012). Can a virtual patient trainer teach student nurses how to save lives—Teaching student nurses about pediatric respiratory diseases. Simulation in Healthcare, 7(1), 10–17. https://doi.org/10.1097/SIH.0b013e31823652de
Article Google Scholar
Li, K., Hall, M., Bermell-Garcia, P., Alcock, J., Tiwari, A., & González-Franco, M. (2017). Measuring the learning effectiveness of serious gaming for training of complex manufacturing tasks. Simulation & Gaming, 48(6), 770–790. https://doi.org/10.1177/1046878117739929
Article Google Scholar
Luu, C., Talbot, T. B., Fung, C. C., Ben-Isaac, E., Espinoza, J., Fischer, S., Cho, C. S., Sargsyan, M., Korand, S., & Chang, T. P. (2020). Development and performance assessment of a digital serious game to assess multi-patient care skills in a simulated pediatric emergency department. Simulation & Gaming, 51(4), 550–570. https://doi.org/10.1177/1046878120904984
Article Google Scholar
Middeke, A., Anders, S., Schuelper, M., Raupach, T., & Schuelper, N. (2018). Training of clinical reasoning with a serious game versus small-group problem-based learning: A prospective study. PLoS ONE, 13(9), Article e0203851. https://doi.org/10.1371/journal.pone.0203851
Article Google Scholar
Miller, C. H., Dunbar, N. E., Jensen, M. L., Massey, Z. B., Lee, Y., Nicholls, S. B., Anderson, C., Adams, A. S., Cecena, J. E., Thompson, W. M., & Wilson, S. N. (2019). Training law enforcement officers to identify reliable deception cues with a serious digital game. International Journal of Game-Based Learning, 9(3), 1–22. https://doi.org/10.4018/IJGBL.2019070101
Article Google Scholar
Mohan, D., Angus, D. C., Ricketts, D., Farris, C., Fischhoff, B., Rosengart, M. R., Yealy, D. M., & Barnato, A. E. (2014). Assessing the validity of using serious game technology to analyze physician decision making. PLOS ONE, 9(8), Article e105445. https://doi.org/10.1371/journal.pone.0105445
Article Google Scholar
Mohan, D., Farris, C., Fischhoff, B., Rosengart, M. R., Angus, D. C., Yealy, D. M., Wallace, D. J., & Barnato, A. E. (2017). Efficacy of educational video game versus traditional educational apps at improving physician decision making in trauma triage: Randomized controlled trial. BMJ, 359, Article j5416. https://doi.org/10.1136/bmj.j5416
Article Google Scholar
Mohan, D., Fischhoff, B., Angus, D. C., Rosengart, M. R., Wallace, D. J., Yealy, D. M., Farris, C., Chang, C. H., Kerti, S., & Barnato, A. E. (2018). Serious games may improve physician heuristics in trauma triage. Proceedings of the National Academy of Sciences, 115(37), 9204–9209. https://doi.org/10.1073/pnas.1805450115
Article Google Scholar
Moreno-Ger, P., Torrente, J., Bustamante, J., Fernandez-Galaz, C., Fernandez-Manjon, B., & Comas-Rengifo, M. D. (2010). Application of a low-cost web-based simulation to improve students’ practical skills in medical education. International Journal of Medical Informatics, 79(6), 459–467. https://doi.org/10.1016/j.ijmedinf.2010.01.017
Article Google Scholar
Perini, S., Luglietti, R., Margoudi, M., Oliveira, M., & Taisch, M. (2018). Learning and motivational effects of digital game-based learning (DGBL) for manufacturing education—The life cycle assessment (LCA) game. Computers in Industry, 102, 40–49. https://doi.org/10.1016/j.compind.2018.08.005
Article Google Scholar
Phungoen, P., Promto, S., Chanthawatthanarak, S., Maneepong, S., Apiratwarakul, K., Kotruchin, P., & Mitsungnern, T. (2020). Precourse preparation using a serious smartphone game on advanced life support knowledge and skills: Randomized controlled trial. Journal of Medical Internet Research, 22(3), Article e16987. https://doi.org/10.2196/16987
Article Google Scholar
Steinrücke, J., Veldkamp, B. P., & de Jong, T. (2020). Information literacy skills assessment in digital crisis management training for the safety domain: Develo** an unobtrusive method. Frontiers in Education, 5(140), Article 140. https://doi.org/10.3389/feduc.2020.00140
Article Google Scholar
Su, C. (2016). The efects of students’ learning anxiety and motivation on the learning achievement in the activity theory based gamified learning environment. Eurasia Journal of Mathematics, Science and Technology Education, 13, 1229–1258. https://doi.org/10.12973/eurasia.2017.00669a
Article Google Scholar
Taillandier, F., & Adam, C. (2018). Games ready to use: A serious game for teaching natural risk management. Simulation & Gaming, 49(4), 441–470. https://doi.org/10.1177/1046878118770217
Article Google Scholar
Tan, A. J. Q., Lee, C. C. S., Lin, P. Y., Cooper, S., Lau, L. S. T., Chua, W. L., & Liaw, S. Y. (2017). Designing and evaluating the effectiveness of a serious game for safe administration of blood transfusion: A randomized controlled trial. Nurse Education Today, 55, 38–44. https://doi.org/10.1016/j.nedt.2017.04.027
Article Google Scholar
Zualkernan, I. A., Husseini, G. A., Loughlin, K. F., Mohebzada, J. G., & El Gami, M. (2013). Remote labs and game-based learning for process control. Chemical Engineering Education, 47(3), 179–188.
Google Scholar

Download references

Author information

Authors and Affiliations

eX:plain, Department of Applied Research, P.O. Box 1230, 3800 BE, Amersfoort, The Netherlands
Aranka Bijl
Faculty of Behavioural, Management and Social Sciences, Cognition, Data and Education, University of Twente, P.O. Box 217, 7500 AE, Enschede, The Netherlands
Aranka Bijl & Bernard P. Veldkamp
Cito, Department of Research and Innovation, P.O. Box 1034, 6801 MG, Arnhem, The Netherlands
Aranka Bijl, Saskia Wools & Sebastiaan de Klerk

Authors

Aranka Bijl
View author publications
You can also search for this author in PubMed Google Scholar
Bernard P. Veldkamp
View author publications
You can also search for this author in PubMed Google Scholar
Saskia Wools
View author publications
You can also search for this author in PubMed Google Scholar
Sebastiaan de Klerk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aranka Bijl.

Ethics declarations

Conflict of interest

We have no conflict of interest to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 42 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bijl, A., Veldkamp, B.P., Wools, S. et al. Serious games in high-stakes assessment contexts: a systematic literature review into the game design principles for valid game-based performance assessment. Education Tech Research Dev (2024). https://doi.org/10.1007/s11423-024-10362-0

Download citation

Accepted: 24 February 2024
Published: 08 April 2024
DOI: https://doi.org/10.1007/s11423-024-10362-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Serious games in high-stakes assessment contexts: a systematic literature review into the game design principles for valid game-based performance assessment

Abstract

Similar content being viewed by others

A Provisional Framework for Multimodal Evaluation: Establishing Serious Games Quality Label for Use in Training and Talent Development

Success factors for serious games to enhance learning: a systematic review

A Systematic Literature Review of Game-Based Learning and Safety Management

Method

Procedure

Databases and search terms

Inclusion criteria and selection process

Data extraction

Search and selection results

Categorization of selected studies

Competency types

Successful SGs

Results

Descriptives of the included studies

Serious games with successful results

Professional domains and competency types

Game mechanics

Authenticity

Unobtrusive data collection

Adaptivity

Test anxiety

Discussion

Included studies

Authenticity

Unobtrusive data collection

Adaptivity

Feedback

Freedom of gameplay

Limitations

Suggestions for future research

Implications for practice

References

In-text citations

Studies included in the systematic review

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 42 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation