1 Introduction

Spurred by recent technological developments, the application and specific characteristics of virtual reality (VR) have received increasing interest in practice and research (e.g., Gamberini 2000; Radianti et al. 2020; Riva 2009; Schultheis et al. 2002; Straatmann et al. 2022). It is argued that VR has a significant potential to deeply change the representation, perception, and consumption of digital content (Donalek et al. 2014; Meyer et al. 2019; Mikropoulos and Natsis 2011). In contrast to other existing information and communication technologies, VR creates an immersive and multisensory virtual experience in an artificial simulation for users due to its specific technological setup (Allcoat and Mühlenen 2018; Makransky et al. 2020; Mikropoulos and Natsis 2011).

This new kind of virtual experience provided by VR is influenced by the fact that the user can interact in VR “as if it were the real world” (Botella et al. 2017, p. 1). Specifically, VR allows two-way interaction between the user and the virtual environment as well as interaction with objects and persons in the virtual environment (Sveistrup 2004). Movements in the virtual environment can be performed in a motorically analogous way to actions in physical reality. Even though movements in VR are, in many cases, mediated by controllers, leading to less tactile and kinesthetic feedback than that gained from movements in reality, there is less mismatch than in computer variants where the “subjects' interface with the desktop […] is often not motorically analogous to the action being simulated” (Smith 2019, p. 1215). Given the large potential for analogous motoric movements in VR, users can become part of immersive naturalistic situations, exploring and actively influencing them instead of being restricted to a passive observer role (Alfalah 2018). Taking advantage of the given interaction possibilities and the creation of artificial but simultaneously naturalistic simulations, VR is attracting the attention of practitioners and researchers in the context of education, teaching, and learning (Albus et al. 2021; Alfalah 2018; Kamińska et al. 2019; Makransky and Petersen 2021; Markowitz et al. 2018; Mikropoulos and Natsis 2011; Radianti et al. 2020).

In this context, VR provides a promising learning technology for action learning as it offers the possibility to practice, self-perform, and learn relevant motor skills and specific actions. Remembering these actions plays an important role in the acquisition of behavior, particularly in the sense of procedural knowledge, and represents the first step toward ultimately performing them in the long run (Anderson 1996). Explicit recall of actions is closely linked to the memory for actions which is part of episodic memory, and hence, by implication, of declarative memory too (Zimmer and Cohen 2001). One important finding in the field of memory for action is the so-called enactment effect (Cohen 1981; Engelkamp and Krumnacker 1980; Saltz and Donnenwerth-Nolan 1981). The basic idea of the enactment effect is that enacted actions are better recalled than actions which have not been enacted (Engelkamp and Cohen 1991). Thus, learning by enactment should lead to better memory performances than, for example, observational learning (Steffens 2007).

The empirical investigation of the enactment effect in VR is still in its infancy, although VR’s potential as an embodied medium (Riva 2008) with largely analogous interaction possibilities for learning by enactment is quite high. Initial theoretical assumptions that the enactment effect should also occur in VR are made by Smith (2019) in his review of VR in episodic memory research. The present study takes up these considerations and addresses two specific research questions. As a first research question, the study aims to empirically investigate the enactment effect in VR by comparing different conditions of learning action lists in VR. In a second research question, the study deals with the tactile and kinesthetic limitations of interactions in VR, triggered by using controllers, and investigates to what extent the reduction of sensory information influences the ability to recall action lists performed in VR by contrasting learning by enactment in VR with learning by enactment in physical reality.

In summary, the study aims to make two important contributions in terms of expanding our knowledge about the functioning of VR as a new learning medium. First, to gain more insights into the currently underexplored role of interaction in the context of learning actions (Leder et al. 2019; Makransky and Petersen 2021; Wilson 2013) and especially the enactment effect in VR (Smith 2019), this study investigates the enactment effect by contrasting different learning conditions in VR based on the classical laboratory experiments dealing with this effect (Cohen 1981; Engelkamp and Krumnacker 1980; Saltz and Donnenwerth-Nolan 1981). Second, to deepen our knowledge and address the ongoing discussion about similarities and differences between VR and physical reality (Wilson 2013), this study examines memory performance of actions in VR and physical reality. Taken together, as VR is postulated as a new learning aid offering special opportunities for learning by enactment (Radianti et al. 2020), the study contributes to the investigation of the potential of VR in the training and education sector. An understanding of these aspects has important implications for researchers in the field of VR, and in particular, for educators and content creators.

1.1 Enactment effect

Every day, people perform activities and actions which they have learned during their lives. Occasionally, people need to learn new and specific actions and activities, such as driving a forklift truck or new work processes (Stülpnagel et al. 2016). In the 1980s, researchers such as Cohen (1981), Engelkamp and Krumnacker (1980), and Saltz and Donnenwerth-Nolan (1981) first began to examine the memorization of these self-performed actions. Their goal was to investigate the memory effectiveness of different conditions such as learning by enactment, observational learning, and learning by reading to draw conclusions about underlying cognitive processes involved in recalling actions. Typically, in experiments investigating the memory for action, the memory performance of the above-mentioned different conditions was compared after simple action phrases (e.g., “Bounce the ball.”) were presented to participants (e.g. Cohen 1981, 1983; Cohen et al. 1987; Cohen and Bean 1983; Engelkamp and Dehn 2000; Engelkamp and Krumnacker 1980; Engelkamp and Zimmer 1997; Steffens 2007). In particular, the earlier experiments of Engelkamp and Krumnacker (1980), Cohen (1981), and Saltz and Donnenwerth-Nolan (1981) discovered the well-known enactment effect, which simply put is that actions that are enacted are better recalled than those which are not (Engelkamp and Cohen 1991). Following this finding, learning by enactment is superior to other conditions, such as observational learning or learning by reading, in the context of learning short action phrases. There are different approaches to reason about the enactment effect. One of them refers to the multimodal theory of memory (Engelkamp 1998) which states that an action that is performed by the learner not only addresses the visual-imaginal and verbal-memory system, as in observational learning, but also involves the motor system. Hence, besides being semantically encoded in one system (including language and the processing of auditory material) and visually in another one (including images and the processing of visual and spatial material), as hypothesized by Dual coding theory (Clark and Paivio 1991; Paivio 1990), information can also be encoded in the motor system, according to this theory. This enhanced enactment encoding in the memory for action establishes additional memory markers which should make it easier to recall an  action (Engelkamp 1998).

Previous research comparing learning by enactment with learning by reading has shown that the enactment effect is robust (Cohen 1981; Engelkamp 1998). More controversial, however, is the superiority of learning by enactment over observational learning (e.g., Cohen 1981; Cohen et al. 1987; Cohen and Bean 1983; Engelkamp et al. 2003; Steffens 2007). Current findings tend to show that both forms of learning lead to similar memory performance which may be due to different efficiencies of memory organization (Steffens et al. 2015), the activation of mirror neurons during observational learning (Stamenov and Gallese 2002), or the relational processing itself (Steffens et al. 2015).

In light of these inconclusive results regarding the superiority of learning by enactment compared to observational learning, it appears even more important to investigate the extent to which the enactment effect appears in VR (Smith 2019). The relevance of the topic is reflected in the increasing use of VR for learning of actions, for which it is particularly suitable as it enables users to learn actions also in virtual situations “which are either inaccessible (in time or space) or problematic (dangerous or unethical)” (Jensen and Konradsen 2018, p. 1517). Hence, we investigate the enactment effect in VR in our first research question:

Research Question 1 (RQ1): To what extent does the enactment effect appear across different conditions for action learning in VR?

1.2 Learning by enactment: a comparison between reality and virtual reality

In the public and scientific discussion about the increasing use of VR in education and training, the question of the effect of VR on learning outcomes plays a central role (Wu et al. 2020). This question requires a closer examination of the similarities and differences of cognitive processes under conditions of VR and physical reality respectively to better understand the added value of VR as a new learning medium. With a focus on memory effects and brain processes, a few studies have provided first indications that cognitive processes in VR show a strong concordance with cognitive processes in reality. Kisker et al. (2019b) concluded from their investigations of the mnemonic mechanism that “[t]he encoding mechanism in VR might closely resemble real-life mnemonic processing (…)” (p.1). Similarly, Sauzéon et al. (2012), who studied episodic memory assessments in VR, concluded that the memory processes involved are like those in physical reality. It is assumed that “VR evokes lifelike responses at both behavioral and psychophysiological […]” (Kisker et al. 2019a, p. 1), as well as affective level (Gorini et al. 2010), leading to the assumption that similar encoding effects should be found in VR to those in physical reality.

However, despite this assumed strong correspondence between VR and physical reality, there are also some differences especially regarding interaction in VR and real contexts (Smith 2019; Wilson 2013). The “tactile and kinaesthetic feedback which normally accompanies movement” (Wilson 2013, p.184) can, in many cases, only be simulated in VR by using specific data gloves. As an alternative to today’s cost-intensive data gloves, users predominantly use hand controllers for interaction in VR which, however, do not enable a tactile experience via the sense of touch. Thus, although motor or sensorimotor activities can be performed with the help of controllers (e.g., gras** objects), neither the weight nor the texture of the surface of VR objects can be felt by users. This might be a significant limitation especially if learning processes contain a motor or sensorimotor component. Thus, if certain sensory information is either only conveyed via controllers or completely missing in VR, the potential impact on the processing of the action-related stimuli can lead to limitations in memory performance for actions and, ultimately, to the enactment effect being less pronounced than in physical reality.

Taken together, research studies point to the potential similarity of memory processes in the real world and VR (Kisker et al. 2019b; Wilson 2013). However, it is important to investigate whether the kinesthetic and tactile limitations of VR attenuate the memory performance resulting from enactment compared to that resulting from enactment in the real world. Hence, our second research question is:

Research Question 2 (RQ2): To what extent does the memory performance after learning by enactment in reality correspond to memory performance after learning by enactment in VR?

2 Methods

2.1 Design and participants

In the research field of memory for action, different realizations of memory experiments exist, depending on the study conceptualization and experimental design (Cohen 1981, 1983; Cohen et al. 1987; Cohen and Bean 1983; Engelkamp et al. 2003; Engelkamp and Dehn 2000; Engelkamp and Krumnacker 1980). However, all the mentioned studies follow the same basic logic, insofar as they contrast the effect of different conditions on the memory performance of simple action phrases such as “ring the bell”.

To investigate the enactment effect in VR (RQ1) and the difference between VR and physical reality concerning learning by enactment (RQ2), we conducted a VR experiment using a one-factor between-subjects design with four groups, namely learning by reading in VR, observational learning in VR, learning by enactment in VR, and learning by enactment in reality physical. While RQ1 contrasts learning by reading in VR, observational learning in VR, and learning by enactment in VR in terms of their memory performance, RQ2 compares the memory performance of learning by enactment in VR and learning by enactment in physical reality. Correctly remembered action phrases in the (a) immediate recalls and (b) final recall were measured as dependent variables. Both are considered characteristic indicators that allow conclusions to be drawn with regard to memory performance (Cohen 1981).

The experiment was carried out with N = 112 participants. All participants were screened for both psychological and neurological disorders and normal or corrected-to-normal vision with an anamnesis sheet. The participants received course credits. Based on their previous experience in the use of 3D-games/applications and in the use of VR, participants were assigned to one of the four conditions in a parallel way. Due to the parallelization, the general previous experience was approximately the same in each condition. In total, most of the participants had no (38%) to rare (once a year, 29%) experiences of using 3D-games/applications, while 30% of the sample had never used VR and 51% had rare (once a year) experiences with VR. The sample size was nearly consistent across all four groups (nGroup 1 = 30; nGroup 2 = 26; nGroup 3 = 28; nGroup 4 = 28) and varies slightly due to parallelization in the group assignment. Female participants comprised 71% of the sample, and the mean age of the sample was 23.89 years (SD = 4,51). Participants of the four groups did not differ significantly with respect to gender [χ2 (3) = 1.29, p > 0.05] or age [F(3,111) = 0.730, p > 0.05].

2.2 Materials

Stimuli for the experiment were two lists of 15 action phrases (Appendix 1). These action phrases were either selected and slightly adapted from a list developed by Cohen (1981) or newly formulated concerning their possibility of implementation in both VR and physical reality. Only action phrases that consisted of at least one object (noun) and one possible interaction (verb) were selected. Action phrases such as “cross your finger” were therefore avoided, as they are difficult to perform in VR due to the use of controllers.

2.3 VR environment and setup

In all three VR conditions, participants were seated on a chair in the middle of the laboratory and equipped with an HTC Vive Pro head-mounted display which allows a 3D-360° view. Regardless of whether controllers were needed or not in the VR conditions, all participants held the HTC Vive controllers in their hands. The virtual environment itself was built in Unity 3D (Version 2018.1.4f1) and closely resembled the actual room in which the experiment was conducted, but with a reduced setup to avoid additional distraction. Specifically, existing equipment in physical reality, which was unrelated to the experiment and specifically to the learning process, was not built in VR. In physical reality, participants sat at a white table which was positioned in a way to minimize additional distractions. In the virtual situation, participants sat in front of a white desk on which a computer monitor was placed.

2.4 Procedure

Before the experiment started, the participants were informed about the general aim which was to investigate memory findings in VR, the course of the study including the announced recall memory tests, and the possibility that they could abort the study at any time. The subsequent anamnesis sheet ensured that only persons who had not taken medication, had no epilepsy or hallucinations, and were not especially susceptible to motion sickness took part in the study. Participants gave their informed written consent to participate in the study and agreed to the recording of audio files.

In all three VR conditions, participants sat virtually in front of a computer monitor which was placed on a white desk. Depending on the condition, the action phrases were then presented in different ways. In group 1, which was the learning by reading group, the action phrases appeared sequentially in written form on the computer monitor (see Fig. 1a). Group 2 watched a video on the computer monitor in VR showing another person performing the actions (observational learning—see Fig. 1b). In group 3, namely the learning by enactment group in VR, the participants performed the action phrases in VR themselves. The objects appeared on the desk in front of the participant (see Fig. 1c). To ensure that participants understood how to handle the controllers and that no technical problems occurred, a test situation was created beforehand in which the participant could familiarize themselves with the controllers (see Fig. 2). For this purpose, the respondent could practice grip** and throwing objects which were different from the objects presented later in the lists. In group 4, the participants performed the action phrases in reality (learning by enactment in reality—see Fig. 1d). They sat at a real white table in the laboratory. The researchers placed the objects on the table according to the respective action phrase. In each of these four groups, an additional acoustic action instruction (e.g., “Bounce the ball.”) was played either through the integrated headphones in the HTC Vive (groups 1–3) or through loudspeakers (group 4).

Fig. 1
figure 1

Experimental setup of the four groups. Note An additional acoustic action instruction was played either through the integrated headphones in the HTC Vive ac or through loudspeakers d

Fig. 2
figure 2

Experimental procedure of the learning by enactment in VR group. Note (1) Participant wearing HTC Vive Pro head-mounted display, (2) task description presented in VR, (3) test situation to familiarize with the handling of the controllers, (4) different action phrases, (5) call for final recall

Close to the experimental setting of one of the original experiments (Cohen 1981), an item-presentation rate of 5 s/item, a list length of 15 action phrases/list, a recall time of 90 s/set for the immediate recall, and a final recall after 10 min of distraction were chosen. The recall time was raised from the 80 s used in Cohen’s (1981) experiment to 90 s in the present experiment because the instruction time was included.

In all four experimental groups, two lists, each consisting of 15 action phrases, were presented to the participants. The first list was followed by an immediate recall (time: 90 s) in which the participants were asked to recall as many action phrases as possible. Irrespective of whether the participants needed the whole time allowed for recall or not, the second list of action phrases was presented directly after the 90 s had elapsed. The second immediate recall took place immediately after the presentation of the second list. It was followed by a ten-minute distraction in the form of an online questionnaire about technology acceptance and a standardized conversation with the researcher in groups 1 to 3. In group 4, the participants played instead a short VR game during this time which was unrelated to the learning task and completed afterward the same online questionnaire. The rationale for this was that—consistent with the cover story—all participants should experience the VR to ensure that group 4 (learning by enactment in physical reality) did not assume that they might be in a control condition, which might have added confounding effects. After the distraction, the final recall took place, in which the participants were asked to recall as many items of both lists as possible (time: 90 s).

The pre-experimental instructions informed the participants about both the immediate and final recall. To avoid interrupting the VR experience for the recall, all answers in the recall phases were recorded via recording devices.

2.5 Measures

As dependent variables, the correctly recalled action phrases in both the immediate and final condition were measured. The two immediate recalls were added up into one due to their comparability. In line with the scoring procedure used in Cohen’s (1981) study, correct partial answers such as one-word responses were also scored as correct. In both the immediate and final recall, 30 correct answers could be given. Incorrect answers were not counted. In the online questionnaire which was mainly used for distraction, the strength of cybersickness in VR was measured as a control variable using the simulator sickness questionnaire (Kennedy et al. 1993). Participants in the three VR-groups did not differ significantly concerning the strength of cybersickness symptoms [F(2,82) = 0.741, p > 0.05].

2.6 Analyses

The audio data were first transcribed and then coded and analyzed in IBM SPSS Statistics Version 25. To investigate the enactment effect in VR (RQ1), two separate ANOVAs were calculated for the immediate and final recall. Subsequently, planned contrasts were used to provide information about differences between the groups. To compare the role of learning by enactment in VR versus learning by enactment in physical reality on the memory performance (RQ2), two t-tests were conducted, one for the immediate and the other for the final recall.

3 Results

The requirements for the ANOVA and t-test were fully met. The correctly recalled action phrases in both the immediate and final recall were normally distributed for all groups, as assessed by the Shapiro–Wilk test. Homogeneity of variances was asserted using Levene’s test which showed that equal variances could be assumed in each group. No outliers were identified.

Regarding RQ1, there was a significant effect of conditions for action learning in VR on levels of memory performance in the immediate recall [F(3, 108) = 15,175, p = 0.000]. Planned contrasts revealed that learning by enactment in VR (MEL = 19.04, SD = 3.58) and observational learning in VR (MOL = 17.96, SD = 3.41) lead to better memory performance than learning by reading in VR [MRL = 13.87, SD = 4.11, t(108) =  − 5.68, p = 0.000, ƞ2 = 0.278]. No significant difference between learning by enactment in VR and observational learning in VR was observed [t(108) = 1.10, p = 0.273] (Fig. 3).

Fig. 3
figure 3

Memory performances in the immediate recall. Note Amount of correctly recalled action phrases per group in the immediate recall. The error bar depicts the 95% confidence interval (CI). Significant differences are marked (*p < .05)

A similar pattern of results can be found in the final recall as a significant effect of conditions for action learning in VR on levels of memory performance in the final recall was revealed [F(3,108) = 14,904, p = 0.000]. Planned contrasts demonstrated that the memory performance was significantly better in the enactment group in VR (MEL = 14.54, SD = 4.13) and the observational learning in VR (MOL = 12.69, SD = 3.3) than in the learning by reading group in VR [MRL = 9.30, SD = 3.43, t(108) = − 5.44, p = 0.000, ƞ2 = 0.277]. Again, there was no significant difference between learning by enactment in VR and observational learning in VR [t(108) = 1.94, p = 0.056] (Fig. 4).

Fig. 4
figure 4

Memory performances in the final recall. Note Amount of correctly recalled action phrases per group in the final recall. The error bar depicts the 95% confidence interval (CI). Significant differences are marked (*p < .05)

Regarding RQ2, which focused on the memory performance of learning by enactment in VR and in physical reality, the results show that both conditions lead to comparable memory performance. No significant differences between learning by enactment in VR (M = 19.04, SD = 3.57) and learning by enactment in reality (M = 19.54, SD = 3.1) were found for the immediate recall (t(54) = − 0.559, p = 0.578, d = 0.15, 95% CI [− 2.29, 1.29]). Furthermore, there was no significant difference in the final recall (t(54) = − 0.074, p = 0.941, d = 0.02, 95% CI [− 2.00, 1.86])) between learning by enactment in VR (M = 14.54, SD = 4.13) and learning by enactment in reality (M = 14.61, SD = 2.97).

4 Discussion

VR is attracting the interest of practitioners and researchers in the field of education and learning as it promises a new learning experience in a naturalistic simulation (Alfalah 2018; Hamilton et al. 2020; Makransky and Petersen 2021; Radianti et al. 2020). Due to the technological advantages of VR, learning can become a multisensory and interactive experience, enabling learning by enactment. Learning by enactment in virtual environments can offer many opportunities (Smith 2019), especially when specific actions have to be practiced, performed, or learned. The underlying idea of learning by enactment relates to the enactment effect (Cohen 1981; Engelkamp and Cohen 1991; Engelkamp and Krumnacker 1980; Saltz and Donnenwerth-Nolan 1981). As little is known about the extent to which the enactment effect appears in VR, the present study sheds new light on this highly relevant research area.

The first research question examined whether the enactment effect occurs in VR by comparing the effect of learning by enactment, observational learning, and learning by reading on memory performance. The results show that learning by enactment in VR is superior to learning by reading in VR regarding memory performance. This finding advances existing knowledge about  the enactment effect in physical reality (e.g. Cohen 1981 Engelkamp 1998; Steffens 2007) by showing similar patterns in VR. Regarding the development of procedural knowledge and, in particular, memory performance as an important prerequisite for the subsequent execution of actions (Anderson 1996), this research result also aligns well with previous VR research. Alongside a few studies that have shown that learning in VR generates better memory performance than the passive consumption of information (Allcoat and Mühlenen 2018; Chittaro and Buttussi 2015; Webster 2016), a recent meta-analysis by Wu et al. (2020) reported a stronger effect of head-mounted displays over more passive receptive lectures in terms of both knowledge and skill development. Wu et al. (2020) concluded that active learning in VR is significantly more engaging for learners than passive learning. As Leder et al. (2019) postulated, the most relevant factor for explaining these differences may lie in the interactive component. Although these studies did not explicitly investigate the enactment effect in VR, they can provide supportive indications that interactive learning in VR is beneficial in terms of memory performance. In doing so, the current results offer a possible explanation for the findings of Wu et al. (2020) and Leder et al. (2019).

Furthermore, the present results show that learning by enactment does not significantly outperform observational learning in VR. This second finding aligns with the findings of Steffens et al. (2015), who postulated that an advantage of enactment encoding compared to observational learning in the context of short action phrases depends on various factors. Specifically, different aspects of the study design (e.g., recall or recognition; between-subject vs. within-subject design; the number of study-test cycles; list length; action sequences—for an overview, see Steffens et al. 2015) were discussed as influencing factors. Additionally, previous research considered whether different efficiencies of memory organization (Steffens et al. 2015), the activation of mirror neurons during observational learning (Stamenov and Gallese 2002), or the processing of information itself (Steffens et al. 2015) influence the memory performance in the two modes of learning.

Another possible explanation for the non-significant differences between learning by enactment and observational learning could be the overload of the receptive channels, as addressed by Sweller et al. (2011) in the cognitive load theory. In the learning by enactment condition, subjects had to listen, simultaneously take in visual stimuli, and act themselves. The possible increase in cognitive load due to high interactivity may inhibit learning, resulting in worse memory performance in VR (Frederiksen et al. 2020; Makransky et al. 2020). The advantage of enactment in VR may be canceled out by an increased cognitive load. In the video condition, on the other hand, participants could visually and auditorily focus on the learning material. This may have allowed them to process information more deeply on these two channels and avoid cognitive overload, leading to similar memory performances.

Besides possible methodological aspects and the stronger impact of cognitive load during learning by enactment in VR, it is conceivable that the actions used as stimuli in the present study might have been too short and simple. Hence, it can be expected that when actions are more complex (Mikropoulos and Natsis 2011), novel (Jensen and Konradsen 2018), need to be practiced several times (Jensen and Konradsen 2018), or have a certain dynamic (Allcoat and Mühlenen 2018), the contrast between the two conditions may become greater in the sense that learning by enactment leads to higher memory performance.

As of today, no conclusive explanation has yet been offered in the ongoing discussion of why some studies find a superiority effect of learning by enactment over observational learning, while others did not  (Steffens et al. 2015). Adding to the current research, our results show that the non-significant differences between learning by enactment and observational learning (e.g., Cohen 1981; Cohen et al. 1987; Cohen and Bean 1983; Engelkamp et al. 2003; Steffens et al. 2007) can also be found in VR. Future research could take these results as a starting point to further investigate the explanatory approaches raised by Steffens et al. (2015), the influence of interactivity in VR on memory performance with reference to cognitive load theory (Sweller et al. 2011), or the memory performance of differentially complex and dynamic tasks, as well as novel and repetition-requiring tasks. VR might be particularly suited for future research as it can provide highly controllable, yet realistic environments.

Regarding the second research question, which addresses the direct comparison of learning by enactment in VR and in physical reality, no significant difference can be found. The correspondence of acting in VR to interactions in physical reality is therefore sufficiently efficient to produce similar results as in reality. Accordingly, even if there are differences between these two realities (Wilson 2013), the difference in receipt of sensory information does not seem to be crucial for the encoding of short action phrases. However, considering recent research results, the difference in receipt of sensory information might play a much more important role in the learning of fine motor skills which requires tactile feedback (Allcoat and Mühlenen 2018). For example, van der Meijden and Schijven (2009) were able to show how important haptic feedback is in VR training, especially in the initial phase of acquiring psychomotor skills. It is conceivable that actions that require more fine motor skills and haptics also require more sensory information.

The non-significant result between VR and physical reality regarding the memory performance of short action phrases joins existing research (Leder et al. 2019; Makransky et al. 2020; Moreno and Mayer 2002; Zhou et al. 2018) in concluding that VR as a learning medium is as adequate as learning in reality (Wu et al. 2020). The comparably good learning effectiveness regarding the memory performance of action lists between VR and physical reality thus represents an important prerequisite for the practical use of VR in the field of education and training. In situations where real-life training is location-bound, time-dependent, dangerous, expensive, or complex (Alqahtani et al. 2017; Freina and Ott 2005; Jensen and Konradsen 2018; Mikropoulos and Natsis 2011), VR puts additional strengths to work.

Combining the results for both research questions, it can be concluded that, if the same patterns regarding the enactment effect appear in both physical reality (Cohen 1981) and virtual reality, then similar cognitive processes can also be inferred (Smith 2019). Acting in VR could, therefore, lead to an enhanced enactment encoding in VR in a similar way as in physical reality. Even if differences regarding interaction under the conditions of VR and physical reality (such as sensorimotor processes mediated by controllers) exist, the results suggest that similar encoding processes take place when action phrases are memorized. Further follow-up studies, including neuroimaging and the investigation of mnemonically brain activities, could extend the present findings.

4.1 Implications for future research

The results as well as the interpretations point to a variety of future research avenues embedded in the overall context of action learning in VR. Based on previous research findings (e.g., Hamilton et al. 2020; Makransky et al. 2019, 2020; Smith 2019; Wu et al. 2020) and model developments (Makransky and Petersen 2021) in the field of learning in VR, an extended input–process–output model of action learning in VR (IPO-ALVR) is proposed (see Fig. 5). This model provides an overview of different factors which might play a role in the context of action learning in VR. In particular, the proposed model extends existing model developments by including task aspects, individual characteristics, and research design questions. According to the model, influencing factors in the field of action learning of VR can be divided into task-related (e.g., complexity of actions—Mikropoulos and Natsis 2011), technology-related (e.g., level of interactivity—Leder et al. 2019), individual-related (e.g., learning stage—Wu et al. 2020), as well as design-related aspects (e.g., control treatments—Wu et al. 2020). These aspects either have a direct effect on the learning outcomes or an indirect effect via cognitive and affective factors (Makransky and Petersen 2021). While research provides first support for some of the assumed relationships, the aspects and their effects on each other in the form of direct or indirect (moderating or mediating) effects still need more research to provide a complete picture.

Fig. 5
figure 5

Input–process–output model of action learning in virtual reality (IPO-ALVR). Note The input–process–output model is based on previous model developments (Makransky and Petersen 2021) and research findings (Hamilton et al. 2020; Makransky et al. 2019, 2020; Smith 2019; Wu et al. 2020)

Considering the present results on the enactment effect in VR and the focus on the memory performance of action lists, four further fields of research emerge. In the context of task-related aspects the following key research question emerges: To what extent does the memory performance change when the length of the actions, their complexity, and the number of actions vary? Will the contrasts between learning by enactment and observational learning become more pronounced as actions become longer, more complex, and more numerous? Or will observational learning even offer clear benefits, since increasing complexity leads to a higher cognitive load followed by a reduced memory performance?

The present study did not investigate the extent to which affective factors such as motivation or enjoyment, and cognitive factors, such as cognitive load or presence, are triggered by different conditions of learning actions (see Fig. 5). It has been shown that the opportunity to interact in VR brings more fun and enjoyment to learners than passive observation or reading (Makransky et al. 2019; Makransky and Lilleholt 2018). The positive short-term effects of interaction with VR on memory performance over observation were not shown in the present study, but previous research has demonstrated that intrinsic motivation or enjoyment has long-term effects on learning (Makransky and Lilleholt 2018). Additionally, cognitive factors such as presence and cognitive load as well as the problem of cybersickness should be considered in future as it has already been shown that “participants of HMD-based immersive learning also experienced higher cognitive load and motion sickness than those of non-immersive learning” (Wu et al. 2020, p. 1993).

The present study has focused on memory performance as a cognitive learning outcome. To capture the full picture and contribute to the understanding of the use of VR in the training and education sector, future research could extend the general study design by addressing the following question: To what extent do different conditions of learning actions in VR show an influence, beyond memory performance, on procedural learning outcomes and, in particular, on the transfer of knowledge?

Finally, the embedding in the larger and more global learning context requires the consideration of media comparison research (Parong and Mayer 2018). Thus, the present study primarily compared conditions of learning action lists within VR, excluding other conditions, besides learning by enactment, in physical reality. Examinations of different conditions in VR and the same conditions in physical reality (e.g., watching video on a PC, observing someone else perform actions) could further enrich learning research (Martin et al. 2020) and provide a comprehensive understanding of the use of VR in the learning context.

4.2 Practical implications

In addition to the implications for future research, the study has valuable practical implications. First, it shows that learning in VR is not inferior to learning in reality with respect to the memory  of action phrases. Accordingly, this study reinforces the targeted use of VR in educational settings, especially in the context of learning by enactment, as the potential of VR from a learning perspective has been demonstrated. In particular, in situations where real-life training is problematic, dangerous, expensive, complex, difficult-to-access, or location-bound (e.g. Alqahtani et al. 2017; Bakar et al. 2021; Freina and Ott 2005; Grabowski and Jankowski 2015; Hirt et al. 2019; Jensen and Konradsen 2018; Mikropoulos and Natsis 2011; Soós et al. 2019), VR offers additional benefits and promising possibilities to practice and learn situation-specific actions. However, the use of VR learning applications cannot be a simple yes/no consideration (Wu et al. 2020), as a variety of factors influence the action learning in VR (see input–process–output model in Fig. 5).

Second, the study illustrates that costs and benefits must be evaluated when designing learning content within VR. Although both conditions—learning by enactment in VR and observational learning in VR—lead to similar memory performance, the programming of the former simulation, which includes the possibility of interaction in VR, is far more time- and cost-intensive in the first development (Wilson 2013). For learning content designers, this means that a simulation in VR is not always necessary to achieve certain learning goals: sometimes, a video may  be sufficient.

Even though the present study shows that there are no significant differences between learning by enactment in VR and observational learning in VR, it is important to keep the advantages of learning by enactment in mind. Learning by enactment enables active learning, whereas videos allow passive learning only (Allcoat and Mühlenen 2018). This interactivity combined with agency as supported by VR has positive effects on learning, as learners actively take control of the pace (Makransky et al. 2020). Closely related to positive effects on learning is the aspect that learning by enactment promotes self-determined learning in which learners can self-select their focus and are not constrained by having to look at predetermined learning paths. While people prefer active experiences rather than just passively observing, learning by enactment also offers more fun and enjoyment than passive observation or reading (Makransky et al. 2019; Makransky and Lilleholt 2018).

5 Limitations

Despite the study’s theoretical and practical contributions, certain methodological limitations should be addressed in future research.

First, all data were collected within an experiment from a student sample. Both the experimental setting and the sample limit the generalizability and representativeness of the results. Future research could use more diverse samples, regarding different user-settings (e.g., training and educational programs) and learning stages (Wu et al. 2020), to overcome this sampling issue. Second, the experimental setup VR was arranged for right-handed individuals, in the sense that objects for interaction were closer to the right hand. Even though most of the sample consisted of right-handed individuals (n = 104), it should be ensured in the future that left-handed individuals are not disadvantaged in VR learning applications. Third, the nature of distraction differed between groups 1 -3 and 4 due to the need to maintain the cover story. In future research, study designs could be set up in such a way that allows to employ the same distraction for all groups and still maintain the cover story to further minimize potential bias. Fourth, it should still be kept in mind that the outcome variable measured was memory performance and not behavior. Therefore, the results can only be generalized to the context of learning to a limited extent, since memory performance does not equate to demonstrated behavior. Future research could thus also focus on behavior as a procedural outcome.

6 Conclusion

VR has been named a promising approach for learning purposes (Alfalah 2018; Kamińska et al. 2019; Markowitz et al. 2018; Mikropoulos and Natsis 2011; Radianti et al. 2020). As an interactive technology, VR empowers learners to actively engage in learning, show interactive behavior, and perform physical activities and actions (Makransky et al. 2020). If learners are offered the opportunity to learn by enactment, they become more than passive observers. By investigating the enactment effect in VR, the present study indicates that similar encoding processes occur in VR as in physical reality when individuals memorize action phrases. Furthermore, the results demonstrate that VR and physical reality have comparably good learning effectiveness in terms of memory performance of short actions supporting the assumption of the effectiveness and usefulness of VR in the learning and education context, particularly in regard to the learning of actions. Based on these findings, the strong potential of VR, particularly in regard to learning by enactment, is supported and more applications in research and educational practice are encouraged. Moreover, the derived input-process-output model (IPO-ALVR) and its underlying cognitive processes offer important avenues for future research aiming at the optimization of learning by enactment in VR.