Introduction

Computational thinking (CT) is often defined as a type of thought process that involves using computational approaches to solving problems (Cuny et al., 2010; Shute et al., 2017; Wing, 2011). It has gained increasing attention of scholars and educators in the past decade (Hsu et al., 2018; Liu & Jeong, 2022; Madariaga et al., 2023; Turchi et al., 2019; Zhao & Shute, 2019) and is acknowledged by many as a fundamental and essential competency for everyone to learn, along with reading, writing, and arithmetic (Atmatzidou & Demetriadis, 2016; Tikva & Tambouris, 2022; Wing, 2006; Zhao & Shute, 2019). As such, more systematic actions are being taken to scale CT in K-12 settings (Grover et al., 2017; Madariaga et al., 2023).

In the past decade, various computer-based platforms have been developed to support student engagement with CT, such as block-based programming environments (e.g., Scratch, Zhang & Nouri, 2019), scalable game design (Basawapatna et al., 2010), educational robotics learning (Atmatzidou & Demetriadis, 2016; Madariaga et al., 2023), and game-based learning (Liu & Jeong, 2022; Zhao & Shute, 2019). Extensive research has been conducted to validate the ways that CT is integrated into curricula from each of these platforms.

Compared to the other computer-based platforms, game-based learning (GBL) is regarded as a highly engaging platform for students to enhance CT competency because it plays a crucial role in mitigating the negative attitudes towards learning CT (Sun et al., 2020). Students can become fully immersed in gameplay because GBL provides an interactive environment. Besides, since existing conceptual frameworks define CT skills as forms of problem solving involving cognitive and behavioral processes (Grover & Pea, 2013), the inherent nature of games directly connects to problem-solving activity (Dondlinger, 2007; Liu & Jeong, 2022). Furthermore, in the context of CT, empirical evidence has shown the potential of using GBL to develop students’ cognitive and non-cognitive aspects such as concepts understanding, skills development, and attitude change (Hooshyar et al., 2020; Hsu et al., 2018; Liu & Jeong, 2022; Zhao & Shute, 2019), by engaging students in a more immersive game-based problem-solving experience (Israel-Fishelson & Hershkovitz, 2019).

However, depending on the various definitions used, different research studies may use different methods to operationalize, observe, and measure the development of students’ CT competency in K-12 settings (e.g., Atmatzidou & Demetriadis, 2016; Liu & Jeong, 2022; Turchi et al., 2019; Zhao & Shute, 2019). Few studies establish an effective method that teachers may use in classroom activities as part of curriculum requirements (Araujo et al., 2019; Chen et al., 2017), which can potentially more systematically take CT to scale in K-12 across the United States (Acevedo-Borrega et al., 2022). Furthermore, because CT is fundamental in computer science, numerous research studies have a predominant focus on computer programming concepts and skills (e.g., introductory computing concepts, Lai, 2021). However, CT skills are inclusive of (but not limited to) computer programming. Although these studies take some actions to reduce the complexities of the syntax of programming languages (e.g., block-based environments, Zhao & Shute, 2019), they draw more attention to learning computing concepts and skills than develo** CT competency, which students can use across the disciplines. Consequently, it is vital to conduct more research on the CT competency, which refers to the skills necessary for all learning fields and for solving problems systematically in daily life (Tsai et al., 2021), particularly in the field of GBL.

Therefore, the current study describes the implementation of game-based learning in middle school students CT competency that highlights a learning process of creative problem solving with less emphasis on programming. Considering the deficiencies in the prior studies in the context of GBL, which include (a) limited attention has been paid to self-efficacy toward CT competency; (b) the relation between learning and engagement is unclear; and (c) the influence of the individual differences on CT outcomes is inconclusive, we focus on the descriptive influence of this educational approach on the development of students’ CT competency, self-efficacy toward CT, and game-based engagement. Furthermore, we explore how these outcomes are moderated by students’ individual characteristics. Finally, we make some recommendations for teachers’ implementations in the class as well as future scholarly work for researchers in the field of using GBL to nurture CT competency.

Literature review

Computational thinking in K-12 settings

The concept of CT is not new, and it was originally introduced by Seymour Papert (1980) who pioneered it to represent the relation between programming and thinking skills. The essence of his argument toward CT is constructing a concrete reference for an abstract mathematical concept through performing computational means (Lodi & Martini, 2021). In recent years, Jeanette Wing (2006) brought CT back to public attention by characterizing it as “solving problems, designing systems and understanding human behavior that draws on concepts fundamental to computing” (p. 33). Wing’s seminal work laid the foundation of CT as thought processes that comprise problem identification and solution formulation—and can be effectively performed by using computational methods (Cuny et al., 2010). Wing’s definition (2006), as well as the follow-up extended definition proposed by Cuny et al. (2010), paved the way for CT to be viewed as a basic competency that benefits everyone in their daily life, not just computer scientists (Hooshyar et al., 2020; Zhang & Nouri, 2019).

Although the importance is widely understood, there is no universal consensus on the definition of CT (Shute et al., 2017; Turchi et al., 2019; Zhang & Nouri, 2019). Correspondingly, different aspects of CT are operationalized, observed, and quantified in different research investigations, depending upon the terminology used and the environment in which teaching and learning take place. So far, there are three major categories of operational definitions in the literature of using computer-based approaches to teaching and learning CT. In the first category, CT is operationalized as introductory computing concepts (i.e., concepts and skills for coding), such as basic sequence, conditional structure, and loop structure (Ayman et al., 2018; Eagle & Barnes, 2009; Esper et al., 2014). These approaches (e.g., CodeSpells, MiniColon, EleMental) attempt to simulate a culture of computer science by engaging students in completing tasks with textual-based, industry-standard language (Esper et al., 2014). For instance, in the Game2Learn project, Chaffin et al. (2009) investigated the impact of a game called EleMental on students’ learning and attitudes toward the game. When playing EleMental, students need to write code to learn recursion through depth-first search of a binary tree.

The second category highlights the paramount role of creative problem solving, with emphasis in the usage of computer science across disciplines (Wing, 2006, 2011). In these studies, the operational definition of CT tends to draw on the concepts and skills fundamental to computer science (i.e., concepts and skills for programming), such as abstraction, algorithms, decomposition, testing, and debugging (Grover & Pea, 2013; Liu & Jeong, 2022; Zhang & Nouri, 2019; Zhao & Shute, 2019). The primary learning activity in the studies is solving problems related to programming by dragging and drop** various blocks (e.g., Scratch, Alice). For example, Werner et al. (2012) reported the assessment of CT in 311 middle school students using a Fairy Performance Assessment tool. They used the Alice programming environment, developed at Carnegie Mellon University, for engaging middle school students in CT. Alice is a 3D gaming environment using drag-and-drop programming, which is closely related to the modern imperative programming languages (e.g., Java).

The third cluster still underscores that CT is a thinking process of creative problem solving but with less emphasis on programming (Barr & Stephenson, 2011). Specifically, the International Society for Technology in Education (ISTE) and the Computer Science Teachers Association (CSTA) jointly provide an operational definition of CT that is associated with, but not limited to, the following characteristics: (1) formulating problems that can be solved with the use of computing devices and tools; (2) organizing and analyzing data logically, (3) representing data abstractly; (4) automating solutions through algorithmic thinking; (5) solving problems efficiently and effectively; and (6) implementing the problem solving process to a wide range of situations and domains (ISTE & CSTA, 2011).

Based on the third operational definition proposed, CSTA along with industry leaders, such as Amazon, Google, and Apple, designed and developed the CSTA K-12 Computer Science Standards (revised version, 2017) by reflecting the competencies and skills students need to be computationally literate creators. It is most widely cited by state representatives as the guide for the development of their state computer science standards. In 2018, twenty states reported adopting computer science standards. More states will likely adopt these or similar state computer science standards in the next five years. Given its widespread use among computer science teachers and researchers (Chen et al., 2017), we adopted the third definition to operationalize CT competency in the current study, develo** an approach that teachers can implement in their classroom practices. Because of this direct relation to curricula by using CSTA, our approach has the potential to systematically take CT to scale in K-12 classrooms across the United States.

Develo** computational thinking through game-based learning

Game-based learning (GBL) in education refers to the integration of instructional content and learning objectives into the structure of digital games, which work as pedagogical tools to enhance the learning experience (Gee, 2003; Prensky, 2002). It is considered an effective platform that could provide students with experiential learning opportunities by engaging them in meaningful problem-solving processes in an interactive contextual simulation (Gee, 2003; Ke, 2016; Prensky, 2002). Many studies have provided empirical evidence to show that GBL can bring about both cognitive and non-cognitive benefits, including metacognition, knowledge acquisition, skill development, motivation for content learning, and attitude change (Gee, 2003; Ke & Clark, 2020; Pan & Ke, 2023; Pan et al., 2022; ter Vrugte & de Jong, 2017).

Multiple researchers have employed GBL to develop students’ CT, and many reported that GBL was associated with significant increases in the acquisition of knowledge, skills, and abilities corresponding to CT (Liu & Jeong, 2022; Zhao & Shute, 2019). For instance, Esper et al. (2014) recruited 55 4th grade students to play a 3D immersive video game called CodeSpells, which was designed to engage students in entry-level computing concepts using Java code. They reported that after 1 hour of gameplay per week across 8 weeks, students were able to understand and write basic Java code. In addition, Zhao and Shute (2019) designed and developed a block-based programming game Penguin Go to foster middle school students’ CT skills. In their study, CT skills were measured by programming skills, which included algorithmic thinking, conditional logic, and debugging. After playing the game for 110 minutes, all the 43 8th graders demonstrated increases in their measured CT skills, but not on students’ attitudes toward computer science. Inspired by this study, Liu and Jeong (2022) investigated the effects of learning supports embedded in Penguin Go on 79 4th-6th grade students’ CT knowledge acquisition in near and far transfer tasks. The learning support was delivered as prompts in the form of worked-out examples. Though there were no significant differences on students’ CT performance between the treatment group (i.e., the game with learning supports) and the control group (i.e., the game with no learning supports), all the students’ CT skills were improved significantly at the near transfer test but not at the far transfer test. As a result, Liu and Jeong (2022) highlighted the importance of hel** students transfer the implicit knowledge into the explicit knowledge in a complex GBL environment.

The ways that these educational games are designed reflect the evolution of CT competency. In the early stage, the purpose of these educational games (e.g., Wu’s Castle, CodeCombat, and CodeSpell) was to teach programming, particularly focused on coding skills. Therefore, students were required to understand syntax to achieve the game challenges. When researchers realized a misalignment between coding and CT competency (Kazimoglu et al., 2012), they adopted a block-based programming language to replace a text-based programming language to reduce extraneous cognitive load for novice coders. These games aimed to enhance students’ problem-solving skills, which are closely associated with the fundamental skills of programming (e.g., Penguin Go), such as abstraction, algorithm, decomposition, testing and debugging (Grover & Pea, 2013; Liu & Jeong, 2022; Zhang & Nouri, 2019; Zhao & Shute, 2019). This shift makes the possibility of a low threshold, high ceiling for all students who do not have prior knowledge of programming.

However, it might be incomplete to capture the whole picture of the development of CT through assessing the learning concepts and skills that related to programming (Hooshyar et al., 2020; Zhang & Nouri, 2019). To address this gap, the current study investigated the effects of GBL on students’ CT competency, which they can use across disciplinary boundaries in accordance with curriculum requirements.

Minecraft as a game-based learning tool

Minecraft is a sandbox construction game that allows players to explore, build, and survive in a virtual landscape (Brand & Kinash, 2013; Thompson, 2016). Minecraft Education was invented to build a learning platform for children by extending many additional features that make it more useful and appropriate in K-12 settings (Nkadimeng & Ankiewicz, 2022). Therefore, many researchers believe that Minecraft Education holds affordances for learning both concrete and abstract content (Andersen & Rustad, 2022; Nkadimeng & Ankiewicz, 2022). Specifically, Minecraft Education provides opportunities for crafting and decomposing items, which students may not experience in real classroom practices. For instance, students can make various items through a crafting table based on the game rules (see Fig. 1a). A material reducer (e.g., resource recycler) breaks down items into raw materials used to build the item (see Fig. 1b).

Fig. 1
figure 1

Screenshots of a Crafting Table and a Resource Recycler in Minecraft

Extant research studies show the potential uses of Minecraft in supporting education for multiple disciplines, including mathematics, history, language, science, engineering, and computer science (Al-Washmi et al., 2014; Karsenti & Bugmann, 2017; Marnewick & Chetty, 2021; Nebel et al., 2017; Nkadimeng & Ankiewicz, 2022; Zorn et al., 2013). For example, Nkadimeng and Ankiewicz (2022) explored students' perceptions of using Minecraft Education for learning the topic of atomic structure in the science coursework at a junior high school. Twenty 8th grade students participated in five 1-h gameplay sessions in Minecraft Education. The authors reported that Minecraft holds some affordances for making atomic structure less abstract for students. Their findings also indicated that students were motivated and challenged to think critically while solving game-based tasks collaboratively. Additionally, a few studies show the benefits of using Minecraft to promote students’ skills in computer programming. For instance, Zorn et al. (2013) developed a block-based programming language (i.e., CodeBlocks) to navigate a virtual robot in Minecraft. They found that participants with no prior programming skills demonstrated improved perceptions of programming after playing the game.

Although multiple research studies have been conducted to explore the affordance of Minecraft for education, research that uses Minecraft to promote students’ CT competency in K-12 settings is limited. More research is needed to examine whether students’ CT competency will be enhanced through the interactions in Minecraft, such as crafting and decomposing items.

Computational thinking self-efficacy

According to Bandura (1986), self-efficacy is characterized as people’s judgements of their capabilities to plan and carry out the actions necessary to achieve specific types of performances. It is widely acknowledged to be related to general academic learning success and performance of learning particular abilities (Bergey et al., 2015; Pajares, 1996). Self-efficacy influences the decisions made regarding activities engaged in, the amount of effort put forth, and perseverance in dealing with challenges to solve problems and complete tasks (Bandura, 1994). As a result, self-efficacy is normally considered as an important factor that influences students’ performance in computer-based learning contexts (Moos & Azevedo, 2017).

Multiple attempts have been made to explore the relation between self-efficacy and programming competencies in the process of coding (Askar & Davenport, 2009; Chiu & Tsai, 2014; Durak, 2018; Komarraju & Nadler, 2013). For example, Durak (2018) conducted a quasi-experimental study to investigate the effects of digital story design activities in teaching programming concepts. Sixty-two 5th grade students were enrolled in this 10-week learning process. The results indicated that students’ programming self-efficacy in the experimental group significantly changed. Similarly, Ramalingam et al. (2004) investigated the effects of mental models and self-efficacy on programming performance. According to the results of the study, individuals’ mental models and self-efficacy significantly predicted their programming performance. In a yearly STEM outreach program, Feldhausen et al. (2018) developed two distinct interventions—“Saving the Martian” and “Mighty Micro Controllers”—to foster students’ CT self-efficacy. Saving the Martian targets 5th and 6th grade students by introducing CT through unplugged activities while Mighty Micro Controllers focuses on 7th to 9th grade students by teaching CT through programming. They employed a CT self-efficacy survey consisting of the constructs of “problem solving” and “writing computer programs” to assess gains. They found significant gains in students’ CT self-efficacy only for the “writing computer programs” construct, not for “problem solving”.

In summary, most of these past research studies were related to self-efficacy with teaching and learning computer programming competencies; little attention has been paid to self-efficacy toward CT competency, which centers on a learning process of creative problem solving with less emphasis on programming. As such, the present study adds to the literature by examining how students’ CT self-efficacy changes after playing a game designed to build CT competency, which students can then utilize across different subjects.

Game-based engagement

Engagement refers to an array of mindful, goal-oriented behaviors and endeavors performed to demonstrate a learner’s participation in cognitively demanding task-oriented activities (Ke et al., 2016; Wiebe et al., 2014). It is believed that, compared to an unengaged learner, an engaged learner will demonstrate more positive attitudes toward the learning process (Meece et al., 1988), stronger persistence on mastering new knowledge and skills (Israel-Fishelson & Hershkovitz, 2019), better efficiency of completing learning tasks (Moon & Ke, 2020), as well as more active and deeper learning with subject matter (Rowe et al., 2011). Correspondingly, engagement lays the foundation for an optimal gaming experience and supports cognitive learning processes in GBL (Ke et al., 2016; Rowe et al., 2011).

When considering engagement from the vantage point of human–computer interaction, prior research studies organized individual engagement into pragmatic and hedonic qualities of the user experience (Hassenzahl et al., 2010; Wiebe et al., 2014). The pragmatic facet refers to the utility and usefulness of a computer-based system for a learning activity (Hassenzahl et al., 2010; Shneiderman, 2010). The hedonic facet is characterized as users’ enjoyment and fun while using the system. This hedonic facet is normally associated with the perceived aesthetics of the computer-based environment and the dimension of play in game-based environments (Wiebe et al., 2014). Besides the pragmatic and hedonic qualities, game flow is considered as a crucial construct that helps understand the individual experience in game-based environments and as an approach to interpreting the user engagement when they are engaged in solving game-based problems (Boyle et al., 2011; Wiebe et al., 2014). It refers to a player’s state of full immersion in the game they are playing and is based on Csikszentmihalyi’s flow theory (1975).

As it relates to GBL, engagement is typically conceptualized as an affective state, a cognitive involvement, or a content processing (Ke et al., 2016; Klimmt et al., 2007; Klopfer et al., 2009; Rotgans & Schmidt, 2011). Affective engagement is motivated by a need for success and the right amount of challenge (Klimmt et al., 2007). Cognitive engagement takes place when players engage in general cognitive tasks or thinking (Klopfer et al., 2009; Rotgans & Schmidt, 2011), including goal-setting, self-control, or involvement in learning. Content engagement occurs when players build up positive value systems, which motivates them to digest and remember the information, knowledge, and skills related to subjects (Ke et al., 2016). According to the claim made by Ke et al. (2016), while affective and cognitive engagement are positively related to learning processes, there is a negative correlation between these two types of engagement with content engagement.

However, the relation between learning and engagement in the game-based environment is still inconclusive (Ke et al., 2016; Rowe et al., 2011). Some research studies suggest that there is a positive relation between learning outcomes and increased engagement in GBL (Israel-Fishelson & Hershkovitz, 2019; Moon & Ke, 2020; Rowe et al., 2011). On the contrary, multiple researchers find that students may engage in completing a task (e.g., winning the game, curiosity about the narrative) in a game-based environment but not necessarily initiate the engagement with the instructional content (Whitton, 2007). Consequently, some speculations have been proposed with respect to such contradictory observations. For instance, games’ unique features, such as fantasy, narrative, reward, and sensory stimuli, may play crucial roles in making play experience fun but have limited influence on content learning (Ke et al., 2016). Therefore, more research studies are warranted to investigate the relation between learning subjects and engagement in game-based contexts.

The influence of individual differences

Individual differences, such as gender, age, and prior gaming experience, are acknowledged to be moderating variables that influence students’ CT learning outcomes (Atmatzidou & Demetriadis, 2016; Durak & Saritepeci, 2018; Hill et al., 2010). Some attempts have been made to investigate the relation between individual differences and learning processes of CT concepts and skills (Atmatzidou & Demetriadis, 2016; Durak & Saritepeci, 2018; Tsai et al., 2021). Nevertheless, the existing research studies are inconclusive, supporting that the influence of individual differences on CT outcomes is complex.

Some studies have shown that gender influences CT skills development (Román-González et al., 2017). For example, when Roman-Gonzalez and colleagues (2017) attempted to provide a new instrument for measuring CT in 5th–10th grade students, the level of CT skills differed across gender, with a low-medium effect (Cohen’s d = .31). On the contrary, some studies have demonstrated that there is no relation between CT skills and gender (Atmatzidou & Demetriadis, 2016; Tsai et al., 2021; Werner et al., 2012). In a study of educational robotics with 164 middle school students, Atmatzidou and Demetriadis (2016) reported that students eventually improved their CT skills to the same level, regardless of their age and gender. However, they suggested that girls tended to need longer time to reach the same skill level of the CT competency as boys in the subscale of problem decomposition. Recently, Tsai et al. (2021) recruited 388 junior high school students in Taiwan to develop and validate a self-reported CT scale. They revealed that the overall CT disposition did not significantly differ by gender, but that boys reported higher levels of self-efficacy than girls in the subscale of problem decomposition.

The assumption that there may be a relation between grade levels and CT skill levels is supported by the fact that CT skills are closely related to cognitive development (Grover & Pea, 2013). Roman-Gonzalez and colleagues (2017) found a positive correlation between grade levels and CT skill levels. Their findings also suggested that the differences of CT skills between males and females increased along with the advanced grade levels. However, Durak and Saritepeci (2018) reported that there is a negative association between age (education levels) and CT skill levels.

Furthermore, it is widely hypothesized that prior gaming experience plays a vital role in influencing cognitive learning processes (Habgood & Ainsworth, 2011; Nkadimeng & Ankiewicz, 2022). However, few research studies have provided strong evidence to support this proposition. On the contrary, the previous research reported that there is only a small correlation between prior gaming experience and learning performance (Pan et al., 2023; Smith et al., 2020).

Overall, although there is a rapidly increasing interest in CT teaching and learning in K-12 settings, the literature on how students’ CT learning outcomes will be moderated by the individual differences is still relatively sparse (Atmatzidou & Demetriadis, 2016; Tsai et al., 2021), particularly in a complex GBL environment. The current study partially addresses this need by providing evidence related to how students’ individual differences, particularly gender, age (grade), and prior gaming experience, are descriptively associated with the development of students’ CT competency in a GBL environment.

Purpose of the study

As an initial exploratory study, it aims to investigate how GBL can improve students’ computational thinking competency, self-efficacy toward computational thinking, and game-based engagement. Additionally, we explore individual differences (i.e., gender, grade, and prior gaming experience) and examine how those variables influence students’ performance in the learning processes. Specifically, the current study addresses four research questions:

  1. 1.

    Did middle school students improve their computational thinking competency after playing the game?

  2. 2.

    Did middle school students improve their computational thinking self-efficacy after playing the game?

  3. 3.

    How did middle school students perceive their game-based engagement when solving game-based problems corresponding to computational thinking?

  4. 4.

    Did students’ computational thinking competency, computational thinking self-efficacy, and game-based engagement vary depending on their gender, grade, or prior gaming experience?

Methodology

Research design

To address the research questions, we adopted a one-group pretest-posttest research design to investigate the effects of game-based learning system on students’ development of CT competency, CT self-efficacy, and game-based engagement. Furthermore, we explored how these outcomes were moderated by individual characteristics.

Participants

We employed the purposeful convenience sampling strategy (Creswell & Poth, 2016) to recruit middle school students who were interested in using games to foster their CT competency in a summer camp at a non-profit community center in Texas in summer 2022. Among the recruited 31 participants, 16 (52%) students were male and 15 (48%) were female. Most students (29, 94%) identified as Hispanic. Twelve students were in 6th grade, 9 were in 7th grade, and 10 were in 8th grade.

Intervention design: Factory Craft

In the Minecraft gaming environment, modifications to the game can be implemented by developer teams to change the environment and mechanics of the game. These are typically known as “Mods” to the game and are given specific titles to reflect how they change gameplay. We adapted Mineplex’s Minecraft mod, called Lumberjack Tycoon, and developed a node-based planning application (i.e., Minecraft Factory Planner: MFP), which were integrated with a learning management system (i.e., Canvas). We integrated these three components to design and develop a game-based learning system called Factory Craft that teachers could use to monitor students’ learning (cf. Luo et al., 2021). Factory Craft is a first person, three-dimensional, multi-level role-player game that aims to enhance CT competency for middle school students. The content focuses on three CT concepts: Data and Analysis, Algorithms and Programming, and Impacts of Computing. The primary goal of the game is to construct objects efficiently and successfully (e.g., contraptions) to protect a hypothetical planet from the damage caused by an upcoming meteor shower. Contraptions are a new Minecraft game element that our team developed to automate certain mining and construction tasks that a player would normally need to perform manually. These contraptions help students focus on design and engineering rather than spending large amounts of time collecting and refining materials in the game. Students take the role of engineers who help build sustainable, intricate, and automated factories on a planet named Tiny Town. In the present study, we focus on three game levels in which students must fulfill orders by crafting and refining the resources (see Fig. 2).

Fig. 2
figure 2

Screenshots of Learning Environment in the Selected Three Game Levels

We used Computer Science Teachers Association (CSTA) K-12 Computer Science Standards (2017) to guide the processes of designing and develo** each game level. The tasks examined in the selected three game levels targeting the concept of Data and Analysis in CSTA standards (2017) for 6th-9th grade students which includes: using encoding schemes (2-DA-07), collecting data using computational tools (2-DA-08), and building and refining computational models (2-DA-09). Table 1 displays detailed information about how we aligned game objectives, learning objectives, CSTA standards and CT components. For example, in Lesson 2, students are required to build various models of contraptions to produce objects and explore ways to build objects more efficiently with minimal waste to address the community needs. Students need to extract the essence of a complex system through data collection, analysis, pattern recognition, and modeling, which reflects abstraction—one of the CT learning processes. More specifically, students must precisely define the components—such as inputs, crafters, and outputs—needed to construct distinct contraptions. In the MFP, students drag and drop the useful and relevant material as input and the corresponding objects as output from the dashboard to make the contraption, which can be used in Minecraft. Afterward, students identify the algorithms governing the contraption model automation. For instance, to build a model for a chest contraption that can produce 10 chests per hour, the contraption model should have 20 logs per hour as input to produce 80 planks as the output, which can subsequently be used to craft 10 chests (see Fig. 3).

Table 1 Samples of associations between game objectives, learning objectives, CSTA standards, and CT facets in the selected three game level
Fig. 3
figure 3

An Example of Building a Model of Chest Contraption in MFP

Procedure

The study included six 50-minute sessions as depicted in Fig. 4. In Session 1, the participants completed a demographic survey, a pretest assessment of CT competency, and a CT self-efficacy survey. In Sessions 2–5, the participants played the assigned game levels individually for 40 minutes, then completed a game-based engagement survey in approximately 5–10 minutes. In Session 6, the participants completed a posttest assessment of CT competency and CT self-efficacy survey. The assigned game levels included a Tutorial on playing Minecraft, Lesson 1: Data Collection, Lesson 2: Data Encoding, and Lesson 3: Data Sorting. The total time spent on gameplay was approximately 160 minutes. Three researchers facilitated the gameplay for students.

Fig. 4
figure 4

The Gameplay and Measurement Timeline

Instruments

A demographic questionnaire was administered to collect: (1) background information (i.e., gender, grade, ethnicity); and (2) prior gaming experience with Minecraft. To better identify participants’ level of expertise of playing Minecraft, we questioned participants about how long they had played Minecraft and how many hours a week they typically spend in playing Minecraft. Participants with limited Minecraft experience were those who said they had not yet played the game or had only played a little amount (i.e., no more than 1 year) and never keep playing it anymore; participants with sufficient Minecraft experience were those who said they had played frequently or extensively with at least 1 year gameplay.

CT competency was measured using test items released from the National Assessment of Education Progress (NAEP). The selected six items were aligned with the CSTA standards (2017) for middle school students (see Table 2). Items were formatted as multiple-choice, fill-in the blank, and drag-and-drop (Fig. 5). Four items had multiple sub-questions embedded within the overarching question. In total, the CT Competency measure included 27 test items and sub-items. Items were scored dichotomously. Scores represent the sum of questions answered correctly with a maximum score of 27. The internal reliability for the study test items achieved satisfactory to good levels on both the pretest (Cronbach’s alpha = .77) and posttest (Cronbach’s alpha = .80).

Table 2 Alignment between CSTA standards and test items selected from NAEP
Fig. 5
figure 5

An Example of Multiple-Choice Test Items Used Adopted from Nation’s Report Card Questions Tool (NAEP)

We adopted the Computational Thinking Self-Efficacy Survey developed and validated by Weese and Feldhausen (2017), aiming to measure students’ self-efficacy in problem solving pertinent to algorithms, problem decomposition, parallelization, data, control flow, and programming. The original survey consisted of 23 items in total. We excluded items related specifically to programming (14 items). As such, the scores of the Computational Thinking Self-Efficacy Survey represent the average across 9 items (e.g., “When solving a problem, I break the problem into smaller parts.”). Students responded to each item on a five-point Likert scale, ranging from “Strongly Disagree” to “Strongly Agree”. The internal reliability of the test items achieved satisfactory levels on both the pretest (Cronbach’s alpha = .76) and posttest (Cronbach’s alpha = .78).

In addition to the self-efficacy and CT competency measures, students completed the User Engagement Scale (Wiebe et al., 2014) each day after playing the game. The survey includes 28 items with four subscales: focused attention, perceived usability, aesthetics, and satisfaction. Focused attention is based on flow theory including concentration, absorption, and temporal dissociation. Perceived usability focuses on affective (frustration) and cognitive (effortful) aspects of the game play. Aesthetics focuses on the game’s visual appearance. Satisfaction measures the extent to which the experience was fun and interesting. Scores on this measure represent the average across the items assigned to the scale. Students responded to each item using a five-point Likert scale, ranging from “Strongly Disagree” to “Strongly Agree”. The internal reliability of each test item used in the study with repeated measures ranged from .87 to .94 across time points, indicating good to excellent levels of internal consistency.

Data analysis

In our analysis, we compare the pretest and posttest outcome scores measured by the CT Competency and the Computational Thinking Self-Efficacy Survey. We used either the Wilcoxon Signed-Rank test or a paired samples t-test to measure differences (median or mean, respectively) depending on whether the assumptions of normality and homogeneity of variances were violated. When either constraint was violated, we opt to use the Wilcoxon Signed-Rank test. We used a one-way repeated measures factorial analysis of variance (ANOVA) to measure changes across sessions in game-based engagement based on the scales included on the User Engagement Scale. The assumption of sphericity was assessed using Mauchly’s test, and the Greenhouse–Geisser correction was applied when the assumption was violated. In addition, we performed a mixed factorial repeated measures ANOVA with Bonferroni-adjusted post hoc tests to compare group score differences with respect to gender, grade, and prior gaming experience.

Results

CT competency

Table 3 displays the results of the Wilcoxon Signed-Rank test and paired samples t-test on pretest–posttest outcome scores measured by the NAEP test items. The overall difference in students’ computational thinking competency was not statistically significant, z = -1.59, p = .111. The median CT score was 17.0 in the pretest and 19.0 in the posttest. General increases were observed for students across the subgroups, but the increases were only statistically significant for male students, z = -2.05, p = .041, r = .37.

Table 3 Results on pretest–posttest measures for knowledge acquisition

CT self-efficacy

Table 4 displays the results of the Wilcoxon Signed-Rank test and paired sample t-test on pretest–posttest outcome scores for CT self-efficacy. We observed that students’ self-efficacy toward CT significantly increased after the gameplay experience, z = −2.95, p = .004, r = .41. The median CT self-efficacy score was 3.2 in the pretest and 4.1 in the posttest. There were statistically significant increases in self-efficacy for students with limited Minecraft experience, z = −2.09, p = .037, r = .45, and students entering Grade 8, z = −2.37, p = .018, r = .63. We also observed that female students increased in CT self-efficacy, t(11) = −2.53, p = .028, d = .58; whereas male students did not increase significantly, t(13) = −2.05, p = .061, d = .78.

Table 4 Results on pretest–posttest measures for CT self-efficacy

Game-based engagement

Descriptive statistics for engagement across subscales are presented in Table 5. Generally, among the four subscales, students tended to report Aesthetics and Satisfaction with the game higher than the other two subscales.

Table 5 Means, standard deviations, and repeated measures ANOVA in game-based engagement

A one-way repeated measures ANOVA determined that mean Focused Attention did not differ statistically significantly across the four lessons (i.e., time points), F(3, 54) = 0.48, p = .697, η2 = .03. The other three times of running the one-way repeated measures ANOVA with a Greenhouse–Geisser correction indicated that there were no significant differences across the four time points with respect to the mean scores in the subscales of Aesthetics, F(1.878, 33.803) = 1.58, p = .222, η2 = .08, and Satisfaction, F(1.969, 35.436) = 1.60, p = .216, η2 = .08, but not Perceived Usability, F(2.034, 36.621) = 3.34, p = .046, η2 = .16. The follow-up Bonferroni-adjusted post hoc tests revealed that there was a significant change in Perceived Usability at Tutorial and Lesson 2 (p = .025). Figure 6 depicts the trends in students’ perceived usability changes. Overall, students reported consistently decreasing perceived usability across the previous three game lessons, from a mean score of 3.8 to 2.9. The mean score increased slightly from Lesson 2 to Lesson 3, reaching a value of 3.0.

Fig. 6
figure 6

Mean Perceived Usability Across Lessons on the Game-based Engagement Survey (n = 19)

Multiple mixed factorial repeated measures ANOVAs were performed to investigate whether there were combination effects between different group variables (i.e., gender, grade, and prior gaming experience) and time points in the four subscales of engagement. There was a significant interaction between different lessons (i.e., time points) and gender, F(2.127, 36.158) = 5.47, p = .007, η2 = .24. Following the interaction, we found that there were no significant differences between male and female groups in Perceived Usability, F(1, 17) = 0.70, p = .414, η2 = .04. However, there were marginally significant differences between male and female students at Lesson 1 (p = .044) and Lesson 3 (p = .050). Figure 7 illustrates the mean Perceived Usability for male and female students across the four lessons. The mean scores of female students decreased constantly from Tutorial to Lesson 2 but there was an increase from Lesson 2 to Lesson 3. It was almost the same between the mean scores of male students from Tutorial to Lesson 1, but it decreased over time from Lesson 1 to Lesson 3. We did not observe the other significant interactions between group variables and time points in the other subscales of engagement, as well as the main effects of these variables, respectively.

Fig. 7
figure 7

Mean Perceived Usability for Male and Female Students (n = 19)

Discussion

As an initial exploratory study that aimed to gather preliminary data to assess feasibility before potentially expanding the study, the present study adopted a one-group pretest–posttest research study approach to investigating how game-based learning could influence middle school students’ CT competency, CT self-efficacy, and game-based engagement. Additionally, we examined how these outcomes were moderated by students’ individual differences (i.e., gender, grade, and prior gaming experience). More detailed discussions related to each research question are presented in below.

RQ 1. computational thinking competency

This study found no statistically significant differences in overall students’ CT competency as measured by NAEP released items. One possible reason as to why the in-game learning experience did not produce better learning performance is that it was difficult for students to transfer the obtained knowledge of CT in gameplay to the real-world application. This finding is consistent with previous findings that students were likely to acquire implicit knowledge in GBL environments (e.g., Liu & Jeong, 2022; ter Vrugte & de Jong, 2017). More efforts should be made to assist students in transferring the implicit knowledge into the explicit knowledge that they can use to solve problems in the real world (Pan & Ke, 2023; ter Vrugte & de Jong, 2017; Tikva & Tambouris, 2022), especially when using far transfer test items as a post-game assessment (Liu & Jeong, 2022). Second, it is important to note that exposure to the intervention is likely not sufficient for students to master all the CT knowledge. Three sessions with 40 minutes per game level was relatively short. A longer gameplay might be required for significant gains in such a transfer task. This finding aligns with arguments made by some prior research studies, suggesting that a short intervention may not adequately capture the cognitive development of CT learning (Brennan & Resnick, 2012; Guggemos, 2021), particularly in CT practices (Nouri et al., 2020; Wohl et al., 2015).

RQ 2. computational thinking self-efficacy

The results showed that, after playing the game, students’ overall CT self-efficacy was statistically significantly higher than before the game. It is consistent with previous research studies that have reported that GBL influences self-efficacy gains (Hu et al., 2022; Pan et al., 2022). Similarly, as with other studies utilizing unplugged activities or computer-based learning tools revealed positive self-efficacy gains (Askar & Davenport, 2009; Chiu & Tsai, 2014; Durak, 2018; Komarraju & Nadler, 2013), the study’s findings provide confirming evidence that GBL may positively impact students’ self-efficacy toward CT. Therefore, this study further contributes to the literature by adding that learning CT in a GBL environment may also positively influence students’ CT self-efficacy.

However, this finding is partially contradicted by the study (Feldhausen et al., 2018) from which we adopted the CT self-efficacy survey. Several explanations may contribute to why they found significant gains in students’ CT self-efficacy in the construct of “writing computer programs” but not in the construct of “problem solving”. Firstly, Feldhausen et al. (2018) focused more on learning CT through programming, whereas we emphasized mastering broader CT competency through game-based problem solving. Secondly, compared to our study, they had a relatively longer and more complex intervention design. Finally, the original survey consists of 23 items, including the constructs of “problem solving” and “writing computer programs”, whereas we did not measure the construct of “writing computer programs” in the CT self-efficacy survey.

In summary, these findings offer empirical support for the claim of the importance of students’ self-efficacy in situations involving GBL (Moos & Azevedo, 2017). We suggest that more empirical studies should be conducted to investigate the relation between CT self-efficacy and CT attainment in complex GBL environments.

RQ 3. game-based engagement

Overall, the results suggest that students in the present study had relatively high and positive engagement during the gameplay across the four game sessions. Their perceptions of game-based engagement were consistently reflected by hedonic, pragmatic, and flow qualities of the gameplay (Hassenzahl et al., 2010; Shneiderman & Plaisant, 2010; Wiebe et al., 2014), which were measured by four subscales in the study, including focused attention, perceived usability, aesthetics, and satisfaction.

Specifically, we found that students generally perceived enjoyment and fun during the game play because they were satisfied with the visual elements of the interface in the game world and enjoyed their playing experience in a GBL environment. It demonstrates that our participants found Minecraft’s aesthetics to be appealing. It also suggests that participants enjoyed using the game’s crafting, mining, sorting, and organizing features. These findings provide evidence supporting prior studies indicating that the aesthetics of the game world and game mechanics in Minecraft have the potential to attract students’ interests and enthusiasm, further motivating them to engage with the platform (Nkadimeng & Ankiewicz, 2022). Likewise, participants conceived positive game flow experience consistently while playing at all four game lessons. Furthermore, the results showed that students reported statistically significant decreases in perceived usability across lessons in general. The findings suggest that students initially experienced a sense of ease and confidence upon starting the game, likely due to the tutorial’s focus on introducing fundamental game mechanics that were easily graspable. However, as the lessons progressed, their frustration or demands on the cognitive effort increased in response to the escalating level of difficulty, particularly when they need to master domain-specific knowledge related to CT. Furthermore, between Tutorial and Lesson 2, we noticed a considerable shift in perceived usefulness in the current study. These disparities could be explained by multiple factors. First, since this was the student’s first experience playing the MFP, they must learn the rules from scratch. Second, the MFP’s mechanics are distinct from those in Minecraft. By dragging and drop** the blocks from the dashboard to the canvas to create a model, students complete the assignments in the MFP. This would be an interesting and promising direction for our future research to explore the effects of integrating the underlying logic and sequence modeling behind the MFP with intelligent tutoring systems.

RQ 4. individual characteristics moderating

For CT competency, statistically significant gender differences were observed with boys scoring higher on CT competency after gameplay than girls, with a medium effect size r = .37. These gender differences are consistent with those found in previous research, which revealed a statistically significant difference in the CT scores in favor of the male group (Román-González et al., 2017). Regarding the finding that male students had better performance on CT competency than female students after gameplay, one possible explanation is that students, particularly girls, need sufficient time to fully develop CT skills. It is in line with the prior research study (Atmatzidou & Demetriadis, 2016), reporting that girls require more time to practice CT skills to reach the same skill level as boys in educational robotics. Even so, we would also point out that the observed difference in boys, while statistically significant, was observed to be an increase from a score of 16.3 to 17.9. This means that, on average, boys scored better by one to two questions overall which might not be impactful, despite its statistical significance.

For CT self-efficacy, our finding indicated that students’ CT self-efficacy was moderated by their gender, age, and prior Minecraft gaming experience. This finding potentially suggests that CT self-efficacy is more likely to be moderated by multiple individual differences. For instance, girls had a significantly positive change in dispositions of overall CT self-efficacy, which contradicted our hypothesis. We assumed that students who outperformed in CT competency would demonstrate better performance on self-efficacy toward CT according to the theory of self-efficacy (Bandura, 1986). In our study, boys performed better than girls in the CT competency test. One possible explanation for our finding is that our instrument, CT Competency, assesses learning outcomes, while CT Self-Efficacy Survey used in the current study is also able to examine thinking or learning processes, which are not easily observed (Tsai et al., 2021). Future work should be conducted to verify whether there is a gap existing between these two assessments with respect to the gender differences when they both are used to measure the development of CT.

In addition, students with limited prior gaming experience in Minecraft reported higher CT self-efficacy after playing the game. One possible reason for this observation may be that the inherent nature of GBL is solving problems (Dondlinger, 2007; Pan & Ke, 2023), which is a core part of CT competency (Shute et al., 2017). Thus, when learning about games (e.g., identifying game mechanics) is integrated into the process of develo** students’ CT competency (i.e., a process of problem solving), students’ self-efficacy toward CT as evaluated by their problem-solving skills can simultaneously be increased. Furthermore, there was a statistically significant change in CT self-efficacy in favor of the 8th grade students. The finding might provide some evidence that the development of CT is associated with the cognitive development and maturity of the subjects since CT is characterized as a problem-solving competency (Román-González et al., 2017).

When analyzing game-based engagement, at Lesson 1 and Lesson 3, we saw an intriguing difference in perceived usability between boys and girls. Girls tended to express a decreased view of perceived usability from Tutorial to Lesson 1, compared to boys who reported a similar level of usability in the Tutorial and Lesson 1, indicating that girls’ frustration or demands on the cognitive effort rose in Lesson 1. Such a phenomenon might be brought on by the absence of essential game-based CT knowledge, such as rules of making different materials, which are important for completing the tasks. In the study, more than half of girls (60%) had never or little played Minecraft while there was 44% of boys did not play Minecraft. The complicated learning environment may be overwhelming for students with no prior gaming experience in Minecraft, even though they were taught the basics of game mechanics in Tutorial.

In Lesson 3, boys had consistently decreased perception of usability, whereas girls reported increased perceived usability. It indicated that girls perceived less frustration or cognitive demand than boys on the game tasks in Lesson 3 that requested them to organize the chests by sorting a variety of materials based on similarities and differences. The relations between affective engagement, cognitive engagement, and content engagement may contribute to this phenomenon. The argument made by Ke et al. (2016) contends that while affective and cognitive engagement are positively associated with learning processes, there is a negative correlation between these two types of engagement when it comes to content engagement, which is typically associated with domain-specific learning processes. Our findings are consistent with this proposition. We observed that in Lesson 3, male students encountered the challenging tasks of organizing the chests that demanded heavily on domain-specific learning processes (e.g., pattern recognition), and further caused negative affective and cognitive engagement. That might explain why, compared to female students, boys perceived significantly higher frustration and cognitive demands when performing the tasks of organizing the chests in Lesson 3.

Limitations and future studies

Several limitations of the study should be taken into account in the interpretation of the results. First, our sample size was small, and we used purposeful sampling strategy which has inherent biases, to recruit participants who attended a summer camp at a non-profit community center. The majority of our participants were students of Hispanic descent. Thus, the results may not generalize to the student population at large and to learning that takes place in regular classrooms. Future studies can be replicated in school settings with whole classes to further examine the differences between GBL that takes place in and out of school. Second, the current study used a descriptive design that did not have a control group. Therefore, we suggest that future studies can be enhanced by increasing the duration of gameplay and conducting experimental studies with a control group to measure CT competency with large and diverse samples. Third, we acknowledge that due to the resource constraints (e.g., we did not employ an obtrusive approach that could measure participant’s engagement without interrupting game flow) in this initial study, there might be issues of potential response inaccuracies, especially concerning self-report data from middle school students. We suggest utilizing unobtrusive analytics or qualitative approaches that can triangulate self-reports with behavioral metrics in future studies. In addition, we did not conduct construct validity via factor analysis to formally measure the instrument of CT competency used in the study. Therefore, in our future work with an expanded CT instrument, we will incorporate tests of validity. Lastly, we strongly recommend that future studies can focus on investigating students’ in-game actions to better understand how students’ learning processes of CT competency could be enhanced during the problem solving in GBL.

Conclusion and implications

The preliminary study showed that there was a significant impact of game play on students’ CT self-efficacy, but not on CT competency, and game-based engagement. The findings showed that compared to age (grade) and prior gaming experience, gender tended to play a more important role in moderating middle school students’ CT competency, CT self-efficacy, and game-based engagement. Meanwhile, we found that boys had a significantly better test performance on CT competency, but girls had a significantly higher overall CT self-efficacy measured by problem solving skills. In addition, the findings indicated that students’ CT self-efficacy were influenced by their age and prior gaming experience.

Based on the study results, we formulated some recommendations for practical classroom implementations by teachers and for future scholarly work by researchers and scholars in the field. First, we propose that GBL serves as a viable alternative to some traditional paper-and-pencil activities for enhancing middle school students’ CT competency. Specifically, we suggest that teachers leverage GBL to boost CT self-efficacy, particularly among female students. Moreover, we recommend that teachers offer additional time for female students who have not fully grasped the requisite CT concepts and skills, allowing them to re-play the same game level or play an isomorphic game level as supplementary practice both in and out of school. Additionally, when the game mechanism is complex, teachers need to take more actions to maximize the learning engagement on subject matter rather than on the game rules and actions. This may involve providing more tutorial levels to assist students in becoming familiar with fundamental game operations. Lastly, we propose that researchers design the feature of learning support aimed at assisting students in transferring and transforming the learned implicit knowledge into explicit knowledge.