Introduction

In recent years, artificial intelligence (AI) has been widely applied in many areas including education (Luckin et al., 2016), with commensurate increases being seen in AI education (AIED) research and applications. AIED adaptive learning and evaluation applications are now being used to improve educational effectiveness and efficiency (Chassignol et al., 2018; Kurshan, 2016), evaluate teaching effect, adjust teaching and problem-solving strategies in real-time (Shute & Psotka, 1996), and provide a better understanding of student knowledge acquisition (VanLehn et al., 2007; Beal et al., 2010).

ITS research is multidisciplinary involving AI, pedagogy, psychology, and other related disciplines (e.g., Craig et al., 2004; Graesser et al., 2012; Hu & Cooper, 2014; Luckin et al., 2016). Sleeman and Brown’s (1982) early book Intelligent Tutoring Systems brought together different research fields, with the contributors coming from AI, cognitive science, and education fields (Luckin et al., 2016). Luckin et al. (2016) defined ITSs as computerized learning environments that incorporated computational models from the cognitive sciences, learning sciences, computational linguistics, AI, mathematics, and other fields, Graesser, Conley, et al. (2011), Graesser, Mcnamara, et al. (2011)) commented that ITS often incorporated pedagogical, psychological, and other cognitive learning theories into computational models, Cristina Conati (2002) noted that ITS research focused on advances in AI, cognitive science, and education to improve computer-supported education, and Ahuja and Sille (2014) commented that ITS research was an intersection of computer science, cognitive psychology, and educational research.

As early as the 1970s, AI systems were being used to provide individual, adaptive instruction (Conati et al., 2002a, 2002b; Luckin et al., 2016). The first ITS system, which was developed by Carbonell in 1970, was SCHOLAR (Dargue & Biddle, 2014), which was followed a few years later by several influential ITSs, such as the BIP designed by Stanford University in 1977 (Wescourt et al., 1977), WUMPUS developed by MIT in 1977 (Xu et al., 2009), SOPHIE (Sleeman & Brown, 1982), DEBUGGY (Sleeman & Brown, 1982), and AutoTutor (Graesser et al., 2005). The teaching effectiveness of these ITSs has been found to nearly parallel real-life teachers (VanLehn, 2011) and enhance the performances of both teachers and students (Spector et al., 2014). Therefore, ITSs are being regarded as the propeller for the future of human education (Luckin et al., 2016).

Conceptually, there is no clear boundary between ITSs and related concepts such as computer-assisted learning (CAL), computer-assisted training (CAT), and computer-assisted instruction (CAI) (Sleeman & Brown, 1982; VanLehn, 2006; Anderson et al., 1990). However, Conati (2009) claimed that the key difference between ITSs and CAI was that the solutions provided by ITSs were generated in real-time from student input, rather than having to be predefined, and Wenger (1987) and VanLehn (2011) commented that while CAI, CAL, and CAT were based on responses, ITSs were based on steps. In this study, CAI-related concepts have been excluded.

ITSs research has been multidisciplinary, with the key research associations being computer science and computer technologies and the core of the research being the development of student models (Desmarais & Baker, 2012) that digitize student abilities and allow them to access personalized instruction (Conati & Kardan, 2013) matched to their aptitude (Graesser et al., 2012; Hu & Cooper, 2014; Vandewaetere et al., 2011). The benefits of these ITS student models and other components are that they promote personalized learning, provide real-time learning analysis, use self-adaptive content, and designate targeted practice (Liu, 2003). Corbett and Anderson (1994) first proposed a knowledge tracing (KT) model based on a hidden Markov model to identify changing cognitive states during knowledge acquisition by analyzing student data and predicting future performances.

Another crucial research field has been ITS applications (D’Mello et al., 2007) such as AutoTutor, which has also included comparative studies that compared ITS learning effects with other forms (VanLehn et al., 2007) to assess effectiveness. Some studies found AutoTutor to be as effective as a human tutor in computer-mediated conversations (Graesser et al., 2003; Person et al., 2001), and others identified the specific factors related to the gains in deep-level comprehension (Graesser et al., 2009; Baker et al., 2010). These early ITS research studies attracted greater research interest and the development of other ITS applications, such as Coh-Metrix and Cognitive Tutor (Graesser et al., 2014; Graesser, Conley, et al., 2011; Graesser, Mcnamara, et al., 2011; McNamara et al., 2010, 2014; Pane et al., 2014).

Scholars in psychology provided theoretical tools and was of great significance to this kind of research (Arroyo et al., 2009; D’Mello et al., 2007). For example, Kort et al. (2002) proposed a comprehensive four-quadrant model that explicitly linked learning and affective states), Anderson proposed an adaptive control of thought (ACT) cognitive theory (Anderson, 1980, 1983), which became the theoretical basis for the popular Cognitive Tutor ITS system, and others have used ITSs to identify the different influences that different affective/cognitive states have on learning effects. (Craig et al., 2004; Koedinger & Corbett, 2006; Woolf, 2009; Arroyo et al., 2009; Lester et al., 2013; D’Mello & Graesser, 2011).

Pedagogical researchers have focused on designing teaching strategies to achieve better teaching effects (Kolodner, 2002; Luckin et al., 2016; VanLehn, 2006). For example, Graesser et al. (2005) reviewed the pedagogical strategies that embedded constructivist approaches in ITS instruction, finding that the learning effect was negatively correlated with boredom and positively correlated with flow, and also explored the relationships between the emotional state and the learning process. Other pedagogical aspects have been examined to improve the effectiveness and efficiency of education, such as game-based learning strategies (Lester et al., 2013; Tsay et al., 2018; Santhanam et al., 2016), collaborative learning environments (Chi et al., 2001), and intelligent narrative technologies (McCoy et al., 2011; Yu & Riedl, 2012).

Given all this interest over several decades, there have been a number of previous ITS research reviews, such as systematic reviews focused on ITS composition, current research foci, and current trends (Almasri et al., 2019; Akbulut & Cardak, 2012; Baker et al., 2008; Schmidhuber, 2015), and reviews focused on technology/evaluation methods in different ITSs (Desmarais & Baker, 2012; Elham et al., 2018; Conati, 2002a, 2002b). In a recent review, Elham et al. (2018) concluded that the most frequent AI techniques applied to ITSs were action condition rule-based, Bayesian networks, and data mining. Other reviews have given suggestions on specific teaching mechanisms, such as strategic decision making based on student emotions (Sharma et al., 2014; Conati, 2002). The most promising ITS research trends have been identified as portable devices (Elham et al., 2018; Ahuja & Sille, 2014) and collaborative learning (Isotami & Misogici, 2008), with others redefining the purpose of using ITSs as learning devices that are unable to be effective without human guidance rather than systems to improve effectiveness and achievement rates (VanLehn, 2011; Woolf, Lane, et al., 2013; Woolf, Chad Lane, et al., 2013). Another common ITS research review type has been meta-analyses based on quantitative systematic reviews, most of which have compared the learning effects of different ITS systems, the teaching methods, or the learning conditions (VanLehn, 2011; Graham et al., 2015; Steenbergen-Hu & Cooper, 2014; Ma et al., 2014; Kulik, 2015).

The review conducted in this paper employed a scientometric method, which uses bibliometrics such as citation analyses to evaluate the scientific research activities that guide science-policies (Egghe, 2005). Nalimov and Mulchenko (1971) coined the term “scientometrics” in the 1960s to describe the growth, structure, interrelationships, and productivity in scientific research: (Hood & Wilson, 2001), after which scientometrics was primarily employed to analyze research literature based on the attributes in the research itself, such as the number of publications, keywords, or other dynamic indicators such as citation information. Compared with meta-analyses or systematic reviews, which usually require detailed coding or the weighting of research content based on human judgment, the scientometric method automatically calculates and demonstrates the information based on the publication attributes, that is, it is an analysis method based on data and algorithms.

Scientometrics can provide a quantitative and systematic review of all aspects of ITS research, such as the publication and disciplinary states, the particular research issues, the intellectual structures, and the emerging trends. While previous research has discussed the multidisciplinary composition of ITS research, it has not provided a discipline construction path (Craig et al., 2004; Koedinger et al., 2012; Woolf, 2009). Therefore, this review will reveal the path of the multidisciplinary integration of ITSs and the proportions of the composition. The proportions involved in the composition of ITSs researches and the path of its formation are essential to understand the characteristics of ITSs as a subdiscipline as well as to identify the discipline formation stage. Only by analyzing the characteristics and composition of the subdiscipline, the evolution of the knowledge sources, and the intellectual base can researchers be better guided concerning the research methods or skills they need to master. Further, the illumination of the current stage of the subdiscipline can help researchers identify more promising research directions and adopt more suitable strategies.

Therefore, based on the following research questions (RQs), this research aimed to reveal the history, current status, and trends in ITS research from a scientometric and multidisciplinary perspective:

  1. RQ1:

    What is the current status of ITS research? What are the main contributing countries/regions and the major journals, authors, and institutions?

  2. RQ2:

    What are the ITS knowledge sources and how have they developed?

  3. RQ3:

    What have been the most popular ITS research foci?

  4. RQ4:

    What are the chronological ITS research stages and the intellectual bases in each of these stages?

  5. RQ5:

    What are the emerging ITS research trends?

Data and methods

Data collection

The relevant research article data for this study were extracted from the Science Citation Index Expanded (SCIE) and the Social Science Citation Index (SSCI) in the Web of Science (WoS) Core Collection database from 1963 to 2020. Due to deficiencies in the originality, completeness, and impact of review articles, conference papers, proceedings, and other document types, only research articles were included in this study. Review articles were excluded because they were not original research and their inclusion would possibly lead to over-citation and overrepresentation, especially for highly cited papers (Miranda & Garcia-Carpintero, 2018), which could have affected the veracity of the results.

The 1173 bibliographic records (research articles) and 12,992 associated references were collected on May 21, 2020, using the advanced search service offered by the WoS Core Collection. A query formula that included field tags, Boolean operators, parentheses, and query sets was created to retrieve the desired literature, as follows: TS = (“intelligent tutoring systems” OR “Intelligent Computer-Aided Instruction” OR “Intelligent Computer-Assisted Instruction” OR “Artificial Intelligence in Education” OR “adaptive educational system” OR “adaptive learning systems” OR “constraint-based tutors” OR “Cognitive Tutor” OR “AutoTutor” OR “SQL-tutor” OR “assistments” OR “elm-art” OR “iweaver” OR “DeepTutor” OR “Coh-Metrix” OR “Electronix Tutor” OR “Student modeling” OR “Knowledge Tracing).” This retrieval formula included concepts, theories and methods, and applications to ensure precision in the ITS research retrieval results.

Data analysis

After the data collection, several descriptive statistical analyses were performed to identify the WoS categories, the specific country/region, journal, author, institutional, research area publication numbers, and the ITS research field citation distribution. To explore the information behind the publications, a series of co-occurrence analyses (co-word and co-citations) were also conducted.

Co-occurrence analyses or co-word analyses were conducted on the keywords. The keyword network in scientific literature can be used to explore the correlations between different research studies to reveal the potential research issues and the intellectual structure in a certain research field (Chen et al., 2019). This method has been widely used in bibliometric studies (Wu et al., 2009), was also employed to construct and view the bibliometric maps.

This paper harnessed the strengths of each of the above-mentioned tools to enhance the presentation of the study results. Table 1 shows the corresponding relationships between the research questions, data analysis methods, tools, and research content.

Table 1 Relationship between research questions, methodology, and analysis tools

Results and discussion

Publication analysis

RQ 1: Three-stage publication growth

Figure 1 shows the number of publications in each year. A total of 1173 articles (publications) were collected. The earliest paper appeared in 1963 (Cavanagh, 1963), but besides this paper, few relevant papers were published in the 1960s and 1970s. However, the general ITS technological development during the early years was driven by AI technology.

Fig. 1
figure 1

ITSs topic papers collected from SCIE and SSCI, 1963–2020. Note. The Top Edge of the Yellow Rectangle represents the mean number of publications at each stage

More ITS-related publications appeared in the 1980s and since then, the number of publications has been kee** fluctuant growth. The overall distribution of ITS-related publications had a three-stage growth. During the first stage (1985–1998), the ITS research studies jumped from 7 in 1989 to 32 in 1998, with a mean of 14. In the second stage (1999–2006), after a brief decline in 1999, the number of publications increased, with a mean of 32.6, with a peak appearing in 2006. However, it was then followed by a sharp decline in 2007, but rebounded in 2008. The third stage was from 2007 to 2019, during which time, the mean was 52.4, and reached an all-time peak of 77 in 2013. In general, the number of publications has had a fluctuating growing trend, with the average number of publications in the three stages being respectively 14, 32.6, and 52.4, from which the three-stage growth pattern can be clearly seen.

There were several noteworthy AI technology events that increased public interest in AI technology and led to technological breakthroughs. As the public attention began to attract research interest, there was a significant increase in the number of publications.

Lighthill’s report (1973) revealed that there was a gap between AI technological expectations and reality, which resulted in a reduction in funding in some countries for the first ebb since the Dartmouth Artificial Intelligence Conference in 1956. In 1986, the development of the back propagation (BP) neural network algorithm (Rumelhart et al., 1986) incited new interest in ITS research. In 1997, Deep Blue defeated the world chess champion, which was a milestone in AI development and awakened public and research interest in related ITS applications, with the number of publications climbing to 37 in 1998. However, a year later, only 11 papers were published as chess robots were not seen as directly relevant to ITSs. In 2006, Hinton’s deep belief network (Hinton et al., 2006), which was a breakthrough in neural network algorithms, spurred renewed interest, with the number of publications increasing to 59, and in 2013, the deep learning algorithm that finally achieved speech and visual recognition success, resulted in the era of perceptual intelligence (Li, 2018), which resulted in publications reaching an all-time peak.

These landmark events were respective ITS publication triggers, which little by little led to the fluctuating growth in the number of publications over the years. The relationship between AI technology and ITSs was due to the obvious spillover effect as cutting-edge AI theory and technological research started to lead to ITS education application developments. As AI research increased, interest in ITSs increased, with each research field having an influence on the other as reflected in the fluctuating number of publications. From the scientometric view, the stepped and fluctuating growth) in ITS research (Fig. 1) did not conform to the Price literature exponential growth curve (Tague et al., 1981), which demonstrated that ITS research was still in its initial discipline formation stage.

Since the neural network algorithmic breakthroughs in 2006, algorithms such as Deep Learning, CNN (Convolutional Neural Networks), and GAN (Generative Adversarial Networks) have gradually become the most exciting AI areas. In recent years, one of the deep learning algorithms, CNN, which specialized in recognizing images, facial expressions, voice, and even text, has begun to be more widely applied to education through ITS research optimized model designs and parameter values (Liu et al., 2020; Conati & Maclaren, 2009; Craig et al., 2004) and computer perception devices that automatically monitor student emotions while they are learning (Aleven et al., 2006; Graesser & d'mello, 2012).

In terms of theoretical constructs in psychology, Matz (1981) developed one of the first detailed psychological models to explore why students had certain misconceptions while using ITSs, which provided a cornerstone for building flexible diagnostic systems), and Du Boulay et al. (2007) proposed meta-cognitive scaffolding to increase learner motivation and engagement. As early as 1993, the ACT-R theory was proposed that integrated cognitive science and learning theory, which consequently became a theoretical framework to guide system design (Anderson et al., 1995) for ITSs such as the Cognitive Tutor developed by Carnegie.

Research issues

RQ 3: Co-occurrence networks of keywords

Because keywords are refined research content, popular ITS research foci can be identified using keyword co-occurrence analysis (** of science by combined co-citation and word analysis. Structural aspects. Journal of the American Society for Information Science, 42(4), 233–266" href="/article/10.1007/s12564-021-09697-7#ref-CR15" id="ref-link-section-d192180795e1833">1991). Co-citation analyses identify the consistency of the concepts and references and their associations with the research field (Anwar et al., 2019).

In this section, the intellectual structure evolution is demonstrated through a visualization of the intellectual base and the research fronts (Chen & Guan, 2011). The co-citation analysis was based on the 12,992 references in the 1173 sample publications.

RQ 4: Chronological research stages

The co-citation references and their relationships are the nodes and links in the co-citation network, which are clustered based on the co-citation associations (Madani & Weber, 2016). In CiteSpace, the co-citation network clusters are labeled by the Log-Likelihood Ratio algorithm (Chen et al., 2010), which reveals the main research specialties in the nodes in each cluster. (Chen, 2017; Hu, 2017). Each cluster has a specific ID, such as #0, #1, as shown in Fig. 6. The ID numbers are in descending order based on the size of the cluster (Li & Chen, 2016), and the color bar below Fig. 6 changes from gray to red to represent the transition from 1963 to 2020.

Fig. 6
figure 6

Co-cited reference clusters with cluster ID

The silhouette score in Table 4 is an indicator of the cluster homogeneity, that is, when the silhouette score is greater than 0.7, the clustering results have high reliability (Li & Chen, 2016). The Mean Year is the average publishing year of the literature in the cluster, which is used to detect the ITS intellectual base evolution. The top terms are detected by the Log-likelihood algorithm from the keywords in the co-cited references. The top terms in Table 4 are the first two terms in each cluster that were statistically significant.

Table 4 Information of the co-cited references clusters

From the cluster evolution shown in Fig. 6, the co-cited reference clusters were divided into three stages in Table 4, which are shown in different background colors.

Clusters #7, #11, #9 and #4 constituted the early stage (mean years 1986–1992). Using the LLR tag semantic analysis, it can be seen that the early research was mainly focused on the ITS applications associated with knowledge representation and CAI, such as adaptive systems and business management gaming simulations, and the development of cutting-edge computer technologies, such as unsupervised learning and neural networks. This early stage research indicated that ITS research originated from an education base and was established through the associations of education and computer technologies, with the core themes being related to how the education and computing technologies could be combined.

Clusters #5, #8, #4, #6, #23, #13, #10, #18, #16 and #0 constituted the second stage (mean years 1997–2006), with the research themes being an extension of the earlier stage. Due to the spillover effect of the continuing technological progress, many ITS applications such as AutoTutor and Cognitive tutor began to emerge. Consequently, the research directions began to focus on the integration of pedagogy, cognitive psychology, and linguistics theories into ITS applications (Anderson et al., 1990; Johnson & Richel, 2000). The 2006 publication were related to AI technology breakthroughs (Hinton et al., 2006) and the application of deep learning technologies to ITSs (Conati, 2009; Desmarais & Naceur, 2013; Koedinger et al., 2012). Research also strengthened and deepened in various subfields as shown in the several co-citation reference cluster branches, such as #0 human–computer interactions, #5 probabilistic models, #8 computer-mediated communication, #10 Cognitive tutors, #18 meta-cognitive skills, #23 semantic web-based educational systems, #13 latent semantic analyses, #4 STEM learning, #16 data mining, and #6 computer-supported collaborative learning. Of these, the research on human–computer interactions was the largest research cluster in this stage.

The appearance of these multiple research branches in the second stage was directly related to AI technological breakthroughs, which opened up the field to different types of ITSs. For example, the progress made in the development of probability models improved the construction of student models, and the multidisciplinary pedagogy, cognitive psychology, and linguistics theory developments deepened the ITS application research possibilities.

Two research branches appeared as the intellectual bases in the third stage (# 1, # 3, # 14, # 15, # 2). Cluster #1 was associated with developments in computational linguistics and was an extension #6 and #23 and semantic analysis. The references in cluster #1 were mainly associated with ITS text analysis applications such as Coh-Metrix. The second branch (# 3, # 14, # 15, # 2) was an extension associated with the combination of computer technology and pedagogy in ITSs, such as problem-centered instruction, and STEM. Some of the themes that appeared in the clusters #4 and #5 reappeared in clusters #2, #3, and #14, which suggested that the STEM learning and model constructions were common ITS research concerns. The two research branches in the third stage, therefore, appeared to be related to computer technology spillover effects, as they represented progress in areas such as the associations between NLP (Natural Language Processing) and social science and ITS text analysis applications such as Coh-Metrix; that is, the research in this stage deepened the research directions identified in the previous stages.

The scientometrics analyses of the ITS research revealed the multidisciplinary stage of discipline formation, as highlighted in Shneider’s theory (2009). Shneider proposed that the discipline formation may experience four stages, the initial conceptualization stage, the multidisciplinary stage, the expansion stage, and the final stage of decay (Shneider, 2009). The researchers were applying methods from other disciplines to deepen the ITS application-oriented research. Several studies (Choi & Clin, 2006; Núñez et al., 2019); Garnder (1987) claimed that multidisciplinary science was a weaker version of interdisciplinary research, which creates its own theoretical, conceptual, and methodological identity (Núñez et al., 2019). However, it could also be argued that ITS research is currently multidisciplinary as specific ITS communication systems or specific theories have not yet been developed. Overall, the main characteristics of the studies in this period were originality and creativity, which tended to indicate that to effectively extend into new research areas, researchers needed to practice high-risk endurance when choosing their tasks (Shneider, 2009).

RQ 4: Significant references in the co-citation networks

Frequently co-cited references imply advanced ideas and developments in a given research field (Anwar et al., 2019). Table 5 shows the top 10 most frequently co-cited references and gives a brief description of the articles. Articles with high betweenness centrality scores generally indicate (Chen & Guan, 2011) a fundamental transition in the research knowledge domain paradigm and the significant influence these references have on the co-citation structure (Li & Chen, 2016). The betweenness centrality in CiteSpace is algorithmically identified and represented using magenta circles, which allows researchers to understand the changes in a domain’s knowledge structures over time (Chen, 2005).

Table 5 Top 15 co-cited references

The article with the highest betweenness centrality (centrality = 0.32) was “Cognitive tutors: Lessons learned” (Anderson et al., 1995), which presented several empirical studies on ITS learning effects, the development of procedures when cognitive models were incorporated into tutoring systems, and the development of individual instruction. The article also reviewed the use of KT and Bayesian technologies to assess the probabilities that students had learned the principles in the cognitive model. This research article belonged to cluster #5 (probabilistic models, adaptive testing) at the connection point of #9, #5, and #8 in Fig. 6. Therefore, this study was a notable joint in the intelligence base and transited single chain development to the multiple sub-research fields.

RQ 5: Emerging ITS trends

Emerging research‐front concepts were identified using CiteSpace’s Burst detection of the article citations (Chen, 2006), and were specifically detected using the Kleinberg algorithm (Feng et al., 2015). Co-cited references with high Burst values indicate that the number of co-citations of an article suddenly increased sharply (Chen, 2012; Hou et al., 2018). The citation reference themes with high Burst values in a specific period, therefore, reflect the research trends in specific domains for the next few years (Hou et al., 2018).

Figure 7 shows the co-cited reference areas with high Burst values, most of which were located between 2019 and the first half of 2020, with the red dot representing those references with high burst values. The image in the larger red circle is an enlarged drawing of the area within the smaller red circle, and the thickened lines in the larger red circle represent the citation relationships between the co-cited references with high Burst values from 2019 to 2020. The latest co-cited reference clusters with high Burst values were mainly in clusters #1 and #2, the themes of which reflect the emerging ITS research trends.

Fig. 7
figure 7

Evolution of co-cited reference with high Burst values

Cluster #1 was mainly associated with Coh-Metrix research (Graesser et al., 2014; Graesser, Conley, et al., 2011; Graesser, Mcnamara, et al., 2011; McNamara et al., 2010, 2014). Grasser et al.’s (2014) review, “Coh-Metrix measures text characteristics at multiple levels of language and discourse,” summarized how five factors; narrativity, syntactics, simplicity, word concreteness, referential cohesion, and deep/causal cohesion); accounted for text variations, and also reported on analyses that augmented Coh-Metrix. McNamara et al. (2010) investigated the validity of Coh-Metrix as a measure of cohesion and coherence in texts, finding that the Coh-Metrix cohesion indexes were able to significantly distinguish the high versus low-cohesion text versions. This research revealed that Coh-Metrix was one of the most popular research applications in recent years.

In the #2 cluster, the top terms were “problem-centered instruction” and “STEM,” for which there were three research articles were prominent after all the reviews with the high frequency of co-cited references were excluded.

The first identified paper was the “Effectiveness of Cognitive Tutor Algebra I at Scale” (Pane et al., 2014) (burst = 4.15), which reported on a two year study of Cognitive Tutor Algebra I (CTAI) to compare the learning effects of personalized, mastery learning, blended learning, and CTAI alone. It was found that there was no noticeable learning effect in the first year, but positive effects were found in the second year. A second prominent paper was “Deep knowledge tracing” (Piech et al., 2015) (burst = 4.05), which introduced Recurrent Neural Networks (RNNs) to assess large scale online teaching environments and student modeling. The RNN model family was found to more easily capture the complex representations of student knowledge than earlier models and was able to substantially improve student performance predictions. The third paper was “Stupid Tutoring Systems, Intelligent Humans” (Baker, 2016) (burst = 4.02), which examined the importance of human intelligence in ITS research from a critical perspective, proposed a new teaching paradigm based on educational data mining and learning analytics methods, and emphasized the value of human wisdom when develo** ITS applications.

Of the top five references with the strongest citation bursts in 2019–2020 in clusters #1 and #2, there were three meta-analytic reviews: “Intelligent Tutoring Systems and Learning Outcomes: A Meta-Analysis” by Ma et al. (2014) (burst = 10.16); the “Effectiveness of Intelligent Tutoring Systems: A Meta-Analytic Review” by Kulik (2015) (burst = 9.1); and “A meta-analysis of the effectiveness of intelligent tutoring systems on college students’ academic learning. Journal of Educational Psychology” by Steenbergen-Hu and Cooper, (2014) (burst = 9.1). These references analyzed the effectiveness of the ITS-related learning effect evaluations in different environments.

Kulik (2015) conducted a meta-analysis of 50 controlled intelligent computer tutoring system evaluations and found that the intelligent tutoring had raised test scores by 0.66 standard deviations over the conventional level. Ma et al. (2014) examined 107 findings from 73 separate reports, finding that the average ITS effect resulted in an improvement in test scores of 0.43 standard deviations, but also found that the improvements depended to a great extent on whether they were measured using locally developed or standardized tests. The meta-analytic review by Steenbergen-Hu and Cooper (2014) analyzed 35 ITS effectiveness evaluations in colleges, finding that the ITS applications resulted in increases in the overall test scores by approximately 0.35 standard deviations, but that the type of control group strongly influenced the evaluation results.

The other two references with the strongest citation bursts were a research article and a book. The article “Coh-Metrix: Providing Multilevel Analyses of Text Characteristics” (Graesser et al., 2011) (burst = 13.09) identified “word concreteness,” “syntactic simplicity,” “referential cohesion,” “causal cohesion,” and “narrativity” as the main factors accounting for the most variance in texts across grade levels and text categories using Coh-Metrix. The book “Automated Evaluation of Text and Discourse with Coh-Metrix” (McNamara et al., 2014) (burst = 12.37) provided a comprehensive introduction to Coh-Metrix from both theoretical and practical perspectives, and commented that the development of Coh-Metrix had resulted in a new paradigm that integrated language, corpus analyses, computational linguistics, education, and cognitive science research.

The top ten co-cited references with high burst values from 2019 and the first half of 2020 are listed in Table 6. As there was a large number of reviews (7 reviews, 2 articles, and 1 book), the proportion of original research has been relatively small. It showed that scholars in this field paid more attention to the review in this period. Generally speaking, reviews do not produce new knowledge as they only analyze previous studies, which seems to indicate that there have been fewer original research breakthroughs in recent years or any breakthroughs there have been have not received widespread attention. This situation was further confirmed when compared with Table 5, in which the top co-cited references in history were research articles. Liu and Hu (2018) believed that as there had been wide ranging discussions on education ITS developments, there was a great deal of high-level research repetition but few innovative breakthroughs. This was possibly because the ITS field has lacked a set of adaptive theoretical frameworks and basic guiding theories or assumptions, which should not be broad and descriptive, but directly reflect the characteristics and the essences of ITS-associated learning and education (Liu & Hu, 2018). Thus, we could boldly infer that there was little obvious breakthrough in ITSs in recent years, and there might exist a high level of research repetition.

Table 6 Top 10 co-cited references with high burst value in 2019–2020.5

Conclusion

Publication growth patterns and top contributors

In response to RQ 1, it was revealed that there had been a fluctuating growth in yearly ITS publications, and that the number of citations had exponential growth, which indicated that the ITS field was still in its initial developmental stage based on the Price literature exponential growth curve. However, high developmental potential was evident, which indicated that the ITS and ITS research is expected to flourish in the coming years.

It was implied that the milestones in the development of AI led to the three-stage development of ITSs research. In 1963, ITSs were first proposed and after the Lighthill report (1973) was published, research into AI was reduced with the study into ITSs commensurately reducing until 1985, as shown in Fig. 1. The development of the BP algorithm made the training of large scale neural networks possible, which resulted in increased interest in AI research and a renewed interest in ITSs. However, AI research experienced a second tough time in the late 1980s and early 1990s, then the research recovered. When Deep Blue defeated the world chess champion in 1997, ITS research increased. When Hinton’s Deep Belief Net made a breakthrough in AI algorithms in 2006, there was an explosion in ITS research publications.

The most productive country/region for ITS research has been the USA, the most productive author has been Graesser, the top publication journal has been Computers & Education in terms of the number of the publications and the University of Memphis has contributed the most ITS research studies.

The multidisciplinary integration of ITSs

A new viewpoint was introduced to explain the subdiscipline composition to answer RQ 2. It was found that computer science, education research, psychology, and engineering have been the main ITS research knowledge sources, with computer science taking the dominant position. Due to its application-oriented characteristics, ITSs have had a unique discipline integration feature, which means that ITS research has tended to harness the latest achievements from other disciplines (Shneider, 2009). Given the continuing rapid technological developments, several ITS research subdivisions emerged associated with educational theory, cognitive psychology, and linguistics. This study explicated the path of multidisciplinary integration of ITSs researches in social and natural science fields. Social science in ITSs developed along with natural science and technology evolution.

The category analyses also clearly revealed ITS’s multidisciplinary integration path. While ITSs originated from an educational discipline, since 1986, computer science research has become its most important knowledge base; however, from 2007, social science-based publications have exceeded natural science-based publications.

The most popular research issues

The top five keywords from the co-occurrence frequencies were identified as interactive learning environments, student modeling, teaching/learning strategies, machine learning, human–computer interface, and Coh-Metrix), which reflected the most popular issues in ITS research.

Intellectual base and emerging research trends of ITSs

The intellectual base evolution was analyzed by examining the co-cited references, the clusters for which were divided into three chronological research stages: the first stage mainly focused on the ITS applications; the second stage led to various subfields; human–computer interaction, probabilistic models, computer-mediated communication, cognitive tutors/meta-cognitive skills, semantic web-based educational systems/latent semantic analysis, STEM learning, data mining, and computer-supported collaborative learning; and the third stage gave rise to two main research subfields; computational linguistics, and the combination of computer technology and pedagogy.

The latest Co-cited references clusters with high Burst value references were clusters #1 and #2, the themes of which reflected the emerging ITS research trends. Cluster #1 was mainly focused on Coh-Metrix, indicating that the text analysis/NLP and other research around Coh-Metrix was the current research trend. Cluster #2 was focused on problem-centered instruction and STEM learning, which have been traditional ITS research concerns. Model construction and critical thinking for the development of ITS applications were also found to be important recent research themes.

From the scientometrics view, it was argued that ITS research is currently multidisciplinary and therefore researchers, should implement ideas and practice high-risk endurance when choosing research tasks (Shneider, 2009).

Implication

Here we wish to raise a question for future research. What has been the effect of ITS on education? To answer this question, we need to consider the effects and functions that ITSs are seeking to achieve, the extent to which technology can be applied, as well as identifying the initial desire for ITSs from an educational viewpoint. This question comes from the vision of education. It doesn’t need to be answered immediately, and there might be more questions to be addressed, especially under the situation of the global epidemic COVID-19 bringing great challenges to education. There are still many students taking classes at home around the world. What are the new requirements in the circumstances for ITSs? We hope to raise these questions to inspire researchers, educators to explore more undeveloped research areas, and have more in-depth discussions.

Limitations and future research

This article had some limitations as the data were only collected from the SCIE/SSCI indices in the WoS database. Although the WoS database contains a large volume of high impact research articles, some valuable research has been also published in books or collected in other indices. Diversified data sources should therefore be considered in the future. This study also excluded reviews from the data collection. In the future study, the trends of review research including systematic review, literature review and meta-analysis could be conducted.