Introduction

Instruction could be better tailored to students and thus be more effective, if students’ mental processes were known and reliably associated with the outcomes (the products) in different learning tasks to determine ‘student profiles’ (Vettori et al., 2022). Some of these processes involve metacognition (Cromley & Kunze, 2020; Veenman et al, 2006) and metacognitive skills that have been proven to be important predictors of academic performance (Ohtani & Hisasaka, 2018), particularly in mathematics (Desoete & de Craene, 2019; Muncer et al, 2022).

One of these metacognitive skills, comprehension monitoring, has been shown to significantly contribute to human learning (Guo, 2022; Huang et al., 2022). It has been extensively studied in the field of reading comprehension as a factor related to comprehension outcomes (Baker & Beall, 2014; Guzman et al., 2018; Kolić-Vehovec & Bajšanski, 2006; Soto et al., 2022) and to mental mechanisms. For instance, Van den Broek and Helder (2017) studied the monitoring processes aimed at establishing coherence in a reader’s mental representation of a text, considering the influence of readers’ standards of coherence and their prior knowledge. Otero (2009) analysed the way readers monitor their understanding to ask information-seeking questions. In a simple model, Otero (2009) proposed that question-asking is a type of regulation step taken after detecting a comprehension obstacle in a text. More recent work on text comprehension monitoring has focused on readers’ processing behaviour variables. In different studies using eye-tracking (Connor et al., 2015) or computerized environments (van Moort et al., 2018), gaze location and duration were examined both at word and sentence level. The results showed the following: (a) an increase in gaze duration when an anomaly is detected, compared to reading consistent information (the ‘Inconsistency effect’; Kim et al., 2018) and (b) more time spent by participants with good regulatory skills rereading an inconsistent text due to their attempts to re-establish coherence (Currie et al., 2021; Zargar et al., 2020). Similar readers’ processing behaviours were obtained when reading in a foreign language, although comprehension monitoring was found to be more difficult mainly due to word processing and word-to-text integration problems (Mulder et al., 2021).

Comprehension monitoring has also been analysed in mathematics education (Başokçu & Güzel, 2022; Callan & Cleary, 2019; Cohors-Fresenborg et al., 2010; Van Haneghan & Baker, 1989), in particular with the use of eye trackers (Strohmaier et al., 2020), intelligent tutors (Del Olmo-Muñoz et al., 2022), or electronic supporting systems (Ge, 2013). As in reading comprehension, differences in the number and time of eye fixations, their loci, and their sequence, as well as in the number of eye regressions, were consistently related to differences between poor and good performers in tasks such as problem-solving (Strohmaier et al., 2019) or arithmetic (Huebner & LeFevre, 2018).

However, few of the above studies in mathematics education tried to relate the students’ processing behaviours to their mental processes in terms of psychological models. Therefore, most of these results are difficult to integrate into current psychological knowledge about learning, and particularly about learning mathematics.

The present study

The main objectives of the present study were:

  1. (1)

    To identify university student monitoring profiles in a specific monitoring task, from the task products elaborated and from interviews with the participants, and then characterize these profiles by differences in the students’ mental processes involved. The monitoring task proposed was judging the comprehensibility of algebraic worked-out examples and reporting in writing any error or inconsistency detected.

  2. (2)

    To characterize the student monitoring profiles by the specific mental processing involved, operationalized through behavioural variables in the light of a psychological model for inconsistency detection. This implies relating the students’ behavioural variables to the products they elaborate on in the task (the written reports) with high reliability.

In mathematics education, worked-out examples are frequently used in instruction. Worked-out examples are, basically, fully solved problems offered to facilitate novice students’ encoding and abstraction of problem schemata (i.e. typified relations between statement information and mathematical procedures) with low cognitive load and effort (van Gog et al., 2011). Several processes linking external (linguistic) and internal representations are necessary to completely solve an algebraic problem (Bosse et al., 2020; Polya, 1945). In essence:

  1. (1)

    Comprehension of the problematic situation described in the statement. This implies the elaboration of a suitable situation model usually involving a solver’s previous knowledge.

  2. (2)

    Translation of plain language (statement) into mathematical language, resulting in algebraic equations or arithmetical operations. This step corresponds to the elaboration of a problem model based on a situation model (Kintsch & Greeno, 1985).

  3. (3)

    Transformations of mathematical language, within the problem model, in order to achieve a solution.

  4. (4)

    Interpretation of the mathematical solution in the context of the situation model, i.e. giving meaning to the solution (a kind of inverse translation).

In a worked-out example, the solver’s knowledge that should be activated and the required mathematical transformations in order to achieve a solution, are explicit. Yet students are expected to check their comprehension of the translation process, as well as to interpret the solution.

Embedding deliberated inconsistencies in the information provided is one of the most used procedures to assess monitoring (Winograd & Johnston, 1982). In the present study, the translation process was deliberately altered by embedding a specific inconsistency, a ‘reversal error’ (Soneira et al., 2018), in one equation. The reversal error consists in inadequately interchanging two variables in an equation. This is caused by an incorrect word order matching’, or by an interpretation of the algebraic expressions as labels instead of operations (Clement et al., 1981, p. 288). Because of the reversal error embedded, the solution obtained was also erroneous and it did not make sense, in direct disagreement with a correct interpretation. Therefore, the translation as well as the interpretation processes were altered by the embedded error.

Erroneous worked-out examples have been used successfully in instruction to improve students’ problem-solving skills (Chen et al., 2019; Zhao & Acosta-Tello, 2016).

To collect data on students’ monitoring ability, the software ‘Read and Answer’ (Vidal-Abarca et al., 2011) was used. Online measures to assess metacognition seem to be more accurate than offline measures (Veenman & van Cleef, 2019), but eye trackers are still expensive and difficult to use in educational settings. ‘Read and Answer’ records the student’s processing behaviour online, as well as the results of the proposed task. The students’ mental activity while performing the task was inferred from their behavioural online data by means of a theoretical model described below. Finally, a taxonomy of students’ monitoring profiles was proposed and characterized in detail.

A psychological model for inconsistency detection

Vosniadou et al., (1988; p. 28) proposed a metacognitive mechanism to explain how inconsistencies in a provided information are detected and reported. Based on this mechanism and on Kintsch’s (2018) construction-integration model for reading comprehension, Otero and Kintsch (1992) elaborated a computer simulation which properly explained students’ success or failure in detecting inconsistencies in plain texts. According to Vosniadou et al., (1988; p. 28) and to Otero and Kintsch (1992), the requisites for detecting an inconsistency are:

  1. (1)

    The inconsistent propositions (units of meaning) must be read (or listened), encoded, and represented in working memory in the construction phase.

  2. (2)

    These representations must be compared to one another. This implies the simultaneous processing of the inconsistent propositions in the same cycle in working memory.

  3. (3)

    Both inconsistent propositions must remain activated at the end of the processing cycle. If one of two inconsistent propositions were inhibited by the other in the integration phase, the reader would not feel a break in coherence, as obtained by Otero and Kintsch (1992).

Once an inconsistency has been detected, the reader can initiate regulatory actions in order to re-establish coherence. Among these actions, rereading is frequent to assure the presence of the incoherence (Zargar et al., 2020). Additionally, the use of ‘fix-up’ strategies, i.e. adding personal information in the construction phase to resolve the contradiction (Otero, 2002), are also frequent.

As when reading in a second/foreign language, giving meaning to mathematical language or to natural language has been shown to use different brain resources in expert mathematicians (Amalric & Dehaene, 2019). Hence, previous results using this theoretical basis in tasks only involving plain language might not be applicable to math tasks.

In the case of worked-out algebra-problem examples, plain language (statement) and algebra (resolution) must be processed and coherently related to achieve comprehension. To check for coherence, and thus to build and compare different propositions (units of meaning), students must translate one language into the other (translation and interpretation processes). These processes increase the number of units entering the working memory (w.m.); thus, the probability of overload also increases. To reduce the cognitive load, students can (re)parse the information into smaller chunks entering the w.m., which then causes the number of processing cycles to increase and establishing global coherence among many units could imply extra effort.

According to the psychological model that has been described, possible mistakes that could be found when students check the comprehensibility of a worked-out example with embedded errors, as in the present study, are the following:

  1. (1)

    Poor elaboration of the situation model, caused by a limited or inadequate activation of a readers’ knowledge, for instance. In this case, some necessary propositions are not represented in working memory.

  2. (2)

    Incorrect or non-existent checking of the translation process. In this case, some important propositions from the statement are not compared in working memory with specific propositions from the mathematical resolution.

  3. (3)

    The interpretation process is incorrect or is missing. This may be because some propositions from the mathematical result are not built, or because these propositions are not compared with specific propositions in the situation model.

Methods

Research design

A two-staged mixed model study was conducted. First, a detailed case analysis was made for each participant. The task product and the online data about the processing behaviour were individually analysed and categorized. A short interview was conducted with each subject to increase the reliability of the data. Second, all individual data were put together to perform statistical analyses looking for student profiles.

Participants

Ninety-four (30 women and 64 men) Spanish postgraduate students (24–34 years old) participated in this study. They were all graduates of different engineering specialities and enrolled in a master’s degree program. They were chosen for the good mathematical command and problem-solving experience that engineers usually show. Therefore, their potential obstacles in a monitoring-for-understanding task should be mainly due to failures in monitoring competence. A sampling procedure was not conducted; the sample was selected by their accessibility.

Students (adults) were asked for permission, and their participation in the study was voluntary. One student left the experimental session because she felt uncomfortable working out the proposed task on the computer.

Materials and task

The experimental worked-out example was segmented into information units as shown in Table 1 (the original statement in Spanish has been translated into English). A well-known ‘reversal mistake’ was embedded into a main equation (R1 in Table 1). Consequently, units S1 and R1 became inconsistent (as R1 should be the translation of S1 into algebra). The rest of the resolution did not include additional mistakes, so a wrong numerical solution (the Sol unit in Table 1) was caused by the wrong equation R1.

Table 1 Textual and algebraic information units as they appeared in the task

The task explicitly proposed to students was ‘Please, judge the comprehensibility of the worked-out example, the statement and the resolution, and report in writing everything that you consider incorrect, meaningless, inconsistent, or unintelligible. Be precise in explaining the reason for each obstacle and locating it in the corresponding information units’. In this way, we try to avoid a ‘searching for errors task’ because it could lead students to an unrealistic monitoring activity. This task seems more appropriate for future teachers and is slightly different from that used in other studies with erroneous worked examples, where students are asked to find the errors as part of their instruction.

Instruments for data collection

The software ‘Read and Answer’Footnote 1 was used to offer the worked-out example to the participants, and to collect online data about their processing behaviour and the product of the task (a written report). ‘Read and Answer’ shows visual information (text for instance) on a computer screen and asks the user to perform a task related to this information. The user must process the information provided to elaborate the task outcome (the ‘product’). The text (or other written languages, such as algebra) must be divided into information segments (the segmentation used in the present study is shown in Table 1). Initially, the segments are shown masked and then, unreadable. When clicking on a segment, it becomes readable but only one can be read at once (see Fig. 1). The student can read and reread the segments in the order they want and as many times as they need.

Fig. 1
figure 1

Main screen in ‘read and answer’: each information unit is masked until clicking on it. Note: In Spanish, the letters A and P stand for student (‘alumno’) and teacher (‘profesor’), respectively.

The software also allows the student to access a second screen by clicking on a specific button on the main screen. In this second screen, the task is stated, and the answer has to be written in a specific area. The participant can move from one screen to the other at their own pace and modify their written answer as many times as they wish. When the overall task is finished, the student can press a button to quit (at the bottom in Fig. 1).

‘Read and Answer’ records the complete time-ordered sequence of users’ actions and the time devoted to each one: unmasking information units (main screen), reading the task or (re)writing the answer (second screen), and pressing buttons. From these primary data, other indicators can be elaborated, such as the accumulated time in processing each unit. Therefore, the task outcome and the processing behaviour are registered together: deletions, additions, substitutions, or rearrangements in partial and final elaborations, as well as the actions made while processing the information provided.

A brief individual interview was conducted aimed at increasing the reliability of the researchers’ interpretation of the online data, as in Otero et al.’s (1992) study. In the interview, the embedded error was shown to each participant, and the following questions were asked when appropriate: Did you detect this specific error? Did you report it in writing? If you detected the error but did not report it, why? In which step of the processing did you detect the error? If there were some participants with inappropriate regulation (i.e. detecting the error but not reporting it because of improper causal attribution), the interview could help identify them.

Procedure

Data collection took place in a computer classroom with 40 personal computers in a normal classroom session. Each participant performed the task individually. First, detailed instructions were read aloud and explained to the participants by one of the authors. Next, a practice example was used to train the participants in the use of the ‘Read and Answer’ software and to clarify the task. Throughout this practice, help was provided to the students and their doubts were clarified. Then, students addressed the experimental worked-out example at their own pace without additional help. The task lasted 20–25 min typically.

As they finished, the students went to an adjoining classroom to be individually interviewed by the other two researchers. The interview typically lasted 2–5 min.

Variables and data analysis

Product and behavioural variables were defined and measured. First, as mentioned before, all the writing actions a reader made to elaborate their report about the comprehension obstacles found were registered. The ‘product’ dichotomic variable was defined as the explicit mention or oversight of the inconsistency between the ‘target’ segments S1 and R1: ‘Embedded Inconsistency Successfully Reported’ (Yes/No). This variable was defined based on each student’s written report and the interview only, without using behavioural data.

Second, the behavioural variables were defined based on the reading sequence and the reading times, independently of the written reports and interviews. Data regarding the processing of plain language or algebra were analysed separately.

In order to differentiate normal from difficult professing, the normal reading paces in text and in algebra were individually computed. Average time per word and average time per algebraic symbol (letters-quantities, operations, equals sign, parenthesis), as well as standard deviations, were computed. These included all the reading visits that exceeded 100 ms/word, a minimum time for accessing meaning. For a particular reader #j, any value in the interval (Meanj ± Standard deviation j) was considered their normal reading pace. A value greater than the upper limit when reading a segment (a ‘slowdown’ onwards) was considered possible reading comprehension obstacle. Special attention was paid to the presence (yes/not) of ‘slowdowns’ in S1, R1, or Sol or R5.

The accumulated time per word or per symbol on each segment was also considered indicating the effort made in processing certain segments compared to others. The processing effort expended by a student on a particular segment a was defined as the (numerical) value resulting from the accumulated time reading a divided by the time needed to read a once at a normal pace (using the mean value).

Third, the presence (yes/not) of reading sequences including a consecutive reading of the non-consecutive segments R1 and S1 was singled out. These linking sequences (‘links’ onwards) should facilitate the simultaneous processing of R1 and S1 and therefore the monitoring of the translation process. The same was done for the sequences linking Sol (or R5) with the S1 or S2 segments. These reading sequences should facilitate the interpretation of the numerical result in the light of the situation model.

Numerical data recorded by the software ‘Read and Answer’ were first analysed with Excel™ and then with SPSS 24™.

A participant’s responses to the questions in the individual interview were added to their data.

Predictions

A reader could detect the inconsistency embedded in R1 using one of the two following strategies:

  1. (a)

    Directly monitoring the algebraic translation (direct detection). If the textual segment S1 is correctly translated into algebra, a correct equation ‘R1': 29 • T = S’ is built, clashing with R1. Alternatively, meaning can be given to R1. If R1 is translated into plain language, the new sentence S1': ‘In a school there are twenty-nine times as many teachers (T) as students (S)’ is inconsistent with S1.

  2. (b)

    First, making a correct interpretation of the numerical solution as illogical (Sol segment, or perhaps R5) and then returning to carefully monitor the algebraic translation (indirect detection).

According to the theoretical model presented above, when the embedded inconsistency was correctly reported by a student, whether it resulted from a direct or an indirect detection (the variable ‘Correct reporting of the embedded inconsistency’ = Yes), he or she should show:

  1. (1)

    Normal reading speed when reading S1 and R1. This ensures that both segments were processed.

  2. (2)

    A reading sequence that included a ‘link’ between R1 and S1 (with the variable ‘Presence of a Translation Link’ = Yes), associated with the simultaneous processing of S1 and R1.

  3. (3)

    A slowdown in R1 or in S1, due to integration difficulties, when their meanings are compared to each other (variable ‘presence of a slowdown in S1 or in R1’ = Yes). This slowdown could occur in any of the inconsistent segments S1 or R1, as students could read in any order, not necessarily linear-sequential reading. However, it was especially expected to occur in R1 (the erroneous equation).

Moreover, a slowdown was expected to be observed in R1 or in S1 mainly when the simultaneous processing of the incompatible ideas took place.

  1. (4)

    An increase in the processing effort in the inconsistent segments S1 and R1 compared to the effort in non-conflicting segments. After detecting the inconsistency, readers’ regulatory measures will probably involve more rereading of the inconsistent information compared to the non-conflicting sentences. Rereading implies more reading time accumulated on certain information segments, and thus, more processing effort.

Additional expectations in the case that the embedded inconsistency was indirectly detected after monitoring the interpretation of the illogical solution were deduced from the theoretical model:

  1. (5)

    A reading slowdown in the Sol segment (or in R5) and a reading sequence linking Sol (or R5) with S1 or S2 segments were expected to appear. Rereading could cause the processing effort in Sol (or/and R5) to increase compared to instances of direct detection.

Unsuccessful monitoring may be observed when a student lacks monitoring skills, or when he or she performs a swallow processing. Two main student profiles could appear: non-detectors, i.e. students who mentioned neither the embedded error in R1 nor the absurd numerical solution, and students who interpreted the numerical solution as illogical but were not able to detect the embedded error because of poor monitoring of the translation process. The psychological model has no predictions for failed monitoring, but it was expected that some of the five requirements above would not be met.

Results

Student monitoring profiles according to their written reports

Considering all the participants in the study, 46 (49.5%) did not mention the embedded error and showed three main types of outcomes. (a) Most reported their agreement with the solving procedure. For instance, participant MR wrote: ‘The problem is clear, and the resolution is correct.’ (b) A small group reported spurious inconsistencies or errors, frequently caused by personal mistakes, as is the case of student ANH: ‘The resolution of the problem is wrong. The correct resolution would be the following: we have 600 people in total, and we consider groups of 29 persons. If there were 29 students, we would have 1 teacher. Then, I do a subtraction in which I remove 29 teachers leaving only students: 600–29 = 571. Dividing 571/29 I get the number of teachers that the school has’. (c) Some unspecifically reported that something should be wrong as the solution was absurd. Participant BU is an example: ‘The problem definition states that there are twenty-nine times as many students as teachers. 580 teachers are not logical when the total amount is 600. In addition, I see the problem definition clear and comprehensible’.

Subgroups (a) and (b) correctly monitored neither the algebraic translation nor the interpretation of the numerical solution and were considered as one student profile showing a ‘Poor monitoring’. Subgroup (c) clearly interpreted the solution, although they did not properly monitor the algebraic translation. They were considered to be showing a ‘Blocked monitoring’.

The remaining 47 participants (50.5%) successfully reported the embedded error. Two subgroups were distinguished according to their written partial (if any) and final reports: (d) indirect detectors and (e) direct detectors, both defined in the ‘Predictions’ section. An example of an indirect detector is SO, who wrote: ‘there should be an error somewhere because the result is nonsense’ (visit #63), [and later] (visit #129): ‘(…) The error starts from the first equation, since it should be 29 T = S. This could confuse the student, who will surely have difficulty in finding this error. The essential analysis of coherence of the numerical result shows us the error and leads us to review in detail the resolution procedure which seemed initially so consistent’. An example of a direct detector, JU, concisely reported: ‘The problem resolution is wrong because 29 S = T means 29 teachers per student. The solution should be: 29 T = S; T + S = 600 30 T = 600 T = 600/30’.

Direct detectors showed a ‘Competent monitoring’. Indirect detectors have the appropriate monitoring skill but showed a ‘Delayed monitoring’.

Interviews

The participants’ answers in the interview generally confirmed the category inferred from their written reports. Only 2 participants declared detecting the embedded error but not reporting it explicitly. Both attributed the error to a switching-letters irrelevant mistake. These students were classified as Spurious-error monitors.

Table 2 shows the number and percentage of participants in each profile after being classified based on their written reports and the answers given in the individual interviews.

Table 2 Distribution of student monitoring profiles according to their written report and the individual interviews

General processing data

On average, each information segment in the worked-out example was visited 6.5 times. Globally, 43.2% of the processing time was devoted to textual information, and 56.8% to algebra. The mean reading rate in the sample was 121.9 words/min (SD = 43.9) for text segments, and 139.4 symbols/min (SD = 43.8) for algebraic expressions.

Behavioural patterns for the student profiles

The behavioural variables computed for the different profiles of monitors in Table 2 are shown in Table 3.

Table 3 Mean values of behavioural variables for the different student monitoring profiles (standard deviations in parentheses)

To test the model-based predictions for a successful detection of the embedded inconsistency, a binary logistic regression was computed for the criterion variable ‘Embedded Inconsistency Successfully Reported’ (Yes/No), taking as the predictors the following behavioural variables: presence (yes/no) of slowdowns in S1 and R1 segments; presence (yes/no) of reading sequences linking R1 and S1; presence (yes/no) of a slowdown in S1 or in R1 inside a sequence linking R1 and S1; processing effort in S1 (numerical), and in R1 (numerical).

Using the step-forward procedure, only two variables were retained as significant predictors with unique contributions: the presence (yes/no) of a slowdown in R1 in some R1-S1 link (which implies the presence of such a link), and the processing effort in S1. A total percentage of 97.8% of the cases were correctly classified (97.8% of non-reporters and 97.9% of reporters). Table 4 shows the main data.

Table 4 Main data of the binary logistic regression for the variable ‘embedded inconsistency successfully reported’

A second binary logistic regression was computed for the criterion variable ‘Numerical Solution Reported as Illogical’ (Yes/No). This variable differentiates direct from indirect detectors, and also non-detectors from interpretation-only students. According to the predictions based on the model, the following behavioural variables were considered possible predictors: the presence of slowdowns in Sol (or R5), the presence of a reading sequence linking Sol (or R5) and S1 (or S2), the presence of a slowdown in Sol (or R5) inside a sequence linking Sol/R5 and S1/S2, or the processing effort in Sol, and in R5.

A total percentage of 94.6% of the cases was correctly classified (95.2% of non-interpreters and 93.3% of interpreters) with only two predictors. Table 5 shows the corresponding data.

Table 5 Main data of the binary logistic regression for ‘numerical solution reported as illogical’

As shown in Table 3, most indirect detectors (15 out of 17), but very few direct detectors (2 out of 30) showed a slowdown in Sol (or in R5) inside a Sol/R5-S1/S2 reading link. The processing efforts in Sol and R5 segments were higher in indirect detectors (MIdSol = 8.8, SD = 4.0; MIdR5 = 10.9, SD = 8.8) than in direct detectors (MDdSol = 3.7, SD = 2.6; MDdR5 = 4.6; SD = 3.8).

Both student profiles also differed in their processing paths. Indirect detectors, but not direct detectors, showed a two-stage processing, before and after detecting the solution as illogical. Consequently, indirect detectors made more reading visits (MIdvis = 83.5; SD = 35.2) than direct detectors (MDdvis = 47.9; SD = 22.8). Another consequence was a different processing of R1. Direct detectors read R1 from the beginning with a slow reading pace (M = 1.1; SD = 1.0; being 0 as the mean value and 1.0 as the threshold for slowdowns). Before detecting the absurd numerical solution, the indirect detectors processed R1 faster than algebra segments on average (MAlg = 1.1; SD = 1.9; MR1 = 0.8; SD = 1.1). However, in the second stage, they were ‘seeking for an error’, and then, while the algebra segments were read faster than before (MAlg = 0.8; SD = 1.0), R1 was read slower (MR1 = 1.9; SD = 1.6).

Discussion

First, empirical data collected with the software ‘Read and Answer’, defined from a psychological mechanism for inconsistency detection, were able to characterize four different student profiles for the specific monitoring task conducted. Among the successful monitors, some detectors first interpreted the numerical solution as non-sense and then initiated a second reading phase of careful monitoring focused on the algebraic translation process in a kind of ‘error seeking task’. They showed a ‘delayed monitoring’ behaviour and followed an indirect path to detection. In opposition, other detectors showed a ‘Competent monitoring’, performing a direct monitoring of the algebraic translation, and reported the embedded error quickly without the need of interpreting the absurd solution.

Among the unsuccessful monitors, there was a group of non-detectors who were able to correctly monitor neither the algebraic translation nor the interpretation of the solution. Their profile corresponded to a ‘Poor monitoring’ competence. Another group of unsuccessful students were able to interpret the solution as absurd, but in their subsequent error-seeking reprocessing, they could not detect the embedded error but showed a ‘Blocked monitoring’ with many fruitless reading visits until they gave up.

These four profiles were explained by behavioural variables related to mental processing with a high reliability, as a high percentage (92.5%) of individual cases were correctly classified by two related binary logistic regression analyses. In agreement with the psychological model on which the study is based, significant predictors with the unique contribution in these regressions were ‘the presence of a slowdown in R1 in some R1-S1 reading link’, ‘the value of the processing effort in S1’, ‘the presence of a slowdown in Sol in some Sol/R5-S1/S2 link’, and ‘the value of the processing effort in Sol’. The first predictor was also reliably related to monitoring success in Sanjosé et al. (2022). Relevant for teaching, the most probable processing cause of monitoring failure was the absence of deliberate links between relevant information segments to build a coherent mental representation of the worked-out example.

Some of the main outcomes reported in the literature about eye gazing in metacognitive monitoring have also been observed in the present study using the alternative measures collected by the ‘Read and Answer’ software. In agreement with other studies on the importance of the allocation and sequence of eye-fixations, the time span in eye-fixations, and backward jumps to revisit previously fixed information to succeed in a task (Strohmaier et al., 2019; Zargar et al., 2020), the present study showed that compared to unsuccessful monitors, successful monitors put more relative processing effort in crucial information, such as S1 and R1 (allocation and time in eye-fixation), repeatedly linking (by means of backward jumps) the inconsistent segments R1 and S1, and experienced slowdowns in R1 (long fixations in crucial information) when R1-S1 were linked (inconsistency effect).

However, the method used in the present study to determine the existence of an inconsistency effect, i.e. an anomalous increase in the reading time, was different from other studies. Students read the example at their own pace, segment (i.e. sentence) by segment, and frequent jum** backs were observed in their processing sequences. Therefore, the procedure used to compare the reading time on the target word/sentence and on the target-1 or perhaps, target + 1, (Kim et al., 2018) was difficult to apply. Instead, a different strict criterion was used for a ‘slowdown’ as an indicator of reading obstacles: a time per word higher than the mean value plus one SD, according to each individual reader’s data.

Using this method, the registered slowdowns on the critical segment (R1) had reading values of about 400–450 ms per symbol, comparable with the threshold of 500 ms per fixation on crucial arithmetic information used by Wu et al. (2021) to distinguish skilled from unskilled children in problem-solving tasks.

Finally, the percentage of competent monitoring of the worked-out example by graduated engineers was low, suggesting that develo** cognitive abilities does not imply develo** metacognitive abilities. However, metacognitive abilities have a significant influence on the learning of mathematics (Desoete & DeCraene, 2019). This warns about the need of deliberately develo** metacognitive skills in mathematics education by proposing, for instance, specific tasks with incorrect worked-out examples.

In mathematics, overconfidence in performance seems to be the main cause of poor monitoring (Baars et al., 2017). In the present study, overconfidence in monitoring ability could be a cause explaining the failed monitoring in some non-detectors, but not in others. Students in the blocked monitoring profile were not able to detect the error in R1 although they were actively seeking for it. These students were misled by the embedded reversal error despite the math education they received.

In the present study, there was no evidence that the participants paid little attention to the task, as the total time devoted to read the example was high (about 6 times the time needed to read it once). In addition, students’ mean reading speed for text segments (122 word/min) was clearly lower than the reading speed in educated adults for normal texts, 200–400 word/min (Rayner et al., 2016), and the value for algebra (129 symbol/min) was also slower than the one (250 symbols/min) obtained by Andrà et al. (2015) in a study of students’ fixation times reading algebraic expressions, and by Jansen et al. (2007) with experts. Regardless, more studies are needed to reliably determine the normal reading speeds in similar monitoring tasks.

Conclusions

Data on students’ processing behaviour in a monitoring math task was collected online and analysed using the software ‘Read and Answer’ with low complexity and cost. Behavioural variables were defined from a psychological model to account for the monitors’ mental processing. These behavioural variables were able to significantly predict the monitoring outcomes and allow identifying some differences in the mental processes of students with the same observable, successful or unsuccessful, task products. As a result, a taxonomy of four student profiles while studying worked-out examples of algebraic nature was proposed and described using both the task products and the processing behaviour data.

Pending further replication or extension studies, this taxonomy could lead teachers to better focus their help on their students. In normal circumstances, teachers can only access this general data, at best, and then it is difficult for them to elaborate an accurate hypothesis to explain why a student fails. The present study emphasizes the need for teachers to access more accurate student processing data if effective instructional models capable of addressing individual differences are to be advanced.

Limitations

Some limitations must be pointed out, and future research will be devoted to overcoming them. First, the participants’ problem-solving competence could be tested in subsequent studies so that new data could be interpreted with more confidence. Second, future studies could examine a wider range of possibilities involving different academic levels, a greater variety of worked-out examples, and diverse associated tasks.