1 Introduction

Tasks can be understood both more narrowly as specific opportunities to learn (OTLs, McDonnell, 1995) and more broadly as classroom tasks creating a context for thinking about and learning mathematics (Tekkumru-Kisa, 2020). As such, they are prompts specifying the expected products to be created, strategies to be performed, and resources to be used by students (Doyle, 1983; Stein et al., 1996). Tasks have long been a matter of interest in educational effectiveness research, mostly conceptualized as a part of the learning opportunities provided by the teacher and used in different ways with varying scope by the students (Hiebert & Stigler, 2023). Based on theoretical reflections, high-potential tasks can thus be considered to have a positive impact on both the quality of instruction as well as on cognitive and affective student outcomes. However, strong empirical evidence to support this claim is still missing (for an exception see Baumert et al., 2010). A few studies have found empirical evidence for the link between task features and student learning gains (English, 2011; Sullivan & Mornane, 2014) and student motivation (Heinle et al., 2022). The picture for the relation between tasks and teaching quality is less clear, with several studies pointing to relations between task potential and teaching quality (Herbert & Schweig, 2021; Hill & Charalambous, 2012; Matsumura et al., 2002, 2006) while others have found no (significant) effects (Joyce et al., 2018).

In light of the ongoing discussion on theorizing and measuring teaching quality from a mathematics educational perspective, classroom tasks are of interest as they are not only an important tool to shape the students’ experience in learning mathematics but also a means for the students to engage with new information, operations, and strategies (Borko et al., 2008; Doyle, 1983). In order to better understand the role of tasks in relation to the quality of mathematics instruction, the present study aims to identify relevant characteristics of task potential and subsequently to contrast three different interpretations of task potential within models of educational effectiveness: understanding task potential as part of teaching quality, as part of teacher competence, and as an independent construct predicting teaching quality. In this paper, we first present theoretical underpinnings and possible consequences following the assumptions of either of the three perspectives. Subsequently, analyses based on data from the Teacher Education and Development-Validate Study (TEDS-Validate) are presented and their implications regarding the claims for each of the three different interpretations of task potential are discussed.

2 The role of tasks in educational effectiveness research

According to Stein et al. (1996), the role of classroom tasks for mathematics teaching and learning is twofold, in that they “determine not only what substance [the students] learn but also how they come to think about, develop, use, and make sense of mathematics” (p. 459). With regard to the mathematical substance, those tasks encouraging students to link different content are particularly interesting as they help to promote conceptual understanding and enable the students to experience mathematics as a coherent system (Hiebert & Grouws, 2007; Richland et al., 2012). In terms of tasks as prompts for doing mathematics, there have been numerous approaches to identifying high-potential tasks. Most of them focus mainly on different aspects of the tasks’ potential aiming to promote students’ mathematical learning and understanding. In some cases, tasks are classified holistically according to their cognitive demand level, largely following Bloom et al. (1956) as well as similar, yet subject-specific taxonomies (e.g., Stein et al., 1996). In contrast, there are many studies and frameworks that focus on individual task features in order to investigate the potential of tasks, such as in terms of fostering single or multiple mathematical competencies as an important goal of mathematics education (sensu Niss, 2015; cf., Brunner et al., 2019; Drüke-Noe, 2014; Turner et al., 2015). Other ambitious projects show how combining different criteria of task potential allows for a broader view of the underlying construct (Herbert & Schweig, 2021; Neubrand et al., 2013; Yeo et al., 2022).

However, the consensus is that the inherent potential of tasks is usually very different from how the task is enacted by the teacher and students in the classroom (Boston & Smith, 2011; Hatch & Clark, 2021; 2014; Yeo et al., 2022). This differentiation is depicted in more detail in the mathematical task framework by Stein et al. (1996) illustrating specific phases of task implementation and task enactment. First, there is a special focus on the teacher’s knowledge, skills, attitudes, and goals, based on which the tasks as represented in curricular/instructional materials are adapted and structured and thus appear as tasks as set up by the teacher in the classroom. Depending on students’ dispositions and prevailing norms and standards in the classroom, the tasks are then enacted by the students in the classroom, ultimately translating into student learning.

Considering more comprehensive theoretical models of teaching and learning mathematics prevalent in educational effectiveness research (Praetorius & Charalambous, 2023; Kyriakides et al., 2023), as well as the widely discussed duality of intended and enacted curriculum (for an overview, see Thompson & Huntley, 2014), the question emerges where to place the potential of tasks in relation to the different influencing factors related to teacher, instruction, and context. The following paragraphs will shed light on some approaches in this regard.

2.1 Tasks as indicators of teaching quality

Teaching quality is widely understood as interactions between students and teachers, that support student learning (i.e., successful or effective teaching) considering relevant norms and values of the given educational context (good teaching, Fenstermacher & Richardson, 2005). Watson and Ohtani (2015) view tasks as the bedrock of classroom life, emphasizing the way the mathematical tasks with which the students engage shape both their learning and their experience of the nature of mathematics. As such, teachers make decisions with regard to their goals, the content and the planned activities throughout the lesson by selecting or creating the tasks for the students to work on (König et al., 2020). At the theoretical level, this important role of tasks is reflected in offer-use models, in which the quality of the teaching materials (e.g., instructional tasks) is conceptualized as part of the teaching quality (e.g., Helmke, 2014; for an overview, see Vieluf et al., 2020).

In accordance with this assumption of task potential as a part of teaching quality, Neubrand (2002) was able to show that an examination of the tasks used during the lessons allows for inferences to be made about the teaching taking place, drawing on data from the lesson videos as part of the Third International Mathematics and Science Study (TIMSS-Video). In German-speaking countries, cognitively challenging tasks are seen as part of the cognitive activation dimension within the Three Basic Dimensions of teaching quality (Praetorius et al., 2018; Vieluf & Klieme, 2023) and thus interpreted in various studies as a direct indicator of teaching quality, such as the Cognitive Activation in the Classroom study (COACTIV, Baumert et al., 2010; Neubrand et al., 2013) as well as the Teaching and Learning International Survey (TALIS-Video, Herbert & Schweig, 2021).

2.2 Tasks as indicators of teacher competence

With regard to teacher competence, the current discourse in research on mathematics education often refers to Shulman’s (1986) different types of knowledge, namely content knowledge (CK), pedagogical content knowledge (PCK), and general pedagogical knowledge (GPK). Ball et al. (2008) have further elaborated on the subject-specific facets with a particular focus on mathematics and mathematical knowledge for teaching (MKT). They distinguish three types of CK (common content knowledge, specialized knowledge and knowledge at the mathematical horizon) as well as three types of PCK (knowledge of content and students, knowledge of content and teaching and knowledge of curriculum). In recent years, the concept of teacher noticing as a process of perception, interpretation, and (in some conceptualizations) decision-making has added a situation-specific, more proximal component to the discourse around how teachers’ competencies shape their instructional practice (PID, Blömeke et al., 2015; König et al., 2022; van Es & Sherin, 2021). There is some evidence that, while subject-specific facets of teacher knowledge (CK, PCK) alone do not account sufficiently for variation in teaching quality or student achievement, the inclusion of PID as mediating variable can help uncover effects on both the quality of instruction and student achievement (Blömeke et al., 2022; Kersting et al., 2012; for a different finding see Baumert et al., 2010).

As stated previously, the selection of tasks for a lesson is one of the most important decisions a teacher makes with regard to students’ opportunities to learn and engage with mathematics (Boston & Smith, 2011). These decisions are made either in advance of the lesson during lesson planning or in the moment as a form of adaptation or improvisation. It can be hypothesized that different aspects of teachers’ knowledge as parts of their lesson planning competencies are thus manifested in the careful selection and creation of adequate tasks for the students to work on (Cevikbas et al., 2023; Hammer & Ufer, 2023; Yinger, 1980). At the same time, teachers’ skills in terms of noticing are likely to be essential for their ability to deviate from the previously made plan in a meaningful way by, for instance, adapting or creating new mathematical tasks in the moment. This teaching practice is also referred to as disciplined improvisation and can be observed especially among those considered as expert teachers (Hatch & Clark, 2021).

2.3 Tasks as independent predictors for teaching quality

A third perspective on the potential of tasks in the complex frameworks surrounding the teaching and learning of mathematics is them taking on an independent role beyond the competence of teachers and the teaching quality. This idea is illustrated in the socio-didactical tetrahedron introduced by Rezat and Sträßer (2012). Therein, they conceptualize classroom artifacts (including instructional tasks) as a fourth corner point of the well-known didactical triangle, traditionally comprising content, student and teacher. The addition of artifacts to the model is used to describe how tasks not only interact with each of the other three elements but can also shape the interactions among them. For instance, teachers use a sequence of tasks to structure and break down new mathematical content, while tasks allow the students to actively engage with this content.

On an empirical level, this approach is applied in studies such as by Joyce et al. (2018), who investigated the relationship between the quality of instructional tasks and contextual factors related to teacher, class, and school, on the one hand, and the quality of instruction measured by different instruments, on the other. However, they found no significant correlation between the quality of the tasks and either the teachers’ competence or the scores from the lesson observations.

2.4 Research questions

As it became clear upon reviewing existing literature on tasks in educational effectiveness research, there is no clear consensus which role the potential of mathematical tasks plays in relation to other key variables regarding the teacher or instruction. Thus, despite elaborate theoretical approaches to illustrate the interplay of teacher, task, students, and context, further research is needed to support the use of measures of task potential as indicators for either teaching quality or teacher competence. The present study aims to contribute to bridging this gap by examining the relations between teaching quality and the potential of the tasks employed during the same lesson, accounting for the influence of teachers’ subject-specific competencies.

These considerations are essential to validating interpretations of the relations between the aforementioned constructs. Theoretical work suggests such relations, however, finding empirical evidence may be difficult due to the diversity of the constructs and their measurement, as well as the generally high complexity of teaching and learning processes (Doyle, 1986). In addition, the aim is to contribute to the discussion about the role of tasks for the quality of mathematics teaching. The analysis of the present data as well as the interpretation of the results are therefore performed in the light of the following research questions:

2.4.1 (RQ1) Is task potential an indicator of teaching quality?

If task potential can be regarded as an indicator of teaching quality, the two should correlate at an effect size close to reliability (i.e., moderate to high). In case of the analysis of task potential and classroom observations, both would measure the same underlying concept (teaching quality).

2.4.2 (RQ2) Is task potential an indicator of teacher competence?

As outlined previously, two facets of teacher competence in particular appear relevant with regard to task selection and implementation. Assuming the deliberate selection and implementation of tasks can indicate the teachers’ lesson planning competence (Hammer & Ufer, 2023), they may correlate with their mathematical content knowledge (MCK) and/or mathematical pedagogical content knowledge (MPCK), whereas high correlations to the teachers’ skills of mathematical perception, interpretation, and decision-making (MPID) would suggest the interpretation of task potential as an indicator for teacher noticing and disciplined improvisation. In case of a positive effect of both task potential and teacher competence on the quality of instruction, there should be no notable interaction effects between the measures of task potential and teacher competence.

2.4.3 (RQ3) Is task potential an independent predictor of teaching quality?

Lower correlations with teaching quality as well as low or no correlations with teacher competence suggest task potential being an independent predictor of teaching quality. In particular, the occurrence of interaction effects between teacher competence and the potential of the tasks would point in this direction.

3 Methods

For detailed information on the study design as well as the teacher competence and teaching quality measures (e.g., item examples, scaling), we refer readers to the electronic supplementary materials in Blömeke et al. (2022). In addition, an appendix to this paper provides further explanations on the classification instrument that was used to conduct the rational task analysis (see Appendix A).

3.1 Study design and sample

Data were collected within the study TEDS-Validate, which is a successor of the international Teacher Education and Development Study in Mathematics (TEDS-M, Tatto et al., 2008). TEDS-M focused on both the training and the competencies of teachers in different countries, while various national follow-up studies based on the design and key findings of TEDS-M included further constructs from research on the teaching and learning of mathematics. As one such example, the study TEDS-Validate was carried out in the German federal states of Hesse, Saxony, and Thuringia, aiming to investigate the presumed relations between teachers’ professional competencies, teaching quality, and students’ learning gains. A total of 113 in-service teachers participated in the study, including a web-based evaluation of their dispositions (both cognitive and affective-motivational) as well as situational-specific skills (Kaiser & König, 2020). During this session, teachers were allowed to pause the test. In addition, lesson observations were carried out in which two raters assessed the quality of instruction in vivo across different dimensions (Schlesinger & Jentsch, 2018). These data were subsequently linked to the students’ learning gains (for the results and discussion of the full effect chain, see Blömeke et al., 2022).

Within the context of this study, we sampled 2490 tasks from 31 teachers (grade 5 to 10, median = 30 tasks per lesson), all teaching their own mathematics classroom. 63% of the teachers were female, and half of the participants taught at a school from the academic track. Teachers‘ median age was 46 years, ranging from 26 to 64. On average, teachers had 17 years of experience on-the-job (ranging from one to 40 years). Regarding data collection, tasks were collected within two double lessons per teacher (mostly 2 × 90 min). The teaching quality was also assessed during these same two lessons. As teachers conveniently participated in our study, self-selection bias is likely, but also almost unavoidable for studies of this type.

In order to subsequently link the task potential with the characteristics of the teachers, it is necessary to presume a certain degree of freedom on the part of the teachers when selecting the tasks. As the lesson planning processes were not evaluated in the course of the study, it cannot be ruled out that participating teachers may, for instance, have relied heavily on a particular textbook or planned the lessons with colleagues and thus also drafted on their knowledge and beliefs. However, in Germany, there are generally no guidelines for schools and teachers with regard to the selection of certain textbooks or materials. Ultimately, all teachers were able to choose the different tasks for their lessons freely and independently. We therefore argue that we can indeed establish a connection between the tasks’ potential and the teachers’ competencies.

3.2 Measures

3.2.1 Task potential

Following prior work on mathematical tasks, we understand them as prompts asking the students to create a product as the result of dealing with a specified mathematical situation. In studying tasks and their potential, one often neglected issue is dealing with different lengths of tasks (Stein et al., 1996). In the context of this study, different (sub-)tasks are distinguished according to the underlying definition if the products to be created or the corresponding mathematical situations differ sufficiently from one another, resulting in a comparatively fine-grained perspective on classroom activities. As the solution paths available to the students at a certain grade level are understood to be inextricably linked to the tasks (Doyle, 1983), they as well are part of the respective units of analysis.

In order to assess the potential of the tasks, a rational task analysis (see Resnick, 1975) was conducted using the classification instrument developed in the context of the study TEDS-Validate (Adleff et al., 2023). Departing from prior work for each of the criteria, the tasks were examined in regard to the following aspects (for details, see Appendix A):

  • Interconnectedness (IC) of different content areas stemming from the German national curricula for secondary mathematics education (numbers, measurement, space and shape, functions and algebra, and data and probability, see KMK, 2004),

  • their potential for fostering general, process-oriented mathematical competencies (see Niss, 2015; Niss & Højgaard, 2019; Turner et al., 2015), namely modelling (MM), problem solving (PS), reasoning and argument (R&A), use of representations (REP), use of symbols and operations (S&O), and communication (COM),

  • and the predominant knowledge facet (KF; factual, procedural, conceptual, or metacognitive) and cognitive activity (CA; remember/reproduce—understand/apply—analyze/evaluate—create) required to solve the task, both adapted from Anderson and Krathwohl (2001).

For the measure of interconnectedness, the content areas encountered in the task were counted, whereas the other criteria were assessed by trained raters on four-point scales. At least 20% of the sample were double-coded with an overall satisfactory to very good agreement for the different categories (0.66 ≤ Cohen’s κ ≤ 0.97, for further details, see Appendix A).

To obtain an overall measure of task potential for each lesson, the maximum value attained by at least one task implemented during the lesson was determined for each of the task characteristics. In contrast to other aggregate measures such as absolute and relative frequencies, this approach prevented any distorting influence of the different numbers of tasks per lesson. Subsequently, a threshold value distinguishing tasks of high and low potential was defined for each of the categories in such a way that the two groups of lessons resulting from the division based on the occurrence of high-potential tasks were largely equal in size. For instance, if out of all tasks discussed in a lesson at least one scored beyond the designated threshold, then the lesson was assigned a high potential in that category (1: high potential, lesson contains at least one task scoring at or beyond the threshold, 0: low potential, lesson contains only tasks that do not reach the threshold). This procedure helps to facilitate the statistical analysis and ensures comparability between lessons with different amounts of tasks used.

3.2.2 Teaching quality

Teaching quality was assessed via a standardized observation instrument employing high-inferent in vivo (live) rating with trained observers (Schlesinger et al., 2018). The instrument covered the Three Basic Dimensions (Praetorius et al., 2018), that is classroom management (three items, e.g., “time on task“), student support (four items, e.g., “dealing with heterogeneity“), and cognitive activation (seven items, e.g., “challenging questions and problems“), as well as more subject-specific aspects of teaching quality (mathematics educational structuring, seven items, e.g., “mathematical correctness“, details in Schlesinger et al., 2018). All items were rated typically four times per double lesson (i.e., every 22.5 minutes) on four-point rating scales. Given the small sample size in our study, these four dimensions were aggregated to a weighted composite score as demonstrated by Blömeke et al. (2022). However, additional correlation analysis (see Appendix B) shows that the reported findings for the composite score are largely equivalent to those of mathematics educational structuring.

3.2.3 Teacher competence

Mathematics pedagogical content knowledge (MPCK) was assessed with the original test constructed for TEDS-M (Tatto et al., 2012). The test consists of 28 items that are mainly presented in a multiple-choice format and cover both curricular knowledge and knowledge on how to teach mathematics (Blömeke et al., 2010). Rasch analysis led to one-dimensional MPCK test scores with satisfactory internal consistency (Cronbach’s α = 0.79). Teachers‘ skills in perception, interpretation, and decision-making were captured with a video-based instrument that was developed within a follow-up study of TEDS-M and involved three short scripted video clips (video length range 2.5 to 4 min, see Kaiser et al., 2015, for more information). The clips showed typical classroom situations in grade 8–10 and teachers were asked to perceive (P), interpret (I), and make decisions (D) in regard to the situations in the video clips. The instrument involved both a mathematics-didactical (MPID), and a pedagogical test (PPID). Blömeke and colleagues (2022) were able to find an effect specifically of MPID on both teaching quality and student achievement in the data from TEDS-Validate and a previous study. Furthermore, recognizing and adapting the potential of tasks for teaching and learning is highly dependent on the respective subject and content. Thus, we only use the MPID results (32 items involving approx. 50% multiple choice and constructed response, respectively). Rasch analysis was applied, which resulted in satisfactory reliability (α = 0.72) for a one-dimensional MPID test score. Yang et al. (2019) discuss first evidence for two-dimensional scores, in which P items are empirically separated from I and D items.

3.3 Statistical analysis

First, we present basic descriptives for all categories of task potential. Second, bivariate correlations for all variables in this study are calculated. Third, we have performed multiple regression analysis, where teaching quality is regressed on task potential and teacher competence, involving separate models for MPCK and MPID. In doing so, we include both main as well as interaction effects and take the nested structure of our data (two lessons per teacher) into account by estimating clustered standard errors. We apply Cohen’s classification (Cohen, 1992), in which correlations with absolute values of at least 0.30 are regarded as moderate effects, and absolute values of at least 0.50 are considered large. Similarly, standardized mean differences (Cohen’s d) > 0.50 are considered moderate, and those > 0.8 are regarded as large effects. Note that because of standardized variables, regression coefficients can be directly interpreted as effect sizes. All analyses presented in this paper were performed in IBM SPSS 29 involving the Complex Samples software.

4 Results

4.1 Occurrence of high-potential tasks

Table 1 presents how high-potential tasks are distributed across lessons. We see that our data are zero-inflated, such that the amount of lessons with no occurrence of a high-potential task ranges between about a third (S&O) and more than two-thirds (IC, COM, KF) of the total number of lessons. Thus, we collapsed our count measure to a binary score for further analysis, where – for each of the task characteristics in turn – lessons employing high-potential tasks (HP) are compared to those in which teachers do not make use of high-potential tasks (LP).

Table 1 Occurrences of high-potential tasks (n = 60 lessons)

4.2 Bivariate correlations

Table 2 presents bivariate correlations between measures of teaching quality, teacher competence, and task potential. We see that teaching quality is associated moderately positively with IC, PS, and REP, but also negatively with R&A and S&O. This suggests that lessons employing at least one high-potential task regarding IC, PS, and REP were also rated higher on teaching quality. MPCK is not related to task potential for most indicators of task potential. However, there are moderate positive effects for PS and CA, as well as a small effect for COM, which indicate that lessons emphasizing problem-solving activities, more challenging cognitive activities, and mathematical communication were taught by teachers with higher levels of MPCK. For teachers‘ situation-specific skills (MPID) we find very similar results, in the sense that the same measures are connected to higher levels of MPID. Here, we observe another substantive positive correlation for IC. On the contrary, MM, R&A, and S&O are only marginally related to teaching quality or teachers’ competence, including negative associations with teaching quality for R&A and S&O. This suggests that particularly MM and COM do not act as indicators or predictors of teaching quality in our study. However, although we do observe several positive associations with teaching quality or teacher competence, these are usually of moderate effect size at best (see, e.g., the correlation between MPCK and MPID in Table 2).

In regard to intercorrelations of task characteristics, we find that IC and REP, as well as PS and CA are closely related. This means that lessons in which representations of mathematical objects are promoted, teachers also focus on the connection between various mathematical ideas. At the same time, problem-solving activities and higher-order cognitive activities are associated to a large extent. Finally, we also note that there is a moderate correlation between the occurrence of tasks with a high potential for mathematical modelling and those that require a high amount of mathematical communication in class, as well as a moderate correlation between KF and CA.

Table 2 Bivariate correlations (n = 60 lessons)

As the results for the teaching quality composite score could be difficult to interpret, we provide more detailed correlation analysis for the original four dimensions used in TEDS-Validate (Appendix B). This shows that results for teaching quality presented here reflect in particular those of mathematics educational structuring, rather than any of the other three teaching quality dimensions (i.e., the Three Basic Dimensions). This suggests that the composite score is more likely to reflect subject-specific aspects of teaching quality. The correlations with the other dimensions, in contrast, are largely of marginal effect sizes.

4.3 Multiple linear regression

Tables 3 and 4 present the results from multiple linear regressions, where teaching quality has been regressed on task potential and MPCK or MPID, respectively. We have estimated a separate model for all measures of task potential and also include the interaction between teacher competence and task potential. We note that (1) when adjusting for task potential, there is no effect of teacher competence on teaching quality and (2) we observe no interaction effects. This means that teacher competence and task potential add to teaching quality independently. In regards to task potential, both sets of models provide similar findings. They show that IC, PS, and REP are associated with higher teaching quality after adjusting for teacher competence. As a final note, we see that the correlation between MPID and teaching quality drops considerably in those models where IC, PS, and REP are included (compared to the bivariate associations in Table 2). This suggests that the effect of MPID on teaching quality could be mediated by the corresponding categories of task potential.

Table 3 Multiple linear regression of teaching quality on MPCK and task potential (n = 60 lessons)
Table 4 Multiple linear regression on teaching quality on MPID and task potential (n = 60 lessons)

5 Discussion

5.1 Task potential in relation to teaching quality and teacher competence

The present study aims to contribute to an understanding of the interplay between teacher competence, task characteristics, and teaching quality in mathematics education. The focus is on the role of tasks, specifically on examining whether tasks are (RQ1) an indicator of teaching quality, (RQ2) an indicator of teacher competence, or (RQ3) an independent predictor of teaching quality. To this end, we jointly analyze three sources of data from the study TEDS-Validate, namely the participating teachers’ scores for MPCK and MPID, classroom observations as measures of teaching quality, as well as a rational task analysis of all tasks implemented throughout the respective lessons.

With regard to the facets of teacher competence considered in the analysis, we found no direct relations between MPCK and teaching quality and only weak relations between MPID and teaching quality in our sample, which reflects trends seen in prior research on the effects of teacher knowledge and skills (Blömeke et al., 2022; Kersting et al., 2012). What is more, there is no effect for either MPCK or MPID on teaching quality when adjusting for different aspects of task potential. Considering the different aspects of task potential included in the present study, bivariate correlations suggest that teaching quality is mostly related to interconnectedness of content, problem solving, and use of representations, whereas teachers with higher scores for MPCK and MPID seemingly used more tasks with a focus on problem solving and higher-order cognitive activities.

As outlined in the previous sections of the paper, if task potential is indeed to be seen as a direct indicator of the quality of instruction (RQ1), high correlations would be expected between the two measures. However, for the sample from the TEDS-Validate study, we found varying but mostly low or even negative correlations between teaching quality and the different aspects of task potential. The highest correlations with the quality of instruction (interconnectedness, problem solving, and use of representations) are still below 0.50, suggesting that task potential and teaching quality as operationalized for the purpose of this study are unlikely to be representations of the same construct. This conclusion is in line with the insights provided by Joyce et al. (2018), who found only small correlations between the quality of assignments and teaching quality. Regarding the negative correlations, it should be noted that reasoning and argumentation occurred very seldom across the entire data set (Adleff et al., 2023) and thus the negative correlation may be a statistical artifact. The negative correlations with the use of symbols and operations suggest that the use of tasks using complex symbolic language and procedures does not necessarily contribute to a high rating of teaching quality for the observation instrument used in this study. Furthermore, the negative correlations may stem from negative correlations with other indicators of task quality, which in turn are positively correlated with teaching quality, namely interconnectedness and use of representations.

Regarding the question of task potential being an indicator of teacher competence (RQ2), we similarly found low to non-existent correlations between the measures of task potential and both MPCK and MPID. Despite the absence of interaction effects between task potential and teacher competencies, it cannot be ruled out that interaction effects remain undetected due to the small effect sizes and the relatively small sample size at the lesson level. In contrast to other studies (Baumert et al., 2010), the sample examined in this paper showed weak or no correlation between teachers’ competencies and the other constructs under consideration. Thus, no substantiated conclusions can be drawn about whether and to what extent aspects of task potential are more closely attributable to MPCK (in the sense of lesson planning competence, see Cevikbas et al., 2023) or to MPID (in the sense of disciplined improvisation, see Hatch & Clark, 2021).

Overall, our analysis suggests that task potential should be regarded as an independent construct (RQ3) in line with the framework of the socio-didactical tetrahedron (Rezat & Sträßer, 2012). Upon detailed examination of the different aspects of task potential, there appears to be a rather fragmented picture. Based on the present data, interconnectedness, problem solving, and the use of representations can be understood as predictors of teaching quality, as evidenced by up to moderate effect sizes in both the bivariate correlations and the multiple linear regressions. The fact that the small effect of MPID on teaching quality disappears in the models including any of these three task characteristics further suggests that they act as mediators between teacher competence and teaching quality.

While the other task characteristics exhibit some correlations among them, and in case of knowledge facet and cognitive activities also with teachers’ knowledge and skills, no correlations with teaching quality were found. One possible explanation in this context is that some of the considered characteristics indeed have no influence on the quality of instruction in the respective lesson. In the case of mathematical modelling, for instance, one might argue that mathematics lessons can be equally perceived as high-quality both in the realm of pure mathematics as well as by incorporating real-world context. Furthermore, references to the real world may be more obvious for some mathematical content areas than for others. It is also possible, however, that the categories in question shed light on aspects of teaching quality that are more difficult to assess by means of classroom observations in general and the observation protocol used in this study in particular. One explanation for the knowledge facet and the cognitive activities may be that both are associated more with the deep structure of teaching, whereas surface structures are often easier to gather in classroom observations (Oser & Baeriswyl, 2001; Schlesinger et al., 2018).

5.2 Limitations

In the light of the implications of the results presented above, it is essential to discuss the theoretical and methodological limitations of the study. As mentioned before, we draw on a convenience sample of teachers who volunteered to participate in the time-consuming data collection consisting of both the competence tests as well as the lesson observations, suggesting a possible positive self-selection bias (Blömeke et al., 2022). Furthermore, while a large task sample could be collected, the nested structure as well as the need to aggregate the data resulted in a relatively small sample size on the lesson level. This, in combination with the weak correlations found between the measures of teacher competence and teaching quality, lowers the chance to detect potential interaction effects.

Another limitation concerning the interpretation of the results resides in the underlying measures and instruments. While well-established measures of both teacher knowledge (Blömeke et al., 2010; Tatto et al., 2012) and situation-specific skills (Kaiser et al., 2015) were used, methodological challenges arise particularly for teaching quality and task potential, which will be briefly discussed below. There has been an active debate about the what (e.g., good vs. effective teaching, Berliner, 2005; teacher quality vs. teaching quality, Bellens et al., 2019; Seidel & Shavelson, 2007; cultural differences, Fischer et al., 2019) and the how (e.g., different perspectives, Bellens et al., 2019; Fauth et al., 2020; different instruments, Charalambous & Praetorius, 2018) of measuring instructional quality, especially in the last two decades. The present study uses data from live ratings with an instrument that includes both generic and subject-specific aspects (Schlesinger et al., 2018). All of this entails specific implicit assumptions and corresponding blind spots and it is reasonable to expect that a different underlying framework of teaching quality may lead to different results (Praetorius & Charalambous, 2023). In this case, for instance, the limited visual and auditory range of the live observers in the classroom allowed only for reduced perception of students’ individual and group work as crucial phases for the realization of task potential, which may have led to a lower fit between the analyses of the tasks and the instruction as a whole.

Regarding the analysis of task potential, the model by Stein et al. (1996) illustrates how the object of analysis is transformed during the process of task implementation. Therefore, by conducting a rational task analysis based solely on the task as set by the teacher and its inherent potential, we cannot account for the co-constructive implementation of the tasks by teachers and students during instruction. However, studies have shown that the intended potential of the tasks is rarely realized as such in the classroom context (Boston & Smith, 2011; Stein et al., 1996; Sullivan et al., 2009) and it is suggested that including, for example, student work as a realization of task potential in the classroom could lead to a more accurate prediction of teaching quality (see Joyce et al., 2018). Thus, while the use of tasks as an indicator of teaching quality brings many advantages over, for example, lesson observations in terms of minimizing effort and intervention in teaching practice, getting as close to the actual implementation of the tasks when measuring their quality (e.g., by explicitly including teachers’ and students’ voices) may be necessary to ensure that the results can be interpreted in the intended way. In further studies, it could also be beneficial to look in isolation at either the process of task selection and adaptation by monitoring the teacher’s lesson preparation, or at the way the tasks are implemented in the classroom by setting the same or comparable tasks for all the lessons studied.

5.3 Directions for future research

Looking back on the overall aim of the paper to investigate the role of task potential in relation to teaching quality and teacher competence, the first important insight emerging from the analyses is that the different characteristics foundational to the assumed construct of task potential do not seem to play a uniform role. While fostering mathematical competencies as well as the construction of inner-mathematical connections, and the promotion of higher-order thinking skills all represent important goals of mathematics education, their occurrence in the tasks used throughout the lesson does not predict the quality of instruction in the same way. Similarly, not all criteria are related to either the teachers’ knowledge or situation-specific skills. It is clear that more research is needed on the way different characteristics of tasks interact with one another as well as with other factors in models of educational effectiveness research. In future analyses, it may be interesting to distinguish aspects of task potential as a normative construct possibly related more closely to teacher competence and those aspects forming an outcome-oriented construct of task potential associated with higher teaching quality, similarly to the notion of (normatively) good and effective teaching (Berliner, 2005).

While in this study, for instance, the relatively small effects of task characteristics on teaching quality may be due to the focus on the inherent potential of the task as set up by the teacher (Joyce et al., 2018; Stein et al., 1996), another possible explanation could lie in the holistic view on teaching quality in this study. Herbert and Schweig (2021) have shown their newly developed construct of the lesson artifacts’ potential for cognitive activation (including tasks, but also, for instance, lesson plans and visual materials) to be moderately related to video-based assessment of the lessons’ potential for cognitive activation. This raises the assumption that task potential may be suitable to predict some dimensions of teaching quality more so than others, which could be another interesting strand for further research. It seems worthwhile to understand the potential of tasks as a separate perspective on teaching quality that can complement rather than replace other perspectives, such as those of students, teachers and observers, in order to paint a more comprehensive picture of the complex reality of teaching. Further research is necessary to identify how the different perspectives can be aligned in a beneficial way.

Finally, the analyses in this paper could not illuminate clearly the relationships between teacher competencies and task potential. It is well-established that the selection of tasks cannot be attributed only to teachers’ knowledge, skills, and beliefs, but also mirrors curricula and thus the schools’ and states’ perspective on high-quality mathematics education on the one hand, and the broader teaching context and available resources on the other (Joyce et al., 2018; Remillard, 1999). Further research is therefore needed, both on the selection of high-potential tasks – including, beyond teachers’ knowledge and skills, other variables at the state, school, or classroom level to better understand the factors that determine which tasks find their way into mathematics lessons – and on the way teachers adapt given tasks to fit their goals for specific learning groups. In this regard, research and teacher professional development should focus in particular on those characteristics of tasks that are related to the quality of teaching, such as, in the case of this study, the interconnectedness of different mathematical content and the opportunities for problem solving, and the desired student outcomes.