Introduction

The use of number lines as a representation of numbers has been recommended as an effective tool for students to engage and develop mathematical ideas (Education Endowment Foundation, 2017; 2020). There is strong evidence for its use with low achieving students, as number lines allow for the representation of all real numbers, supporting the acquisition of basic to advanced mathematical concepts (Fuchs et al., 2021). Number lines can also support the acquisition of mathematical operations and are used in the context of temperature, time, graphs, etc. In the last few decades, numerous studies within the fields of psychology and education have tapped children’s mental representation of numbers using the number line estimation task. Performance on the number line task has been associated with mathematical reasoning concurrently and longitudinally (Ellis et al., 2021). However, there is increasing evidence that the association between some mathematical skills and mathematical reasoning might differ based on children’s mathematical reasoning ability (Fuchs et al., 2020; Hajovksy et al., 2020). These studies have explored the effects of the association of more complex mathematical skills (e.g., calculation fluency) with mathematical reasoning, and have found differing effects. While there is evidence that supports a differential association between some basic numerical skills (e.g., approximate number system, counting, number relations) and performance in more complex mathematical tasks (Bartelet et al., 2014; Devlin et al., 2022; Purpura & Logan, 2015), this has not been studied with mathematical reasoning as an outcome. Thus, it seems possible that a more basic numerical skill, such as number line estimation, could have a differential relationship with a complex mathematical skill such as mathematical reasoning. Understanding individual differences in this association is relevant for several reasons: i) it can aid in clarifying the role of number line estimation in mathematical reasoning, ii) it can help better understand the processes involved in mathematical reasoning, iii) it can guide the targeted use of educational interventions. The latter can help target interventions, which safeguards the appropriate allocation of limited economic and human resources, by maximizing the cost–benefit relationship of educational interventions. Considering that curricula worldwide include mathematical reasoning as a main component of mathematics learning (Australian Curriculum, Assessment and Reporting Authority, 2018; Department for Education, 2014; Ministry of Education, 2021; National Council for Teachers of Mathematics, 2000), this study aims to determine if accuracy on the number line task (0–10 and 0–100) differently predicts concurrent mathematical reasoning across the spectrum of mathematical reasoning performance.

Mathematical reasoning

Mathematical reasoning is emphasized as being a critical feature of mathematics learning (Herbert & Williams, 2023; Lindmeier et al., 2018; Olsson & Granberg, 2022), and it is a key concept in curricula worldwide (Australian Curriculum, Assessment and Reporting Authority, 2018; Department for Education, 2014; Ministry of Education, 2021; National Council for Teachers of Mathematics, 2000). This construct has been defined by the OECD (2018) as involving “evaluating situations, selecting strategies, drawing logical conclusions, develo** and describing solutions, and recognizing how those solutions can be applied” (p.14). A variety of frameworks have been proposed (e.g., Ball & Bass, 2003; Jeannotte & Kieran, 2017; Lithner, 2008; Yackel & Hanna, 2003), highlighting the variability of interpretations of this concept. Lindmeier et al. (2018) suggests that mathematical reasoning, at its most basic, can be considered as “reasoning in mathematical situations” (p. 315). The authors propose that even though mathematical reasoning is usually conceptualized as requiring the use of complex knowledge and processes, there is still a form of mathematical reasoning that can take place before children are formally taught mathematics. An example of this is provided by Sumpter (2014) who analyses examples of individual and collective mathematical reasoning in the preschool years. Though the term mathematical reasoning is conceptuatlised in many different tasks, it is usually studied in the context of problem solving. This also applies to the current study. Children had to recognize the total number of items that result from combining two sets, or from taking away items from a set, understand information presented in graphical format to answer questions, among others. Mathematical reasoning has been shown to longitudinally predict performance on high-stakes mathematics assessments (Nunes et al., 2007), above and beyond arithmetic skills (Nunes et al., 2012).

The number line estimation task

Number line estimation is usually operationalized by number line tasks. Several variations of this task have been proposed. In the research literature, and in educational settings, the most frequently used version of the task is the ‘number to position bounded’ task with Arabic numbers. In this task, a horizontal line is presented with labeled endpoints (bounded) which indicate the range of the number line (e.g., 0–10, 0–100, 0–1000). Participants are presented with a series of Arabic numbers which they place on the line based on their value (number to position). Differences have been found between versions of the task regarding what they index (Reinert et al., 2019) and how they relate to mathematical achievement (Schneider et al., 2018). The focus of this study is on the number to position, bounded number line task with Arabic numbers as it is the most frequently used version of the task.

Various explanations regarding what underlies performance on the number line estimation task have been put forward (see Dackermann et al., 2015; Yuan et al., 2020; for an extensive overview). An analysis of these theoretical explanations is beyond the scope of this study. However, considering what the number line task represents is important to interpret its relationship to other mathematical skills. Initial proposals suggested that the number line estimation task mainly indexes the mental representation of magnitude (Siegler et al., 2011). According to this account, performance on the number line reflects the way numbers are represented mentally. An alternative account advocates that performance on the bounded number line task indexes the use of proportion judgment skills (Cohen & Blanc-Goldhammer, 2011). Based on this account, performance on the number line reflects the strategies used to relate the number to be estimated with the numerical range of the number line. Other accounts have proposed ordinal processing (Podwysocki et al., 2019), familiarity with numbers (Ebersbach et al., 2008, 2015), place value understanding (Moeller et al., 2009) and spatial map** (Sella et al., 2017) as underlying performance on the number line estimation task. Regardless of what underlies performance on the number line task, its relevance and relationship with other mathematical skills do not seem to be debated.

Number line estimation and its relationship with mathematical reasoning

Several studies have focused on the concurrent and longitudinal relationship between performance on the number line task and other mathematical skills across the lifespan. A meta-analysis conducted by Schneider et al., (2018) found an overall moderate correlation (r = 0.443) between number line estimation and mathematical achievement (i.e., combination of all mathematical outcomes included in the meta-analysis). This association was shown to differ by age, and to be stronger for later than concurrent mathematical achievement. Neither the numerical range of the number line, nor the type of measure used to index mathematical achievement were significant moderators of the relationship between number line estimation and mathematical achievement. However, this study did not specifically differentiate mathematical reasoning as an outcome measure.

Ellis et al., (2021) replicated this study based on seven samples, differentiating mathematical reasoning as an outcome measure. The authors found an overall relationship between number line estimation and mathematical reasoning (r = -0.36), which was moderated by age and number range. Additionally, Zhu et al., (2017) found that performance on the 0–100 number line task predicted 12% (β = -0.370) and 9% (β = -0.318) of unique variance of mathematical reasoning, in grades 2 and 4, respectively, after accounting for age, non-verbal reasoning, attention and working memory. Similarly, Ouyang et al., (2021) found that performance on a similar number line task (0–100) during the second preschool year was predictive of mathematical reasoning in the third year of preschool (β = -0.250), even after accounting for the effect of several covariates (β = -0.275) (but see Clarke et al., 2020 for different results). These studies provide evidence of a relationship between number line estimation and mathematical reasoning. However, it is relevant to further analyse if and how the number range on the number line task moderates its relationship with mathematical reasoning. Importantly, none of these studies have examined whether the relationship between number line estimation and mathematical reasoning might differ depending on children’s mathematical reasoning ability, even though previous evidence suggests this is the case for other early develo** numerical abilities.

The effects of number range on the relationship between number line estimation and mathematical reasoning

Ellis et al., (2021) found that the relationship between number line and mathematical reasoning held across different number ranges (i.e., r = -0.40, r = -0.51, r = -0.55 for the 0–20, 0–100 and 0–1000 number ranges, respectively), though they found that the relationship with mathematical reasoning was significantly higher for the 0–1000 compared to the 0–20 number range. The authors explain that these findings could arise from limitations of the included samples that led to confounds between number range and presentation medium, as within the included studies different number ranges were presented in different formats (i.e., 0–20 in paper and pencil, while 0–100 and 0–1000 on a computer). However, they also suggest that if some number ranges become too simple for children to estimate, this might decrease the relationship between performance on the number line estimation task and mathematical reasoning. Additionally, as proposed by Moeller et al., (2009) and Ebersbach et al., (2008, 2015), children might engage differently with the number line estimation task depending on its number range. This could lead to a difference in the mathematical skills being used to perform the number line task. It is possible that children might rely on different mathematical skills when dealing with highly known as compared to lesser-known number ranges. For example, children may use ordinality skills to solve the number line task with a highly known number range (e.g., 0–10), while more complex skills such as place value understanding or proportional reasoning might be needed to accurately solve the number line task with lesser-known number ranges (e.g., 0–100). However, it is also possible that other differences might depend on the characteristics of the number range itself. For example, place value understanding might become more relevant within the 0–100 or 0–1000 number ranges, as compared to the 0–10 number range. Finally, it is also possible that as children’s mathematical skillset diversifies, the strategies they choose to use to solve the number line task change. Though most of the explanations proposed regarding the skills underlying performance on the number line estimation task somewhat account for changes in proficiency throughout time (i.e., level of linearity of the mental representation of magnitude, number of benchmarks used for estimation), limited attention has been paid to the possibility that different number ranges in the number line estimation task might be reflecting different underlying mathematical skills, and that these differences might be further impacted by a variety of factors such as age, familiarity with numbers, and other mathematical skills.

The effects of the level of mathematical reasoning on the relationship between number line estimation and mathematical reasoning

Previous findings indicate that the predictive capacity of some basic numerical skills to explain mathematical skills might vary across differing levels of performance. For example, Bartelet et al., (2014) found that counting only significantly predicted fact retrieval at the 84th percentile, though differences between the 16th, 50th and 84th quantiles were not statistically significant. Devlin et al., (2022) studied the capacity of different subdomains of early numeracy (i.e., numbers, number relations and number operations) in pre-kindergarten, kindergarten and first grade to predict mathematical achievement (i.e., an assessment battery tap** several mathematical skills) a year later. They found that number operations were significantly predictive of mathematical achievement at the 50th and 80th quantiles, but not the 20th quantiles for all three grades. Number relations significantly predicted mathematical achievement across all quantiles assessed for the pre-kindergarten cohort, but it was only significantly predictive at the 20th quantile for the kindergarten and first grade cohorts. Finally, numbers significantly predicted mathematical achievement at the 20th and 50th quantiles in pre-kindergarten, at the 50th and 80th quantiles in kindergarten and only at the 50th quantile in first grade. This highlights how different subdomains of early numeracy might differently impact mathematical achievement depending on the children’s level of mathematical achievement, but also on their age. However, statistical comparisons across quantiles are not reported. Furthermore, Li et al., (2018) found that approximate number system performance in fourth grade significantly predicted concurrent strategy choice on a double-digit multiplication task for median and high quantiles (50th, 75th and 85th), but not for lower performance (15th and 25th), though statistical comparison between quantiles were not reported. Li et al., (2018) offer several possible interpretations for their findings. They initially suggest that children with poorer strategy choice also show poorer approximate number system performance; this would mean that they have to rely on more general cognitive skills such as executive functions as a compensatory mechanism. Additionally, the authors suggest that this could also be interpreted from the mutual inclusion/exclusion theory. Within this framework they suggest that by mutual inclusion, children with high performance in strategy choice also exhibit high approximate number system performance. High approximate number system performance together with high performance on the central executive (as reflected by their findings) would lead to high levels of problem solving. However, it is unclear based on these interpretations how strategy choice is impacted by the approximate number system. Relatedly, Purpura & Logan, (2015) found that approximate number system performance was a significant predictor of future early math competence for the 25th, but not the 50th or 75th percentiles when concurrent early math competence and math vocabulary were accounted for. However, comparisons across quantiles are not reported. The authors suggest that these findings are expected given that lower performance on early math competence would involve solving items with small quantities, for which reliance on the approximate number system would be higher than more advanced items which include larger quantities and making connections between them.

This phenomenon has also been observed in the predictive capacity of some mathematical skills to explain performance on mathematical reasoning. For example, Fuchs et al., (2020) with a sample of second graders found that even though calculation fluency was a significant predictor of mathematical reasoning across the performance distribution, its predictive strength was higher for the 80th compared to the 20th and 50th percentile (with there being no significant difference between the 20th and 50th percentiles). The authors propose two possible explanations for these findings. On the one hand, they suggest that children who exhibited weaker mathematical reasoning also probably have limited calculation skills, which could explain a weaker relationship between both skills for the lower and intermediate quantiles as compared to the higher quantile. This explanation resembles the one proposed by Li et al., (2018) in reference to the approximate number system performance not being predictive of strategy choice in multiplication for the lower quantiles. Alternatively, Fuchs et al., (2020) propose that children who showed low and intermediate mathematical reasoning performance tend to approach word problems by adding the numbers presented on the problems, without engaging with mathematical reasoning. This would lead to low accuracy on measures of mathematical reasoning (which in this case include subtraction problems for half of the items), while calculation accuracy might remain high. However, it is relevant to consider that, though not included in the comparison across quantiles, the estimate for calculation fluency was the lowest for the 90th percentile. Without further explanations, this would not be consistent with the authors’ interpretations of their findings. Relatedly, Hajovsky et al., (2020) found that within a sample of kindergarten through grade 2 children, calculation fluency was negatively related to concurrent mathematical reasoning at the 10th percentile, but positively related at the 50th and 90th percentiles. The authors do not specifically interpret these findings but highlight the implications of accounting for non-linear relationships between cognitive and academic skills when having to prioritise which skills to include in the context of neuropsychological assessments with children. Finally, a similar phenomenon has been shown for the relationship between general cognitive processes and mathematical skills (Dong et al., 2020; Hajovsky et al., 2020), as well as mathematics self-concept and mathematical skills (Susperreguy et al., 2018). To the best of our knowledge, this has not been studied for the relationship between number line estimation and mathematical reasoning.

Performance on the number line estimation task could have a different role across differing levels of mathematical reasoning. The reasons behind this difference and its interpretations would depend on what performance on the number line represents. For example, under the assumption that performance on the number line estimation task mainly reflects the mental representation of magnitudes (Siegler et al., 2011), then one could speculate that the accuracy of the mental representation of magnitudes would possibly have a stronger effect for the initial acquisition of mathematical reasoning skills, but that once a certain minimum level of performance has been acquired, representations of magnitude would become less important. This would entail a stronger relationship between number line estimation and mathematical reasoning for lower than higher levels of mathematical reasoning and would be consistent with previous findings showing that performance of the approximate number system is significantly predictive of early math competence only for lower levels of performance (Purpura & Logan, 2015).

Alternatively, under the assumption that completing the number line task is a proportional judgment task (Cohen & Blanc-Goldhammer, 2011), therefore entailing proportion understanding and the ability to spatially reflect these proportions, then one could have different expectations for the relationship between the number line task and mathematical reasoning. In this case, understanding of proportions between numbers could reflect complex numerical knowledge, and would therefore only become necessary when solving problems that require this complex knowledge. If this were the case, one could expect performance on the number line task to be more strongly related to mathematical reasoning for higher than lower ability levels. This would resemble Fuchs et al., (2020) findings in which calculation fluency was more strongly related to mathematical reasoning in the 80th percentile as compared to the 20th or 50th percentile (though this relationship was the lowest for the 90th percentile). Other interpretations would ensue when considering place value understanding, familiarity with numbers, ordinality or spatial map** as underlying performance on the number line estimation task.

In summary, previous studies support a relationship between the number line estimation task and mathematical reasoning. However, determining if, and how, the relationship between performance on the number line task and mathematical reasoning differs across levels of performance is key. This could allow us to better understand the role that number line estimation plays in the development of mathematical reasoning, as well as to create differentiated and targeted educational interventions if required.

The present study

This study has two aims. The first aim is to replicate the findings from previous studies that show that higher accuracy on the number line estimation task is significantly associated with higher levels of mathematical reasoning. We use linear regression to fulfill this aim. Linear regression assumes homogeneity of the association between number line estimation and mathematical reasoning. However, it is possible that there is heterogeneity in whether and how the number line estimation task predicts mathematical reasoning. Therefore, the second aim is exploratory. We want to determine if the relationship between number line estimation and mathematical reasoning holds across several points of the performance distribution. We use a conditional quantile regression approach to fulfill this aim. This approach has the benefit that it estimates the relationship between the predictors of interest (0–10 and 0–100 number line) and the outcome (mathematical reasoning) at different points of the outcome’s distribution (Petscher & Logan, 2014).

Materials and methods

Participants

The initial sample was composed of 347 children in the second year of kindergarten in Singapore (year before starting primary school). This sample represents all children who participated in a cross-sectional study, a smaller cohort of which was followed up as part of a longitudinal study reported elsewhere (Bull et al., 2021). Participants attended a total of 21 preschools which were selected from the government-supported, non-profit, and for-profit providers that are typical of the early childhood service options available to the local population. Due to children’s absence from preschool on testing days, data for one or more variables was missing for a small number of participants. Therefore, data was analyzed for missingness. Little’s (1988) Missing Completely at Random (MCAR) test was not significant, indicating that data was likely missing completely at random (χ2 = 29.986, df = 20, p = 0.166). As a result, listwise deletion was used and 23 observations were deleted from the dataset. The final sample comprised of 324 participants (162 girls, Mage = 74.1 months, SDage = 4.0).

According to parents’ reports, the sample consisted of 63.0% Chinese children, 21.0% Malay children, 9.3% Indian children and 5.2% children from other ethnicities. Ethnicity for five children was not reported. Comparison with census data indicated this sample had an under-representation of Chinese children (census = 74.2%) and an over-representation of Malay children (census = 13.3%; Department of Statistics Singapore, 2013). Information regarding parent’s highest educational level achieved was reported for two hundred and 28 children – 8.3% did not have high school qualifications, 17.3% had high school qualifications, 22.2% had post-secondary qualifications and 22.6% had a university undergraduate or postgraduate qualification. Census data reports 31.2% without high school qualification, 18.8% high school qualification, 22.7% post-secondary qualification and 27.3% university qualification (Department of Statistics Singapore, 2013).

Procedure

The study received ethical approval by Nanyang Technological University. Prior to children’s participation, parents provided written consent. All children were tested in English, which is the language of instruction, within their preschools and given a small gift for their participation. Testing time was approximately 1.5 h divided into two to three sessions, conducted on different days. Task order was randomized, with the restriction that both number line tasks had to be completed on different days.

Measures

Vocabulary and non-verbal reasoning skills were assessed to control for general cognitive functioning. Non-verbal reasoning was assessed through Raven’s Progressive Matrices test and vocabulary was assessed through the Peabody Picture Vocabulary Test. In both cases, assessments were conducted following standard procedures. Gender was classified as male/female according to parent reports.

General cognitive skills

Vocabulary

The Peabody Picture Vocabulary Test III (Dunn & Dunn, 1997) assesses children’s receptive vocabulary. After listening to a word, the child selects one out of four images that best represent the meaning of the word they heard. The assessment was conducted following standard procedures. Accuracy is indexed as the number of correct responses.

Non-verbal reasoning

Raven’s Standard Progressive Matrices Test (Raven et al., 2003) assesses children’s non-verbal reasoning. Children are presented with a matrix which is missing a section and they must choose which of the options provided corresponds to the missing section. The assessment was conducted following standard procedures. Accuracy is indexed as the number of correct responses.

Mathematical skills

Number Line Estimation (0–10)

Children were presented with a number between one and nine, and a horizontal line with zero at the left and ten at the right. Children were asked to determine where the number should go on the horizontal line. Children initially completed one practice trial (five) to confirm they had understood the task instructions and were then assessed with all numbers between one and nine. Each number was presented twice (Siegler & Booth, 2004), and an average position was calculated. Children received no accuracy-based feedback throughout the task (including practice trials). Accuracy was indexed by percent absolute error [(ǀnumber estimated—target numberǀ)/ scale of number line] × 100. Spearman-Brown split half reliability coefficient for this sample was 0.950. This number range was chosen as it was aligned with curriculum expectations (Ministry of Education, 2013), and would therefore represent a well-known number range.

Number Line Estimation (0–100)

Children were presented with a number between one and 99, and a horizontal line with zero at the left and 100 at the right. Children were asked to determine where the number should go in the horizontal line. Children initially completed one practice trial (fifty) to confirm that they had understood the task and were then assessed with an additional 26 numbers (three, four, six, eight, 12, 14, 17, 18, 21, 24, 25, 29, 33, 39, 42, 48, 52, 57, 61, 64, 72, 79, 81, 84, 90, and 96). Numbers on the lower end of the number line were oversampled following Siegler and Booth (2004). Children received no accuracy-based feedback throughout the task (including practice trials). Accuracy was indexed by percent absolute error. Spearman-Brown split half reliability coefficient for this sample was 0.917. This number range was chosen to extend beyond well-known number ranges and curriculum expectations (Ministry of Education, 2013).

Mathematical Reasoning

The Math Reasoning subtest from the Wechsler Individual Achievement Test II (WIAT-II, Wechsler, 2005) assesses the application of numerical and mathematical concepts in practical situations and the ability to solve verbally presented word problems aided by visual cues. Children might be required to answer items such as: “Abdul had three marbles in one hand and two marbles in the other. How many marbles did Abdul have in total?” which in some cases would be accompanied by pictures showing the marbles in Abdul’s hands. Other items might involve being able to answer questions based on information provided by a calendar or a ruler. Following standard procedures, testing was discontinued after six consecutive errors. Accuracy is indexed by number of correct responses.

As part of the original study the following measures were also assessed: Bracken School Readiness Test (colours, letters, numbers, sizes and shapes), non-symbolic comparison, non-symbolic to symbolic estimation, Numerical operations subtest from WIAT II and, parent questionnaire regarding expectations for learning. We had initially included performance on the Numerical Operations subtest as a dependent variable in this study, but later concluded that given the characteristics of the task (i.e., first items assessing number identification and counting, and following items assessing calculation) differences across quantiles in this task were more likely to reflect differences between skills than differences in performance across the same skill. Therefore, those analyses are not included in the study. None of the other variables were included as they were not aligned with the aims of the study.

Results

To address the aims of this study, we conducted two separate linear and quantile regressions with mathematical reasoning as dependent variable, one for each number range (i.e., 0–10, 0–100) as predictor.Footnote 1 All analyses were conducted in R (R Core Team, 2019). The linear regression was aimed at establishing if, as in previous studies, there was a mean association between performance on the number line task and mathematical reasoning. The Stats package in R was used to conduct these analyses. Including the quantile regressions had the purpose of determining if this association varied across the distribution of mathematical reasoning. The conditional quantile regression approach applied in this study allows to use the data from the whole sample to analyze the relationship between the predictors and outcome variables at different points of the outcome’s distribution (Petscher & Logan, 2014). This approach prevents the loss of statistical power that would arise if independent linear regressions were conducted with separate subgroups based on their level of performance. Wenz (2019) points out that the quantile regression approach analyses the effects of the predictors in the ‘conditional’ distribution of the outcome variable, which is conditional on all the predictors included. Even though estimate coefficients from quantile regression are interpreted similarly to those of linear regression, this effect is not on the outcome variable itself, but on the conditional distribution of that variable. As more predictors are included in the regression, it is more likely that the conditional distribution of the outcome will differ from the unconditional distribution. We focused on the 10th, 25th, 50th, 75th and 90th percentiles for these analyses. We specifically chose the 10th and 25th percentiles as these are frequently used in the research literature as upper benchmarks to indicate mathematics learning difficulties and low mathematical achievement, respectively (Nelson & Powell, 2018). The 50th percentile was chosen to represent median performance and 75th and 90th to replicate the same distribution at the higher and lower ends. Model estimates between quantiles were compared using Analysis of Variance (ANOVA). The quantreg package in R (Koenker, 2020) was used to conduct these analyses.

Descriptive statistics are presented in Table 1 with accompanying boxplots (Fig. 1). Pearson correlations between continuous variables included in the regression models are presented in Table 2. As normative data with Singaporean samples is not available for the measures used, we report on raw scores. Before fitting the linear and quantile regressions, we calculated z-scores for all continuous variables. We first present the results of the linear regressions and then the quantile regressions.Footnote 2

Table 1 Descriptive Statistics
Fig. 1
figure 1

Boxplots for all Continuous Variables Included

Table 2 Correlations between All Continuous Variables Included

Number line’s association with mathematical reasoning: linear regressions

To account for the effect of demographic characteristics and general cognitive skills on mathematical reasoning, Model 1 includes age, gender, non-verbal reasoning, and vocabulary. These were all significant predictors of mathematical reasoning. To study the association of performance on the 0–10 number line task (Model 2) we added this independent variable to those present in Model 1. Performance on the 0–10 number line was significantly associated with mathematical reasoning and explained an additional 2.9% of unique variance (Fchange (1, 318) = 14.441, p < 0.001).

To study the association of performance on the 0–100 number line task (Model 3) we added this independent variable to those present in Model 1. Performance on the 0–100 number line showed a significant association with mathematical reasoning and explained an additional 11.5% of unique variance (Fchange (1, 318) = 62.74, p < 0.001). Full results for these models are presented in Table 3 and conditional scatterplots for number line estimation are presented in Figs. 2 and 3 for Models 2 and 3, respectively (see Supplementary Figures 1 and 2 for conditional scatterplots of all continuous predictors included in Models 2 and 3).

Table 3 Results of the Linear Regression Models
Fig. 2
figure 2

Conditional Scatterplot for the effect of 0–10 Number Line on Mathematical Reasoning. The plot reflects the relationship between 0–10 number line and mathematical reasoning after accounting for the remaining predictors included in Model 2

Fig. 3
figure 3

Conditional Scatterplot for the Effect of 0–100 Number Line on Mathematical Reasoning. The plot reflects the relationship between 0–100 number line and mathematical reasoning after accounting for the remaining predictors included in Model 3

Model 3 shows better model fit than Model 2 (ΔAIC = 44), and more variance in mathematical reasoning was explained by the 0–100 than the 0–10 number line (11.5% vs. 2.9%, respectively). We assessed for performance differences between both number line tasks through a Wilcoxon signed rank test as the normality assumption was not met. We found that children were significantly more accurate in the 0–10 than the 0–100 number line task (p < 0.001). We assessed if the association between performance on the number line task and mathematical reasoning differed according to the number range by using Zou’s (2007) confidence interval method through the cocor package in R (Diedenhofen & Musch, 2015). We found that the correlation between mathematical reasoning and the 0–100 number line was significantly higher than the correlation between mathematical reasoning and the 0–10 number line (r = -0.51 and r = -0.29, 95% CI = [0.104, 0.336]). These findings seem to suggest a stronger association with mathematical reasoning of the 0–100 number line as compared to the 0–10 number line. We assessed if Model 2 was redundant in the presence of Model 3 by using the Cox Test and the Davidson-MacKinnon J test for comparing non-nested models through the lmtest package in R (Zeileis & Hothorn, 2002). We found that each model explained some variance in mathematical reasoning that was not accounted for by the other model (p < 0.05 for all comparisons). Therefore, we maintained Models 2 and 3 for the quantile regressions.

Number line’s association with mathematical reasoning: quantile regressions

We retained Models 2 and 3 respectively to conduct quantile regressions (see Table 4). Performance on the 0–10 number line showed a significant association with mathematical reasoning across all quantiles assessed. Slope comparisons between quantiles did not show a statistically significant difference (F (4, 1616) = 0.507, p = 0.730). Non-verbal reasoning and vocabulary were also significantly associated with mathematical reasoning across all quantiles. Slope comparisons showed statistically significant differences across quantiles for vocabulary only (F (4, 1616) = 3.787, p = 0.005). Post-hoc multiple comparisons (after Bonferroni adjustment) showed that the association of vocabulary and mathematical reasoning was significantly higher for the 25th than the 90th percentile (Δβ = 0.260, p = 0.028). Figure 4 is the quantile coefficients plot for this model, which represents the slope coefficients for each predictor at each quantile examined accounting for all other predictors included in the model.

Table 4 Results from Quantile Regressions
Fig. 4
figure 4

Quantile Coefficients Plot with 0–10 Number Line as Predictor. The dots represent the slope coefficients (in y axis) for each quantile selected (in the x axis) and includes a line connecting them. The shaded bands represent the confidence intervals for the quantile regression coefficients at each quantile. The coefficient from the ordinary least squares regression is plotted in a full horizontal line with confidence intervals (discontinuous line). When zero is within the values in the y axis, it is represented by a full horizontal line

To isolate the effect of the number line from that of the other predictors included we also conducted quantile regression including only 0–10 number line as predictor. In this model 0–10 number line showed a significant association with (p < 0.05) mathematical reasoning for all quantiles included. Slope comparisons between quantiles showed a statistically significant difference between quantiles (F (4, 1616) = 2.818, p = 0.024). However, post-hoc multiple comparisons (after Bonferroni adjustment) were not statistically significant. Full results for this model appear in Supplementary Table 1.

Performance on the 0–100 number line showed a significant association with mathematical reasoning across all quantiles assessed. Slope comparisons between quantiles did not show a statistically significant difference (F (4, 1616) = 1.309, p = 0.264). Non-verbal reasoning and vocabulary were also significantly associated with mathematical reasoning across all quantiles. Slope comparisons did not show statistically significant differences across quantiles. Figure 5 is the quantile coefficients plot for this model, which represents the slope coefficients for each predictor at each quantile examined accounting for all other predictors included in the model.

Fig. 5
figure 5

Quantile Coefficients Plot for 0–100 Number Line as Predictor. The dots represent the slope coefficients (in y axis) for each quantile selected (in the x axis) and includes a line connecting them. The shaded bands represent the confidence intervals for the quantile regression coefficients at each quantile. The coefficient from the ordinary least squares regression is plotted in a full horizontal line with confidence intervals (discontinuous line). When zero is within the values in the y axis, it is represented by a full horizontal line

As with the 0–10 number line, we also conducted quantile regression only including the 0–100 number line in the model. Performance on the 0–100 number line showed a significant association with mathematical reasoning across all quantiles assessed. Slope comparisons between quantiles did not show a statistically significant difference (F (4, 1616) = 1.983, p = 0.095). Full results for this model appear in Supplementary Table 1.

Discussion

The first aim of this study was to replicate the findings from previous research that show performance on the number line estimation task is significantly associated with mathematical reasoning. We replicated previous results finding that children that were more accurate in the number line tasks tended to show better performance in mathematical reasoning for both number ranges. The regression estimate for the 0–100 number line (β = -0.377) in this study is similar to that of Zhu et al., (2017) with second graders (β = -0.370), but slightly higher than that found by Ouyang et al., (2021) with preschool children (β = -0.275). Performance on the 0–10 number line explained 2.9% of unique variance of mathematical reasoning while performance on the 0–100 number line explained 11.5% of unique variance. These findings suggest that, at a group level, our sample behaves similarly to those of previous studies.

The second aim was to determine if the relationship between number line estimation and mathematical reasoning holds across several points of the ability distribution. We found that for both number ranges, performance on the number line task showed a significant association with mathematical reasoning across all differing levels of performance. Even though the association tended to be higher for lower as compared to higher quantiles, we did not find statistically significant differences between quantiles. This is consistent with previous findings from Bartelet et al., (2014) who found that performance on a number comparison and an estimation task were associated with fact retrieval across the ability distribution.

These findings add to the evidence that suggests that performance on the number line estimation task is strongly associated with concurrent mathematical reasoning and provides initial evidence that this is the case across all levels of ability. However, it is also possible that the homogeneity found in the relationship between number line estimation and mathematical reasoning is characteristic of younger children, and that when the demands of mathematical reasoning become more complex, further heterogeneity would arise. This would be consistent with some of Devlin et al., (2022) findings, as number relations was significantly predictive of mathematical achievement across all quantiles when assessed in pre-kindergarten but was only predictive at the 20th quantile in kindergarten and first grade. Future studies should expand to higher age ranges to explore if the relationship between performance on the number line estimation task and mathematical reasoning remains homogeneous in older children.

In the Introduction we proposed that differential relationships between performance on the number line task and mathematical reasoning might contribute to our understanding of what underlies performance on the number line estimation task and allow a closer approximation to understanding how this might support mathematical reasoning. However, as significant differences between quantiles were not found, it is difficult to draw conclusions regarding what drives this relationship.

The number range of the number line task seems to play a role in the association between performance on the number line task and mathematical achievement, at least at the group level. Findings from the linear regression showed that at the group level, the association between number line estimation and mathematical reasoning was significantly higher for the 0–100 than the 0–10 number line task. These findings are consistent with the results of Ellis et al., (2021) who found that while both the 0–20 and 0–1000 number line tasks significantly predicted applied problems, the association was significantly stronger for the 0–1000 than the 0–20 number range. However, this is not consistent with Schneider et al., (2018) who found no effect of number range on the relationship between performance on the number line and mathematical achievement. Though the estimates for the 0–100 number line seem higher than those of the 0–10 number line (0.140 < Δβ < 0.240) across all quantiles, we cannot know if these differences are of statistical significance, as to the best of our knowledge these cannot be formally compared.

Several reasons could explain the differences found between number ranges in the linear regression. For example, children could show more consistent performance on the number line estimation task with lower compared to higher number ranges. This would lead to overall less variability in performance on the number line estimation task with lower number ranges. This difference in variability across both number ranges is also reflected in the current sample (see Fig. 1), but it does not seem to be the only possible reason behind differences between number ranges. Relatedly, as suggested by Ellis et al., (2021), it is possible that when a number range becomes too easy, its relationship to mathematical ability might reduce. This has practical implications when using the number line estimation task in educational, research and clinical settings, as choosing a number range that is too easy for children might hinder the task’s predictive capacity. Aligning the number range chosen to curriculum expectations is common practice (Link et al., 2014). However, our findings show that the number range that was aligned to curriculum expectations (i.e., 0–10), appears less predictive than the one that extended beyond them. This highlights the relevance of the number range chosen for the number line estimation task.

Alternatively, it is possible that as mentioned in the Introduction, performance on the number line task might index different skills depending on the number range being considered. If children are engaging with the 0–10 number line differently than with the 0–100 number line, then the differences between number ranges would reflect distinct underlying mathematical skills and their relationship with mathematical reasoning. Though this is a possibility, we would consider it more likely if variations in predictive strength between the number ranges were also accompanied with differences in the patterns found in the quantile regression. This was not the case in this sample, as the patterns of association between the number line estimation task and mathematical reasoning across the mathematical reasoning distribution were similar between both number ranges. These results suggest that the skills underlying performance on the number line task are significantly related to mathematical reasoning, though the number range included in the task could play an important role. Which role this might be remains unclear.

Findings from the linear and quantile regressions also highlighted the relevance of vocabulary and non-verbal reasoning for mathematical reasoning. These cognitive skills were significantly associated with mathematical reasoning across the ability distribution. Similar to performance on the number line task, the association of non-verbal reasoning and vocabulary tended to decrease as mathematical reasoning performance increased. However, significant differences between quantiles were only observed for vocabulary between the 25th and 90th quantiles when performance on the 0–10 number line task was included in the model. Our findings regarding the role of vocabulary are not consistent with Fuchs et al., (2020) findings which showed that the association of language comprehension on mathematical reasoning was higher for higher levels of mathematical reasoning. However, our findings are consistent with those of Fuchs et al., (2020) who also found that non-verbal reasoning was a significant predictor of mathematical reasoning across all quantiles, with no significant differences between them. Overall, these findings highlight the relevance of considering both mathematical and non-mathematical skills when analyzing children’s mathematical reasoning.

Implications

The fact that performance on the number line task predicted mathematical reasoning across its distribution highlights the potential of the number line estimation task as an assessment tool to provide relevant information for all children regardless of their level of ability. However, the number range chosen for the number line might impact this potential. This has practical implications when using the number line estimation task in educational, research and clinical settings, as choosing a number range that is too easy for children might hinder the task’s predictive capacity. Aligning the number range chosen to curriculum expectations is common practice (Link et al., 2014). However, our findings suggest that the number range that was aligned to curriculum expectations (i.e., 0–10), appears less predictive than the one that extended beyond these expectations. This highlights the relevance of selecting a number range that is slightly challenging. Finally, our findings serve as a reminder that, at least when mathematical reasoning is concerned, it is crucial to consider the non-mathematical demands involved in mathematical reasoning. We remind the reader that the cross-sectional nature of this study does not allow to establish causal relations between performance on the number line estimation and mathematical reasoning. Thus, these implications should be read in the light of this limitation.

Limitations

One of the limitations of this study, as previously highlighted, is that conditional quantile regression considers the specified quantiles on the conditional distribution of the outcome variable. This means that the quantiles identified do not reflect the observed (unconditioned) performance distribution on the outcome variable, but its performance distribution dependent on (conditional on) the predictors included in the model (Wenz, 2019). Therefore, caution should be taken when practical implications are drawn from these analyses, as the percentiles identified based on quantile regression analyses might not completely overlap with those identified based on performance on mathematical reasoning alone. However, the findings from the models that only included performance on the number line task were consistent with those of the full models, strengthening the conclusions that can be drawn from the analyses presented. Relatedly, the identified quantiles are based on the performance of this sample. It is difficult to know to what extent the performance of this sample could be compared to that of other populations, as previous evidence suggests differences in the development of mathematical competence between Singaporean children and some international counterparts (Yao et al., 2017).

Additionally, we briefly present in the Introduction different theoretical positions that account for what performance on the number line task represents. However, this study cannot provide evidence neither for nor against any of these accounts. Lack of consensus in the field in this regard, limits the possibility of fully interpreting the findings of this study in the context of what might underlie performance on the number line estimation task and how this relates to mathematical reasoning.

A final limitation of this study is its cross-sectional nature. Understanding how and why performance on the number line task relates to concurrent mathematical reasoning is a necessary first step. Practical educational implications, however, can only be drawn when we also understand how it relates to future mathematical reasoning and the acquisition of mathematical reasoning skills across all levels of ability.

Conclusion

The current study showed that performance on the 0–10 and 0–100 number line task is related to concurrent mathematical reasoning. This study also showed that even though, based on the curriculum, children included in this sample were only expected to know and manipulate numbers up to 10, performance on the 0–100 number line appears to be a stronger predictor of mathematical reasoning than performance in the 0–10 number line. The study did not find evidence to support that the relationship between performance on the number line tasks varies across differing levels of mathematical reasoning and highlights the potential of the number line estimation task for all children regardless of their level of ability. Finally, these findings add to previous evidence that suggests that the relationship between number line estimation and mathematical reasoning might differ depending on the number range being estimated.