Keywords

3.1 TIMSS Data and Sample Characteristics

We used the TIMSS grade eight public-use data from 1995 through 2015 to establish how inequalities of education outcomes have changed between 1995 and 2015, and to assess whether education systems have managed to increase the performance of disadvantaged students. TIMSS has been conducted every four years since 1995 to monitor trends in mathematics and science performance of students across education systems. Every participating education system provides a representative sample of students by adopting a two-stage random sample design. Typically, in each participating education system, a sample of schools is drawn at the first stage, and one or more intact classes of students from each of the sampled schools are selected at the second stage (LaRoche et al. 2016). Although most features have remained constant over the different TIMSS cycles, there have also been several significant changes in sample design, country participation, and questionnaire administration.

First, the target population has changed slightly. The first cycle of TIMSS in 1995 identified three target populations; one of them was students enrolled in the two adjacent grades, which maximized coverage of 13-year-olds (Foy et al. 1996). At the time of testing, most students were either in the grade seven or grade eight. This practice was refined for the 1999 cycle of TIMSS, and resulted in only grade eight students being assessed. To maintain comparability, for our study, we therefore only included grade eight students for most education systems in the 1995 assessment in our trend analyses, which is in alignment with the practice outlined in the TIMSS 1999 international mathematics report (Mullis et al. 2000) and TIMSS 2015 international results in mathematics (Mullis et al. 2016, appendix A.1, at http://timssandpirls.bc.edu/timss2015/international-results/timss-2015/mathematics/appendices/).Footnote 1 Norway was the only exception, because Norway only included grade six and seven students in its 1995 sample. However, according to the TIMSS 2015 report, the sample of upper-grade students (grade seven) in Norway in 1995 was comparable to that in 2015 (see Mullis et al. 2016, appendix A.1). Therefore, in the case of Norway, we kept the sample of grade seven students in 1995 for trend comparison (Gonzalez and Miles 2001).Footnote 2

Second, although many education systems have participated in TIMSS over the last 20 years, not every education system participated in each cycle. To analyze trends over the entire 20 years of TIMSS data, we therefore limited our analysis to those education systems that participated in the first cycle in 1995, the most recent cycle in 2015, and at least one other intermediate administration cycle. This produced a potential sample of 18 education systems.Footnote 3

However, according to the 2015 TIMSS international results in mathematics (see Mullis et al. 2016, appendix A.1), many education systems’ previous data cannot be used for trend analysis to 2015. This is primarily due to improved translations or increased population coverage. For example, the data for Australia in 1999, Kuwait in 1995 and 2007, Canada in 1995 and 1999, Israel in 1995, 1999, 2003, and 2007, Slovenia in 1999, and Thailand in 1995 were not considered comparable to 2015 data. Therefore, four education systems (Canada, Israel, Kuwait, and Thailand) had to be excluded from the analyses because 1995 data cannot be used for trend analyses.

In addition, given that our primary focus is SES-related information, we excluded England from our study since it did not have data for parental education in 1995, 1999, and 2007. In total, our analytical sample is limited to the following 13 education systems (Table 3.1):

  • Australia, Hong Kong, Hungary, Islamic Republic of Iran, Lithuania, Republic of Korea, Russian Federation, Singapore, Slovenia, and the United States (education systems that participated in all six cycles); and

  • New Zealand (which participated in 1995, 1999, 2003, 2011, and 2015), Norway (which participated in 1995, 2003, 2007, 2011, and 2015), and Sweden (which participated in 1995, 2003, 2007, 2011, and 2015).

Table 3.1 Samples for each education system in each TIMSS assessment year

Finally, for trend analysis, several adjustments were made to follow the approach used by Mullis et al. (2016). First, IEA has a policy that students should not fall under the minimum average age of 13.5 years (for grade eight) at the time of testing (see Mullis et al. 2016, appendix C.10). Therefore, New Zealand assessed students in grade nine across multiple cycles. The results for grade nine students in 1995, 1999, 2011, and 2015 are deemed comparable to those for grade eight students who participated in 2003 in New Zealand. Second, although Slovenia assessed grade eight students in 1995, the results for grade eight students in 1995 are not deemed comparable to those in other cycles. Therefore, data for grade seven students in 1995 is used for trend analysis. Third, in Lithuania, the results for students assessed in Polish or Russian in 2015 are deemed not comparable to previous cycles. Therefore, trend results only include students assessed in Lithuanian and do not include students assessed in Polish or Russian in 2015.

3.2 Construction of a Proxy Measure for Socioeconomic Status

To address the research questions, we first needed to construct a comparable proxy measure for socioeconomic status across the different TIMSS administration cycles. The TIMSS home educational resources (HER) index measures important aspects of SES, but it is not applicable for trend comparisons across all cycles for several reasons.

First and foremost, the HER index was constructed by different measurement methods in different cycles. In 1995 and 1999, the HER index was a simple combination of several background variables, including the number of books at home, number of home possessions, and parents’ education, which were combined into three levels: high, medium, and low. For example, students at the high level were those with more than 100 books in the home, all three educational possessions (computer, study desk, and dictionary), and at least one college-educated parent. This index made interpretation easy since each category had its own corresponding characteristics. However, since 2011, the HER index has been constructed using IRT scaling methodology (Martin et al. 2011), which allows for the analysis of more fine-grained differences in home educational resources between students, and enables forward comparability for future administrations even if the components of the index should change in the future. The current form of the HER is, however, not comparable to the earlier index. In addition to that, in 2003 and 2007, no HER index was constructed for TIMSS.

Second, the components of the HER index changed because the available home possession items that students are asked about in the student questionnaire have changed over time (Table 3.2). For example, an internet connection at home was not part of the questionnaire before 2007, but is now an important component of the current HER scale. The only common items across all cycles are a computer and study desk. However, in 2015, the question regarding having a computer at home was also changed, resulting in two variables: one asking if a student owns a computer or tablet at home and a second one asking if a student shares a computer or tablet with others in the home.

Table 3.2 Home possession items by TIMSS cycle

It was clear that constructing a consistent measure of SES that can be applied across all TIMSS cycles would be of immense value to researchers who wished to use TIMSS for trend analyses. We therefore developed a modified version of the HER index to address this issue. Our SES measure, which we here term SES*, does not represent the full SES construct as usually defined by parental education, family income, and parental occupation.Footnote 4 While the construction of such an index serves a specific purpose in this study, we believe that the SES* index that is proposed here is sufficiently closely related to the later IRT-scaled HER versions to yield highly relevant and valid results. This index can thus also be beneficially applied to other future studies that intend to use the SES* variable for analysis over multiple administrations.Footnote 5

3.2.1 Components of the SES* Measure

Our SES* measure, which as mentioned in the introduction is a modified version of the HER index, is constructed using three common components across the six cycles of TIMSS. These components include (1) number of books at home, (2) number of home possessions, and (3) the highest level of education of either parent.

Number of Books at Home

The information is derived from the student questionnaire asking how many books students have at home. There are five categories, coded (0) to (4): (0) 0 to 10 books; (1) 11 to 25 books; (2) 26 to 100 books; (3) 101 to 200 books; and (4) more than 200 books.

Number of Home Possessions

This information comes from questions asking students whether they have each of a list of items at home. Since there are only two common items (computer and study desk) across all cycles, the total number of home possessions ranges from 0 to 2. One caveat needs to be mentioned for 2015. The question regarding having a computer at home was changed to two variables: one asking if a student owns a computer or tablet at home and the other one asking if a student shares a computer or tablet with others at home. We coded a positive response to either of these questions as a “1”. Despite the addition of tablet in 2015, the correlations of the other SES* components with the computer/tablet variable were comparable with those found in 2011 (computer alone), with the scoring of either response as a “1”. We therefore believe that the addition of tablet in 2015 did not substantially change the construct being measured, and that the SES* index remains consistent over time.

Highest Level of Education of Either Parent

This is a derived variable constructed from both the father’s and mother’s highest educational levels. The categories of the source variables were grouped into five levels in line with the 1995 survey, coded as follows: (0) less than lower secondary; (1) completed lower secondary; (2) completed upper secondary; (3) postsecondary nontertiary education; and (4) completed university or higher. “I don’t know” responses were treated as missing.

3.2.2 Multiple Imputation of Missing Values

The main components of the SES* index have different degrees of missingness. Of specific concern is parental education, which on average has missing values of around 20%, depending on administration year and education system. Since drop** such a large part of the sample would undermine the generalizability of the findings, especially when the students with missing values tended to come from lower ability levels, multiple imputation was used for all missing values of the SES* index components. Instead of imputing the “highest level of parental education” variable directly, we imputed father’s and mother’s education separately, compared them after imputation, and then generated the highest level of parental education for the SES* index. We imputed the missing values of SES* index variables five times using multiple imputation chained equations before constructing the SES* index. Imputation using chained equations is known for its flexibility in handling different types of variables (for example binary, categorical, and continuous; Hughes et al. 2014), with our variables of interests being mostly categorical. The imputation is achieved by using the observed values for a given individual and the observed relations in the data for other participants (Schafer and Graham 2002).

In addition, since TIMSS data include multiple education systems across multiple years, we decided to impute the missing data for each year first and only then create a database of all years. The advantage of this approach was that we maximally used available information for a given year since the questionnaires have been modified over time and thus available relevant variables differ by year. In the imputation model, we included all analytic variables that were included in our final analysis, other common home possession items available for all education systems in each year, plausible values of achievement score, and other related variables (such as language spoken at home). After imputation, the correlation between these variables in each year was compared between the original dataset and the imputed dataset, and the results suggested the imputation preserved the overall relationship among variables very well. The student sampling weight was taken into account in the imputation model, as shown in a case study of conducting multiple imputation for missing data in TIMSS (Bouhlila and Sellaouti 2013).

3.2.3 The SES* Index

After imputation, we constructed the SES* index, ranging from 0 to 10 points, by assigning numerical values to each category of each of the three components (Table 3.3). We applied this to the 13 education systems’ data for the 2011 and 2015 cycles, and found that this index has a relatively high correlation with the HER scale (2011: r = 0.87; 2015: r = 0.84). We also compared the variance in mathematics performance explained by the SES* index and by the HER scale in 2011 and 2015 for these 13 education systems. In 2011, the SES* index explained 23.7% of the variance in mathematics, while the HER index explained 23.6% of the variance. In 2015, the SES* index explained 17.8% of the variance in mathematics, while the HER index explained 19.1% of the variance. This suggests that the proposed SES* index is highly correlated with the current HER scale and explains a similar amount of the variance in students’ achievement.

Table 3.3 SES* index construction

The overall weighted distribution and corresponding average mathematics score for all participating education systems in 1995 and 2015 suggests that the distribution of this index is somewhat left skewed (Figs. 3.1 and 3.2). One possible explanation might be that many education systems in our analytic sample have an overall high level of SES*. More importantly, the results clearly suggest that each additional point in the SES* index is associated with higher average mathematics scores. In 1995, the TIMSS achievement score was scaled to have an international average value of 500 and a standard deviation of 100 points for participating countries. On average, the difference in mathematics scores between students with the lowest SES* (0 points) and the highest SES* (10 points) is around 150 points, which is 1.5 times the standard deviation of TIMSS scores. Furthermore, the positive correlation between the SES* index and mathematics scores is not only true overall but also holds across all education systems individually.

Fig. 3.1
A bar cum line graph of the weighted percentage of students and average mathematics score versus the S E S Index 1995. The percentage of students is high at (15, 7), and low at (0, 2.5). The average mathematics score is high at (10, 580), and low at (0, 5.5). All the values are approximated.

Weighted percentage of students and average mathematics score by SES* index, 1995. (Note In 1995, 50,332 students in the 13 selected education systems were included in the analysis)

Fig. 3.2
A graph of the weighted percentage of students and average mathematics score versus the S E S Index 2015. The percentage of students is high at (8, 17), and low at (0, 0.5). The average mathematics score is high at (10, 580), and low at (0, 5.5). All the values are approximated.

Weighted percentage of students and average mathematics score by SES* index, 2015. (Note In 2015, 76,159 students in the 13 selected education systems were included in the analysis)

3.2.4 Defining High- and Low-SES* Groups

To calculate the achievement gap between students with high- and low-SES* backgrounds over time, we first needed to define the criterion or cut-off points corresponding to high- and low-SES* backgrounds. Among the different approaches for establishing cut-offs, the main choices are either (a) using common cut-offs across educational systems and years, or (b) defining education system-specific low-SES* versus high-SES* groups based on the distribution of the SES* index for a given year.

Common Cut-Offs

Given the weighted distribution of the sum-score SES* index for all students in all participating education systems across all 20 years, we found that an index value of three corresponded to about the 21st percentile of all students, whereas a value of eight points corresponded to the 81st percentile (see Table 3.4). As a first test, we applied these cut-off points to all students. Students with three or fewer points were defined as the low-SES* group and those with eight or more points were defined as the high-SES* group. As can be expected, this approach led to very unbalanced groups when the results were examined by education system. For example, in Australia in 1995, only 10% of students would have been placed into the low-SES* group, while 26% would have been in the high-SES* group. By contrast, in Iran, about 76% of students would have been placed in the low-SES* group, with only 1% in the high-SES* group (Table 3.4).

Table 3.4 Common cut-offs by overall distribution of SES* index (cumulative proportion)

Thus, common cut-offs tend to generate unbalanced groups in certain education systems since individual education systems’ specific situations are not taken into account. While these may be the actual percentages for high- and low-SES* students across educational systems, SES* is a relative concept when viewed within an educational system. That is, what is perceived as high or low SES* is society dependent. And it is the perception which is important, because what is perceived to be real is real in its consequences. Therefore, we decided to establish education system specific cut-offs for each year. Given each education system’s distribution of SES* in each year, we used quartiles as cut-offs; students in the bottom quartile were considered low SES*, while students in the top quartile were considered high SES* (see the Appendix for a sensitivity analysis using quintiles versus quartiles and additional information). This approach generated better grou** results because it takes local context into consideration.

Another challenge was how to establish exact 25th or 75th percentile cut-offs using an index with a range of only 11 points in total. Considering the cumulative proportions of students at each SES* point in Australia in 1995 (Table 3.5), we found that students with eight points on the index corresponded to the 73th percentile, while students with nine points corresponded to the 86th percentile. Establishing the bottom quartile was also difficult, since four points corresponded to the 15th percentile, while five points corresponded to the 27th percentile.

Table 3.5 Weighted distribution of SES* index for Australia, 1995

To address this issue, we decided to randomly split the sample of students at the cut-off point below or above the 25th and 75th percentiles and then combine it with a random subsample from the adjacent group, resulting in top and bottom categories containing 25% of students. Again, using Australia in 1995 as our example, to obtain the bottom quartile, we needed another 10% of students in addition to those having 0 to 4 points on the SES* index. Therefore, we randomly selected a subsample of the Australian students who participated in 1995 and who scored five SES* index points to create a sample comprising 25% of students as the bottom SES* category (another way to consider this is that if 27% of students are at index point five, then the sample contains 2 % more students than needed for the bottom quartile, so 2% of students, in absolute terms, have to be randomly excluded from the five-point subsample). Applying the same strategy to every individual education system and year guaranteed that the bottom- and top-quartile SES* groups always represented exactly 25% of students from a given education system in any given year.

3.3 Analytic Approach

3.3.1 Plausible Values and Imputed Datasets

One significant analytic challenge underlying this work was how to simultaneously use the existing five plausible values of achievement scores while incorporating results from the multiple imputation procedure for the missing values of SES* background variables. One approach might be to conduct nested multiple imputation, in which the plausible values imputation is nested within the background variable imputation (Weirich et al. 2014). However, that would have required an extra step back to item responses, and the imputation model would highly depend on the final analytic model, meaning that other studies using this SES* index would have to create their own models. More importantly, the TIMSS & PIRLS International Study Center had clearly stated that principal components for a large number of student background variables were included as conditioning variables to improve the reliability of the estimated student proficiency scores (Foy and Yin 2016). It is reasonable to believe that the components in our SES* index, which are very important student background variables, were included in the TIMSS conditioning models for proficiency estimation. Therefore, we used the existing plausible values of achievement scores in TIMSS to impute missing values in the SES* component variables, together with other relevant variables, resulting in five imputed datasets.

After imputation, one possibility for using the imputed SES* variable was to average the SES* values among the five imputed datasets and thus generate a single SES* index score for each student. To validate this approach, we randomly selected 10% of cases in each country, replaced the existing value of parental education with “missing”, imputed the pseudo-missing values using the same imputation model, and then compared the imputed values with actual values. However, the validation results were not satisfactory, since a simple average of the five imputed values presented a quite different distribution from the actual values because it overlooked the variance between the imputed values. Therefore, we decided not to average the imputed values for SES* but to treat the five imputed SES* values as plausible values (Kevin Macdonald, personal communication, 10 March 2018) and conduct analyses with the PV module in Stata 14 software (Macdonald 2008). This approach allowed us to simultaneously use the five plausible values of the TIMSS achievement scores and the five imputed values for the SES* index in the analyses for this report.

3.3.2 Measuring Educational Inequality

Ferreira and Gignoux (2011) described methods for measuring both inequality in achievement (which they saw as being expressed simply by the degree of variability in the outcome measure), and inequality in opportunity (they proposed a meaningful summary statistic for this would be the amount of variance explained obtained from an OLS regression of students’ test scores on a vector C of individual circumstances). Another approach was used by Green et al. (2015) in an application using international adult skills surveys. Their measure was a “social origins gradient” representing the point difference in scores that can be predicted for an individual when the education level of his or her parent(s) is increased from the bottom unit to the top unit (for example from “less than high school” to “college education”).

We opted for yet another different approach, one that we believe is better suited for trend analysis of educational inequality. To answer the first research question, “How has the inequality of education outcomes due to family socioeconomic status changed for different education systems between 1995 and 2015”, we calculated the achievement gap over time between students in low- and high-SES* groups in terms of the average TIMSS achievement score. The larger the gap, the larger the role of SES* in determining educational outcomes.

In addition, we examined whether the changes in achievement gap between high- and low-SES* students across years were statistically significant. Since these calculations are computationally quite demanding, we focused on providing significance testing for changes in achievement gaps only between the following years: (1) 1995 versus 2003, (2) 2003 versus 2015, and (3) 1995 versus 2015. For example, to investigate if the change in the gap between 1995 and 2003 was statistically significant, the following regression model was conducted:

$$ \widehat{Y_i}={\beta}_0+{\beta}_1\left({SES}^{\ast}\right)+{\beta}_2\left({Year}_j\right)+{\beta}_3\left({SES}^{\ast}\ast {Year}_j\right)+{\varepsilon}_i $$

Where \( \widehat{Y_i} \) is the predicted achievement score (that is, either mathematics or science) for student i in a given education system after controlling for other predictors; β0 is the mean achievement score for low-SES* students in a given education system in 1995; β1 is the mean score difference between low- and high-SES* students in a given education system in 1995; and β2 is the coefficient for a categorical variable indicating the year of assessment. The reference group is 1995, therefore, the coefficient is the mean score difference between students who participated in 2003 and those in 1995, after controlling for other predictors. Meanwhile β3 is the coefficient for an interaction term between SES* and the assessment year. This reflects how much the achievement gap between low- and high-SES* students in 2003 differs from the achievement gap in 1995, and, therefore, the p-value for β3 indicates whether the achievement gap in 2003 is statistically different from the achievement gap in 1995. Following the same logic, we conducted similar comparisons of the achievement gaps between 2003 and 2015, and between 1995 and 2015.

While seeing trends in the SES achievement gaps is important, they can hide important changes over time. For example, there might be no change in the size of the SES* gap over time because neither group has changed over time, and, in another case, the SES* gap may not change because both the lower and upper groups have changed in the same direction over time. Because gaps can close or widen for different reasons, it is also important to examine how the most disadvantaged students are doing over time, as proposed by our second research question, “To what extent have education systems managed to increase the academic performance of disadvantaged students between 1995 and 2015?” To address this, we analyzed the trend in performance among low-SES* students in each education system from 1995 to 2015. Specifically, we tracked the percentage of low-SES* students who performed at or above the TIMSS international intermediate benchmark (that is, 475 points) for each education system over time.

3.3.3 Country-Level Indicators in the Educational Systems and the Macroeconomic Context

To better understand our findings in the larger context in which education systems operate, we obtained macroeconomic and other indicators from the TIMSS encyclopedias as well as data from external sources from 1995 to 2015. The external sources we consulted include the World Bank, the UNESCO Institute for Statistics, the CIA’s World Factbook, the OECD Income Distribution Database, the World Inequality Database, and local education agencies (see Table 3.6). We used this to interpret our findings against changes in the social context of each education system over the 20 years of TIMSS.

Table 3.6 Sources for country-level economic indicators