Keywords

6.1 Introduction

In this chapter, we address our second research question, namely what is the relationship between different types of teacher quality and instructional metrics and student achievement? Our study extends the work of Blomeke et al. (2016), who analyzed the TIMSS 2011 grade four data to explore the relationship between teacher quality variables and student outcomes. The analysis undertaken by Blomeke et al. (2016) incorporated similar constructs to those explored in Chap. 5 (teacher experience, self-efficacy, and educational preparation), and included receipt of professional development and teacher instructional practices. However, Blomeke et al. (2016) identified only a modest relationship between instructional quality and student mathematics achievement, and their analysis was limited to grade four data. In this chapter, we explore the extent to which including opportunity to learn variables (time spent on teaching mathematics and content coverage) mediate the influence of instructional quality, using both grade four and grade eight TIMSS data.

6.2 Data

For the analyses in this chapter, we used the 2011 grade four and grade eight student data, and their mathematics teachers’ data from the IEA’s Trends in International Mathematics and Science Study (TIMSS). TIMSS is an international assessment of student mathematics and science achievement at grade four and grade eight. A three-stage unequal probability sampling design was used to select the sampled population within each education system. For TIMSS 2011 at grade four, this resulted in an international sample from 58 education systems comprising 297,150 grade four students in 14,215 classrooms in 10,422 schools, and 14,576 mathematics teachers. At grade eight, 50 educations systems participated in TIMSS 2011, resulting in a sample comprised of 287,395 grade eight students in 11,688 classrooms in 9203 schools, and 13,190 mathematics teachers.

Several participating education systems did not include the questions on mathematics standards in their country curriculum questionnaires, or the questions on teacher quality or instructional quality in their teacher questionnaires. We therefore excluded these countries entirely from our analyses, as well as any benchmarking participants. In many countries, several teachers taught mathematics in more than one classroom, and several classrooms had more than one mathematics teacher. For the classrooms with more than one mathematics teacher, data from one teacher was selected at random for the analyses. Our final sample for grade four thus included 45 education systems comprising 233,583 grade four students nested in 11,153 classrooms in 8268 schools, and 11,083 mathematics teachers. Our final sample for grade eight included 40 education systems comprising 228,107 grade eight students in 9002 classrooms in 7353 schools, and 8887 mathematics teachers.

For most schools, only one classroom was selected, and thus there is no way to distinguish the classroom-level and school-level variances in student outcomes. We therefore opted to use a two-level model (i.e., students nested within classrooms) for each education system in this study, and did not undertake school-level analyses.

Readers should note that, in this chapter, student gender, as well as several other variables, are coded differently in order to match the Blomeke et al. (2016) operationalization, and hence different variable names may be employed.

6.3 Measures

Teacher instructional alignment with national curricular expectations (Alignment), time spent on teaching mathematics (Mathtime), and teacher experience (Exp) were operationalized as described in Chap. 3. The TIMSS 2011 mathematics score was used as a measure of student achievement. Several additional measures were included in Blomeke et al.’s (2016) grade four study and are reproduced here for both grades four and eight, including teacher participation in professional development and a latent measure of instructional quality. In order to most closely mimic the Blomeke et al. (2016) study, we made only minor changes to the operationalization of student covariates and self-reported preparation to teach mathematics. As we discuss later, in several instances there was fairly low internal reliability in the constructs used, and so we modified the model design accordingly.

6.3.1 Professional Development

We considered three aspects of teachers’ participation in professional development. The first was professional development in mathematics (PDM), which measured whether a teacher participated in professional development activities associated with mathematics content, pedagogy/instruction, and curriculum. For each activity, participation was coded as 1, and no participation was coded as 0. The average score for the three activities was used to indicate a teacher’s level of participation in professional development for mathematics teaching.

The second aspect was professional development in mathematics instruction (PDS), which measured whether a teacher participated in professional development activities associated with integrating information technology into mathematics, mathematics assessment, and addressing individual students’ needs. Each activity was scored as before. The average score for these three activities was used to indicate a teacher’s participation in mathematics instruction professional development.

The third aspect was professional development in collaboration (COL), assessed using five items: (1) “I am content with my profession as a teacher”, (2) “I am satisfied with being a teacher at this school”, (3) “I had more enthusiasm when I began teaching than I have now”, (4) “I do important work as a teacher”, and (5) “I plan to continue as a teacher for as long as I can”. Response options were “agree a lot”, “agree a little”, “disagree a little”, and “disagree a lot”; they were coded as 1, 2/3, 1/3, and 0, respectively. The average score of these five activities was used to indicate a teacher’s participation in professional development in collaboration. Consistency coefficients (i.e., Cronbach’s alpha) varied across countries, being between 0.36 and 0.63 for grade four mathematics teachers, and between 0.21 and 0.64 for grade eight mathematics teachers. The low internal consistency coefficients may suggest that the three aspects of professional development do not measure similar constructs.

6.3.2 Teacher Education

Teachers’ teaching experience was measured in years (Year). If a teacher’s highest education level was ISCED Level 5A or higher, their highest level of formal education was coded as 1, else 0 was coded. If a teacher’s major was mathematics or mathematics education, their major was coded as 1, else 0 was coded.

6.3.3 Preparation to Teach Math

Teachers were asked to indicate “how well prepared do you feel you are to teach the following mathematics topics”. Response categories were “not applicable”, “very well prepared”, “somewhat prepared”, and “not well prepared”. If a topic was indicated as “not applicable”, the response was recoded as missing data, and the responses of “very well prepared”, “somewhat prepared” and “not well prepared” were coded as 1, 0.5, and 0, respectively. At grade four, we assessed this using eight topics related to number (NUM), seven topics related to geometric shapes and measures (GEO), and three topics related to data display (DAT). At grade eight, we assessed this using five topics related to number (NUM), five topics related to algebra (ALG), six topics related to geometry (GEO), and three topics related to data and chance (DAT). For each mathematics domain at each grade, the average score across the topics was used as the measure of a teacher’s preparedness to teach in that domain. The internal consistency coefficients varied across countries, from 0.61 to 0.91 at grade four, and from 0.52 to 0.96 at grade eight.

6.3.4 Instructional Quality

We used three constructs to assess teachers’ instructional quality. Teachers were asked “How often do you do the following in teaching this class?” The first construct concerned the clarity of instruction (CI), which was measured with two items: (1) “summarize what students should have learned from the lesson”, and (2) “use questioning to elicit reasons and explanations”. The second construct concerned cognitive activation (CA), which was measured by: (1) “relate the lesson to students’ daily lives”, and (2) “bring interesting materials to class”. The third construct was supportive climate (SC), which was measured by: (1) “encourage all students to improve their performance”, and (2) “praise students for good effort”. The response categories were “every or almost every lesson”, “about half the lessons”, “some lessons”, and “never”, which were coded as 1, 2/3, 1/3, and 0, respectively. The average score of the items was used for each construct. The internal consistency coefficients of instructional quality varied across countries, from 0.20 to 0.76 at grade four, and from 0.21 to 0.73 at grade eight.

6.3.5 Student-Level Covariates

Student gender (Gender) and number of books at home (Book) were used as student-level covariates. For student gender, girls were coded as 1, and boys were coded as 0. For the number of books at home, in this case “none or few” was coded as 0, “one shelf” was coded as 0.25, “one bookcase” was coded as 0.5, “two bookcases” was coded as 0.75, and “three or more bookcases” was coded as 1.

6.4 Analysis

The direct effects of instructional alignment and time on student mathematics achievement, and the indirect effect of instructional quality through instructional alignment and time were examined using multilevel structural equation modeling (see Fig. 6.1).

Fig. 6.1
figure 1

Model of hypothesized relationships at (a) grade four and (b) grade eight. Notes PD = professional development; PDM = professional development in mathematics, PDS = professional development in mathematics instruction, COL = professional development in collaboration, PRE = teacher’s preparedness to teach mathematics, NUM = preparedness to teach number, GEO = preparedness to teach shapes and measures, DAT = preparedness to teach data and chance, INQ = instructional quality, CI = clarity of instruction, CA = cognitive activation, SC = supportive climate, Major = major in mathematics, Degree = ISCED level, Exp = experience, Alignment = alignment with national content expectations, Mathtime = time spent on teaching mathematics, Gender (female = 1, male = 0), Book (1–0 index of books in the home), Performance = TIMSS math score

We first examined the factor structures of professional development, preparedness to teach mathematics, and instructional quality at each grade, country-by-country. We used a multiple group approach to examine the configural invariance of the factors across countries, which indicated the same latent construct was represented by the same indicators. Next, we tested metric invariance (i.e., whether the same indicator showed the same factor loading on the latent construct across countries) for each latent construct at each grade. Note that the results are only directly comparable across countries when there is measurement invariance across countries.

Second, we applied a two-level model without instructional alignment and time country-by-country at each grade. Student-level covariates were centered about their grand means.

Finally, we constructed the hypothesized model for each grade and each country to examine the direct effects of instructional alignment and time, as well as the indirect effect of instructional quality through instructional alignment and time. The model was also fitted country-by-country.

We conducted factor analyses of professional development, preparedness to teach, and instructional quality with teachers as the analysis units, and used teacher sampling weights in the analyses. For the multilevel structure equation model, the nested data structure (i.e., students nested within classrooms) was taken into consideration, robust standard errors were computed, and student and teacher sampling weights were used in the analyses. The model fit was evaluated using a likelihood ratio test and model fit indices.

6.5 Results

6.5.1 Descriptive Statistics

Average mathematics achievement scores for each educational system were between 245.94 and 606.27 at grade four, and between 332.76 and 611.10 at grade eight. The intraclass correlation coefficients (ICCs) for student mathematics achievement were between 0.07 and 0.56 at grade four, and between 0.09 and 0.79 at grade eight, indicating a need to explore both individual and school effects on student mathematics achievement.

Instructional alignment in mathematics was between 0.34 and 0.66 at grade four, and between 0.26 and 0.84 at grade eight. On average, grade four mathematics teachers covered about 34–66% of their country grade-level mathematics standards, and grade eight mathematics teachers covered about 26–84% of their country grade-level mathematics standards in their teaching. Grade four teachers in Poland reported the poorest alignment with national mathematics standards, while grade four teacher in Australia reported the greatest percentage alignment with national mathematics standards. Similarly, for grade eight, mathematics teachers in Bahrain reported the poorest alignment with national standards, while teachers in Japan reported the greatest percentage alignment with national mathematics standards. On average, grade four teachers taught mathematics for 186–429 min per week, and grade eight teachers taught mathematics for 161–316 min per week. At grade four, mathematics teachers in Denmark spent the least time on mathematics, and teachers in Portugal spent most time on mathematics. At grade eight, mathematics teachers in Japan spent the least time, and mathematics teachers in the Lebanon spent most time on teaching mathematics. We noted that some countries that scored highly for instructional alignment did not report similarly high average instructional time on mathematics, and vice versa. Grade four teachers in Poland showed the least alignment with national mathematics standards, but their average time spent on mathematics was similar to the international average. Grade eight teachers in Japan taught about 84% of their national grade-level mathematics standards, however, they spent the lowest reported time on mathematics teaching. This suggests that instructional alignment and time spent teaching mathematics may play different roles in student achievement.

6.5.2 Measures of Professional Development, Preparedness to Teach, and Instructional Quality

The factor structures of professional development, preparedness to teach mathematics, and instructional quality were firstly examined country-by-country for each grade.

The one-factor model of professional development was built with the three indicators: professional development in mathematics (PDM), professional development in mathematics instruction (PDS), and professional development in collaboration (COL). This model demonstrated a convergence problem in 15 countries (Australia, Croatia, Finland, Georgia, Germany, Honduras, Kuwait, New Zealand, Saudi Arabia, Slovak Republic, Slovenia, Sweden, United Arab Emirates, United States, and Yemen) at grade four, and in 14 countries (Australia, Chile, Ghana, Hong Kong, Iran, Japan, Korea, Lithuania, Morocco, Norway, Slovenia, Thailand, Tunisia, and Ukraine) at grade eight. These convergence problems were caused by the low correlation of COL with the other two indicators in these countries. Even in the countries with no convergence problems, the factor loadings of COL on the latent construct were <0.2 or even negative. Together, this indicates that COL measured a different construct from the latent constructs measured by PDM and PDS. To address the convergence problems of COL, PDM, and PDS in many countries, we used the three variables as separate predicators in a revised model instead of constructing a latent construct of professional development (Fig. 6.2).

Fig. 6.2
figure 2

Model of revised relationships for (a) grade four, and (b) grade eight. Notes PD = professional development; PDM = professional development in mathematics, PDS = professional development in mathematics instruction, COL = professional development in collaboration, PRE = teacher’s preparedness to teach mathematics, NUM = preparedness to teach number, GEO = preparedness to teach shapes and measures, DAT = preparedness to teach data and chance, INQ = instructional quality, CI = clarity of instruction, CA = cognitive activation, SC = supportive climate, Major = major in mathematics, Degree = ISCED level, Exp = experience, Alignment = alignment with national content expectations, Mathtime = time spent on teaching mathematics, Gender (Female = 1, Male = 0), Book (1–0 index of books in the home), Performance = TIMSS math score

The one-factor model of preparedness to teach was constructed from three indicators at grade four: preparedness to teach number (NUM), shapes and measures (GEO), and data display (DAT) at grade four. At grade eight there were four indicators: preparedness to teach number (NUM), algebra (ALG), geometry (GEO) and data and chance (DAT). The model converged in all countries at both grades. A multiple-group approach was used to build the factor model for all countries at each grade. In order to assess metric invariance, the factor loadings of the indicators were constrained to be equal across countries. A likelihood ratio test (LRT) was used to examine whether the metric invariant model was significantly different from the base model. At grade four, we found that \( \Delta \chi^{2} = 224.19, \Delta df = 88 \), which was statistically significant (p < 0.001). At grade eight, we found that \( \Delta \chi^{2} = 259.77, \Delta df = 117 \), which was also statistically significant (p < 0.001). In other words, the factor loadings of the indicators of preparedness to teach were significantly different across countries at both grades.

The one-factor model of instructional quality was constructed from three indicators at both grades: clarity of instruction (CI), cognitive activation (CA), and supportive climate (SC). The one-factor model did not converge at grade four data in Northern Ireland, or at grade eight in Italy. This indicates that CI, CA, and SC measure different constructs in these two sets of data. Multiple-group models were applied to the remaining countries at each grade to examine the metric invariance. At grade four, we found that \( \Delta \chi^{2} = 218.79, \Delta df = 86 \), which was statistically significant (p < 0.001). At the grade eight, we found that \( \Delta \chi^{2} = 168.95, \Delta df = 76 \), which was statistically significant (p < 0.001). The factor loadings of the indicators of instructional quality were significantly different across countries at both grades.

6.5.3 Effect of Teacher Quality and Instructional Quality on Student Mathematics Achievement

As the factor analyses suggested that PDM, PDS, and COL measured different latent constructs in many countries, the three indicators were used directly as predictors in the models. As the factor loadings of preparedness to teach and instructional quality differed across countries, the models to explore the effects of teacher quality and instructional quality were built country-by-country rather than in a multiple-group fashion.

The model did not converge for the grade four data from Australia, Malta, Netherlands, Romania, or England, or for the grade eight data from the Syrian Arab Republic or TunisiaFootnote 1 (Tables 6.1 and 6.2). In most countries, the chi-square statistics were statistically significant, which is common with large sample sizes. In countries with no convergence problems, Root mean square error approximations (RMSEAs) were generally <0.02, standardized root mean square residuals (SRMRs) for the within and between levels were <0.08, and the comparative fit index (CFI) and Tucker Lewis index (TLI) were >0.80.

Table 6.1 Model fit of the grade four country-by-country models of the effects of teacher and instructional quality
Table 6.2 Model fit of the grade eight country-by-country models of the effects of teacher and instructional quality

In many countries there were no convergence problems shown by the standardized coefficients of instructional quality and teacher quality effects on student mathematics achievement (Tables 6.3 and 6.4). At grade four, at least one of the professional development indicators showed significant effects on instructional quality in 16 countries. At grade eight, at least one indicator of professional development showed a significant and positive effect on instructional quality in 24 out of the 38 countries. In these countries, the effects of professional development indicators varied from 0.2 to 0.4. A teacher’s score for professional development activities in mathematics, mathematics instruction, and collaboration was directly and positively linked to instructional quality, with higher scores indicating better instructional quality. However, these three professional development indicators showed weak relationships with student mathematics achievement in both grades four and eight. Their direct effects on student mathematics achievement were significantly positive in only eight out of 40 countries at grade four, and seven out of 38 countries at grade eight. In most countries, teachers’ participation in professional development activities did not have any significant direct effects on student mathematics achievement.

Table 6.3 Effects of teacher quality and instructional quality at grade four
Table 6.4 Effects of teacher quality and instructional quality at grade eight

The effects of preparedness to teach on instructional quality were significant at grade four in 15 countries, and at grade eight in 14 countries. The effect sizes of preparedness to teach ranged between 0.2 and 0.5. The better prepared teachers felt to teach mathematics topics, the better their instructional quality. The direct effects of preparedness to teach on student mathematics achievement were non-significant in most countries. The direct effects of preparedness to teach were significant and positive in only two out of 40 countries at grade four, and nine out of 38 countries at grade eight.

The three teacher education background indicators, experience, degree, and major, affected instructional quality significantly and positively in nine out of 40 countries at grade four, and in three out of 38 countries at grade eight. In comparison, their direct effects on student mathematics achievement were significant in more countries. Their effects were positively significant in 13 countries at grade four, and in 14 countries at grade eight. In these countries, the teachers with more experience, a higher degree, and who majored in mathematics major had more positive effects on student mathematics achievement. However, teachers’ experience, degree, and major did not have any significant impact on their instructional qualities in many countries.

The direct effects of instructional quality on student mathematic achievement were significant and positive in only two out of 40 countries at grade four, and in seven out of 38 countries at grade eight. For the non-significant effects of instructional quality on student mathematics achievement, the indirect effects of teacher quality indicators through instructional quality were non-significant in most countries. The significant indirect effects were found in only two countries at grade four, and in five countries at grade eight. In Korea, professional development in collaboration and preparedness to teach showed significant and positive indirect effects through instructional quality on student mathematics achievement at grade eight. When the number of professional development activities in mathematics instruction undertaken by a grade four teacher in Lithuania increased, their instructional quality and average student mathematics achievement level in their classroom also increased. In Oman, the effects of professional development in mathematics and degree were mediated by instructional quality at grade four, and the effects of professional development in mathematics instruction, preparedness to teach, and degree were mediated by instructional quality at grade eight. At grade eight, the effects of professional development in collaboration were mediated by instructional quality in Romania, the effects of professional development in mathematics instruction were mediated by instructional quality in Saudi Arabia, and the effects of preparedness to teach were mediated by instructional quality in Macedonia.

6.5.4 Effect of Instructional Alignment and Teaching Time on Student Mathematics Achievement

Based on the models outlined in Sects. 6.3 and 6.4, the final model was built on instructional alignment and class time spent on teaching mathematics. For each country we examined both the direct and mediating effects of instructional alignment and time spent on teaching mathematics on student mathematics achievement. These models were built for countries where no convergence problems were identified in the previous models. All models converged (Tables 6.5 and 6.6). As in previous models, the chi-square statistics were statistically significant in the final model in most countries. In countries with no convergence problems, RMSEAs were generally <0.02, SRMRs for the within and between levels were <0.08, and CFI and TLI were >0.80. However, we found that the model did not have a good fit to the grade eight data for Georgia.

Table 6.5 Model fit of the grade four country-by-country models of the effects of instructional alignment and time on task
Table 6.6 Model fit of the grade eight country-by-country models of the effects of instructional alignment and time on task

The effects of instructional alignment and time spent teaching mathematics on student achievement were examined after controlling for teacher and student characteristics (Tables 6.7 and 6.8). Teachers’ instructional quality showed significant and positive effects on teachers’ instructional alignment at grade four in Croatia, Iran, New Zealand, and Yemen, and at grade eight in Indonesia, Saudi Arabia, Ukraine, and the United States. In these countries, teachers with higher instructional quality levels also reported better alignment with national mathematics standards. Teachers’ instructional alignment showed significantly positive effects on student mathematics at grade four in Denmark, Georgia, and Germany, and at grade eight in England, Indonesia, Italy, Lebanon, New Zealand, and Norway. Greater alignment between teachers’ instruction and national mathematics standards was directly and positively associated with higher student achievement scores. Taking the effects of instructional quality on instructional alignment and the effects of instructional quality on student mathematics achievement together, instructional alignment showed a significant mediating effect on the relationship between instructional quality and student mathematics achievement only at grade eight in Indonesia.

Table 6.7 Effects of instructional alignment and time at grade four
Table 6.8 Effects of instructional alignment and time in grade eight

Teachers’ instructional quality showed significant and positive effects on teachers’ instructional time on mathematics at grade four in Croatia, Hungary, New Zealand, and Saudi Arabia, and at grade eight in Palestinian and Thailand. The higher the instructional quality teachers had, the more time they would spend on mathematics. Teachers’ instructional time on mathematics showed significantly positive effects on student mathematics outcomes at grade four in Bahrain, Iran, and United Arab Emirates, and at grade eight in Chile, Chinese Taipei, Japan, Jordan, Morocco, New Zealand, Thailand, and Ukraine. The more time teachers spent on mathematics, the better mathematics achievement their students would have. Taking the effects of instructional quality and instructional time together, instructional time on mathematics showed a significant mediating effect on the relationship between instructional quality and student mathematics achievement in the grade eight data in Thailand.

6.6 Discussion

In the original Blomeke et al. (2016) study, the researchers found that, although the latent construct of instructional quality was influenced by professional development, experience, and sense of preparedness, instructional quality was only weakly related to student outcomes in grade four. We adapted the model developed by Blomeke et al. (2016) and applied it to the same cycle of TIMSS data (2011) to explore in greater detail the direct, indirect, and mediating effects of teacher effectiveness measures. The aim was to test whether the inclusion of opportunity to learn variables would strengthen the overall model and influence the statistical impact of instructional quality. The direct effects of instructional alignment and time on student mathematics achievement, and their mediating effects on the relationship between instructional quality and student mathematics were found to be positive and significant only in a small number of countries. As noted in Chap. 5, instructional alignment and time spent on teaching mathematics had a limited and inconsistent relationship to student outcomes, even including additional teacher-related variables, such as receiving professional development and the latent construct of instructional quality, which we incorporate and understand in our model as identifying pedagogical quality, as distinct from opportunity to learn variables.

The findings here are consistent with those in Chap. 5, but are in stark contrast with the results of previous analyses using PISA data (see Jerrim et al. 2017; Schmidt et al. 2015). These disparate outcomes could be the result of the different design of the two studies. TIMSS selects intact classrooms, whereas the PISA samples 15-year-olds in different classrooms and at different grades. As a result it is difficult to distinguish school or within-school/between-classroom effects in TIMSS studies. In addition, the measure of content coverage is quite different between the two sets of studies: the Schmidt et al. (2015) work (using PISA) measured curricular intensity as opposed to alignment with national standards (the TIMSS measure). Further, the measure of opportunity to learn in TIMSS is teacher reported, while the comparable PISA variable is student reported. All of these factors could account for the differing ability of the present analysis to replicate the results from PISA studies, but in doing so they raise the possibility that the impact of time spent on teaching mathematics or instructional content are influenced by study design.

A final note relates to the issue of international comparability. The varying relationships among indicators of teacher quality across educational systems (measured by factor loadings and structural equation model structure) reinforces the challenge of merging data from multiple countries into a general global model of teacher effects on student learning. Our results suggest that the interrelationship of teacher factors to one another, and to student mathematics learning, is conditioned by cultural and national policy contexts, and that additional measures need to be included to identify the source of these differences.