Seven ways to make improving teacher evaluation worth the work

Getting teacher evaluation right has the potential to drive significant improvements. The trouble is that so few places—so far—have gotten it right.

To the surprise of few, a recent working paper from the Annenberg Institute at Brown University found almost no positive impact from the teacher evaluation reforms that occurred as a result of a major push by the U.S. Department of Education under the Obama Administration. Those findings have been getting a lot of attention, as some consider the paper to have provided conclusive evidence that these teacher evaluation reforms were ill-advised from the start, while others assert that the commitment on the part of states and districts was half-hearted all along (some may say they were arm-twisted by the feds), and therefore doomed to produce few positive outcomes.

Will this paper be the death knell for rigorous and meaningful teacher evaluation systems? It's too soon to say, but it's worth pointing out that the paper never asserts that these systems can't improve outcomes for both teachers and students, just that it appears remarkably hard to do so. In fact the authors (Joshua Bleiberg, Eric Brunner, Erica Harbatkin, Matt Kraft, and Matthew Springer) fully acknowledge that there are states and school districts that did achieve great results, just that there weren't many of them. 307

One thing is for sure: many of the states and districts in the study weren't paying close attention to the key principles a number of research studies established as necessary to produce positive outcomes. It's likely that these places where teacher evaluation reforms failed did not adhere to all of these research-based principles, either choosing to overlook them entirely or making compromises that effectively neutralized their capacity to do good.

These key principles remain as relevant now as ever, and we spotlight seven of them here.

1. Measure what matters, looking at multiple and frequent measures of teacher performance.

In 2018, NCTQ highlighted four large school districts and two states where evaluation reforms had led to improvement in teacher quality, the positive results confirmed by the Annenberg working paper. Something all six systems had in common was that each annually evaluated all teachers using both objective and subjective measures, as opposed to the widespread practice by states and districts of exempting large numbers of teachers from yearly evaluation, only using subjective measures (such as teacher observation scores), or not giving significant weight to student learning.

What does this look like in practice?

Require evaluation of all teachers, including experienced or tenured teachers, each year. Annual evaluations provide all teachers with the regular feedback they need to improve, and result in the data needed to make informed personnel decisions (e.g., teacher leadership roles).
Incorporate multiple measures of teacher performance 308 — including objective measures — into each evaluation to support accuracy and stability of scores over time. These measures can include student growth measures from standardized assessments, classroom observations using a clearly defined rubric, and student surveys (which studies have found to be highly reliable). 309 There is also research to support including other student outcomes, such as attendance, as means of measuring teacher performance. 310 Student learning objectives (SLOs) are another option; however, research suggests they must be standardized across classrooms and require extensive training and oversight to be reliable. 311

Combining multiple measures of teacher performance improves predictive power and reliability of evaluation scores

Education Next: Capturing the Dimensions of Effective Teaching

The above graph shows that evaluation scores that combine multiple measures--including student achievement gains, classroom observations, and student surveys--have higher predictive power and reliability than any single measure alone. Figure 1: Combining Strengths, from Kane, T. (2012). Capturing the Dimensions of Effective Teaching. Education Next. https://www.educationnext.org/capturing-the-dimens. .

Iterate on the system by monitoring outcomes and incorporating teacher and administrator feedback. The initial timing, components, or weighting of those components may need to be adjusted over time to ensure the evaluation system is working for administrators and teachers. Two places that have found sustained success with their systems (Washington, D.C. and Tennessee) both made changes after implementation based on educator feedback, such as decreasing the percentage of a teacher's evaluation that is based on student test scores. 312

Common practices of teacher evaluation systems that saw improvements in teacher quality and student outcomes

Making a Difference: Six Places Where Teacher Evaluation Systems are Getting Results

2. Pay careful attention to who is doing the evaluating.

Studies over the last decade have shown that teachers both perceive evaluations to be more meaningful and see greater improvement in their practice when those doing the evaluating have been trained on the observation rubric, have more experience in and knowledge of the setting where teachers are being observed, and are familiar with the content their evaluees are teaching. If an evaluation system is going to provide the feedback teachers trust and need to improve, school and district leaders should:

Make sure evaluators and/or observers get comprehensive training on the observation rubric.Research from the U.S. Department of Education's Institute of Education Sciences (IES) and the Bill & Melinda Gates' Foundation's Measures of Effective Teaching (MET) study found that teacher observation scores were more reliable and student learning improved when teachers were observed by evaluators who had been trained on the observation rubric. A study of teacher evaluations in Chicago also found that implementing a new, robust teacher evaluation system only corresponded with improvements in student outcomes when observers received "extensive" training and support on the new rubric. 313
Prioritize evaluators with more experience in the school and setting. A new study that looked at 4,800 teachers matched with 350 evaluators at over 100 schools found that teachers found the feedback they received from their evaluators more impactful when the evaluator providing the feedback had more experience and longer tenure at their school. 314
Pair teachers with evaluators who know their subject area. Surveys have also found that teachers are more likely to perceive feedback as valuable and to improve their instructional practice when their evaluators had relevant content-area expertise. 315 Likewise, principals report feeling less effective as evaluators when they don't have subject-specific knowledge relevant for the teacher they are evaluating. 316
Consider using high-performing peer observers. School leaders are not the only ones who can observe and provide feedback to teachers; teachers also respond well to observations and feedback from their peers, particularly those with more experience and expertise in their grade or subject. 317 More support for the idea of incorporating peer observations and feedback into evaluation systems comes from another recent study of about 100 teachers that paired high-performing teachers with low-performing teachers, finding that the students of the low-performers saw greater academic gains when their teacher was paired with a high-performing mentor. 318

3. Consider using video observations.

Two of the oft-noted difficulties in conducting multiple classroom observations with each teacher are 1) the time commitment it requires from observers and 2) the strain it can place on in-school professional relationships. Luckily, research that examined the observations, feedback, and attitudes towards that feedback for over 400 teachers found that using video observations could be a solution. When teachers recorded videos of their lessons and then later watched and reflected upon these videos with their observers, it helped to alleviate time constraints for administrators and supported more effective feedback discussions, with more positive perceptions of the feedback and process. It also was associated with improved retention for the teachers that used the video observations! 319

As more research conducted during the pandemic gets published, we should be seeing additional guidance about the effective and ineffective ways to use video in evaluation.

4. Address bias in the system head-on, iterating to make improvements.

Teacher evaluations, particularly observation scores, may be subject to racial and gender biases. An important way to mitigate bias is through including multiple measures of performance in teachers' summative evaluation scores, but these biases can still exist and can result in unjust outcomes, particularly for teachers of color and for teachers of students of color.

New research out of Chicago at first glance does not seem hopeful; a study found significant bias against Black teachers in observations and that the bias was explained by which students these Black teachers are more likely to teach: students from low socioeconomic levels, with lower levels of achievement in reading, and with higher frequency of misconduct. 320 But the upside is that it suggests a way to mitigate these biases. In addition to including other measures of teacher quality to offset potential bias in observation scores, the researchers recommend that education leaders in districts with this kind of problem statistically adjust observation scores to account for student characteristics, just as a teacher's contributions to student learning are adjusted in a value-added measure.

Another, less complex way to mitigate bias in observation scores is to require multiple observations by different observers for each teacher, which can make them a more reliable and accurate measure. 321

5. Tie results of observations and evaluations directly to each teacher's own customized professional development.

In the six exemplar teacher evaluation systems NCTQ analyzed, each tied the professional development a teacher should pursue to her evaluation results, as opposed to giving teachers open-ended choices. This finding is further supported by a meta-analysis of the effectiveness of performance pay systems that found that performance pay programs that are paired with professional development result in significantly higher student gains than those that are not. 322

6. Pay great teachers more. A lot more.

For a teacher evaluation system to have a major impact on teacher quality and student learning, it needs to significantly reward high-performing teachers, encouraging them to continue teaching. That is a clear takeaway from a recent meta-analysis of over 40 research studies, which showed positive effects on student achievement, particularly in math, when individual teachers were eligible for performance-based bonuses. Importantly, more significant gains for students were found when the annual incentives for high-performing teachers were above 7.5% of their base pay (or, nationally, on average $5,000 a year). 323 Other research also suggests that 7% of teachers' base pay would be the minimum to be effective, whereas the most effective performance pay bonus should be 14%. 324

Why does a significant monetary incentive for teachers make a difference for students? A review of 120 studies on teacher attrition found that not only were more robust teacher evaluation systems associated with better retention of high-performing teachers, but participation in a performance pay system could decrease the probability of teachers (all teachers, not just high-performing ones) leaving the classroom by 24%, or nearly 15% in high-need schools. 325

Unfortunately, NCTQ research has found few school districts have adopted performance pay incentives for teachers, with even fewer of these districts making the incentives above the threshold research suggests is needed to make an impact.

7. Use teacher evaluation data to provide support for low-performing teachers and, if necessary, to inform decisions about layoffs and dismissal.

Teacher evaluation reforms support improvements in teacher quality in part because they lead to less effective teachers exiting the profession 326 or dissuade people who are likely to be less effective from entering the profession. 327 While the first priority of identifying struggling teachers must be to provide them with targeted professional development and support for improvement, prioritizing student learning means that consistently low-performing teachers should be the first to be considered when layoffs are necessary and should ultimately be exited from the profession.

A study that examined teacher evaluation data and retention for over 20,000 teachers over five years in Chicago found that the implementation of their more rigorous teacher evaluation system increased the likelihood of exiting low-performing teachers by 50%. Additionally, the new hires who replaced these teachers were more effective on average, improving the overall quality of the teacher workforce. 328 Other evidence that incoming teachers are more effective than those exited after implementation of rigorous teacher evaluation has been reported by two different studies of teachers in Washington, D.C. 329

Conclusion

When properly designed and even more importantly, when implemented with fidelity, a good evaluation system should be able to strengthen the teacher workforce by helping all teachers become more effective, motivating effective teachers to stay in the classroom, and informing decisions about who to exit from the classroom. Some research has found that a meaningful evaluation system can even attract individuals to the profession who might not have otherwise considered teaching, 330 perhaps because of a more visible commitment on the part of a district to supporting and rewarding great teachers. Without a good system in place, it is difficult for administrators to access the data needed to tackle problems of educational inequities, primarily the tendency of districts to assign their more effective, qualified teachers to more advantaged students.

Ultimately, for a teacher evaluation system to have the positive outcomes for teachers and students that research and real examples prove are possible, it must adhere to these evidence-based practices, involve both teachers and administrators from the start, be evaluated frequently for efficacy, be tested for biases, and be iterated on as necessary. It's far from easy, and requires both significant funding and dedicated district and school leadership, but the potential impact it can have on students and teachers is worth the investment.