Last Friday, we saw again how messaging matters as much as substance with the release of two controversial reports. First, The New York Times published an article on a new National Bureau of Economic Research study on the long-term effects of high value-added teachers on their students. As I wrote last week, the results were exciting: in the most in-depth research of its kind, the authors demonstrate that highly effective elementary and middle school teachers, as measured by high value-added scores, have a long-lasting positive effect on their students’ lives beyond test scores, including lower rates of teenage pregnancy, higher rates of college attendance, and greater adult earnings.
But the reader’s takeaway was tucked halfway through the article. After a discussion on the costs of keeping a minimally effective teacher, one of the authors, John N. Friedman, remarks, “the message is to fire people sooner rather than later.” His co-author, Raj Chetty, goes further: “Of course there are going to be mistakes—teachers who get fired who do not deserve to get fired.” Cringe. As states and districts overhaul their teacher evaluation systems to be more rigorous and data-driven for the purpose of sorting out the bad teachers, teachers feel like they are under attack. Instead of using this research to get teachers on board with value-added scoring by demonstrating its validity in predicting long-term student success, the authors’ bonehead quotations gave teachers more reason to hunker down.
The NBER authors should have taken a page from the other report released on Friday, the Gates Foundation’s new MET project findings. In the Policy and Practice brief, the MET project provides clear, actionable guidance for developing fair and reliable teacher evaluation systems. The central insight from the brief is the importance of multiple measures: combining value-added scores, student feedback surveys, and classroom observations maximizes each measure’s strengths to create a more reliable predictor of teacher effectiveness.
Used independently, each measure has its tradeoffs. A teacher’s value-added score is the single best predictor of future teaching effectiveness, but since a teacher’s value-added score can fluctuate year-to-year, it is not the most consistent. As Professor Chetty notes above, using value-added scores to decide which teacher to fire will result in mistakes. On the other hand, student feedback surveys are reliable across a teacher’s different classrooms because they are based on many perspectives developed over the course of a year, but they only have moderate predictive value.
Classroom observations can provide targeted feedback to teachers, but require multiple observations and different observers to have consistent results. In fact, the MET project found that a single observation by a single observer was less reliable than the teacher’s value-added score, although reliability increases with multiple observations. Observation is not the most predictive measure, but the MET project highlights its importance as a developmental tool. While the NBER study advocates for “deselecting” the bottom 5% of teachers, the MET project rightly puts a premium on being fair to teachers and providing them with useful feedback.
By using all three measures in their teacher evaluation system, school districts will have the predictive power of the value-added scores, the reliability of the student feedback surveys, and the diagnostic ability of the classroom observation to inform their human capital decisions. When it comes to increasing the effectiveness of the teacher workforce, school districts should first give an ineffective teacher a chance and the necessary supports to improve. If the teacher does not improve, the district should fire her. But if a teacher can be fired—or believes that she could be— due to a statistical error, the impact on the quality of teaching workforce could be disastrous. Why would a bright young professional choose a career where she could be the mistake?