On Tuesday the Measures of Effective Teaching (MET) Project released its third and final series of reports. The media has reported the main findings: that we can measure and predict effective teaching. And, because the MET Project randomly assigned students to teachers, we can say that there is causality in this relationship, that teachers with high value-added scores in one year caused student achievement to rise in the following year.
But the reports also include a host of interesting detail and helpful suggestions to districts and states. Among them:
- Current state tests can be used to identify effective teachers. Assessments that require higher-order thinking skills are likely to be better at differentiating teachers, but even the current low-level tests that states are using are valuable in identifying effective teachers. Important, the same teachers who raised student achievement on low-level state tests also raised student achievement on more cognitively challenging, open-response type assessments.
- Both teacher observations and student surveys, the two other measures considered in the Project, are predictive of future student achievement. However, once they are combined with a teacher’s value-added score, they no longer add any predictive value. Instead, they add year-to-year stability (what researchers call “reliability”) to teacher ratings. They also provide more detailed and timely feedback to help teachers improve their practice. (If you want to parse this further, Jay P. Greene is sort of right on this but Marty West has a more nuanced take.)
- The MET Project tested high-quality observation rubrics that are widely considered some of the best in the education field, each of which had been tested and validated in other settings. States and districts attempting to create their own are not likely to have results as strong.
- Similarly, the MET Project used only the most high-quality and rigorously evaluated student surveys that were created by the Tripod Project. It will be difficult for states and districts developing their own student surveys to see similarly strong results.
- There are also a number of steps that states and districts can take to maximize the reliability of teacher observations:
- Observers should be trained on the observation protocol and tested on their accuracy before conducting meaningful observations.
- Two observations are better than one. The results from two 45-minute observations were substantially more reliable than only one 45-minute observation. MET tested reliability for up to four evaluations. Each additional observation increased reliability, but the largest gain was from moving from one to two observations.
- Two pairs of eyes are better than one. Using two different observers increased reliability significantly more than having the same person observe two lessons.
- Different combinations of the number of lessons and observers can work equally well. For example, having two observers each watch a 45-minute lesson had as much reliability as a principal watching a 45-minute lesson and three peer teachers each watching a 15-minute lesson.
- Districts should focus on rank, not rating. Although principals tended to give their own teachers slightly inflated ratings, their rankings were very similar to those of outside observers.
- Districts don’t need to surprise teachers (through unannounced visits). Although teachers tended to earn higher ratings when they were told in advance that they would be observed, rankings of teachers were very similar regardless of whether the observation was announced or unannounced. Giving teachers notice of an observation may help reduce stress and increase their belief in the fairness of the observation system.
- Video can save time and not harm reliability. The MET Project videotaped teacher lessons and let observers watch the videos on their own time. This saves precious time in the workday. Given the evidence that teachers do not need to be surprised, they could even be asked to videotape themselves and provide the tape to principal or peer observers.
Tuesday’s release also included a long paper devoted specifically to how states should think about weighting different measures into one overall teacher evaluation rating. The researchers looked only at test-based value-added scores, student surveys, and teacher observations. Among their findings:
- In terms of predicting which teachers would be more effective at raising student achievement in future years, prior-year value added scores were the best of the three measures. The study did not look at other measures of student growth such as Student Learning Objectives or whole-school growth, but it’s likely those measures have lower predictive power than an individual teacher’s value-added results.
- To balance predictive power and year-to-year stability, states should weight state tests between 33 and 50 percent of a teacher’s evaluation. More than that reduces reliability, but less than that reduces predictive power.
- Important, the 33-50 percent recommendation is for student growth as measured by state tests. Many states have required teacher evaluations to be based on a broad “student growth” factor that includes growth on the state test and other measures of student growth, such as Student Learning Objectives or other locally developed measures. The MET Project suggests student growth as measured by state tests should be weighted at 33-50 percent. Instead of waiting for objective evidence on how their new teacher evaluation systems are playing out, states like Maryland and DC are pre-emptively lowering the weighting for test-based student growth. The MET results suggest those changes, while politically appealing, will cause the evaluation ratings to lose predictive power.
Finally, there’s a debate about just how many different variables teacher value-added models should include. For example, many states and districts have added statistical controls for things like student race/ ethnicity, poverty, or other demographic factors. Doing so assumes, for example, that low-achieving black students should have different expectations than low-achieving white students. The paper concluded that there was no particular statistical rationale for controlling for student demographics.
The MET Project’s formal work is now over, but researchers will have access to all of the data, videotapes of teacher lesson plans, and observation ratings to dig in as they choose. Find more information about the MET Project, including all of its reports and an FAQ here.
Photo Credit: Eagle Country Online



Chad Aldeman
Kristen Amundson
John E. Chubb
Constance Clark
Peter Cookson Jr.
Thomas Dawson
Joni Finney
Andrew Gillen
Sara Mead
Sarah Rosenberg
Jeff Selingo
Ben Wildavsky
Mandy Zatynski 

