A few comments following on Elena’s post below:
Education policy people already know this but for those Q&E readers who are more policy generalists, it’s worth emphasizing what a tremendous big deal this Gates teacher study is going to be. The scale and ambition of it are without precedent and its results will influence the way people think about teacher effectiveness for years to come. It’s the result of a decade-plus-long policy conversation that went something like:
“Research based on newly-available standardized test data for huge populations of individual students and teachers shows that individual teachers make a huge difference in how much students learn, much more than we previously thought, and that effectiveness varies a lot among teachers who have very similar credentials!”
“So we should start training, evaluating, paying, promoting, and otherwise thinking about teachers using that effectiveness data, which would be a whole lot different than the way we do those things now!”
“But wait –test scores don’t tell us everything! The statistics aren’t perfect and not everything teachers do is captured in the results! We need multiple measures of teacher quality, to see the whole picture.”
“Okay, let’s do that!”
In other words, there’s an element of bluff-calling here–people who frankly would rather teachers weren’t seriously evaluated at all have been hiding behind the lack of good research and legitimate observational instruments to block changes to the existing, largely evaluation-free teacher system. Now that Gates has assembled everyone who’s anyone to participate in this huge study while federal and state policies are moving more school districts to adopt multi-variate teacher evaluation systems, those excuses will hold considerably less water.
It’s also worth remembering as we interpret the results of the study that the optimal level of correlation between multiple measures of teacher effectiveness is not 1. That’s the point of them–each perspective adds unique information not picked up by others. The fact that previous year’s value-added results predict the next year’s results strongly but not perfectly isn’t a bug, it’s a feature–of course a teachers’ effectiveness will vary given different students and circumstances. A video observation ought to tell us things that standardized test scores and student surveys don’t. And so on.
That leads to the interesting question of what the ideal level of correlation is and which measure is “controlling” in thinking about measurement validity. I don’t think those are question that can be resolved empirically, unless one measure is clearly a big outlier. The idea level is probably, I don’t know, 0.7, something like that? No correlation would suggest that at least one of the instruments is seriously flawed while total correlation would be redundant and implausible. And indeed the initial results seem to be in the neighborhood of the sweet spot.
Also, this is worth quoting in full:
Some of the classrooms in our study did focus on test preparation. In many classrooms students reported that “We spend a lot of time in this class practicing for the state test,” or “Getting ready for the state test takes a lot of time in our class.” However, the teachers in such classrooms rarely show the highest value-added on state tests. On the contrary, the type of teaching that leads to gains on the state tests corresponds with better performance on cognitively challenging tasks and tasks that require deeper conceptual understanding, such as writing.
Click Image To Enlarge


{ 3 comments }
I don’t believe VAM’s will be thrown away so quickly. The economists are moving quickly to more complex and current statistical techniques (this is a good thing). Having said that, the Gates reported model would not stand up to peer review had they submitted the report through traditional routes. The model does not meet the accepted criteria, they even try to say one test is statistically significant by adding a p value (not a commonly accepted practice for this type of analysis).
I meant to say that they started with the subject where VAMS are LEAST UNRELIABLE.
Read the study! If that’s all they got, VAMs are headed to the ash heap of history faster than predicted. They started with the subject where VAMs are obviously the least reliable, elementary Math, and still couldn’t put together more than a PR campaign to contradict their own lack of success.
I can’t wait till lawyers use the actual findings of the Gates report on cross examination to reverse the firings of teachers with VAMS.
Elena had this backward. “Early findings are pretty much about Measure 1–whether a teacher’s value-added on student achievement tests is a good predictor of the later performance of students—and a little about Measure 4—how well students can judge teacher effectiveness.”
The thrust of the study was about #4, as was reported in the NYT.
Even the last numbers-only report from Brookings reported how much the fluctation was. No they did not offer anything new of “the predictive value on ACTUAL teacher effectiveness,” just how much the subsequent guess fluctate from the original ones. (emphasis mine)
Reread the Gates actual words in the context of Elena’s statement “How predictive are they? Pretty strong.” But according to MET: “In every grade and subject, a teacher’s past track record of value-added is among the strongest predictors of their students’ achievement gains in other classes and academic years.” Well duh. But they are just talking numbers, not giving new evidence that the numbers posted teachers who failed to meet their growth targets show evidence of ineffectiveness.
That’s where the rubber meets the legal road.
“Volatility is not so large as to undercut the usefulness of value-added as an indicator (imperfect, but still informative) of future performance.” To be “AN” indicator, yes. to withstand legal scrutinity, no way! Even Brookings reported how much volitility they found. Why was the actual finding left on the cutting floor?
But here’s where Elena hit it out of the park! Yes, “Expect to see the quote “value-added is among the strongest predictors” everywhere.”
This is straight out of Karl Rove’s playbook. If you don’t have the fact, argue the law, if you don’t have the law, shout a good story headline.
Then they offer ONE PIECE OF EVIDENCE that teachers with high value-added really teach better. That is one finding. It is relevant to assigning incentives on the top. Congratulations for that $45 million piece of evidence.
Finally, do teachers have a larger effect on achievement in math than ELA? MET study says yes, but they also acknowledge the strenght of the body of evidence to the contrary, and only offer a speculative counter.
They don’t claim to have found evidence that VAMs aren’t less valid for ELA, or high school subjects. Guess what? They won’t.
They call for better tests, but can they find evidence to refute the common sense assessment that their “reforms” – like has been the pattern in the past – will not encourage more primitive and ineffective test prep?
Guess what? They won’t.
Its easy to see where they are going. They want a $300 million meta-analysis to replace human judgement. Meta-analyses are fine but they won’t replace the rest of the social sciences and scholarship. Neither will the Gates effort to pile more garbage-in, gargage out numbers, disconnected to reality, into some humonguos dashboard.
They didn’t even report the low-income rate of their student survey sample. Will they analyse videotapes with the same disregard to context?
Comments on this entry are closed.
{ 2 trackbacks }