The intrepid EdWeek bloggers Alyson Klein and Stephen Sawchuk caught a passage in President Obama’s latest budget proposal that would require states to, “develop a definition of ‘effective teacher’ that is based in significant part on student learning, and to put in place a system that links the academic achievement and growth of students to their teachers and school leaders.” This would complete the gradual progression from requiring a student-teacher data link in any education database that receives federal funding (the 2007 America COMPETES Act), the the State Fiscal Stabilization Fund’s assurance around teacher evaluation metrics and results, and the Race to the Top Grant precluding any state with a teacher-student firewall and insistence that teacher evaluations include student growth as a “significant” component.
This should not be seen as an entirely new policy, but one that’s been developing for at least the last three years.
The research behind using growth scores (the academic gains between two years of standardized tests) as a teacher evaluation metric is young but improving rapidly. In December, Dan Goldhaber and Michael Hansen released a paper for the Center for Reinventing Public Education looked at what would happen if teacher tenure decisions were based entirely off value-added scores. That is, after teachers had a couple years experience and were up for lifetime tenure, could a district accurately cut off the bottom 25 percent of teachers and improve their workforce?
The answer is a qualified yes.
First, the qualifications. The authors focused their paper by looking at value-added measures for teachers who could be accurately linked to students with at least one year of prior math and reading test scores. That limited the pool to almost 20,000 4th and 5th grade teachers observed for up to 11 years for a total of 63,000 observations. *For part of the analysis, they limited the pool even further, to teachers with at least five years of experience in the district, in order to observe the impact of two years of value-added data three years later (the first year after they’ve earned tenure). To enforce tenure rules according to these value-added limitations would obviously only affect a small sub-set of teachers: it would not affect any teacher teaching a grade lower than 4th or higher than 8th and would also exclude any teacher without obvious responsibilities for reading and math results.
These are important qualifications, but, if you can get past them, there are some interesting results. For one, the authors have some nifty graphs showing the growth of teacher effectiveness. As in prior studies, teachers make dramatic gains in effectiveness their first few years, but make only very modest improvements the rest of their career.
Two, differences in teacher effectiveness are wide and persistent across time. This is important, and it means that teachers with 3 years of experience have the same variation as those with 25. You are just as likely to find a great teacher with 3 years of experience as you are a terrible one with 25 years on the job, and vice versa. Other than the first few years, experience does not matter one iota.
Three, and this is the most important one, a teacher’s prior year value-added score is a better predictor of this year’s student’s success than any other teacher factor. That includes experience, whether the teacher holds a Master’s degree, their licensure score, the college they went to, or whether they were fully licensed or not. The value-added model was better than any of these other measures that are commonly used in selection and salary decisions.
Four, rejecting tenure for teachers in the lowest quartile of teacher effectiveness scores after three years on the job would improve a district’s overall teacher quality. In other words, this would not be a random exercise, but one that had some meaningful impact of student lives.
The paper comes with a handful of caveats–the actual improvements could be relatively small statistically, that value-added measures still have questions before being ready for high-stakes personnel decisions, and that any change would affect teacher behavior in ways we cannot predict–but it concludes by reminding readers that, “the results presented here indicate teacher effect estimates are far superior to observable teacher variables as predictors of student achievement, suggesting that these estimates are a reasonable metric to use as a factor in making substantive personnel decisions.”
*Update for clarity






Lowering Student Loan Default Rates: What One Consortium of Historically Black Institutions Did to Succeed
College and Career-Ready: Using Outcomes Data to Hold High Schools Accountable for Student Success
So is Dan Willingham ( http://tinyurl.com/yzqbfoa) entirely wrong?
“Obviously, teachers have little incentive to teach any topic that is not tested, or indeed, anything that will not be tested that year; why lay groundwork for improving next year’s scores? If you thought No Child Left Behind led to an overemphasis on testing, wait for the test-prep frenzy that follows linking salaries to test scores.
“Another problem: not everything is in the teacher’s hands. Rowdy kids are harder to teach than well-behaved kids. And it’s easier to teach your class if your principal (and parents) are helpful and supportive. Several studies have shown that teacher evaluations based on test scores are unstable. About 25 percent of teachers pegged as terrific or terrible get the opposite designation the next year.
“The logic underlying this approach is suspect. It assumes that teachers know what to do but just aren’t doing it or that they will figure out what to do once the pressure is on. It’s the equivalent of the frustrated parent shouting “I don’t care how you do it – just bring home better math grades!’’ No Child Left Behind should have taught us that improving student achievement doesn’t happen simply by mandating it.”
When will you edsector folks hire an actual teacher, at least to check your less than informed opinions of how schools work and what they might, or might not, need?
Chad has never taught, have you, Chad? You simply write about stuff you have no direct experience with, making your conclusions and/or positions dubious at best.
You cannot beat the poverty out of the equation by threatening teachers; teachers don’t cause the out of school factors that account for far more of our troubles than teacher unions or “bad” teachers.
You policy folks, with your dearth of teaching experience, are the ones who should be held to the fire for promoting/devaluing ideas based on the thinnest and weakest foundations available.
Talk to teachers. Talk to students. Then help us with what really matters–ending poverty and the outcomes poverty induces.
Chad, I’m with TFT. These guys have made their own cottage industry out of cranking out pro-choice and pro-charter publications for years. They have framed it as public school choice, and public charters, but suffice it to say their vision of Reinvention rests on options, not actually changing the existing system. That said, I know Goldhaber’s work. And he’s tremendously prolific. As an economist, he’s pretty legit, but he really doesn’t understand how schools work, or how school reform works. And yes, I know he spent some time on the school board in Alexandria. Doesn’t change my opinion.
Chad, they are an arm of the charter/privatization movement. I did more than read their name. Go ahead, look into it.
A system to define an efficient teacher that links teachers’ academic achievements to students’ performance is just not the right way to treat teachers.
Unfortunately, your post repeats a pattern I’ve seen all too often. You wait until the last paragraph for the caveats. In particular this honker: “value-added measures still have questions before being ready for high-stakes personnel decisions.” That needs to be front and center. The problem with all this is that these ways of measuring are being codified into law, and yet the measurement hasn’t been perfected enough to be ready for prime time.
There are already too many policy makers (and think tankers) who see value-added mechanisms as the brass ring in this conversation. But folks now want to require it before it’s been shown effective. I’ve been engaged in conversations around VAM for more than a few years now. And many of the same issues that were identified by the experts (I’m thinking of the RAND paper from 2004 by McCaffery, Lockwood and others) are still there. There’s simply not been enough progress in figuring out how to clear out the “noise” (as Melody puts it). And this is after a number of years of work with some of the top minds in the field.
It might be better than anything else, but to think we can mandate this method across the country is ridiculous. Not only is the technology so far unproven, it’s also *really* expensive. And there’s only about 10 people in the country who know how to do it with any level of credibility. Do we really want some HR person to have the ability to click a few buttons and determine some teacher’s tenure? No offense to HR personnel, but we’re talking about pretty high-level econometric calculations here.
My back-o-the-hand calculations, in looking at how far VAM research has moved in the past 6-8 years, figure we won’t have a solid model until roughly 2015. And that’s if we invest heavily in research and development. Oh, and then we need at least 3 years worth of data.
And it would really help if that teacher stayed in the same school for those three years. And the same grade. And it would help if they didn’t receive any professional development over that time – and if s/he does, it be identical to that received by all the other teachers in that school/grade. Cause if s/he takes one additional course, that’s one additional calculation to put into the formula. And it’s already one helluva formula (see p 9-10 of the linked report).
from a parental point of view — anything that does away with the fraud of the state and national certifications of ineffective teachers and focuses on how well the teachers teach and the students learn from the beginning to the end of each year is awesome!! The certifications are an absolute joke — speaking from experience…
TFT: The Center for Reinventing Public Education is a non-profit research organization affiliated with the University of Washington Bothell. You shouldn’t draw conclusions based on their name. And the researchers themselves are very highly regarded in the field.
Melody: You’re absolutely right, we still haven’t found a perfect measure, and the authors acknowledge many important caveats along the way. Still, it’s saying something that this “noisy” measure is better than everything we have been using.
I think you missed the most important qualification, which is that the prediction of how teacher value-added in subsequent years based on early years is relatively “noisy” — and that’s leaving aside the issue of whether standardized test results is the only thing we care about.
“… the multi-year correlations in teacher effects are modest by some standards… Further, the observed fade in the predictive ability of VAMs at increasing time intervals weakens the effect of any policy intervention based on these VAMs with time.”
Essentially, the authors are saying that the ability of their models to predict future performance is better than what we have now (degrees etc.) but they’re still not very good.
Observe Fig 4 (last page). For “de-selected” teachers, the modal teacher performance is only slightly below average. It’s a very crude filter. Yes, you are more likely to catch the lower performing teachers, on average, but you also catch many good ones.
[...] & the Ed looks at what might happen if tenure decisions were made entirely off value-added [...]
The “study” was done by the Center on Reinventing Public Education. I wonder if they have a financial dog in this fight?