David Leonhardt’s value-added piece is characteristically thoughtful but I think he skips over some important issues here:
Value-added data is not gospel. Among the limitations, scores can bounce around from year to year for any one teacher, notes Ross Wiener of the Aspen Institute, who is generally a fan of the value-added approach. So a single year of scores — which some states may use for evaluation — can be misleading. In addition, students are not randomly assigned to teachers; indeed, principals may deliberately assign slow learners to certain teachers, unfairly lowering their scores. As for the tests themselves, most do not even try to measure the social skills that are crucial to early learning.
The value-added data probably can identify the best and worst teachers, researchers say, but it may not be very reliable at distinguishing among teachers in the middle of the pack. Joel Klein, New York’s reformist superintendent, told me that he considered the Los Angeles data powerful stuff. He also said, “I wouldn’t try to make big distinctions between the 47th and 55th percentiles.” Yet what parent would not be tempted to?
Many things in life fall along a normal distribution and there’s no reason to think teachers are an exception. I wouldn’t try to make a big distinction between the 47th and 55th percentiles in the average height of adult males, for example, because there isn’t a big distinction. That doesn’t mean that tape measures are an unreliable method of estimating height.
Similarly, a single year of scores can be a misleading way of estimating a teacher’s effectiveness over multiple years. But that’s also an inherently dumb way to interpret the meaning of a single year of scores. The fact that teacher value-added scores vary from year to year isn’t a bug, it’s a feature. Teacher effectiveness isn’t a permanent attribute–it’s what happened among a given group of students over a given length of time. There are plenty of unreasonable ways to interpret that information. That’s not an argument that the information shouldn’t exist.
Teachers are only disadvantaged by being assigned to slow learners if the value-added measure doesn’t account for students’ previous learning history. But widely-used systems like EVAAS do exactly that. In those measures, student growth is compared to their previous learning trajectory. If they learned slowly in previous grades, that becomes the baseline for comparison.
More broadly, the whole LA Times debate illustrates the long-run futility of aligning your interests against public information. Once schools started testing students annually, it was just a matter of time before someone took the inherently logical step of estimating how much student learning in a given classroom changed from the beginning of the year to the end. To be sure, there are all kinds of effective short-term ways to prevent this kind of information from being created and/or released. When I sat down seven years ago to write a policy paper on value-added, I assumed this day would come sooner than it did. Many aspects of that paper, my first attempt at the form, make me wince today–it’s far too long, awkwardly organized, and all but ignores the crucial issue of statistical error. But I think the basic point–value-added data can help create a more functional job market for teachers–still stands. And now that the genie is out of the bottle, it’s not going back in.
Click Image To Enlarge


{ 5 comments }
Many things in life fall along a normal distribution and there’s no reason to think teachers are an exception. I wouldn’t try to make a big distinction between the 47th and 55th percentiles in the average height of adult males, for example, because there isn’t a big distinction.
Mr Carey,
Although I agree that the non-random assignment issue in VAM is not quite as dire as some people make it out to be, your statement that “Teachers are only disadvantaged by being assigned to slow learners if the value-added measure doesn’t account for students’ previous learning history” is not quite accurate, at least from my understanding.
As you indicate, the inclusion of student fixed effects (previous learning history) does mitigate the non-random assignment bias, as does the use of multiple years of data. But if students are sorted based on prior gains, or on any unobserved characteristic correlated with performance, the fixed effects cannot remove the bias, only lessen it. It remains a significant problem for VAM, even if it is overstated sometimes.
I also think it’s important to remember that when error of any type is reduced, even to a relatively low level, this still usually means that many *individual* teachers’ estimates will be biased I just always worry that people will hear that error is reduced to low levels and fail to understand that the distribution of that error is uneven. But maybe I’m paranoid.
The fact that teachers’ value-added ratings vary from year to year does not mean that teachers are good one year with one group of students and bad the next with another group. It means that the estimates are crude. A teacher can be equally good two years in a row but the value-added rating would show that they are high one year and low the next. Value-added scores are not tape measures.
That said, Leonhardt is right: they do provide important information that should not be dismissed. But to assign them a precision they do not have does a disservice not only to teachers but to students as well.
Students would benefit from a more functional job market for teachers.
“value-added data can help create a more functional job market for teachers”
And here I thought you guys were all about helping students. But no, you are about markets.
Comments on this entry are closed.
{ 3 trackbacks }