Tomorrow, the *LA Times* will release data on teacher value-added test scores for elementary teachers in LA Unified. My colleague, Elena Silva lays out some of the details (here). While I acknowledge that I have and always will be a data geek, I think that this move will have lasting impact in the education world. Just as school accountability systems have lead to parents and real estate agents focusing on school quality like never before, the information that the *LA Times *will be providing will empower parental decisions about their child’s teacher in a way that has not been seen before. But what is the quality of the information that will be released?

I think that value-added data is an essential next step in school accountability, but at a school level, and not necessarily at the teacher level. Why the school level and not the teacher level? The reason is that the accuracy of the measure increases with the number of students. For a 3^{rd} grade teacher with 20 students (the good old days in California before class sizes started to grow), 20 observations are just not enough to determine a teacher’s effectiveness. In contrast, hundreds of observations at a school can provide a much more accurate measure.

Unfortunately, this value-added data will not be as accurate a measure of a teacher’s effectiveness as we would like especially if it is going to be publically linked to individual teachers. But even with those data problems, does it still make sense to provide public access to these value added measures? I think that there may be a middle ground that focuses the attention of the education world on the importance of teacher effectiveness and gains in student achievement without publically embarrassing individual teachers with data that is not necessarily reflective of a teacher’s true effectiveness.

The* Times* will release the data tomorrow, so it is not worth debating whether they should do this or not. Let’s hope that they only release it for teachers with at least several years of data, and not just 2 or 3 years of test scores. The important question is whether a similar analysis should be conducted at the other 10,000 school districts in the county? I have struggled with this for the last couple of days. The data geek in me can’t wait for Thursday’s release to take a look at the data and all of the interesting finding that will come out in the days that follow. But, my policy wonk side thinks that this release has gone too far, and that all of the problems with this type of data will be glossed over, and the data will be misused. See below for details on some of the bigger data problems with value-added measures for teachers.

I think that there is a middle ground that can raise the focus on student progress and teacher quality without publically embarrassing specific teachers. It would seem fair to release school wide summaries of the data without singling out individual teachers. This would be a more information than a single year schoolwide summary of value added like many state accountability systems already provide (think Colorado, Tennessee, Florida…). It would highlight the importance of focusing on this data. It would also focus on the need for the education system to develop meaningful teacher evaluations systems, and better measures of teacher effectiveness. But, it would not focus on publically embarrassing teachers.

For example, you could first divide teachers into one of three groups based on their value-added results – high growth, typical growth or low growth. This is basically what the Colorado school accountability system currently does, but at a school level and not an individual teacher level. Then for schools with at least a minimum number of teachers, say 8 teachers, the state (watchdog group or newspaper) could release the distribution of the number of teachers in each of the three groups. If the school had less than 8 teachers, then a single schoolwide average would be provided.

First of all, it is important to explain why the release of any information on value added is a good idea. Publically releasing value-added data will jump start a lot of important conversations not currently happening. From the first article that the *Times* released, the part that I found most disturbing was the fact that for many teachers this will be the first time they will see their own value-added results. Should it take a newspaper analysis and public release for a teacher to know something about how their students are doing over time. Of course not. Even if this is not the most accurate measure, teacher should be informed, and asked to question what these value added measures tell them. A teacher should know about the value-added gains of each student, so that they can match the data with a face, and really understand how students are progressing. Looking at the data may start to impact the only thing that really matters in the end, and that is the quality of instruction.

In addition to the creating the opportunity for teachers to use this data to inform their instruction, this type of data will also put pressure on the school system to develop higher quality measures of teacher effectiveness, and to use those measures in improved teacher evaluation systems. To date the system has been resistant to this type of change. Daniel Willingham thoughtfully lays out why teacher groups need to step up and develop quality evaluation systems before others do it for them.

Releasing a school summary of value-added will support the collaborative environment that teachers want. A school’s staff will want to have all of their teachers producing high growth, and will work to support each other to make that happen.

Finally to put the public release of school value added summaries in context, in school accountability, public reporting is often referred to as low stakes accountability. High stakes is when accountability results actually force a school to make changes – think School Improvement Grants. Similarly, when compared to what is happening in other schools districts where value added data is starting to play a major role to make layoff decisions or salary determinations, public reporting does not seem like such a major step.

**Problems with Value Added Measures of Teacher Effectiveness.**

Some of the biggest problems with value-added measures are (1) value added measures are often are not stable over time, (2) students are not randomly assigned to schools or classrooms, (3) the state test is not great at measuring value-added, and (4) the measures can only be calculated for around 1/3^{rd} of teachers.

(1) Teacher value-added data not stable. Research shows that teacher value added measures aren’t stable over time. In the chart below from an analysis conducted by Julian Betts and Cory Koedel on San Diego test data, it shows how stable these measures of a teacher’s value added data are from one year to the next. The chart shows that for a teacher in the top quintile in the first 3 year time period (top performer), that teacher has a 30 percent chance of being a top performer in the second 3 year time period, but also has a 13 percent chance of being a bottom quintile performer

The Stability of Value-Added Measures over Time (overlapping 3 year periods) | ||||||

Second Time Period | ||||||

Bottom Quintile | Second Quintile | Third Quintile | Fourth Quintile | Top Quintile | ||

First Time Period | Bottom Quintile | 30% | 20% | 19% | 18% | 13% |

Second Quintile | 23% | 25% | 13% | 21% | 18% | |

Third Quintile | 18% | 20% | 25% | 24% | 13% | |

Fourth Quintile | 15% | 16% | 26% | 20% | 23% | |

Top Quintile | 13% | 17% | 16% | 19% | 35% |

See this previous post for a discussion of other studies of the accuracy (or lack there of) of value added measures over time (pre-tenure and post-tenure) and across assessments (here). A recent IES technical report looks at the error rates for elementary teacher (Statistical geeks click here) or (Bruce Baker’s summary of the paper). It finds that with only 2 years of data, there is a 1 in 4 chance that an average teacher would be misidentified as a low performing teacher, and a 1 in 4 chance of an actual low performing teacher not being identified. The accuracy gets better with 3 or more years of data with error rates dropping to around 1 in 10 misidentified.

So, at least at the teacher level, this data is not likely to be accurate enough to base decisions about a teachers effectiveness without information about many other measures of effectiveness.

(2) Students not randomly assigned. For this value added data to be a good measure (unbias) of teacher effectiveness and comparable across a district, students would need to be randomly assigned to schools and to teachers. This obviously does not happen. Although there is some level of school choice in LA, student assignment across schools is still largely based on a student’s home address which means that demographics are the largest indicator of school assignment. Similarly, well informed parents are likely to influence which teacher their children get. And, while these decisions are based upon the rumor mill instead of hard data, the rumor mill may be just as accurate or more accurate than the value-added data (an interesting empirical question).

(3) The California test used for this analysis was not build to measure a value added score for teachers. And, while the test is very accurate at determining whether a student is above or below the proficiency cut score, the further from that cut score that you go the less accurate that it is. For example for over 10 percent of student’s at the bottom end of the distribution, their test results can not be differentiated from random guessing. So, if you have students starting in this achievement range, you really have no idea what they know, so you can’t really figure out how much they have learned. The next generation of test will do a much better job of this, but it will take years before next generation assessments are in place, and then three additional years of data to measure value added growth with the new assessments.

(4) Value-add measure aren’t available for all teachers. There is a reason that the Times only focused on grades 3-5. These are the only grades where a single teacher is largely in charge of a student’s education. In other grades, value added can only be measured in math and English, and math gets a little complicated when students start to take different math courses later in middle school and high school. Generally only 1/3^{rd} of teachers will have value added scores, meaning that these teachers will be singled out.

