Stories from the NY Times, Mother Jones, and the Washington Post bemoaned the flat National Assessment of Education Progress (NAEP) reading scores released Wednesday. Jay Matthews called it the epitaph of the No Child Left Behind era. The results aren’t quite so simple.

See, NAEP is different than most standardized tests. It takes a sample of the current population in every state, so this year’s population of kids is compared to the last time the test was administered. There’s an automatic correction for changing demographics, so as America has gotten less white, so has NAEP. In statistical terms this creates something called Simpson’s Paradox, which makes trend lines seem worse than they really are because of a hidden variable, in this case, race (Matthew Yglesias touched on this point yesterday).

To show how this impacts NAEP scores, here are the results of the long-term trend NAEP results for fourth-grade reading from 1975 to 2008 (I’m using the long-term trend version of NAEP, because it’s been largely unchanged since its first administrations in the 1970s. Its had one significant format change, in 2004, when NAEP administered both the new and old formats. Hence the dotted and solid lines in all of the following graphs). As the chart below shows, average fourth-grade reading scores have risen only modestly, from 210 in 1975 to 220 in 2008.

This is basically the same thing that showed up in yesterday’s results. There’s been some small gains over time, but year-to-year progress has been small.

But that’s not the whole story. See, these overall trend lines are a sampling of America. As we’ve become more diverse, NAEP has changed its sampling ratios to reflect our changing society. This chart shows the percentage of students drawn from racial/ ethnic categories over time. In 1975, NAEP test-takers were 80 percent white. By 2008, only 56 percent were. There were three percent more blacks in 2008 than in 1975, and Hispanics had quadrupled from five to 20 percent.

So, because NAEP has gradually included more black and Hispanic students, and black and Hispanic students score lower, on average, than white students, the total score doesn’t reflect the true gains made by each group. The chart below shows scores taken from the same testing years, this time disaggregated by race.

Each group has actually made greater gains over time than the overall total. White students increase 11 points, one more than the national average. Black students scored 23 points higher, and Hispanic students were scoring 24 points higher in 2008 than they were in 1975 despite quadrupling in size. In other words, the white-black and white-Hispanic gaps are closing and every group is scoring higher, but the national score is showing more modest improvements because of demographic changes.

This is an important distinction to make, because it means the test score results are not just a matter of classroom teaching and learning (to be completely clear, I don’t think NAEP results can be easily attributed to national education policies like NCLB either). The *students themselves *have changed in important ways, and to break even or to make small achievement gains as society becomes more diverse is an accomplishment worth celebrating. At the very least it’s worth understanding.

For more background reading on NAEP, try Education Sector’s NAEP Explainer.

Pingback: SAT Score Hysteria and the Missing Chart

Pingback: Why Robert Samuelson Fails « Nashville Jefferson: A Nashville Education Blog

Pingback: NAEP and the 4th Grade FallOff « The Core Knowledge Blog

Pingback: Simpson’s Paradox and the NAEP « Modeled Behavior

Pingback: Simpson’s Paradox and the NAEP « Modeled Behavior

Pingback: Pencils Down! « Around The Sphere

Pingback: Read: What is NAEP Edition