Kevin Drum wrote a good post a couple of weeks ago about statistical illiteracy in the media, *viz. *the widespread tendency to characterize election poll results in which one candidate’s percentage point lead is equal to or less than the poll’s statistical margin of error (MOE) as a “statistical tie” or “dead heat.” Kevin notes:

…probability isn’t a cutoff, it’s a continuum: the bigger the lead, the more likely that someone is ahead and that the result isn’t just a polling fluke. So instead of lazily reporting any result within the MOE as a “tie,” which is statistically wrong anyway, it would be more informative to just go ahead and tell us how probable it is that a candidate is really ahead. Here’s a table that gives you the answer to within a point or two:

As Kevin notes, if Obama is up by three points, and the MOE is three points, it’s 84% likely that he’ll be the next President of the United States. That’s very different than 50% likely, i.e. an actual tie.

This is directly relevant to education because most states use precisely the same statistical techniques when deciding whether a school has made Adequate Yearly Progress (AYP) under No Child Left Behind. If, say, 65% of students need to pass the test in order to make AYP, and only 62% pass, but the state determines an MOE of 4 percentage points, then the school makes AYP because the score was “within the margin of error.”

This is silly for two reasons. First, unlike opinion polls, NCLB doesn’t test a *sample *of students. It tests *all *students. The only way states can even justify using MOEs in the first place is with the strange assertion that the entire population of a school *is *a sample, of some larger universe of imaginary children who *could *have taken the test, theoretically. In other words, the message to parents is “Yes, it is true that *your *children didn’t learn very much this year, but we’re pretty sure, statistically speaking, that had we instead been teaching another group of children who do not actually exist, they’d have done fine. So there’s nothing to worry about.”

Second, per Kevin’s chart above, the idea that scores that fall below the cutoff but within the margin of error are statistically indistinguishable from actual passing scores is incorrect. This is particularly true given that, while opinion polls almost always use a 95% confidence interval to establish their MOEs, most states use a 99% confidence interval for NCLB, which results in substantially larger margins of error around the passing score. But states do it anyway, because many of them basically see NCLB accountability as a malevolent force emanating from Washington, DC from which schools need to be shielded by any means necessary.

Think of it this way: let’s say your child is sick and you bring him to the doctor. After the diagnosis is complete, you and the doctor have the following conversation:

*Doctor*: My diagnosis is that your son has pneumonia and needs to be hospitalized.

*You*: That’s terrible! Are you sure?

*Doctor*: Well, there are few absolute certainties in medicine. It’s possible that he only has bronchitis. But I’m pretty sure it’s pneumonia.

*You:* How sure?

*Doctor:* 84% sure.

What would you do? Would you (A) Check your son into the hospital? Or would you (B) Say “Hey, there’s a 16 percent chance this whole thing will work itself out with bedrest and chicken soup. Let’s go that way.”

States implementing NCLB nearly always choose option (B). That’s because they see the law as a process for making the lives of educators worse, not what it actually is: a process for making the lives of students better.

If you think students need 80% correct to be proficient, set the cut score at 75% to account for error. But having done so, don't then proceed to tell parents that schools have met AMOs under NCLB when in fact they have not.So are you saying that the schools have

alreadydone this, or that theyshoulddo this?It's true that if you have a 100% sample of test scores you don't need a confidence interval. You have sampled your entire population. That said, no one is interested in the population of

test scores- they are interested in what the students know. You have onlysampledwhat it is that students know. Ignoring the systematic error associated with the instrument, we still have a random error associated with the measurement. Different versions of a test may be identical on average, but individuals will perform differently on different versions of a given test. Even with a given test, individuals will perform differently at different times. You might be hungry one day, sleepy another, and in peak mental condition the third.One could, of course, fudge the standards up front, as you suggested, and build in a "margin of error". But why build junk like that into the system? Why not use something more reliable - like a margin or error? You know - something that you can actually calculate from the data...

You: That's terrible! Are you sure?

Dr: Well, there are few absolute certainties in medicine...But I'm pretty sure it's pneumonia.

You: How sure?

Dr: 51% sure.

What would you do? Hey, what matters is your kid is sick! If there's a 51% chance he needs to be treated for pneumonia, then we begin treatment--now! It sure beats the status quo.

Another lesson learned from Q&E. When it comes to statistics--kids, don't try this at home.

Live with band: http://www.youtube.com/watch?v=3d_VU2XUP-E&feature=related

Live by himself: http://www.youtube.com/watch?v=BMQdtyot38s&feature=related

Enjoy.

It's true that tests only assess a subset of what students need to know, and test results are subject to measurement error. But the proper way to account for measurement error is in setting cut scores on the test. No states requires students to get 100% correct to pass. So when policymakers decide what score score is good enough, that's the place to make allowances for the imprecision of the instrument. If you think students need 80% correct to be proficient, set the cut score at 75% to account for error. But having done so, don't then proceed to tell parents that schools have met AMOs under NCLB when in fact they have not.

And the MOE around test results doesn't reflect the fact that not every kid was tested, it reflects the fact that that one test is only a sample of their ability -- if 100 different tests were given, the scores would differ each time -- and recognizes that a student's score would vary each time they took a test based on how they're feeling and what questions happen to be on the test.