While it’s easy to pick at flaws or cast doubt on policy ideas, meaningful change almost always requires the navigation of thorny dilemmas. Last week I touched on the dilemmas in the DC IMPACT teacher evaluation program. Below, I’ll outline a few of the many dilemmas faced by the two assessment consortia that are developing the assessments that will replace current NCLB-mandated tests in most states, beginning in the 2014-15 school year.
First, let’s acknowledge that the status quo in not tenable. We must improve both the process and practice of student assessment — summative and formative. Few people would defend the quality of most state tests and the low bar that they set to proclaim a student “proficient.” And, here are just a few things that are driven by or dependent on high-quality assessment systems:
- Efforts to evaluate teachers based on student learning
- Accountability determinations in any new ESEA re-authorization
- Data for our fancy new data systems and colorful data walls
- The actualization of common core standards, including comparability and the ability for everything from textbooks to digital resources to online courses to be utilized across states
- Evaluation of charter schools and other “innovations”
The good news is that it can be done. Student assessment can be significantly improved. A number of promising new assessment tools and approaches are increasingly available. I’ve written about the potential to utilize technology. My colleague Elena Silva has researched and written about tools — such as the College and Work Readiness Assessment — that have the the potential to help assess higher-order skills. And, a number of other countries successfully help instructors develop high, consistent expectations for students by implementing smart practices for teacher involvement in assessment design and scoring.
But the implementation challenge is immense.
While assessment could be significantly improved, large-scale state testing systems are designed to be very slow to change. Test scores are equated to be comparable over time. State and district officials are reluctant to do anything that could risk lower scores. Changes in design, administration processes, and scoring procedures are costly. Any new tests must be considered fair to both educators and students. These and many more factors make it difficult to implement anything but extremely incremental change.
This context is critical to understanding the dilemmas behind the efforts to improve assessment in our country. So, on to a couple of issues raised by Rick Hess last week:
1. The cost to implement online testing: This is a big issue and I share concerns that it is not being addressed proactively. A strong case can be made for these investments: There are tremendous cost and use inefficiencies in the current paper-based testing system, including the slow turnaround of scores that renders them less effective for accountability, teacher evaluation, and instructional uses. But, unless states and districts are fully prepared for the heavy lifting in the years ahead, we won’t get the change we seek. For example, schools could cobble together enough computers and bandwidth and devote it all to testing, or they could begin now to figure out how to pay for and leverage this technology for instructional use. Critical, because to be fair and accurate, the test can’t be the first time that a third-grader is exposed to a computer-based simulation or online writing assignment. Last fall I sat in on a day-long panel that representatives from the two assessment consortia attended with state technology leaders from SETDA, so I know they are trying. But they need to make the case and help states re-direct and prioritize funding in smart ways.
2. Curriculum: Hess worries that the assessment designers are slowly backing into curricula development, leading to not just common standards, but a common approach and sequence to teaching those standards. This is a big issue and needs to be discussed at length. Here are a few initial ideas.
First, the consortia want to develop “through-course,” or periodic assessments throughout the year, partly to reduce the inherent noise from a one-shot test. (The one game aspect is why NCAA basketball’s “March Madness” is much more volatile and open to upsets than the NBA’s seven game series.) How to get the benefits of through-course assessments, without what Hess fears, is a dilemma that I hope the assessment consortia are prepared to confront.
Second, the “clearer” part of “fewer, clearer, higher” standards is critical. If the standards cannot be clearly defined within the curriculum, then we end up with generic tests and weaker instruments. Here’s one example from a paper I wrote two years ago: Simulated exercises are particularly useful for assessing students’ knowledge of interactions among multiple variables in a complex system, such as in an ecosystem. But, since these models assess both process and content, they require assessments that are closely linked with classroom instruction. This presents a problem for the broad use of these models. An early prototype in science, for example, restricted its assessment to scientific problem solving with technology—rather than science content—because NAEP cannot assume that students in the nation’s some 14,000 school districts have all covered the same science content. Most of the time in science, however, as University of Maryland researcher Robert Mislevy explains, “it’s not ‘here’s the situation in the world, and you give the answer.’ Usually you have some hypotheses, some conjectures, but then you do something, and the world does something back. It’s these cycles that really get at the nature of what model-based reasoning under constraints is really about.”
Again, how can we navigate this tension between stronger assessments and the need for instructional flexibility? One essential tool is a cognitive model. The cognitive model forms the bridge between two different uses of testing. Summative assessment describes what has been learned over time and is used to judge the performance of students or schools, while formative assessment is meant to guide teaching and learning as part of the instructional process. Projects built on cognitive models attempt to build both summative and formative components, held together by a common conception of how students learn a particular subject. Cognitive models are also key to the development of promising technology-supported learning approaches, such as the School of One.
Finally, Hess points to a certain tone deafness from the assessment consortia. This is a big problem. Hess is right to note the political importance of these issues. Here’s one suggestion for the consortia: open, public engagement. Rather than dismiss concerns, discuss these dilemmas and trade-offs openly. Go beyond the assessment experts and interest groups and take a cue from successful tech companies. Employ a blogger to describe and let others into your development process, illustrating your thinking and reasoning along the way.


Chad Aldeman
Kristen Amundson
John E. Chubb
Constance Clark
Peter Cookson Jr.
Thomas Dawson
Joni Finney
Andrew Gillen
Sara Mead
Jeff Selingo
Ben Wildavsky
Mandy Zatynski 


Pingback: Off Message
Pingback: Testing and Cheating | The Observatory
Pingback: Three Truths About Testing and Cheating
Pingback: Tweets that mention Wrestling with Policy Dilemmas: Common Core Assessments -- Topsy.com