Neither student course evaluations nor student results on curriculum-aligned standardized tests are good predictors of professor quality. In fact, even though students tend to give better reviews to instructors who have higher test scores, both of these performance evaluation tools are severely flawed. It turns out that professors with lower test scores and more negative student reviews are the ones adding the most value to college students’ educations.
If this seems completely illogical to you, you are well-suited for reading the rest of this post.
The conclusions above were reached by two professors—Scott E. Carrell of UC Davis and James E. West of the U.S. Air Force Academy—in a new research paper, “Does Professor Quality Matter? Evidence from Random Assignment of Students to Professors,” featured in the June 2010 edition of the Journal of Political Economy. As the title suggests, 10,534 students who attended the U.S. Air Force Academy from the fall of 2000 to the spring of 2007 were randomly assigned to required introductory courses. During their time at the USAFA, their academic performance in each course, as measured by standardized course exams, was tracked as they progressed through the curriculum.
To minimize any bias in this experiment, all instructors teaching core courses used common syllabi and all students took standardized course exams that were graded by several professors (Professor A graded problem 1, Professor B graded problem 2, etc.). In the study, Carrell and West set out to determine which instructors caused the greatest amount learning for students, both in the “contemporaneous course” (i.e., the course taught by the instructor) and in “follow-on courses” (i.e., sequence courses taken after the contemporaneous course). An example of a contemporaneous course would be Calculus I, and some of its follow-on courses are Calculus II and aeronautical engineering.
Using advanced statistical methods, Carrell and West created a value-added model for instructors, which allowed them to isolate each instructor’s contribution to student achievement in the actual course that she taught and to her students’ achievement in follow-on courses. They found that instructors who cause higher student achievement in the courses they teach are less experienced, less likely to hold a terminal degree, and mostly not tenured professors. These instructors also receive better evaluations from their students. On the other hand, their model finds that instructors who cause less student achievement in the actual courses they teach, cause more achievement in follow-on courses. These instructors are more experienced, more likely to hold a terminal degree, and mostly tenured faculty.
So, how can one possibly make sense of this paradox? Apparently, it can be explained away by experienced, tenured professors providing students with some kind of enigmatic “deeper learning”:
Results show that there are statistically significant and sizable differences in student achievement across introductory course professors in both contemporaneous and follow‐on course achievement. However, our results indicate that professors who excel at promoting contemporaneous student achievement, on average, harm the subsequent performance of their students in more advanced classes. Academic rank, teaching experience, and terminal degree status of professors are negatively correlated with contemporaneous value‐added but positively correlated with follow on course value‐added. Hence, students of less experienced instructors who do not possess a doctorate perform significantly better in the contemporaneous course but perform worse in the follow‐on related curriculum.
Student evaluations are positively correlated with contemporaneous professor value‐added and negatively correlated with follow‐on student achievement. That is, students appear to reward higher grades in the introductory course but punish professors who increase deep learning (introductory course professor value‐added in follow‐on courses). Since many U.S. colleges and universities use student evaluations as a measurement of teaching quality for academic promotion and tenure decisions, this latter finding draws into question the value and accuracy of this practice.
As you can see from the excerpt above, it’s pretty clear that Carrell and West are searching for an intellectual argument to undermine student course evaluations and the common sense method of judging instructors by their students’ achievement on material from the actual courses they teach. If you’re like me, you are utterly perplexed by a system that would mostly determine the quality of a Calculus I instructor by students’ performance in a Calculus II or aeronautical engineering course taught by a different instructor, while discounting students’ mastery of Calculus I concepts.
The trouble with complex value-added models, like the one used in this report, is that the number of people who have the technical skills necessary to participate in the debate and critique process is very limited—mostly to academics themselves, who have their own special interests. I hope that this report isn’t used by the media or lobbyists to claim something like, “Research shows that tenured faculty are more effective instructors because they provide students with a deeper understanding of concepts. Course-aligned standardized exams and instructor evaluations by students are poor tools for assessing professor quality.”
To the skeptical consumer of higher education, this research report appears to be an attempt to erode support for evolving professor evaluation tools, while replacing these tools with evaluation methods that are designed by (and can only be understood by) the elite suppliers of higher education. The authors must better communicate their findings to the public and policy makers to displace this perception.
Thanks to Greg Mankiw for directing me to this research report. He calls the findings “fascinating.”