Student evaluations were based upon what appeared to be a reasonable premise: feedback from students provides a window into the strengths and weaknesses of a professor’s approach to education. To a certain extent, that premise is sound. But feedback of that kind is best used by the professor. And longitudinal feedback, evaluations gathered over a period of time is where helpful patterns surface. On the whole, those are far more useful and significant than individual evaluations gathered in a single semester.
When academic institutions began using student evaluations as a major metric for promotion and tenure, however, their use became counterproductive. And a growing body of empirical evidence suggests that they are not a reliable indicator of effective instruction.
Worse yet, their use has had a negative impact on the larger mission of colleges, universities, and (of particular interest to this writer) theological education. A recent of that research and a report on further research conducted by Wolfgang Stroebe and others makes the point succinctly:
Student Evaluations of Teaching (SETs) do not measure teaching effectiveness, and their widespread use by university administrators in decisions about faculty hiring, promotions, and merit increases encourages poor teaching and causes grade inflation.
How did we get here? Particularly in light of the fact that almost a century ago the creators of SETs cautioned against drawing a correlation between student evaluations and good teaching.
There are a number of factors at play:
One: There has been a decades long passion for measuring good teaching. The ability to say with empirical certainty that a teacher has been effective, allows institutions to evaluate the performance of their faculties. It offers ostensibly concrete evidence for rewarding or sanctioning poor performance. And, more broadly, the ability to measure good teaching implies that good teachers can be reproduced.
Two: That ability has implications for educational institutions across the board. And it bears upon the management of faculties and budgets, enrollment and retention of students, advertising campaigns, institutional ratings, and the viability of schools of education, in particular.
Three: As Stroebe notes, the creators of SETs also have a vested interest in advocating for the use of student evaluations. He points out that positive student evaluations correlate poorly with effective teaching when measured by student performance; and a whole host of intervening variables deeply shape the outcome of evaluations, including race, gender, physical appearance, and implied or implicit promises of a “good grade”. At best, Stroebe notes that the creators of SETs seem predisposed to ignore their weaknesses and, at their worst, willfully ignore them. Quoting Sinclair Lewis, he observes, “It is difficult to get a man to understand something, when his salary depends on his not understanding it.”
Four: Stroebe goes on to note that educational institutions are drawn into a mutually reinforcing relationship that blinds educational administrators to the same defects in student evaluations. An increasing number of administrators charged with the task of evaluating faculties depend upon them. Their jobs, in turn, depend upon certain “deliverables”. And, as such, the process takes ever deeper hold on the administrative structure of the institutions that use them.
Five: What Stroebe doesn’t note is that student evaluations are also an effective way for administrators to compel faculty to cooperate in making institutional goals a reality across the curriculum. For some time now, student evaluations have made allowances for institutions to craft a set of their own questions, which often insist on a variety of implied curricular goals that they believe are important to achieve. So, on the face of it, such questions function as a basis for evaluating the quality of instruction, but what they actually do is promote certain institutional goals.
Now some might be tempted to overlook the imbedded bureaucracy that has grown up around student evaluations. They may even shrug off the empirical evidence that falsifies the claim that they measure effective teaching. But SETs don’t just fail to do what they are supposed to do, as Stroebe notes, they have eroded the larger academic enterprise, inflating grades, distorting the picture we have of what students have really achieved, compromising academic standards, and dumbing down the curriculum of every school that uses them. Their use also overlooks a fundamental truth: learning often is often a struggle, it is not a lawn chair, and comfort isn’t necessarily the point.
Theological schools, in particular, ought to take this truth to heart. But as maintaining enrollment numbers becomes an ever-greater challenge – they are, in fact, among the institutions most likely to overlook the impact of student evaluations.
What might provide a way forward? The answer deserves broad-based conversation, but these are a few of the suggestions I would make:
- Acknowledge the problems with student evaluations and return their use to professors, to make whatever use they might make of them over time, knowing just how unreliable they are.
- Create a deeper collegial set of relationships, devoted to intentional conversations about the vocation of theological education.
- And develop a mentoring program that relies on building creative partnerships between established and younger faculty members.
Theological institutions should avoid aping the secular academy. And they have the theological tools to develop an approach to nurturing educators that is more deeply rooted in their commitments.
 On the face of it, this seems reasonable enough: An insistence on diversity, for example, insures a breadth of viewpoints in the classroom and guards against sexism or racism in the classroom. But imagine, for example, that you teach church history and your subject matter is the Caroline Divines. You will find it impossible to find diversity among its primary voices or the academics who write about them. (I offer this as just one example, because it so obviously underlines the problematic character of such goals, writ large, across a curriculum. But there are an endless number of other examples.)