Friday, March 02, 2012

Teaching by the Numbers

cross-posted from Dagblog

Last week, New York City released Teacher Data Reports for every teacher in its system. This week, I got my own teaching numbers: last semester's teaching evaluation scores. Getting my numbers was a good thing for me personally; they were very high, and my bosses tend to reward that. Releasing the New York City numbers was a bad thing generally, which can only set education in the city back. But both sets of numbers are largely bullshit.

Teaching quality is very difficult to quantify, and no method that currently exists comes anywhere close to doing it accurately. It's not impossible. But it is impossible with the current tools. It's like going to Mars: a good thing to do that is impossible this week but can be possible someday if we keep working.  In fact, getting quantifiable teacher assessment right is probably a bigger technical challenge, in terms of how far the existing technology is from the ultimate goal, than a Mars mission. Measuring teacher quality accurately is something that we should do, and which will have important benefits for K-12 and university teaching alike. But it's important to be realistic about what we can actually achieve right now. And relying too quickly on the flawed early technology, like launching a mission to Mars in a rocket that can't make it, will only set back progress and cause real people pointless harm.

These aren't sour grapes. The current method of quantifying university teaching treats me very favorably, and if more weight were given to those numbers it could only be to my benefit. My evaluation scores are reliably high, and last semester's were close to perfect. But that's no reason for me to believe them. Student evaluations, which are the only widespread quantitative measure of university teaching and by which some numbers-minded administrators set great store, measure student satisfaction, not student learning. These are obviously not the same thing. My numbers don't prove I'm effective. They prove that I'm well-liked. And the students have no way to know how much, or how little, they would have learned with a different teacher, because mine is the only version of that class they've taken. If, for example, I did not cover material that students at other universities would routinely learn in that course, my students would have no idea. And if my students learned significantly more than students in a similar course somewhere else, if they covered more material, understood it better, and developed stronger intellectual skills than students elsewhere, they couldn't know that either. They do have an intuitive sense of whether they're learning or not, and that intuitive sense is part of their satisfaction with the course and the teacher, but it's only part. These numbers are correlated with effective teaching, but the correlation isn't strong and (even worse) the correlation itself varies widely from class to class. Some popular teachers are very effective. Some are simply easy. Still others are merely likeable. Which am I? The numbers don't tell me.

There are other ways to evaluate university teaching: peer observations, teaching portfolios, testimonial letters from former students, and so on. All of these methods are also imperfect. They work best when several methods are combined, which gives you a fuzzy but reasonably adequate picture of how someone is doing in the classroom. The fuzziness of those measures has fundamental consequences for the entire profession of college teaching. They're good enough to weed out flagrant incompetence, or at least to reassure the school that the flagrantly incompetent have been weeded out. But they're not focused enough to make fine distinctions between good teachers and very good ones, let alone distinctions between the very good and the truly excellent. This is why professors ultimately advance their careers either as researchers or as administrators; research talent and administrative skill are easier to measure, and make it easier to distinguish the excellent from the merely good and the very best from the merely excellent. There isn't a career path that rewards superb teachers for their teaching, because the available measurements can't reliably tell those people from the teachers who are only above-average.

If college administrators tried to use the existing measurements to reward the best teachers by, say, promoting people whose teaching scores were 10% better than their colleagues', they'd be prospecting for fool's gold. Was I really 5% or 10% better last semester than I was the semester before? Hell, no. In fact, I was badly distracted last semester: I had an especially laborious and high-stakes administrative role to perform, I was making an inter-state commute almost every weekend, I got terrible medical news about people dear to me, and halfway through the semester I got married. I never slacked on my course prep, but I guarantee you that there was no extra time to put into it. The numbers, taken at face value, suggest that I should strive for that level of stress and distraction every semester, but the numbers should shut their damn mouths. And when this semester's numbers "show" that my effectiveness has "declined" 10% or 15% (because last semester's scores can only decline), that won't mean that I've actually become less effective. It will mean that the scores fluctuate widely from semester to semester because they are extremely imprecise.

But my bullshit evaluation numbers are a masterpiece of statistical rigor compared to the numbers that New York City just published. That data is related to students' performance on a standardized test, which is already imperfectly correlated with how much students have learned. So from the start you've got a shaky correlation with a shaky correlation. Then the numbers are adjusted in various ways to make them more "meaningful," but considering how small the sample sizes are  all the extra variables and sub-tabs, and the addition of new correlation problems with each new variable, actually make the numbers murkier and more volatile. Even if you ignore all those problems, and you shouldn't, there's the problem of sample size itself. In some instances the number crunchers themselves admit that the margin of error for particular teachers hits 53%. This means that a teacher ranked in the 50th percentile might actually belong in the 103rd percentile and be a miracle worker, or belong in the -3rd percentile and have been dead since the 1990s.

When I call these numbers bullshit, I don't mean that they serve no purpose at all. We will only get meaningful techniques of measurement by experimenting with different approaches. The numbers we have are not useful as actual measurements. They are useful as steps in the project of devising better measurements. Bullshit, put to the right use in your garden and combined with the right mix of water, seeds, and sunlight, will eventually yield a nutritious salad. But that doesn't mean you put the bullshit on a plate and call it lettuce. The New York City numbers are pretty obviously not ready for public consumption. Serving them up represents a health hazard.

Bill Gates, a champion of number-driven education reform, published an op-ed in the Times opposing the release of the teacher numbers. By and large, Gates gets it: the numbers aren't ready for prime-time and using them to publicly shame teachers will only cause harm. And Gates is right that using numbers punitively, especially when the numbers themselves aren't even half-baked, will only make teachers resist the whole project of numerical assessment. Of course it will.

Finding ways to measure teaching quality would eventually benefit teachers enormously. Teachers don't oppose measurement and numerical assessment because they fear change, or don't want to be held accountable, or because they're union thugs. Teachers oppose these "reform" initiatives because the "reformers," sometimes with the best of intentions, often use badly flawed measurements as if they were self-evident facts. No one in their right mind would want to be evaluated that way, especially when "education reform" in its current form has no suggestions for helping "underperforming" teachers except firing them. Gates understands that education reform should ultimately aim to help teachers improve, rather than simply replacing them, but many "reformers" take a much cruder approach. Claiming that the teachers are just looking out for their self-interest doesn't cut it; you can ask teachers to put their own interests aside for the sake of the kids' education, but you can't ask them to put their interests aside for the sake of number-driven policies that don't help the kids' education and likely hurt it. Turning over K-12 education to a set of statistics that don't actually measure learning is not a worthwhile goal, period, let alone a goal worth getting fired for.

People with the best intentions can do enormous damage to our education system by naively relying on numbers that are a long way from becoming reliable. These people are perfectly sincere. They really think that the bullshit is lettuce, and they will tell you at length how important leafy greens are to a good diet. If someone tells you that identifying the best teachers is perfectly simple, you're likely talking to one of these naive and disastrously well-meaning souls. They not only do damage to the current education system, but they set back reform, because peddling bullshit and calling it lettuce has the long-term effect of making teachers oppose lettuce on principle, and moves us further from the day that we can actually produce a healthy salad.

And what Bill Gates does not get is that not everyone who advocates these number-driven policies is naive or well-intentioned. There are a number of people supporting numerical assessment who are not interested in improving education at all, but who are simply anti-teacher or even anti-education. Some are union-busters, some have ideological problems with public schools, some have other motives. But they are not interested in producing lettuce. They just want to see some teachers eating shit. This can be difficult for well-meaning "reformers" to see; when you understand yourself as crusading for the public good, you tend to see anyone who joins you as one of the good guys. But it is transparently and intuitively obvious to teachers. When the same politicians and interest groups who were down on teachers last year are suddenly talking about "assessment" and "reform" this year, it's obvious that those politicians and activists are just adopting a new name for the same old ends. And that leads many teachers to see all advocates of reform, no matter how well-intentioned, as part of an older anti-education agenda. When reformers talk about reform leading to higher pay for the best teachers while the "underperformers" are fired, it is very obvious to people who actually teach that no one is going to get much of a raise, but that the firings are at the top of the agenda. (Even when school systems follow through with merit pay, the increases are small, and in many systems the "best" teachers don't do any better financially under the "reformed" system.) The sincere reformers, such as Arne Duncan or Barack Obama, generally don't grasp this. Their opportunistic allies do.

The genuine reformers damage their cause through their careless choice of allies, and by working with people who are operating in bad faith. They not only create resistance from the very people who should be their most important allies, the teachers themselves, but they ensure that any "reforms" enacted will be implemented abusively rather than productively: that flawed numbers will be treated as hard data, that results will be used to punish teachers and not to help them, that the promised raises never come but the threatened firings do. "School reform" will be a thin disguise for teacher-bashing as long as the "reformers" include education-bashers in their political coalition. That alliance will always provide enough political backing for new punishments, but not enough for the promised rewards. Bill Gates should be applauded for reminding the well-meaning readers of the New York Times what education reform is supposed to be. But his plans will never bear fruit until he comes to grips with what "education reform" actually is.

