On e-Literate, Elijah Mayfield has a good post addressing some of the myths (his term) going on around the subject of machine grading, particularly in response to the NY Times article that provocatively suggested that "Essay Grading Software Offers Professors a Break." I've been re-reading Manuel DeLanda's Philosophy and Simulation for my speculative realism class, which has me thinking about this is from some different angles I suppose. From Mayfield's perspective, as someone invested in machine learning and developing these kinds of applications, machine grading isn't about replacing professors (or giving them a break) but rather providing a different kind of feedback to authors. It really isn't grading at all. As Mayfield writes "If I were to ask whether a computer can grade an essay, many readers will compulsively respond that of course it can’t. If I asked whether that same computer could compile a list of every word, phrase, and element of syntax that shows up in a text, I think many people would nod along happily, and few would be signing petitions denouncing the practice as immoral and impossible."
This would be the point, right? The computer isn't "reading," so it clearly isn't "grading," if grading requires reading in the first place and means establishing relative quality (e.g. this essay is better than this other one).
Mayfield doesn't want to think about machine grading as replacing teaching but rather as a supplement, helping students: "What if, instead of thinking about how this technology makes education cheaper, we think about how it can make education better? What if we lived in a world where students could get scaffolded, detailed feedback to every sentence that they write, as they’re writing it, and it doesn’t require any additional time from a teacher or a TA?" Perhaps it is pollyanna to imagine this outcome, but I am interested in different questions here.
First, let's dispense with the grading aspect. The problem with the grading process is that it is always underlying the lamest possible writing activity: one where 100s of students are asked to write essentially the same response to a single, fairly narrow prompt. There is no real purpose of communication with another human. No one wants to write the texts, and no one wants to read them. As I understand this, the machine can sort responses only because the answers are so uniform and predictable. Mayfield, at one point, uses the example of sorting between photos of ducks and photos of houses. And the reality is that this kind of essay writing is equivalent to asking students to go, take a photo of a duck, and submit it for a grade. As a result the computer can tell whether the student has taken a duck photo or not. But if the assignment was to take an "interesting" photo. Well, let's just say that don't yet have to worry about a computer making that judgment for us.
On the other hand, I think part of the problem (and resistance) to machine grading lies in a serious misunderstanding of what humans do when they grade. To appropriate William Carlos Williams, a text is a machine made out of words. A text is a machine. For a Deleuzian like DeLanda, a human might be a machine as well. That is, humans reading texts are already machines processing other machines. In machine-to-machine relations, we are talking about capacities, not just the properties of a given text, which are finite, but the interaction of those properties with any possible reader in any given situation, which creates infinite capacities to affect. We already know all the things we do to norm readers to create predictable responses. In other words, grading is always about creating a situation that is unlike reading elsewhere, not only in a large standardized test, but in typical composition classroom grading as well. By regularizing one end of the equation, we hope to get better measurments of the other end (i.e. the student). As a grader I do not and cannot care about what the author says. To care is to invalidate the evaluative mechanism. It doesn't matter if I agree with your politics or not. All I am looking for is to see if the text has certain objective criteria. Have you ever watched a movie and started focusing on things like directorial or acting decisions (e.g. that's an interesting camera angle or that was a curious facial expression to match that line of dialog)? It's almost impossible to become affectively invested in the film. That's what grading is like.
That said, when one responds to student writing, one has to offer up real engagement with the text, because writing for the purpose of a grade is even more depressing than reading for the purpose of a grade. You have to generate some real, genuine human response to the subject matter. But this stops being about evaluation of the student then because we open up the Pandora's box of capacities; we become a chain of machinic interactions. And here the computer is already a welcome participant. Human feedback is valuable, but the network can analyze our text and offer thousands of interesting and useful responses. Today we use the network to uncover plagiarism, but tomorrow we could use it to link our students to 100s or 1000s of other writers and texts that share their interests. Think of a kind of reverse Google: your text is composed of these search terms.
Whether the machine is a human or a computer, the mass grading process is a statistical procedure that says that a writer who produces a text with certain measurable qualities is likely to be an "A," "B," or whatever, which in turn means they are statistically likely to "know" the content on which they are being tested. On the other hand, when a text is read, by a human or a computer, the reader establishes links between the text and a larger network of data, which generates a response. Do I mean to equate humans and computers? Not really. I suppose my point is that the issue here is with the practice of grading, not the machine doing it.
Recent Comments