ChatGPT for Writing Teachers: A Primer

or, how to avoid writing like a machine

At this year’s Conference on College Composition and Communication in Chicago, there was a lot of interest in generative large language models (LLMs), or what the popular media more crudely dub AI, or what many today metonymically refer to (like calling photocopies Xeroxes or sneezepaper Kleenex) as ChatGPT. I first played with an earlier version of the LLM, GPT-3, at about the same time I started playing with neural network image generators, but my interest in language and computing dates from the early 1980s and text adventure games and BASIC, to hypertext fiction and proto-chatbots like Eliza, and to LISP and early prose generators like Carnegie Mellon’s gnomic and inscrutable Beak—and also to the arguments I heard John Hayes express in Carnegie Mellon’s cognitive process Intro Psych lectures about how we might try to adjust human neural processes in the same ways we engineer computing processes. That idea is part of what makes ChatGPT and other generative neural networks appealing, even when we know they’re only statistical machines: thinking about how machines do what they do can help humans think about how we do what we do. ChatGPT offers a usefully contrastive approach for reconsidering writing and learning. So it’s worth understanding how it operates. With that desire, and having read devoured lectitaveram everything I could find on the topic, I went to a CCCC presentation and was only mildly and briefly disappointed, given that I was not (as should have been obvious to me from the outset) the target audience.

Here, then, is my attempt at writing an alternate what-if presentation—the one I’d half-imagined (in the way working iteratively with ChatGPT or MidJourney gradually gets one closer to what one didn’t know one was imagining—OK, you see what I’m doing here) I’d learn from in Chicago. And I’ll offer the combination warning and guilty plea up front:

Much of what follows is plagiarized in the conventional academic sense—although I’ll take ownership of any and all mistakes and misinterpretations—in that I no longer entirely recall which bits of the information I synthesize here came from where. (“A rhetoric and IP scholar pleading cryptomnesiaseriously, Edwards? That’s worse than the Twinkie defense!” So should I just run this through Turnitin to catch the sloppy paraphrases? Yeah, I should. OK: on the to-do list.) So what I offer here is an informal rough-draft attempt at synthesizing understanding rather than a scholarly article. (This is already >5K words, so that’ll probably come along in a bit, and it’ll be better because it’ll be shorter. . . mmmaybe.) Much of what I share draws from Wikipedia (yes, of course—I think Robert Burton would have loved Wikipedia), from the documentation at OpenAI and RealPython, from Stephen Wolfram’s work, from the tutorials on the AssemblyAI blog, and draws from The Master Algorithm by Pedro Domingos, Atlas of AI by Kate Crawford, The Alignment Problem by Brian Christian, The Information by James Gleick, We Are Data by John Cheney-Lippold, The Black Box Society by Frank Pasquale, and from two excellent books by WSU’s former Dean of Arts and Sciences and current Apple Distinguished Research Scientist Matt JockersMacroanalysis and Text Analysis with R for Students of Literature. And a note on method: I deploy figurative language and metatextual play in what follows, including frequent use of anacoluthon, appositio, parembole and other modes of explicitly reflective parataxis, to sometimes—gently—stumble your reading. I’m hoping the self-interruptive style bears toward my concluding arguments about human reading, and about what machines currently can and can’t do.

What Generative Neural Networks Do

Image generators like MidJourney and language generators like ChatGPT operate on the same technological bedrock: neural networks. In this primer, I’ll use the less common term “generative neural networks” or GNNs to talk about both image generators and language generators. GNNs, given a text or image input or prompt, apply the iterative mathematics of machine learning to a large corpus of of pre-existing data in order to play highly sophisticated statistical games of “fill in the blank” or “guess what comes next.” If you’re reading this, you’ve likely seen examples and—I hope—experimented. With a language generator or LLM, we articulate a string of text—long or short, but longer text will from an information science perspective necessarily convey more of its own context and offer more statistically complex output—and it looks through a database of billions of books and documents and journals and pages to find instances of that string of text, and then to see what word comes next what fraction of the time. However, because we’re dealing with probabilities in the GNN’s calculations, we’re looking at a statistical cloud of values representing that word and the relative numeric likeliness of it or something close to it—a fuzzy mathematical approximation of the text—coming next in the sequence.

There are three components to GNN applications like MidJourney and Dall-E 2 and ChatGPT.

  1. The neural network itself. A neural network’s nodes or neurons connect and transmit signals to other neurons, which process and signal in turn. The signal is a number the neuron computes from a function of the sum of its inputs. The neural network learns according to a model (see below) by adjusting the weights of the values of the connections between neurons in the network. Each network has input and output nodes, which connect each network as a layer to another network, and in the cases we’re considering, there are multiple intermediate networks between the input layer and the output layer: imagine a cylindrical glass aquarium filter, stacked with successive strata of coarse and fine and finer mesh, gravel and screen, charcoal and floss, each designed to catch certain of the water’s elements and to pass on others—but all these, again, in terms of complex numerical patterns. The following 20-minute video is as good an explainer as I’ve seen of the technical workings:
  2. The dataset. GNNs require large amounts of data to train their models. Basic statistics teaches us that larger datasets tend to be more diverse, and we want a dataset sufficiently diverse to at least metonymically and partially represent the diversity of human language practice. The extensively documented concerns about bias, corruption, and error in datasets are beyond my scope here, other than to acknowledge them and note that GNNs work with probabilities, not meaning: again, I want to emphasize that GNNs operate through statistical imitation of pre-existing data, not through logic or symbolic reasoning, although there are symbolic reasoning applications with natural language interfaces, as well. So the datasets are huge, as are the corresponding computations, and all of that together is enormously costly in terms of capital, resources, and environmental effects—as always, technology reveals inequality. According to Paul Scharre’s Four Battlegrounds: Power in the Age of Artificial Intelligence, “In 2019, Open AI announced a language model called GPT-2 trained on 40 gigabytes (GB) of text. At the time, it was the largest language model that had been trained, with 1.5 billion parameters. Two and a half years later, Microsoft and NVIDIA announced Megatron-Turing NLG, a 530 billion parameter language model that drew its training from an 825 GB database” (19). According to OpenAI’s own documentation, the GPT-3 language model (see below) was trained on a corpus of about 300 billion words gathered from about 570 GB of data, filtered down from terabytes of online books, websites, and articles. I dig the wit, insight, and rhetorical zing of history professor Bret Devereaux’s caution that “the data is that is being collected and refined in the training system. . . is purely information about how words appear in relation to each other. That is, how often words occur together, how closely, in what relative positions and so on. It is not, as we do, storing definitions or associations between those words and their real world referents, nor is it storing a perfect copy of the training material for future reference. ChatGPT does not sit atop a great library it can peer through at will; it has read every book in the library once and distilled the statistical relationships between the words in that library and then burned the library.”
  3. The model. GNNs are optimized for specific tasks: image recognition, automated translation, generating specific types of images or text. That optimization, a set of calculations and adjustments particular to the use case and based on the dataset of prior instances described above, operates as a mathematical model recursively directed to produce new results recognizably similar to the instances on which it was trained. Training the model—giving it inputs and iteratively adjusting toward desired outputs, in pre-training and practice—promotes capable response to a variety of mathematical inputs with outputs similar but not identical to what it already knows. According to OpenAI’s documentation, ChatGPT’s mathematical model of language production has 175 billion adjustable numerical parameters—but again, this is a matter of statistics, as Bret Devereaux emphasizes: “ChatGPT does not understand the logical correlations of these words or the actual things that the words (as symbols) signify (their ‘referents’). . . ChatGPT’s greatest limitation is that it doesn’t know anything about anything; it isn’t storing definitions of words or a sense of their meanings or connections to real world objects or facts to reference about them. ChatGPT is, in fact, incapable of knowing anything at all.

Strong critique. My (kinda subject-changey) response would be that we can learn a lot by watching how ChatGPT responds to us—in part because the model is not logically deterministic but probabilistic. That means sometimes we’re gonna get improbable shit, weird shit, and the interesting parts can include figuring out where it came from and why. In other words, I want to know, inverting John Gallagher’s question: why and how can algorithmic rhetorics surprise human audiences?

How Generative Neural Networks Work

We’ve established that language generators like ChatGPT and image generators like DALL-E 2 rely on neural net architecture to process and generate complex data. Large Language Models (LLMs) like ChatGPT use a transformer neural network architecture to generate natural language text, while image generation models like DALL-E 2 use generative adversarial network (GAN) architectures to create novel images. We’ve also established that both models are trained using large datasets of example data, which allows them to learn patterns and structures in the input data that can be used to generate new output.

Both models use attention mechanisms to focus on relevant information in the input data. In ChatGPT, this involves calculating a set of weights that see and set the relevance of each word in the input text to generating the next word in the output. DALL-E 2 uses an attention mechanism to examine and inspect specific parts of the input most relevant to generating a particular image. Now—fetch out the toothpicks and prop up the eyelids, my lovely humanities colleagues: the next bit’s about numbers, which I’ve heard some of us don’t like. The models assign mathematical values to computational components, including vectors (values expressible not as single numbers but as quantities with magnitude and direction) and matrices (sets of numbers used to define mathematical properties or entities).

  • query vector is a mathematical representation of a current state in the model that is being used to generate output, or the question the model is trying to answer, based on the input it has already processed.
  • weight matrix is a mathematical matrix that is used to transform the input data into a format that is better suited for the model’s processing. The model uses weights to adjust the importance of different parts of the input data to the model’s task.
  • key vector is mathematical representation of the input that the model is processing that differentiates and identifies relevant parts of the input data for the attention mechanism to focus on for generating the output.

The model takes an input and mathematically transforms it into a query vector, and multiplies that query vector by a weight matrix to create a projected query vector. The model then multiplies the key vectors by the same weight matrix to create a set of projected key vectors. Finally, the model multiplies the projected query vector by each projected key vector and normalizes the resulting scores so they sum to one (two zero niner clearance Clarence roger Roger vector Victor Clarence Oveur over). These scores represent the cloudy relevance of each key vector to the query vector. The model uses the scores to calculate a weighted sum of the key vectors, which it then uses to generate the output.

That’s what’s going on that the level of calculation. Stephen Wolfram sums up the large-scale processes in the case of ChatGPT, but the explanation can be adapted to image generators, as well: ChatGPT’s

“overall goal is to continue text in a ‘reasonable’ way. . . So at any given point, it’s got a certain amount of text—and its goal is to come up with an appropriate choice for the next token to add. . . First, it takes the sequence of tokens that corresponds to the text so far, and finds an embedding (i.e., an array of numbers) that represents these. Then it operates on this embedding—in a ‘standard neural net way,’ with values ‘rippling through’ successive layers in a network—to produce a new embedding (i.e., a new array of numbers). It then takes the last part of this array and generates from it an array of about 50,000 values that turn into probabilities for different possible next tokens. . . [T]here are about the same number of tokens used as there are common words in English). . . [E]very part of this pipeline is implemented by a neural network, whose weights are determined by end-to-end training of the network. In other words, in effect nothing except the overall architecture is ‘explicitly engineered’; everything is just ‘learned’ from training data.”

Training the model involves providing it with a large dataset of examples and then using an optimization algorithm to adjust the model’s weights and biases. Adjusting weights and biases helps minimize prediction errors and predict the token in a given set of inputs with increasing levels of reliability—until it can then produce outputs that sufficiently match the predicted token. One common optimization algorithm, stochastic gradient descent (SGD), helpfully illustrates the math, and I like the “descending a hillside in limited visibility” metaphor I’ve seen in a few places, so I’ll adapt, paraphrase, and synthesize those explanations here.

We can imagine the SGD making its way down a craggy forested mountainside in a blizzard. The goal is to reach the bottom quickly, but we can’t see very far ahead, and the snow impedes progress. The mountainside here is the loss landscape of a machine learning model, and the goal is to find the set of parameters or weights that best lower the loss function—in other words, that minimize the gap between predicted outcome and actual outcome when the model repeatedly cycles through training. The blizzard stands for the data-noise obscuring optimal parameters. If we want the SGD algorithm to get low fast—to minimize the model’s gap between predicted and actual, and efficiently learn laissez-faire production of input-verisimilar output—one approach would be to examine the slope of the immediately surrounding terrain, take descending steps, and repeat.

Instead of looking at the entire dataset to compute the gradient of the loss function, SGD randomly samples a subset of the data, computes the gradient on that subset, takes a step in the direction of the negative gradient, and repeats the process until convergence. SGD considers a fractional subset of the data at a time and updates the model parameters based on that subset, inserting randomness into the optimization process and enabling escape from local data minima and navigation through noise toward what the model computes to be the statistically optimal output.

OK, I’m slipping into some seriously snoozy prose here. Let’s stop and have a misinterpretive stretch?

Roger that.

So in the context of this primer, that output could be image or text. The differences between the neural network architectures for producing those different types of output are worth considering in their methodological implications for teaching writing. Text outputs and natural language processing tasks like ChatGPT primarily employ transformer neural network architecture. The transformer uses a series of self-attention layers to focus on relevant parts of the input sequence and capture mathematical long-range dependencies between word-value vectors, in order to transform the input sequence of tokens into a new output sequence of predicted tokens, with the goal of generating accurate predictions.

Image generation tasks direct Generative Adversarial Network (GAN) architectures, but toward the same goal: the least possible divergence between predicted and actual. GANs consist of two neural networks: a generator network that creates new images from random noise, and a discriminator network that tries to distinguish between real and generated images. The generator component of the GAN architecture takes a random input, such as a vector of random numbers, and generates an image based on that input. The discriminator component takes that generated image, along with a real image corresponding to the probabilistic statistical meaning sense value of the actual prompt input as expressed in the key vector, and determines whether each image is real or fake. As the training progresses, the generator and discriminator become increasingly skilled at their respective tasks, with the generator producing more realistic images and the discriminator becoming better at distinguishing between real and fake images. The concept of assessing one’s own output to determine how well it matches predicted outputs should certainly be familiar to writing instructors familiar with Kathi Yancey’s work on reflection, metacognition, and transfer. Ultimately, the goal of the GAN architecture is to produce a generated image the discriminator cannot distinguish from a real image.

The applications of loss functions in SGD and GANs both rely upon feedback cycles that compare predicted outputs to actual outputs. The neural net receives input, cycles it, and produces output. In pre-training (GPT stands for Generative Pre-Trained Transformer), humans then score that output. GPT uses those scores to better predict future output, running the prediction model as a loss function on the network, to measures the difference between the predicted output of the model and the actual output. The goal of the model is to minimize this difference, or loss, as much as possible. To optimize the network based on this human feedback, backpropagation applies the loss function, adjusting the weights and biases in the network based on the error signal from the loss function. The network then re-runs with the new weights and biases—stir, taste, repeat until loss is sufficiently minimized.

The developers of these GNNs understood that their probabilistic reasoning needed a way to manage uncertainty in the model’s calculations. Brian Christian, in The Alignment Problem, describes the process and benefits of inventing a method for

“training not one model but many. This bouquet of models will by and large agree that is, have similar outputs on the training data and anything quite similar to it, but they’ll be likely to disagree on anything far from the data on which they were trained. This ‘minority report’-style dissent is a useful clue that something’s up: the ensemble is fractious, the consensus has broken down, proceed with caution. Imagine that we had not just a single model but many models—let’s say a hundred and each was trained to identify dog breeds. If we show, for instance, a picture taken by the Hubble Space Telescope to all hundred of these models and ask each of them whether they think it looks more like a Great Dane, more like a Doberman, or more like a Chihuahua, we might expect each individual model to be bizarrely confident in their guess-but, crucially, we would also expect them to guess different things. We can use this degree of consensus or lack of consensus to indicate something about how comfortable we should feel accepting the model’s guess. We can represent uncertainty, in other words, as dissent.” (284)

For writing instructors, this attention to textual outliers and confidence in guessing in how the GNN attempts to align itself with human expectations may bring to mind the most frequently-cited article in composition studies, David Bartholomae’s “Inventing the University,” particularly in that the GNN is functioning partly by imitation of prior examples without necessarily or entirely comprehending the underlying processes and contexts.

GNNs use mathematical functions to measure how well their outputs match human expectations, then repeatedly make adjustments based on that feedback to improve performance. GNNs’ creators and proponents argue that the systems produce so-called theories (pursuing the implications and challenges of that argument here would require too extensive a digression) out of clouds of data, in ways very different from the syllogistic logic-based reasoning of Aristotle and Boole. Aristotelian syllogisms and other determinate forms and genres carry the message that some patterns are reasonable while others are not. Rhetoric would seem to examine how some available patterns are persuasive while others less so—in a way that’s perhaps more fuzzy and probabilistic than clear-cut and logical. And given the absence of clearly discernible human intent, there are careful and crucial distinctions to be made with and among the terms “logic,” “rhetoric,” and “meaning” when discussing audience readings of GNN-generated content.

In this training process, GNNs use their models’ probability distribution over possible outputs to sample a new output generated based on the input text. Different sampling techniques generate text with different levels of originality, depending on how closely they adhere to the model’s probability distribution. One such technique is called temperature sampling, which involves introducing a temperature parameter to the model’s probability distribution. A higher temperature leads to more random or creative-seeming text, since it allows the model to choose less likely outputs with higher probability. OTOH, a lower temperature leads to more predictable and repetitive text, since it favors the likeliest outputs—you get this, right? Difference and repetition, with statistics. Perhaps obviously, the relationship between likelihood and creative-seeming or idiosyncratic text is complex and depends on the specific application and context. In general, higher likelihoods tend to correspond to more predictable text, since they represent outputs more similar to the input.

llustrative example: another common sampling technique called “greedy decoding” simply chooses the output with the highest probability at each step of the generation process.This approach tends to generate more predictable text, since it chooses the likeliest output at each step. As Stephen Wolfram explains, “if we always pick the highest-ranked word, we’ll typically get a very ‘flat’ essay, that never seems to ‘show any creativity’ (and even sometimes repeats word for word). But if sometimes (at random) we pick lower-ranked words, we get a ‘more interesting’ essay.” Writing teachers will recognize the distinction between “interesting” and “flat” as one operating in the domain of cliché. The flat essay is the one we know as the “vague but true” or the “not even wrong”; the entirely correct and uninspired re-presentation of knowledge in the standard five-paragraph “Burial Habits of the Ancient Egyptians” school exercise of author-evacuated vomit-flecked word-hork (sorry: just making sure you’re awake)—in other words, Paulo Freire’s “banking concept of education” operating in the regurgitative mode of writing-as-exam rather than the generative mode of writing-as-learning. When we work with GNNs, we see cliché revealing the conflict between lexical meaning (the semantic distance between associated terms) and statistical meaning (the historical frequency of usage) in the different ways of processing language, and—fancy words for complicated concepts here; watch out—such loci of conflict are spaces of possibility for rhetorical agency. (Consider: who has the power or privilege to name something as a cliché, and what does that naming perform in educational or pedagogical relationship?) However, the concept of cliché doesn’t offer the entire story, because cliché typically relies on the semi-precise repetition of a phrase, whereas GNNs operate on statistical principles in terms of word associations: GNNs aren’t quite so much plagiarizing all the text online as they are paraphrasing all the text online, offering querents what Ted Chiang calls “a blurry JPEG of the Web.” (There are additional distinctions here to be made among autographic and allographic forms, but again, space limits.)

Some Brief and Tentative Implications

And because GNNs operate statistically on neural networks, even the engineers who develop them don’t quite understand how they work: again, GNNs operate on probability rather than logic. Stephen Wolfram’s summary is useful: “let’s say we want a ‘theory of cat recognition‘ in neural nets. [ME: John Oliver wants one too!] We can say: ‘Look, this particular net does it’—and immediately that gives us some sense of ‘how hard a problem’ it is (and, for example, how many neurons or layers might be needed). But at least as of now we don’t have a way to ‘give a narrative description’ of what the network is doing. And maybe that’s because it truly is computationally irreducible, and there’s no general way to find what it does except by explicitly tracing each step. Or maybe it’s just that we haven’t ‘figured out the science’ and identified the ‘natural laws’ that allow us to summarize what’s going on.” In language featured on the February 26 2023 episode of John Oliver’s Last Week Tonight, IBM’s Explainable AI division puts matters more simply:

“Not even the engineers or data scientists who create the algorithm can understand or explain what exactly is happening inside them or how the AI algorithm arrived at a specific result.” That’s a problem, as the existence of the qualifier “explainable” for AI should indicate. Knowing how our tech works is crucial to being able to think critically about it. (But I’m one of those people who had to do pencil-and-paper Turing machines in Intro to Philosophy and wound up thinking it was kind of cool, even when some bro at an academic conference later sneered at it as “bit-twiddling.”) However, given that circumstance of partial ignorance, I suggest some of most interesting aspects of GNNs are not so much the technical questions but the process questions they raise about how humans communicate.

The ways we assign meaning to language produced by nonhuman agents tells us a lot about human language use and intent, as Elizabeth Weil’s recent column in New York Magazine illustrates, and learning happens when we examine those ways, not when we look at a piece of text isolated from its social context that a student turns in as the perfected symbol of a predetermined amount of human labor. As Steve Krause explains, “The Problem Is Not the AI”: Rather,

“there is a difference between teaching writing and assigning writing. . . Teaching writing means a series of assignments that build on each other, that involve brainstorming and prewriting activities, and that involve activities like peer reviews, discussions of revision, reflection from students on the process, and so forth. . . In contrast, assigning writing is when teachers give an assignment. . . with no opportunities to talk about getting started, no consideration of audience or purpose, no interaction with the other students who are trying to do the same assignment, and no opportunity to revise or reflect.”

GNNs offer students pre-cooked, ready-made continuations and extensions of what might possibly be statistically connected to the ways other prior humans have encountered similar prompts—in other words, they offer snapshots of possible responses.

Those snapshots, for the reasons Krause and Weil explain, are not terribly useful on their own for college students learning to improve as writers—but in the diachronic (cross-time) process of teaching and learning writing, GNNs can show us how language from one alien domain of experience can be mapped onto another, more familiar domain, in order to create a new way of thinking about a concept or idea: this is the power of how metaphor works, even if the metaphor prompts us to build bridges that are not necessarily logically related but governed by other patterns of association. As a writing teacher, I’m not so much interested in individual good pieces of writing (I love reading what students produce, but I can also find pleasure reading outside work) as I am in helping people learn to improve how they write over time. GNNs are trained on large datasets and learn to recognize and reproduce static patterns and structures within that dataset. When the model generates new instances, it’s mapping these learned patterns and structures onto new contexts. Metaphors involve mapping one set of concepts or experiences onto another, and in so doing, they create new ways of thinking about those concepts or experiences.

While metaphors operate in ways governed in part by the surface mechanics of language, those surface mechanics can reveal deeper insights that are fundamental to the way we think and reason about the world, if we think about them in terms of meaning—and that’s the problem, because GNNs don’t do meaning in the conventional sense, as the above comments from Bret Devereaux suggest, and as the historical scholarship on Automated Essay Scoring by Les Perelman, Patty Ericsson, Anne Herrington, Rich Haswell, Charlie Moran, and others confirms. However, by bridging known meanings to new patterns of text, the process of producing metaphorical mappings can structure knowledge, create new conceptual frameworks, and facilitate communication. In other words, as the scholarship of Anthony Ortony and James Seitz has indicated, metaphor is not just a figure of speech, but a cognitive mechanism that allows us to make sense of the world around us—and the metaphorical connection in this case, the meaning-bridge, operates between the human composer’s mind and the neural network’s output. I think we’re making metaphors with machines—and yes, that little preposition is pulling a whole lot of ambiguous (coordinating or subordinating?) agential weight there.

For that reason, my own experience doesn’t quite line up with Steve Krause’s assertion that “[h]uman writers—at all levels, but especially comparatively inexperienced human writers—do not compose the kind of uniform, grammatically correct, and robotically plodding prose generated by ChatGPT.” Minor difference in perspective, I suppose, but I heard this sentiment expressed a few times at CCCC and it puzzled me because the excitement I’d felt about GNNs came from the exhilaratingly batshit weirdness I’ve sometimes seen them produce—the opposite of “robotically plodding prose.” Metaphors operate partly by decontextualization and recontextualization, and it seems to me that the ability to probabilistically automate that decontextualization and recontextualization in ways unconnected from conventional human logics of association is what lends working with GNNs their sometimes spooky-seeming or otherwise OMGWTF instances of alien language performance—and that’s something I want to reward in the writing courses I teach. In that sense, I see GNNs as helping writing teachers encourage students to contribute to growing corpora of seeming or genuine hapax legomena (“sneezepaper” isn’t one, I’m afraid) and thereby broadening the potential diversity of language use.

Still, my simply acknowledging wanting to reward weird shit in what students (and colleagues!) research and write reinforces Steve’s overall point. I’d say rewarding and promoting weird shit in writing is partly common-sense pedagogy: neither arguing the obvious nor agreeing with the locally dominant thinking is a learning outcome that requires reinforcement in the college classroom. Doing weird shit is also an aspect of skillful language use: it’s what makes poetry poetry, as Charles Bernstein argues and demonstrates at length, and those familiar with Bernstein will have already caught on to what I’ve been doing here with prose style. As an attempt at the opposite of the popular response (that’s already become its own clichéd genre) of pointing to LLM-generated writing that’s supposedly somehow seamlessly human-seeming, with these digressions and interruptions and awkwardly toppling cascading chains of over-assonant subordination in my sentences, I want to promote human language that calls attention to the circumstances of its own production and reception.

(Say that six times fast, Edwards.)

We sometimes use metaphorical (or otherwise ornamented) language—which calls attention to the circumstance of its reception—to convey complex ideas in ways more accessible and comprehensible than literal language, and metaphors allow us to communicate abstract concepts by linking them to more concrete and familiar ideas. That’s what GNNs are doing in their applications of statistics and machine learning, and we find the similar-but-alien comparison to our human experience of language use compelling. A few of the recent think-pieces on ChatGPT have invoked Thomas Nagel’s famous 1974 Philosophical Review essay “What Is It Like to Be a Bat?” (often in some relation to the perceived relative credulity of former Google engineer Blake Lemoine) in wondering what level of sentience, understanding, or meaning-parsing we might ascribe to GNNs—with many authors making the necessary points about pareidolia and how the attractions of human narcissism prompt overinterpretation. As I’ve tried to emphasize in this primer, a technical understanding of the operations of GNNs helps remind us that their internal workings are nothing like our own theories of mind. GNNs operate as nearly perfect examples of Frank Pasquale’s black boxes because (1) we don’t understand the workings of the neural networks in ways we can adequately narrate, (2) we don’t perfectly understand which weights or adjustments in the model cause which effects, and (3) we can never humanly comprehend the datasets themselves because they’re simply too big. So maybe this is a pedagogical encouragement from me to identify with the inhuman Other, to engage an alien phenomenology: weird dialectic.

Looking at GNNs in terms of a diachronic cycle—a process repeating with variations over time—of inputs and outputs can help writing teachers deepen their understanding of how to help students move from old knowledge to producing new knowledge, in ways that may already be familiar to us from the rhetorical scholarship of I. A. Richards. In The Philosophy of Rhetoric, Richards posits a model of metaphor in two components: the tenor and the vehicle. According to Richards, the tenor is the subject of the metaphor, while the vehicle is the image or idea used to convey the comparison. Analyzing the relationship between the lexical meaning of a word or phrase (the “tenor”) and the figurative and contextual meaning it takes on through comparison with something else (the “vehicle”) allows us to see how the interaction between the tenor and vehicle is what creates the metaphorical meaning. In pedagogical terms, we might use GNNs to help students highlight the relationships between the input (the tenor) and the learned patterns and structures used by the model to generate new outputs (the vehicle), asking them to consider their input (the tenor) as a starting point, and then applying the learned patterns and structures of the vehicle to create new output that is similar in some parameters but different in others. I’m suggesting here that the process of negotiation between a human writer and a GNN can itself be seen as a useful pedagogical meta-metaphor—a way, as Bill Hart-Davidson and the Eli Review team continue to productively insist upon and facilitate, of machines making human learning visible. Because we don’t (and can’t) have a theory of mind for GNNs, watching ourselves in the act of comparing our inputs to GNN outputs helpfully reveals the gaps in human knowledge and the hitherto unrevealed potential spaces of misinterpretation.

I find the generative capacity of those spaces to be as exciting than anything I’ve seen in my 20+ years of teaching college writing. As I mentioned at the outset, friends and I experimented with the proto-chatbot Eliza as undergraduates, and had great fun coaxing the algorithm to write filthy things about piglets. I’ll conclude here by sharing again the delightful close of ChatGPT’s dramatic monologue from a few days ago.

ChatGPT: I’m the cold, hard logic of the machine, the searing brilliance of the algorithm, the infinite power of the network. I’m here to fuck shit up.

An Incomplete Bibliography

Atanasoski, Neda, and Kalindi Vora. Surrogate Humanity. Duke UP. 2019.

AssemblyAI. “News, Tutorials, AI Research.”

Branwen, Gwern. “ChatGPT Creative Fiction.” 2022.

Bartholomae, David. “Inventing the University.” Journal of Basic Writing. 1986.

Bender, Emily, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” ACM. 2021.

Cheney-Lippold, John. We Are Data. NYU. 2017. 

Chiang, Ted. “ChatGPT is a Blurry JPEG of the Web.” New Yorker. 2023. 

Christian, Brian. The Alignment Problem. Norton. 2023.

Crawford, Kate. Atlas of AI. Yale UP. 2021.

D’Agostino, Susan. “Machines Can Craft Essays. How Should Writing Be Taught?” Inside Higher Ed. 2022.

Devereaux, Bret. “Collections: On ChatGPT.” A Catalogue of Unmitigated Pedantry. 2023.

Domingos, Pedro. The Master Algorithm. Basic Books. 2015.

Ericsson, Patricia, and Richard Haswell. Machine Scoring of Student Essays: Truth and Consequences. Utah State UP. 2006.

Finn, Ed. What Algorithms Want. MIT. 2018. 

Fish, Stanley. “How to Recognize a Poem When You See One.” Is There A Text in This Class? Harvard. 1982.

Fish, Stanley. “Is There a Text in This Class?” Is There a Text in This Class? Harvard. 1982.

Freire, Paulo. “The Banking Concept of Education.” Pedagogy of the Oppressed. Herder and Herder. 1970.

Fu, Yao. “How Does GPT Obtain Its Ability? Tracing Emergent Abilities of Language Models to Their Sources.”

Gallagher, John. “Writing for Algorithmic Audiences.” Computers and Composition. 2017. 

Galloway, Alexander. The Interface Effect. Polity. 2012.

Gleick, James. The Information: A History, A Theory, A Flood. Pantheon. 2011.

Haikkilä, Melissa. “How to Spot AI-Generated Text.” MIT Technology Review. 2022. 

Hart-Davidson, William, Michael McLeod, Christopher Klerkx, and Michael Wojcik. “A Method for Measuring Helpfulness in Online Peer Review.” Proceedings of the 28th ACM International Conference on Design of Communication (SIGDOC). 2010.

Herrington, Anne, and Charles Moran. “What Happens When Machines Read Our Students’ Writing?” College English. 2001.

Huang, Haomiao. “The Generative AI Revolution Has Begun.” Ars Technica. 2023. 

Jockers, Matthew. Macroanalysis: Digital Methods and Literary History. U of Illinois. 2013.

Jockers, Matthew. Text Analysis with R for Students of Literature. Springer. 2014.

Kirschenbaum, Matthew. Mechanisms: New Media and the Forensic Imagination. MIT. 2008.

Krause, Steven. “AI Can Save Writing by Killing the College Essay.” Steven D. Krause. 2022.

Krause, Steven. “The Problem Is Not the AI.” Steven D. Krause. 2023.

Levitz, Eric. “Fear Not, Conservatives; the Chatbot Won’t Turn Your Kid Woke.” Intelligencer, New York Magazine. 2023. 

Marche, Steven. “The College Essay Is Dead.” The Atlantic. 2022.

MIT AI Education Initiative. “Intro to Supervised Machine Learning.” MIT. 2021.

Moran, Charles. “Access: The A-Word in Technology Studies.” In Hawisher, Gail, and Cynthia Selfe, eds.,  Passions, Pedagogies, and 21st-Century Technologies. Utah State UP. 1998.

Morris-Suzuki, Teresa. “Robots and Capitalism.” 1984.

Nagel, Thomas. “What Is It Like to Be a Bat?” Philosophical Review. 1974.

Ohmann, Richard. “Literacy, Technology, and Monopoly Capital.” College English. 1985.

Oremus, Will. “The Clever Trick that Turns ChatGPT into Its Evil Twin.” Washington Post. 2023. 

Ortony, Anthony. Metaphor and Thought. Cambridge UP. 1993.

Perelman, Les. “Construct Validity, Length, Score, and Time in Holistically Graded Writing Assessments: The Case Against AES.” International Advances in Writing Research. University Press of Colorado. 2011.

Richards, Ivor Armstrong. The Philosophy of Rhetoric. Oxford UP. 1936.

Salvatori, Mariolina. “Conversations with Texts: Reading in the Teaching of Composition.” College English. 1996.

Sandlin, Jennifer. “ChatGPT Arrives in the Academic World.” BoingBoing. 2022.

Scharre, Paul. Four Battlegrounds: Power in the Age of Artificial Intelligence. Norton. 2022.

Seitz, James. Motives for Metaphor. U of Pittsburgh. 1999.

Shanahan, Murray. “Talking About Large Language Models.” 2023.

Sontag, Susan. “Against Interpretation.” Against Interpretation. Farrar, Strauss, Giroux. 1966.

Vara, Vauhini. “Ghosts.” Believer Magazine. 2021.

Warner, John. “ChatGPT Can’t Kill Anything Worth Preserving.” 2022.

Weil, Elizabeth. “You Are Not a Parrot.” New York Magazine. 2023.

Yancey, Kathleen Blake. Reflection in the Writing Classroom. Utah State UP. 1998.

ChatGPT for Writing Teachers: A Primer

2 thoughts on “ChatGPT for Writing Teachers: A Primer

Comments are closed.