AI Loves—and Loathes—Language

Deep learning networks may look like brains, but that doesn’t mean they can think like humans. On the ever-expanding meganet, that’s a problem.
Photo collage of a computer keyboard a brain letters and Shakespeare
Photo-illustration: WIRED Staff; Getty Images

A few years ago, I found myself investigating the thorny problem of Shakespearean authorship. I wanted to know if the anonymous Renaissance play Arden of Faversham (1590) was written partly or entirely by William Shakespeare. Perhaps, as some research claimed, an AI could look over a field of plays divided into just two categories—Shakespeare on one side of the fence and everyone else on the other—and place Arden of Faversham decisively on the correct side.

The AI considered what words Shakespeare and only Shakespeare tended to use, as well as those words that Shakespeare and only Shakespeare avoided. Researchers put Shakespeare’s plays on one side of a fence and every other Renaissance play on the other. We then unleashed an AI, tasking it with figuring out what sorts of features are common to Shakespeare’s plays and, even more importantly, what features are only common to Shakespeare’s plays. So when Arden was thrown at the AI, it would choose to place Arden on the Shakespearean or non-Shakespearean side of the fence based on which “Shakespearean” words it had.

The result, it turns out, is inconclusive. The field happens to be far less neat than I have portrayed. AIs don’t see the fence I mentioned that divides categories. What they do, instead, is build that fence. Here is where the problem arises. If, after drawing the fence, the plays separate cleanly on either side, then we have a neat cleavage between the two categories of Shakespearean and non-Shakespearean plays. But if that separation is not so neat, then it becomes far more difficult to be certain of our classification.

As you would perhaps expect, Renaissance plays don’t cluster so nicely into Shakespearean and non-Shakespearean plays. Shakespeare’s style and verbiage are so varied and dynamic that he intrudes into other authors’ spaces—as other authors frequently do to one another. And word frequencies alone are likely not enough to prove authorship definitively. We need to take other features into consideration, like word sequence and grammar, in the hopes of finding a field on which a fence can be neatly drawn. We have yet to find it. The same goes for the lines between abusive and nonabusive language that Perspective AI—a project from Google that launched in 2017 with the aim of filtering out abusive language from internet conversations and comments—had such trouble identifying, or even a chatbot’s inability to determine appropriate versus inappropriate responses. 

The failure of AI in classifying Arden of Faversham can be attributed to several different causes. Perhaps there simply aren’t enough plays to correctly train an AI. Or perhaps there is something about the nature of the data of Renaissance plays that causes AI to have a harder time with particular types of classification problems. I would argue that it’s the nature of the data itself. The particular kind of data that foils AI more than anything is human language. Unfortunately, human language is also a primary form of data on the meganet. As language confounds deep-learning applications, AI—and meganets—will learn to avoid it in favor of numbers and images, a move that stands to imperil how humans use language with each other.

Meganets are what I’m calling the persistent, evolving, and opaque data networks that control (or at least heavily influence) how we see the world. They’re bigger than any one platform or algorithm; rather, meganets are a way to describe how all of these systems become tangled up in each other. They accumulate data about all our daily activities, vital statistics, and our very inner selves. They construct social groupings that could not have even existed 20 years ago. And, as the new minds of the world, they constantly modify themselves in response to user behavior, resulting in collectively authored algorithms none of us intend—not even the corporations and governments operating them. AI is the part of the meganet that looks most like a brain. But by themselves, deep-learning networks are brains without vision processing, speech centers, or an ability to grow or act.

As my experiment with Shakespearean plays shows, language provides the best counterargument to machine learning’s contention that problems of “thinking” can be solved through sheer classification alone. Deep learning has been able to achieve some remarkable approximations of human performance by stacking layers and layers of classifiers on top of one another, but at what point could a mathematically based classifier sufficiently approximate the knowledge of, for example, when to use the familiar pronoun tu in French versus the polite pronoun vousVous may be the formal form of “you” and tu the informal, but there is no fixed definition of formality. There is no hard-and-fast rule for usage but an ever-shifting, culturally driven set of guidelines, which even humans don’t wholly agree on. Sorting through the inconsistent and contradictory examples of the usage of each, one begins to doubt whether deep learning’s pattern recognition could ever be sufficient to mimic human performance. The distinction between tu and vous is really a sharper and more finegrained form of the distinction between abusive and nonabusive language that Perspective had so much difficulty with. The amount of ambiguity and context built up into human language escapes the sort of analysis that deep learning performs.

Perhaps one day deep learning’s opaque brains will be able to approximate human linguistic understanding to the point where they can be said to have a genuine grasp of tu versus vous and countless other such distinctions. After all, we cannot open up our own brains and see how we ourselves make such distinctions. Yet we are capable of explaining why we chose to use tu or vous in a particular case to explain the interactions of our own embodied brains. Deep learning cannot, and that is but one indication of how far it has to go.

Deep learning’s insufficiency is more insidious than its errors. Errors we have a chance of noticing, but the structural inadequacies of deep learning produce subtler and more systemic effects whose flaws are often not at all obvious. It is risky to outsource human thought to machines that lack the capacity for such thought. At the meganet scale, deep learning’s analysis is so wide-ranging and complex that in failing to understand language, it skews the entirety of our online experience in unpredictable and often unmeasurable directions. As we turn administration of meganets over to these deep-learning brains, they presort the information we feed into them by distinctions that neither we nor they can even specify. Every time Google provides us with a suggested response to a text message or Amazon proposes the next book we should read, that is deep learning doing the thinking for us. The more we adopt its suggestions, the more we reinforce its tendencies. It is often unclear whether these tendencies are “right” or “wrong,” or even exactly what those tendencies are. And we don’t have the opportunity to question them.

Deep-learning systems only learn in response to more inputs being fed into them. With the growth of massive, always-on meganets that interact with hundreds of millions of users and process a nonstop flux of petabytes of data, deep-learning networks could evolve and learn incessantly, without monitoring—which, arguably, is the only way real learning can take place. Yet the present state of AI has deep and mostly unexamined implications for the future of meganets. It’s not merely revealing to compare Google Perspective’s embarrassing handling of natural language with the generally impressive performance of image recognition algorithms. It also prescribes the future directions of AI and the meganet. Corporations, governments, and individuals are all predisposed to migrate toward systems that work over ones that don’t, and whatever the failings of image recognition systems, they approach human performance quite frequently. Perspective, like all AI systems to date that purport to understand natural language meaningfully, does not even remotely approach human performance.

Consequently, meganets and deep-learning applications will evolve increasingly toward applications that avoid or minimize human language. Numbers, taxonomies, images, and video already increasingly dominate meganet applications, a trend that the metaverse, with its emphasis on commerce and games, will only accelerate. In turn, such forms of data will increasingly dominate our own lives online and eventually offline. The vitality of human language, with its endless implicit contexts and nuances, will decline. Those more easily grasped forms of data will condition the deep-learning networks that guide the meganet, while much of the linguistic data will simply be thrown away because there will be no deep-learning network sufficiently competent to process it.

In such a world, language nonetheless will retain a vital role but a diminished and strictly regimented one. While AI presently falls down on understanding human-generated language, strictly limiting linguistic context and variation mitigates the failures of comprehension. If AIs are generating language rather than trying to understand it, problems of comprehension evaporate. OpenAI’s GPT-3 will produce text in response to any prompt given to it, whether “write a paper about Hannah Arendt” or “write a romance novel” or “tell me the darkest desires of your shadow self.” The resulting texts are usually fluid, sometimes convincing, and invariably not truly understood by GPT-3—certainly not at a human level. 

That lack of understanding is not impeding deployment of such models, however. The Jasper company touts its “Artificial Intelligence trained to write original, creative content,” providing auto-generated blog posts, advertising copy, and other social media posts. Jasper produces homogenous, anodyne, and clear copy based on absorbing the style of millions of existing posts like the ones it seeks to emulate. Jasper’s writings, produced in instants, restrict and regularize forms of verbal expression based on the most dominant qualities of the most common sorts of text. All this is fitting, given that Jasper does not actually understand anything of what it is producing. We will increasingly read text constructed by entities with no grasp on what any of it actually means. So too will deeper meaning slowly drain away from language.

For all the talk of algorithmic bias today, this ubiquitous and presently unfixable bias against human language goes unspoken. It is not a problem with an individual system, nor is it a problem that we can fix by training a system differently. Machine learning, like the meganet more generally, manifests a ubiquitous bias for the simple and the explicit against the complex and the ambiguous. Ultimately, physicist Juan G. Roederer’s judgment of 2005 still holds true: “To imply, as it is frequently done, including by myself, that the brain works like a computer is really an insult to both.”


Excerpted from Meganets: How Digital Forces Beyond Our Control Commandeer Our Daily Lives and Inner Realities by David Auerbach. Copyright 2023. Available from PublicAffairs, an imprint of Hachette Book Group, Inc.