AI Dungeon and Creativity

AI Dungeon Logo

In early January I joined a group of AI researchers from Microsoft and my fellow humanist Kathleen Fitzpatrick to talk at the Modern Language Association convention about the implications of artificial intelligence. Our panel was called Being Human, Seeming Human. Each participant came to this question of “seeming human” from a different angle. My own focus was on creativity. Here’s the text of my prepared remarks.

Today I want to talk to talk briefly about artificial intelligence and creativity. And not just creativity as it pertains to AI but human creativity as well. So, has anyone heard or played AI Dungeon yet?

AI Dungeon was released just a few weeks ago and it has gone absolutely viral. It’s an online text adventure you play in your browser or run as an app on your phone. Now, text adventure, that was a popular kind of game in the 1980s. A lot of people know Zork. In these games the player is offered textual descriptions of a house, a cave, a spaceship, dungeon, whatever, and the player types short sentences like go east, get lamp, or kill troll in order to solve puzzles, collect treasure, and win the game. There’s a parser that understands these simple commands and responds with canned interactions prewritten by the game developers. Text adventures are also known as interactive fiction and there’s a rabid fan base online that’s part geek nostalgia, part genuine fondness for these text-based games.

Interactive fiction often revolves around choice, where players have multiple ways to transverse the world and solve the puzzles. Following this generic convention, AI Dungeon opens up with major choice, literally which genre of text adventure you want. Fantasy, mystery, apocalyptic, and so on.

Selecting the genre for AI Dungeon

So here I picked fantasy and immediately I’m thrust into a procedurally generated story: a fantasy world entirely written by a natural language processing program.

Generating a static dungeon on the fly is one thing. But what’s amazing about AI Dungeon is that it’s not a scripted world so much as an improv stage. You can literally type anything, and AI Dungeon will roll with it, generating an on-the-fly response.

Eating a dragon in AI Dungeon

So here, we have a stock feature of fantasy text adventures, a dragon. And I eat it. The game doesn’t bat an eye. It runs with it and lets me eat the dragon, responding with a fairly sophisticated sentence that aside from its subject matter, sounds like something you’d read in a classic text adventure. “You quickly grab the dragon’s corpse and tear of a piece of its flesh.”

Let me be clear. No human wrote that sentence. No human preconceived a scenario where the player might eat the dragon. The AI generated this. Semantically and grammatically the AI nails language. It’s not as good at ontology. It lets me fly the dragon corpse to Seattle. The AI is a sponge that accepts all interactions. As you can imagine people go crazy with this. The amount of AI dungeon erotica out there is staggering—and disturbing.

Later I run into some people and I ask them about the MLA convention.

Asking about the MLA convention in AI Dungeon

A man responds to my question about the MLA, “It’s a convention where all wizards use the same language. It’ll make things easier.”

Oh, that answer is both so right and so wrong.

So how does this all work? I obviously don’t have time to go into all the details. But it’s roughly this: AI Dungeon relies on GPT-2, a AI-powered natural language generator. The full GPT-2 set is trained on 1.5 billion parameters gleaned from over 40 gigabytes of text scrapped from the Internet. The training of GPT-2 took months on super-powered computers. It was developed by Open AI, a not-for-profit research company funded by a mix of private donors like Elon Musk and Microsoft, which donated $1 billion to Open AI in July.

One innovation of GPT-2 is that you can take the base language model and fine-tune it on more specific genres or discourse. For a while Open AI stalled on releasing the full GPT-2 set because of concerns it could be abused, say by extremist groups generating massive quantities of AI-written propaganda. In the more benign case of AI Dungeon, the AI is finetuned using text adventures scrapped from chooseyourstory.com.Summoning a Giraffe in AI Dungeon

There’s much more to be said about AI Dungeon, but I’ll leave you with just a few provocations.

  1. Games are often defined by their rules. So is AI Dungeon a game if you can do anything?
  2. Stories are often defined by their storytellers. Is AI Dungeon a story if no one is telling it?
  3. And finally, a mantra I repeat often to my students when it comes to technology: everything comes from somewhere else. Everything comes from somewhere else. GPT-2 didn’t emerge whole-cloth out of nothing. It’s trained on the Internet, specifically, sources linked to from Reddit. There’s money involved, lots of it. Follow the money. Likewise, AI Dungeon itself comes from somewhere else. On one hand its creator is a Brigham Young University undergraduate student, Nick Walton. On the other hand, the vision behind AI Dungeon—computers telling stories—goes back decades, a history Noah Wardrip-Fruin explores in Expressive Processing. The genre fiction invoked by AI Dungeon has an even longer history.

All this adds up to the fact that AI Dungeon turns out to be a perfect object of study for so many disciplines in the humanities. Whether you think it’s a silly gimmick, an abomination of the creative spirit, the precursor to a new age of storytelling, whatever, I think humanists ignore AI storytelling at our own peril.

The Maze and the Other in Interactive Fiction

Albayzin from Alhambra

I’m spending July in Cádiz, Spain, with my family and a bunch of students from Davidson College. The other weekend we visited Granada, home of the Alhambra. Built by the last Arabic dynasty on the Iberian peninsula in the 13th century, the Alhambra is a stunning palace overlooking the city below. The city of Granada itself—like several other cities in Spain—is a palimpsest of Islamic, Jewish, and Christian art, culture, and architecture.

Take the streets of Granada. In the Albayzín neighborhood the cobblestone streets are winding, narrow alleys, branching off from each other at odd angles. Even though I’ve wandered Granada several times over the past decade, it’s easy to get lost in these serpentine streets. The photograph above (Flickr source) of the Albayzín, shot from the Alhambra, can barely reveal the maze that these medieval Muslim streets form. The Albayzín is a marked contrast to the layout of historically Christian cities in Spain. Influenced by Roman design, a typical Spanish city features a central square—the Plaza Mayor—from which streets extend out at right angles toward the cardinal points of the compass. Whereas the Muslim streets are winding and organic, the Christian streets are neat and angular. It’s the difference between a labyrinth and a grid.

It just so happened that on our long bus ride to Granada I finished playing Anchorhead, Michael Gentry’s monumental work of interactive fiction (IF) from 1998. Even if you’ve never played IF, you likely recognize it when you see it, thanks to the ongoing hybridization of geek culture with pop culture. Entirely text-based, these story-games present puzzles and narrative situations that you traverse through typed commands, like GO NORTH, GET LAMP, OPEN JEWELED BOX, etc. As for Anchorhead, it’s a Lovecraftian horror with cosmic entities, incestual families, and the requisite insane asylum. Anchorhead also includes a mainstay of early interactive fiction: a maze.

Two of them in fact.

It’s difficult to overstate the role of mazes in interactive fiction. Will Crowther and Don Woods’ Adventure (or Colossal Cave) was the first work of IF in the mid-seventies. It also had the first maze, a “maze of twisty little passages, all alike.” Later on Zork would have a maze, and so would many other games, including Anchorhead. Mazes are so emblematic of interactive fiction that the first scholarly book on the subject references Adventure‘s maze in its title: Nick Montfort’s Twisty Little Passages: An Approach to Interactive Fiction (MIT Press, 2003). Mazes are also singled out in the manual for Inform 7, a high level programming language used to create many contemporary works of interactive fiction. As the official Inform 7 “recipe book” puts it, “Many old-school IF puzzles involve journeys through the map which are confused, randomised or otherwise frustrated.” Mazes are now considered passé in contemporary IF, but only because they were used for years to convey a sense of disorientation and anxiety.

And so, there I was in Granada having just played one of the most acclaimed works of interactive fiction ever. It occurred to me then, among the twisty little passages of Granada, that a relationship exists between the labyrinthine alleys of the Albayzín and the way interactive fiction has used mazes.

See, the usual way of navigating interactive fiction is to use cardinal directions. GO WEST. SOUTHEAST. OPEN THE NORTH DOOR. The eight points of the compass rose is an IF convention that, like mazes, goes all the way back to Colossal Cave. The Inform 7 manual briefly acknowledges this convention in its section on rooms:

In real life, people are seldom conscious of their compass bearing when walking around buildings, but it makes a concise and unconfusing way for the player to say where to go next, so is generally accepted as a convention of the genre.

Let’s dig into this convention a bit. Occasionally, it’s been challenged (Aaron Reed’s Blue Lacuna comes to mind), but for the most part, navigating interactive fiction with cardinal directions is simply what you expect to do. It’s essentially a grid system that helps players mentally map the game’s narrative spaces. Witness my own map of Anchorhead, literally drawn on graph paper as I played the game (okay, I drew it on OneNote on an iPad, but you get the idea):

My partial map of Anchorhead, drawn by hand
My partial map of Anchorhead, drawn by hand

And when IF wants to confuse, frustrate, or disorient players, along comes the maze. Labyrinths, the kind evoked by the streets of the Albayzín, defy the grid system of Western logic. Mazes in interactive fiction are defined by the very breakdown of the compass. Direction don’t work anymore. The maze evokes otherness by defying rationality.

When the grid/maze dichotomy of interactive fiction is mapped onto actual history—say the city of Granada—something interesting happens. You start to see the the narrative trope of the maze as an essentially Orientalist move. I’m using “Orientalist” here in the way Edward Said uses it, a name for discourse about the Middle East that mysticizes yet disempowers the culture and its people. As Said describes it, Orientalism is part of a larger project of dominating that culture and its people. Orientalist tropes of the Middle East include ahistorical images that present an exotic, irrational counterpart to the supposed logic of European modernity. In an article in the European Journal of Cultural Studies about the representation of Arabs in videogames, Vít Ŝisler provides a quick list of such tropes. They include “motifs such as headscarves, turbans, scimitars, tiles and camels, character concepts such as caliphs, Bedouins, djinns, belly dancers and Oriental topoi such as deserts, minarets, bazaars and harems.” In nearly every case, for white American and European audiences these tropes provide a shorthand for an alien other.

My argument is this:

  1. Interactive fiction relies on a Christian-influenced, Western European-centric sense of space. Grid-like, organized, navigable. Mappable. In a word, knowable.
  2. Occasionally, to evoke the irrational, the unmappable, the unknowable, interactive fiction employs mazes. The connection of these textual mazes to the labyrinthine Middle Eastern bazaar that appears in, say Raiders of the Lost Ark, is unacknowledged and usually unintentional.
  3. We cannot truly understand the role that mazes play vis-à-vis the usual Cartesian grid in interactive fiction unless we also understand the interplay between these dissimilar ways of organizing spaces in real life, which are bound up in social, cultural, and historical conflict. In particular, the West has valorized the rigid grid while looking with disdain upon organic irregularity.

Notwithstanding exceptions like Lisa Nakamura and Zeynep Tufekci, scholars of digital media in the U.S. and Europe have done a poor job looking beyond their own doorsteps for understanding digital culture. Case in point: the “Maze” chapter of 10 PRINT CHR$(205.5+RND(1)); : GOTO 10 (MIT Press, 2012), where my co-authors and I address the significance of mazes, both in and outside of computing, with nary a mention of non-Western or non-Christian labyrinths. In hindsight, I see the Western-centric perspective of this chapter (and others) as a real flaw of the book.

I don’t know why I didn’t know at the time about Laura Marks’ Enfoldment and Infinity: An Islamic Genealogy of New Media Art (MIT Press, 2010). Marks doesn’t talk about mazes per se, but you can imagine the labyrinths of Albayzín or the endless maze design generated by the 10 PRINT program as living enactments of what Marks calls “enfoldment.” Marks sees enfoldment as a dominant feature of Islamic art and describes it as the way image, information, and the infinite “enfold each other and unfold from one another.” Essentially, image gives way to information which in turn is an index (an impossible one though) to infinity itself. Marks argues that this dynamic of enfoldment is alive and well in algorithmic digital art.

With Marks, Granada, and interactive fiction on my mind, I have a series of questions. What happens when we shift our understanding of mazes from non-Cartesian spaces meant to confound players to transcendental expressions of infinity? What happens when we break the convention in interactive fiction by which grids are privileged over mazes? What happens when we recognize that even with something as non-essential to political power as a text-based game, the underlying procedural system reinscribes a model that values one valid way of seeing the world over another, equally valid way of seeing the world?

Header Image: Anh Dinh, “Albayzin from Alhambra” on Flickr (August 10, 2013). Creative Commons BY-NC license.