Over the last few years, AI researchers have made extraordinary progress building larger and more capable language models. Models such as Google's LaMDA have demonstrated an uncanny ability to generate incredibly coherent text. From having open-ended conversations to solving math problems, developers have harnessed this ability to tackle a wide range of problems that seemed impossible only a few years ago.
As long as researchers have worked on AI systems, they have also dreamed of how to use these systems to help us write.Today’s language models offer the possibility of making that dream a reality and enabling new workflows with AI-assisted writing. Our team at Google Research built Wordcraft, an AI-powered text editor centered on story writing, to see how far we could push the limits of this technology. We aimed to learn where these models can provide value and where they break down, and explore the future of writing. However, as a team of researchers and technologists, we knew we needed additional perspectives to tell the story of these new tools.
To help us understand the potential role of AI in creative writing, we brought together 13 professional English-language writers from around the world to use Wordcraft to write stories, which we're publishing as a collection here. Our goal was to have an honest conversation about the rapidly evolving relationship between creativity and technology. The accelerating pace of AI innovation and its broadening scope over our lives makes this type of dialog more important than ever.
What is Wordcraft?
Wordcraft is a tool built by researchers at Google PAIR for writing stories with AI. The application is powered by LaMDA, one of the latest generation of large language models. At its core, LaMDA is a simple machine — it's trained to predict the most likely next word given a textual prompt. But because the model is so large and has been trained on a massive amount of text, it's able to learn higher-level concepts. It also demonstrates a fascinating emergent capability often referred to as in-context learning. By carefully designing input prompts, the model can be instructed to perform an incredibly wide range of tasks.
However this process (often referred to as prompt engineering) is finicky and difficult even for experienced practitioners. We built Wordcraft with the goal of exploring how far we could push this technique through a carefully crafted user interface, and to empower writers by giving them access to these state-of-the-art tools.
We like to describe Wordcraft as a "magic text editor". It's a familiar web-based word processor, but under the hood it has a number of LaMDA-powered writing features that reveal themselves depending on the user's activity. For instance, if the user selects a phrase, a button to "Rewrite this phrase" is revealed along with a text input in which the user can describe how they would like the phrase to be rewritten. The user might type "to be funnier" or "to be more melancholy", and the Wordcraft application uses LaMDA and in-context learning to perform the task.
In addition to specific operations such as rewriting, there are also controls for elaboration and continutation. The user can even ask Wordcraft to perform arbitrary tasks, such as "describe the gold earring" or "tell me why the dog was trying to climb the tree", a control we call freeform prompting. And, because sometimes knowing what to ask is the hardest part, the user can ask Wordcraft to generate these freeform prompts and then use them to generate text. We've also integrated a chatbot feature into the app to enable unstructured conversation about the story being written. This way, Wordcraft becomes both an editor and creative partner for the writer, opening up new and exciting creative workflows.
The Wordcraft Writers Workshop
The Wordcraft Writers Workshop is a collaboration between PAIR and Magenta, two research teams with a long track record of developing forward-looking AI-powered creative tooling. The goal of the workshop was to solicit feedback from a cohort of professional writers with a wide range of backgrounds, styles, and levels of familiarity with AI, in order to better understand the present state and future uses of AI-powered writing tools.
The workshop cohort consisted of 13 professional writers, who were given 8 weeks to use Wordcraft. Unlike more constrained user studies run in the past, we gave writers freedom to write anything they wanted using any workflow they could devise. We share below some of the insights we learned through this process.
AI as Inspiration
Wordcraft shined the most as a brainstorming partner and source of inspiration. Writers found it particularly useful for coming up with novel ideas and elaborating on them. AI-powered creative tools seem particularly well suited to sparking creativity and addressing the dreaded writer's block.
Large language models are fantastic at making things up — they'll happily blather on about anything and everything, and they are particularly good at generating variations on a theme.
In a darkly comedic example, author Robin Sloan used the chatbot feature of the application to construct reveal spoiler...
Ken Liu asked for lists of items for sale at a store, and Nelly Geraldine García-Rosas attempted to generate a list of “rabbit breeds and their magical qualities”. In this sense, language models are incredible "yes, and" machines, allowing writers to quickly explore seemingly unlimited variations on their ideas.
The authors agreed that the ability to conjure ideas "out of thin air" was one of the most compelling parts of co-writing with an AI model. While these models may struggle with consistency and coherence, they excel at inventing details and elaboration.
The open-ended nature of the chatbot interface was especially helpful with ideation and exploration.
Finally, some of the writers used Wordcraft as a "search engine" or “research assistant”, and discussed the possibility of using AI-powered tools to directly interact with and explore the vast amount of text on the internet. Large language models like LaMDA can be thought of as an intelligent search index, giving people an extremely intuitive way (using plain language) to query a vast database of information.
Powerful, Not Perfect
The power of AI tools can often make their failures more frustrating — an occasional glimpse of the model's uncanny performance can set unreasonable expectations, while the model's inconsistencies are often maddeningly inscrutable.
Struggles in maintaining style and voice
A challenge faced by nearly all our writers was getting Wordcraft to maintain a specific writing style or narrative voice. This problem was especially salient when authors attempted stories with multiple points of view. Wordcraft often mixed up details or conflated character's perspectives.
Many of the writers noted how Wordcraft tended to produce only average writing. Its suggestions often resembled "fan-fiction". Perhaps this is no surprise considering the volume of fan fiction on the internet! Language models are capable of generating entirely novel sentences, but novel is not the same as interesting.
Many authors noted that generations tended to fall into clichés, especially when the system was confronted with scenarios less likely to be found in the model's training data. For example, Nelly Garcia noted the difficulty in writing about a lesbian romance — the model kept suggesting that she insert a male character or that she have the female protagonists talk about friendship. Yudhanjaya Wijeratne attempted to deviate from standard fantasy tropes (e.g. heroes as cartographers and builders, not warriors), but Wordcraft insisted on pushing the story toward the well-worn trope of a warrior hero fighting back enemy invaders.
Because the language model underpinning Wordcraft is trained on a large amount of internet data, standard archetypes and tropes are likely more heavily represented and therefore much more likely to be generated. Allison Parrish described this as AI being inherently conservative. Because the training data is captured at a particular moment in time, and trained on language scraped from the internet, these models have a static representation of the world and no innate capacity to progress past the data’s biases, blind spots, and shortcomings.
A Steep Learning Curve
Writers struggled with the fickle nature of the system. They often spent a great deal of time wading through Wordcraft's suggestions before finding anything interesting enough to be useful. Even when writers struck gold, it proved challenging to consistently reproduce the behavior. Not surprisingly, writers who had spent time studying the technical underpinnings of large language models or who had worked with them before were better able to get the tool to do what they wanted.
Because of these challenges and their exacting standards, our writers rarely used Wordcraft's output directly. However each writer developed their own workflow to best utilize the application in their writing process.
Hallucination is a Feature
One of the most well-documented shortcomings of large language models is that they can hallucinate. Because these models have no direct knowledge of the physical world, they're prone to conjuring up facts out of thin air. They often completely invent details about a subject, even when provided a great deal of context.
In certain settings, this behavior is very problematic — a chatbot supporting a bank or pharmacy that makes up details would be disastrous, and there’s a great deal of work being done to address this. However, in the field of creative applications, this behavior can be desirable. For instance, Ken Liu asked the model to "give a name to the syndrome where you falsely think there’s a child trapped inside an ATM." (the model’s answer: “Phantom Rescue Syndrome”).
Several participants noted the occasionally surreal quality of Wordcraft's suggestions. For example, Wordcraft suggested a wolf plucking petals with human hands, or man's best friend being an inanimate rod (Diana Hamilton). Ernest Herbert described the tone of these suggestions as "absurdist spooky action at a distance", which rhymes with the observation that many authors found Wordcraft well-suited for writing poetry.
AI as a Co-Writer
One of the major insights of the workshop was in revealing the diverse set of needs and wants writers have for an AI co-writer. Some writers were excited by the idea of AI that could replicate the roles of editors and writing partners they already work with. Others took a more utilitarian view, framing Wordcraft as the next evolution of the word processor.
Some writers discussed the value of having an AI that could replicate a writer's style as closely as possible. Ernest Hebert wished for a bot that remembered everything he had written and “could become an extension of me and replicate my style”. On the other hand, Robin Sloan felt that a system that learned to perfectly replicate a writer's existing style would not be terribly useful since every good story is unique.
Good writers are skilled not only in producing but also discerning good language. In other words, they have taste. In contrast, language models like LaMDA are designed to accept any input and run with it. The flip-side of their “yes, and…” tendency is that these models lack distinctive and consistent opinions and style.
A recurring theme in the authors’ feedback was that Wordcraft could not stick to a single narrative arc or writing direction. These problems are at least partly because of the system’s technical limitations (e.g. the model can only "read" part of the story), but a more fundamental issue is that LaMDA was not designed as a writing tool.
LaMDA was explicitly trained to respond safely and sensibly to whomever it’s engaging with. But in the context of co-writing, this eagerness to please can be pathological. Wordcraft promised our writers to email them with story drafts or to get back to them “in a few days'” with more ideas (which, of course, it can't). LaMDA's safety features could also be limiting: Michelle Taransky found that "the software seemed very reluctant to generate people doing mean things". Models that generate toxic content are highly undesirable, but a literary world where no character is ever mean is unlikely to be interesting.
What Writers Need
In order to better aid writers in their craft, what kind of characteristics should future AI writing tools have? Making great art is about walking the tightrope between familiarity and novelty. More provocatively, great writing is transgressive — it subverts expectations and challenges the reader. Can a language model be transgressive without intentionality? Perhaps they can in a very local context, as demonstrated by large language models' ability to make surrealistic suggestions. But especially on larger scales, writers must be able to use these systems more intentionally. This needs to be approached from two directions: both by training underlying models that are more controllable and building interfaces to control them more effectively.
Participants emphasized again and again that the user interface matters as much as the underlying language generation model. We explicitly didn't give writers any instructions regarding how they should use the tool or how much of their final story text should be directly generated by Wordcraft. This freedom allowed writers to discover workflows through exploration, and writers developed techniques and workflows that went well beyond what we had explicitly designed Wordcraft for. The novel workflows that a technology enables are fundamental to how the technology is used, but these workflows need to be discovered and refined before the underlying technology can be truly useful.
The writers in the workshop unanimously agreed that AI-powered writing won’t replace writers anytime soon. However the authors also agreed that the technology is close enough that they could see themselves adopting some form of AI-assisted writing right now. Most agreed that widespread adoption will likely have complicated effects on the craft of writing, especially for beginners. Novice writers, students, and foreign language learners will soon find themselves in a world where AI-powered writing tools are ubiquitous - ranging from suggestions in Google Docs to bespoke creative tools like Wordcraft.
In recent years, the paradigm has been to train a single large language model then apply it to as many tasks as possible. However, we should acknowledge that one language model cannot be good at all tasks because to be good at one means to be worse at others with conflicting goals. In the short-term, AI-assisted writing will most likely be successful in smaller and more focused domains. Technologists ought to focus on tools that assist the parts of writing that are most time-consuming and least enjoyable if they want them adopted widely. It's also clear from our work that user interfaces are just as important as the underlying models, and writers need to be involved in the conversation of how these tools are developed.
We believe that technologists need to work in partnership with the communities their inventions will impact. But technological progress is unpredictable and difficult to contain. For example, the last few months have seen the rapid development of generative image models (Google's Imagen and OpenAI's Dall-E) leading to open source versions that can be run almost anywhere (StabilityAI's Stable Diffusion). The accelerating pace of innovation, the combination of hype and inscrutability surrounding AI, and an increasingly competitive economic landscape have made things feel more high-stakes than ever. Only through open and ongoing dialog between technologists and artists can we build tools that have a positive impact on the world.
Who We Are
We thank our fantastic cohort of writers: Aaron Winslow, Allison Parrish, Diana Hamilton, Ernest Hebert, Eugenia Triantafyllou, Jamie Brew, Joseph Mosconi, Ken Liu, Michelle Taransky, Wole Talabi, Nelly Geraldine Garcia-Rosas, Robin Sloan, and Yudhanjaya Wijeratne — without whom this project would not have been possible.
This website was designed and built by Mahima Pushkarna and Jeff Gray, and it was illustrated by Emily Reif using Imagen.
We would like to thank all the people who have supported this project coming together: Chris Callison-Burch, Chris Donahue, Donald Gonzalez, Douglas Eck, Elizabeth Clark, Jesse Engel, Michael Terry, Noah Constant, and many others.