Generative AI: How Do You Make Images, Text, or Code? Anne Carson, Anne, and David Guo, founder of Everyprompt, an AI-backed startup, whose software makes automatic images, text, and
Guo believes generative AI is a leap in the potential of AI technology similar to one beginning in 2012 that reshaped the whole tech industry and the products it offers. That’s when engineers found that artificial neural networks, a type of machine learning model, could perform remarkable new tricks when given sufficient training data and computer power, such as recognizing the content of photos or transcribing speech.
In general, humans like using machine learning to assist pathologists, sharpen a phone photo, or make a better map. But the AI generators bug a lot of people. The tools work by spidering images from across the internet, absorbing the visual culture within them by scanning their caption, and then adding a noise to them until they look static. To make a new image, the AI starts with a caption and some static, then runs the process backward, removing noise until an image appears that lines up with the caption, more or less. (It’s bad at drawing hands, but so am I.)
There was a party in San Francisco for Stability AI, which provided tools for generating images with few restrictions. The company has $101 million in new funding and is valued at $1 billion. Tech celebrities like Sergey Brin were at the gathering.
Song works with Everyprompt, a startup that makes it easier for companies to use text generation. Like many contributing to the buzz, he says testing generative AI tools that make images, text, or code has left him with a sense of wonder at the possibilities. It was a long time since he used a website that felt really helpful. “Using generative AI makes me feel like I’m using magic.”
Anne Carson published a book called Short Talks in 1992. It’s a series of micro-essays, ranging in length from a sentence to a paragraph, on seemingly disconnected subjects—orchids, rain, the mythic Andean vicuña. The sounds of her talk on the sensation of airplane takeoff is what it sounds like. Her “Short Talk on Trout” is mostly about the types of trout that appear in haiku. In what passes for the book’s introduction, Carson writes, with dry Canadian relatability, “I will do anything to avoid boredom. It’s the biggest task of your life. Right about when she published that, the internet started to take off.
Thirty years ago, staying up late and messing with artificial intelligence was one of the best ways to avoid boredom. DALL-E 2, Midjourney, and Stable Diffusion can be used to create exaggerated oil paintings of dogs in hats and astronauts riding horses, even if the tools are not instructed. When I first started playing with Stable Diffusion—which is open source and very fun—I was reminded of Carson’s talks. I went back to them to figure out why. I found out that the resemblance was related to form.
Carson’s Little Lecture: Exploring a Giant Idiot World-Brain through the Lens of a Digital Articulator
Everyone says content is king, but the secret monarch is form,constraints and rules. You grow up learning form. A high school essay is five paragraphs. Sitcoms leave eight minutes in the half hour for ads. Novels are long. There is a cap on the total number of characters in a single 140-character message.
Is my studio film similar to yours? Each of us makes choices in the form. Our style is a word. Carson’s book takes a familiar form, the little lecture, and subverts it, manipulates it, until as the reader you start to feel like you’re inside her wonderful brain, scrolling through her mental browser history, joining her in hyperlinked fancies and half-abandoned rabbit holes. Image generation is kind of like that—but instead of communing with a single brilliant Canadian brain, you’re communing with a giant idiot world-brain. (A less neurological way to put it: vast numbers of data objects grouped in layers, connected together to an incomprehensible degree, like string-and-nail wall art of a many-masted clipper ship but on fire with the flow of data.)
The people who can use the new tools will have new power. The people who were great at the old tools will be remembered and made into Soylent.
This feels gross. Artists databased into oblivion feels gross. The computer would do the portrait without moral judgement, even though it feels gross that someone could say to it. These systems roll scenes, territories, cultures—things people thought of as “theirs,” “their living,” and “their craft”—into a 4-gigabyte, open source tarball that you can download onto a Mac in order to make a baseball-playing penguin in the style of Hayao Miyazaki. New tools will give people new power. The people who were great at the old tools will be remembered for their service, which was rendered into Soylent. It’s as if a guy wearing Allbirds has stumbled into a residential neighborhood where everyone is just barely holding on and says, “I love this place, it’s so quirky! Go play my Quirky playlist with Apple’s intelligent assistant. And open a Blue Bottle on the corner!”
So naturally, people are upset. At least for now, art websites are banning artificial intelligence work, while stock image services refuse it. Prominent bloggers who experimented with having an AI illustrate their writing have been chastened on Twitter and have promised not to do it again. I am suspicious of the ethics of the companies that work with them and certain words are banned from the image generator interface, which is bad because I wanted to ask the bot to paint a cottage in the style of Thomas Kinkade. (One must confront one’s deepest fears.)