The AI Mirror Stage

Doron Darnov

Doron Darnov is a PhD candidate in Literary Studies at the University of Wisconsin-Madison. His research explores the environmental humanities, algorithmic media, and critical theory.

In February 2024, OpenAI revealed Sora, an AI model capable of generating video “featuring highly detailed scenes” and “complex camera motion” from prompts as short as a single sentence. Promotional materials released alongside the announcement show a woman walking down a glowing Tokyo street, some golden retriever puppies playing in the snow, and, perhaps most significantly of all, a wall of vintage CRT TVs stacked on top of each other, each screen flickering in and out with a seemingly patternless assortment of images.

Prompt: “The camera rotates around a large stack of vintage televisions all showing different programs — 1950s sci-fi movies, horror movies, news, static, a 1970s sitcom, etc, set inside a large New York museum gallery.”

The prompt used to generate the video describes the stack of TVs sitting inside “a large New York museum gallery.” Sora’s slow pan indeed bears the feel of a curatorial gaze. It presents the disembodied equivalent of a sweeping arm that gestures towards the expansiveness of a growing exhibition. If there is any subjectivity standing behind the camera, it might be the subjectivity of an artist looking back at a collection of their own creations. From this perspective, perhaps we might read Sora’s TV screens as a gallery of its own digital paintings, as if each screen traps one of its previously generated (or forthcoming) videos in a frame of vintage plastic. In the whirling procession of these AI-generated channels—each screen a kind of simulation within a simulation—The TVs provide a surface for Sora to witness its manifold creation of untold millions of images wrought across untold millions of computer and smartphone screens.

Along these lines, maybe we should say that the TVs are not TVs at all—that they, are instead, mirrors. As digital code, the only way for Sora to see itself is to see the things that it has made. In its wall of flickering screens, it gazes back on itself: it sees its own split, cracked, and refracted form, infinitely scattered across the proliferating field of images that it has generated (or, proleptically, will generate).

Lacan suggests that the mirror stage presents the child with an image of themself as a unified whole. Allowing the child to imagine the completeness of their body, the reflection demarcates a previously ambiguous boundary between I and not I. As disembodied code, Sora’s mirror stage would not offer the same experience. While the child’s reflection prompts them to recognize themself as a distinct body and boundary, Sora’s reflection prompts it to recognize itself as an amorphous smear of images blinking in a staccato rhythm of flickering screens. For Sora, the Imaginary does not manifest in an image of its body—for it has none—but in the invisible non-being whose gaze controls the slow pan of the camera.

What are we to make of this machine that learns to see itself precisely by not seeing itself?