✘ AI Will Eat Itself... And That's Okay
And: the shadows of minstrelsy lurking in AI content, the creator economy as propaganda machine, and the role of sameness in ecological diversity.
Maarten’s off this week, and I’m thilled to be back, to share some thoughts on synthetic data and the future of creativity in the age of AI. Don’t worry; there’s a hopeful ending!
Jarod Lanier, internet thinker extraordinaire, recently expressed his concerns regarding AI and society. Refreshingly, Lanier didn’t share the more typical doom hype - that a supermind would ignite and destroy us. Lanier sees something arguably worse: He worries AI will drive us mad.
AI’s most dire threat to sanity comes from dis- and misinformation, yet another creeping force undermining our wellbeing stems from something far duller: content ennui and endless sameness. Less than a year into the post-GPT4 and Stable Diffusion era, and we’re drowning in junk webpages undermining the digital ad networks that have long been the financial engines of the internet. We’re seeing books of AI-generated nonsense flood online stores. We’ve had AI-generated music on DSPs for years, but the number of tracks is increasing exponentially: Mubert recently announced its services had been used to generate 100 million tracks, setting its sights on a billion in the near future. As recent difficulties surrounding mass-generated AI tracks suggest, some of these tracks may be genAI junk uploaded to perpetrate stream fraud.
Creatives are notably and reasonably upset with these developments. They were not aware their creations were included in training sets, and they were not asked or compensated. Class-action and other lawsuits have already been mounted here in the US by authors and artists. Meanwhile, Hollywood has ground to a halt with union strikes, sparked in part by the legitimate concerns of screenwriters and actors about the role AI will play in scripts and production.
This all kinda sucks, but it’s nothing to make you lose your mind—at least, not yet. The content deluge and the discontent of creatives—again, a justified reaction—is about to land us somewhere we’ve never imagined. We’re about to be trapped in an aesthetic feedback loop, as synthetic data is used to train the models and AI eats itself. By understanding the nature of the issue, however, we can address the underlying aesthetics of AI purposefully, find new approaches to our creative work, and break out.
The Dataset Supply Chain and Synthetic Data
Before we talk about the perils of synthetic data, let’s take a quick tour of how datasets work: You collect a massive pile of some sort of data—images, texts, sounds—and figure out a way to label them, via automation and/or hard (often exploitative) human work. This annotated data becomes a dataset that is then used to train a model. This dataset allows the complex algorithms of machine learning to eventually create the desired results. The quality of the dataset, and accuracy and sophistication of the annotation, help determine how good the model’s output will be.
Much of the data in existing datasets was scraped from the internet, drawing on the so-called “corpus” that sites like Reddit and private services like Getty Images want to protect. In audio, things are more complicated, as finding properly labeled audio online ain’t easy. Audio, scraped or not, is hard to annotate. Annotation requires musical knowledge and, for really good metadata, actually listening to the files in question, a time-consuming process compared to glancing at a picture of a puppy or a muffin. So there aren’t as many datasets for audio and music, and they have limitations.
Because getting extremely large and properly annotated, datasets can be difficult, AI engineers have turned to synthetic data. In other words, they use AI itself to generate content for the datasets other models are trained on. The temptation to turn to synthetic data is particularly great in areas like music where properly labeled data is rare “in the wild” online, where provenance and copyright status aren’t clear, and where datasets are limited in size and variety.
Synthetic data, if used over and over again in training, leads to some odd phenomena and distortions, however, distortions that only grow with each generation of training. This can lead, as some researchers put it recently, to “model collapse,” when models accumulate “irreversible defects.” Strategically feeding “fresh data” into the system (i.e. new human-generated text, images, or sounds) can offset some of this distortion, but it is no longer clear, to internet scrapers, what is human-generated and what is machine-made. (In the case of GPT4, its output can’t reliably be detected at this point, and detection is often picking up antiquated or non-native usage, not a machine.)
The Drift Toward Mid
To make matters even stranger, some of the very people working to label training data are using AI to automate a great deal of that work, as incentives encourage them to outsource annotation to machines and thus increase their productivity. And human creatives are increasingly leery of throwing their works into the billion-dollar content black hole of commercialized AI. So, data annotation is itself becoming synthetic, and human “fresh data” may become increasingly inaccessible. This is a problem.
Part of this problem is aesthetic: Synthetic data stands to slowly filter into and morph what we see, hear, and read every day. The industries and niches that are already the most automated and templated, such as marketing copy, ad images, website design, and production music, are the most vulnerable to this dilution. The middle-of-the-road output most models are trained to produce has a clear, well-established use and will have an impact on art, literature, and music. It will further the drift toward mid, an aesthetic sameness already rampant due to pattern matching and recommendation algorithms on social platforms, among brands, and in design, interior or otherwise.
In audio and music, a similar drift is occurring, as trends cycle faster and faster on TikTok, and as nostalgia and retro vibes surface and colonize all sorts of new tracks by major artists. Lots of new, amazing music is out there, but AI elements or fully AI-generated tracks or beats could begin to exert growing inertia on how we think about and enjoy music, seeping into our creation tools and our playlists. Add to that the drag of model distortions and collapse, and we are looking at a strangely depressing aesthetic future.
How can we break free?
Because we can! It’s not all grim or predetermined; I’m not about to go full Horkheimer here. And anyone who claims technological impacts are inevitable is trying to sell you something.
In fact, AI could spark all sorts of new creativity, despite its autophagous tendency. To make something of this potential mess, we need to respond, and we might as well do it now and deliberately. Our response can be practical, based in music and tech industry realities, and aesthetic, based in creative principles and approaches to our art and craft.
A practical idea
Dataset cooperatives: There’s no reason artists, especially self-releasing artists, can’t create cooperative datasets of their work, including stems, finished tracks, live recordings, outtakes, you name it. The dataset would likely need as few as 20,000 tracks, or perhaps far fewer.
The cooperative could craft a license that rewards contributors equitably, ideally upfront, and allows specific uses for the dataset. Labeled correctly, this audio could be deployed ethically, and would result in a new income stream for artists and fresh inputs to keep models useful. It would be amazing if this process were controlled and run by artists themselves, perhaps using a DAO or other cooperative structure. This space is wide open at the moment. It’s worth exploring.
Some aesthetic ideas
Emotional saturation: The Romantic movement of the early 19th century was, in part, a reaction to automated machinery and the social disruptions that followed its implementation. The insistence on inner experience, on emotion (a word that came into wider usage in the 1820s), was a counterbalance to the creeping metaphors of the materialist springs and gears that were presumed to rule human behavior, just as they ruled the factory floor.
We’re facing a similar moment, when mechanistic models dominate our understanding of our minds (see every pop explanation based on dopamine) and humanity feels disconnected from expression; there will be pictures, poems, and songs in the age of autophagous AI, but they will have little relation to human experience. Our models mapping experience to expressive acts have already collapsed.
We need to set aside our squeamishness around emotion (the ironic/cynical turn of past decades) and the pathologization of profound feeling, which isn’t necessarily mental illness. We can strive to imbue whatever we make, however we make it, with powerful, distilled human emotion.
This means drawing on experience, looking for shared threads, not relying on cliche or endless recursive navel-gazing. This saturation process could take a myriad of forms in music, from the gritty and minimalist to the cinematic and epic; the intent defines the aesthetics, not the details of the final product. As musicians and producers find the forms and frameworks, we can all learn to listen for and savor emotional intensity. We need to discover a new Technicolor of the heart, and the musical vocabulary to parse and appreciate it.
Relational novelty: In the early 20th century, at the dawn of the age of mechanical reproduction, there was serious soul searching about new urban environments and the layers of reproduced text and images they produced. Many visual artists and writers grappled with this disjointed, disorienting environment by digging into juxtaposition via media like collage or avant-garde sound-based poems. They may have used what they found and the inputs may have been banal, but the relationship between the pieces made a novel, sometimes jarring statement.
We, too, might find new inspiration in doing something similar. Generative AI blurs lines and edges between disparate inputs; what if we exaggerate and celebrate these edges and tensions and contrasts more? We can do this alone, or with others, making global collaboration less about bringing one participant into a mode of expression that meets another participant’s cultural standards, but working as peers to collage and mashup our respective aesthetic cultures in jagged but meaningful ways. We can embrace friction and fissures.
Adapted imperfection: As AI degrades, could the artifacts generated in that process themselves become a spark for some new art? Will its errors and failings, like clock noise and vinyl hiss, lead to new aesthetics? I imagine the artists of the near future, listening to some lo-fi, glitched-out generative AI audio, and hearing something new, much like the master craftsperson who takes a board with a huge knot and purposefully makes it the aesthetic center of a new table. We can seek out and even create imperfections, errors, and anomalies, and incorporate them into a new structure.
I could go on. These are a few directions I’ve been contemplating; there must be many, many more, things you are thinking about that everyone else has yet to discover. Let’s pick some, talk about them, make things to try them out, and see where we get. We don’t have to collapse with the models or submit to the mid.
LINKS
⛔️The New Minstrels Are Here (Jason Parham)
“At its most menacing, the mass adoption of AI tools is a mass adoption of the biases they absorb and perpetuate. In doing so, we grant the wrong dogmas credibility… Without safeguards, this new minstrelsy will produce the inverse effect of the post-racial fallacy peddled during the Obama years. Race and gender inequities will not vanish so much as infect the visual vernacular of everything we watch, share, and learn from.”
✘The problem of racial bias in digital media has deep roots, stretching, some historians argue, all the way back to the earliest dawn of computing. The problem will only get weirder and more pernicious with mass AI adoption. Parham puts current developments in the context of past exploitative entertainment and makes a case for vigilance and awareness.
⚔️The New Media Goliaths (Renée DiResta)
“We should not glorify the era of a consolidated handful of media properties translating respectable institutional thinking for the masses — consolidated narrative control enables lies and deception. But rather than entering an age of ‘global public squares’ full of deliberative discourse and constructive conversation, we now have gladiatorial arenas in which participants in niche realities do battle. … We have a proliferation of irreconcilable understandings of the world and no way of bridging them.”
✘The creator economy and the wonderful world of niches have dark sides, and DiResta outlines a big one: the proliferation of propaganda. DiResta connects the creator economy to past media formations, and hopes that by shining a light on the dynamics enabled by the internet, we can better guard ourselves against manupulation. I’m skeptical - there are lots of psychological reasons people fling themselves into certain rabbit holes - but hopeful.
🌿The Key to Species Diversity May Be in Their Similarities (Veronique Greenwood)
“Ecological processes may have a way of canceling each other out, so that what seems like endless variety can have a simple outcome.”
✘Ecologists are trying to figure out why diversity happens (and why natural systems are often more diverse than older models account for). The idea of “emergent neutrality” - that individuals’ life history can tell us as much as their species when it comes to their fate in an ecological niche - is fascinating, a middle ground of sorts between hardcore Darwinian competition and symbiosis. Though nothing here is directly connected to music, it’s cool to imagine how certain kinds of sameness can provoke endless variations and highly complex systems, and how that might inspire our thinking.
MUSIC
Wata Igarashi has been rearranging the furniture in my brain lately. This track in particular slides and roams around your head, with this peculiar subtlety that feels both unnerving and pleasant. The perfect state for trying to wrap your mind around our fractured AI future.
Really love the idea of Dataset Coops - worth exploring as a DAO!
Thanks for another great piece, Tristra!