✘ FUD about AI? Let's talk about it
- Teach your kids prompt engineering - Takedown fails - The loneliness apparatus - History of Niche Internet - Notes from Beethoven's genome
Fear, Uncertainty, Doubt - there’s a lot of it when it comes to AI. There’s even people who say that the general skepticism - read fear - about AI is why we’re not in an investment bubble around the tech right now. But that doesn’t bring us any closer to understanding how it works, how it impacts our work and livelihoods, and what we need to do. And if one thing is for sure, we - yes, you and I both - need to act. Some developments are happening at breakneck speed. In other aspects, there’s hardly any progress. In music, we’ve seen the release of first Suno and then Udio. Some people are wildly enthusiastic, others - that would be me - hammer on about how the underlying models have clearly been trained on copyrighted content. Let’s start there.
The models have been built
Two years ago, the major tech companies had a little department somewhere working on large language models. Then OpenAI launched ChatGPT and everyone had to release their own. MAGMA needs to stay on top. Now, all of these models are out there in the world, eating up information and growing in capabilities. And yet, the finer tuning is hard, and will get harder still. That’s because all the major choices around these models have already been made. If I go to anyone of the image generators and type in that I want a photo of a band, it’ll be an all-male band. Attempts to correct this through ‘reinforcement learning through human feedback’ aren’t fool proof either. It’s a problem, and it’s not going away anytime soon, because - and I repeat - the models have already been built.
This also rings true for Suno and Udio. That genie is out of the bottle - these tools now exist and will not go away. We have to deal with their shortcomings, and need to work together to bring them to heel on issues of copyright. What’s more, these models now deliver based on prompts - text-to-audio. The next step is the move from LLM [large language models] to LAM [large action models]. If you’re following along, you’ll have heard about AI Agents. Breaking it down, the big difference is that an LLM works through its own data to formulate an answer to your prompt. The AI Agent works in a loop, starting from your prompt and working through external data to formalize not just an answer but also an action. These kinds of developments will eventually - say this year or next year - result in a LAM operating at scale, with many people interacting with it.
Build your own model
Let’s take OpenAI and their ChatGPT as the example here. This is basically a walled garden. We can use the tool, but we don’t know what’s under the hood. Nor can we have any influence on where training data comes from and how it’s used. OpenAI can create developer access, which they’ve done with their custom GPTs. This is building within the walled garden. Opposite this strategy is the open-source model. An example is Meta’s Llama 2 model, which leads the way in being the basis of most models available on Hugging Face. While transparency will help - and there’s questions to ask of Meta not being fully transparent about Llama - there’s also dangers here. All this open-source data will, and already is, leading to what we could call an adaptation of an infamous 4Chan rule 34: if it exists, there’s deepfake porn of it.
But instead of moving into the FUD, we’re moving away from it. Through open-source models, we can all build our own models. The creative researchers at Open Culture Tech are currently doing just this with Eveline Ypma, one of the artists they’re working with. They’re using MusicGen AI, another open-source model released by Meta. Another one to keep an eye on in this respect is Spawning, which is due to release a new modular model as well later this year. Both of these initiatives focus heavily on the need to be open about training data and to keep rights with the creators. On that note, one more to keep your ear on the ground for: Somms.ai. All of these initiatives allow people to build their own models on top of a larger model. This gives you more influence on the input and outcomes.
The great interaction shift
It took around 20 years to get from going online through phone lines to post on bulletin boards to an Internet of Things. Other ways to talk about this is the move from a Web1 - read only - to a Web2 - read and write - Internet. The next shift is profound again. During the last crypto bull run Web3 became the equivalent for a blockchain-based Internet, and we may need it to come to fruition now more than ever. But while a Web3 will require firm groundings in transparency, accountability, and immutability, there’s another shift happening. We saw this in the multiplayer experiments of the blockchain in recent years. This, in itself, was an extension of a Creator Economy which called into being an endless amount of creator tools. Each new tool made it easier to create and produce music (or images or video or text) and publish it.
Back in September, Cherie Hu and Yung Spielberg already wrote that music’s Midjourney moment had arrived:
“We’re seeing an influx of stakeholders from every corner — developers, artists, rights holders — race to build large-scale music models, ship improved user experiences on top of those models, and close industry partnership deals, all at an unprecedented commercial scale and technical quality.”
This advances what I wrote back in 2021 about the potential for one billion music creators to exist:
“So, in a way we’re moving into a world that’s thoroughly mediatized by the sonic in the form of melodies, beats, hooks. Some of these put together by people calling themselves artists, others by people who quickly threw together a few loops. The former might be looking to make a living from their art. The latter might just be enjoying themselves and have no ambition to share their creations beyond a few friends and like-minded people. The question of originality remains pertinent.”
The stuff - the songs, sounds, albums - that will hit the auditory senses differently will be those original sonic structures. But all the other stuff - projects started and abandoned, music prompted - will also be there. As Mark Mulligan recently put it, these two types of ‘stuff’ just “occupy different spaces.”
The real change is in the interaction, which will only grow as we move from LLM to LAM. Already, we can have fun together with a variety of music generative AI apps. I can prompt a sonic structure and share it with you. We can be amazed at how good it sounds or laugh at how crap it is. The point is that we’re sharing it. The point is that the next step is already on the horizon - not just sharing but getting a fully arranged score and the steps needed to publish it. It’s what Lex Dromgoole said about Bronze, the company he’s building:
“Currently, when we create a piece of music, we structure it and we arrange it to be static, to be inert. And we refine it and we distil it down to one specific thing. And the aim of Bronze is to allow us to create an arrangement of music for variation, music that always exists within variation, and then release that rather than the static piece of music.”
As with any new technological developments for creation and distribution, we’ll see these new formats come up. This one will be defined by interaction. Interaction with the tech itself - when we build our own models - and interaction with each other - as we create and share those sonic structures.
What the FUD - solution thinking
Yes, there’s a lot of fear, a lot of uncertainty, and a lot of doubt. But we cannot make this go away anymore. AI is here, what we need to do is work to harness it. This is done on higher level licensing and accountability around training data and output. This is done through regulation at an international level. Mostly, however, this is done by comprehending what’s happening as both creators and consumers alike. Embrace AI while also considering the state of your data, and paying attention to opting out, and understanding how to protect yourself, and your digital selves, more broadly.
LINKS
🎸 Teach your kids prompt engineering instead of the guitar (Yash Bagal)
“The technical barriers to music creation will no longer exist -- creativity (and originality, whatever that means in a post-AI world) will be the true differentiating factors. This need not mean "musical" creativity - it could mean creative ways to capture attention via memes, tiktoks, wearing a giant chicken outfit outside a taylor swift gig, whatever. There was a time in history where only scribes possessed the technical ability to write. Today, most educated humans can, and I think most of us agree that it's been a net good for humanity.”
✘ Yash is shouting from the fence and we should all listen. Music is, as always, a bellwether for broader societal changes through tech. What we do will impact how other industries can and will react.
😞 Takedown fails: Artists are seeing their music removed from DSPs for streaming fraud they didn’t commit (Ari Herstand)
“Although distributors and streaming services frequently use language that places the blame on the artist for fraudulent activity detected on their accounts, it has become clear that artists are often caught in the middle of a crossfire between streaming services, distributors and fraudsters attempting to game the system for their own financial gain.”
✘ As we speak about streaming fraud more and more, we’ll also get more of these stories. We need to take them to heart and work on fixing problems not issuing band aids.
⛭ The loneliness apparatus (Toby Shorin)
“The social patterns of welcoming and enervating spaces need more creativity and attention. This is not something a national playbook is suited to address, but the best organizers and teachers know well. At the level of implementation, the negative term (“loneliness”) ought to be deemphasized in favor of the positive social content.”
✘ Not specifically music related, but definitely does resonate with Tristra’s latest piece here in MUSIC x. It also provides a mirror to a lot of perceived problems in the music industry. Often, what we all talk about as ‘the problem’ requires a reframing to understand the core issue at play. Only then can we work on changing the infrastructure and affect change.
▪️ Rise, marginalization & return of the Niche Internet (Mac Budkowski)
“Niches are what make the world go round. Hardcore geeks and enthusiasts are the ones who explore new ideas and move their domains forward. Eventually - once their ideas reach mainstream - we all can reap the rewards of the new technologies, frameworks, and methods these people came up with.”
✘ This presents a solid history of what Mac calls the ‘niche internet’ or those places where hyper knowledgeable people share ideas and solutions. He talks about discoverability as the main problem to tackle adding to the notion that it’s not content, but distribution that’s king.
🪮 Notes from Beethoven’s Genome (Laura Wesseldijk et al)
“Above all, it is essential to keep in mind that human traits, including musical skills, are not determined solely by genes or environment, but rather shaped by their complex interplay, and that genetic influences, such as those captured by PGIs [ed. polygenic indices], are probabilistic rather than deterministic causes that shape an individual’s future.”
✘ If you haven’t seen Beethoven’s Hair yet, go look it up. Anyway, new research on this strand of hair shows that Beethoven’s scores pretty low on these PGIs to indicate he wasn’t actually that musical. It’s both interesting and fun to read this short paper, even if you’re not into all the academic language!
MUSIC
First of, World of Work by Clarissa Connelly is a beautiful album, feel free to listen to it and escape in its sonic structures. There is, however, more happening here. So I also encourage you to listen more deeply to this record. What Clarissa has done is to bring piano and guitar together and she focuses on overtones. There’s harmony and then the overtones create something new as time passes. Listen and see if you can catch this in progress.