Riffusion: we’ve replaced these musicians with an AI model.

“Henceforth, all toys in this company will be designed by computers.”

— Mr. Daggart, Monkee Vs. Machine

Stable Diffusion is an Artificial Intelligence* model that generates images from text. AI art has become hugely controversial — because the models are trained on existing and current artwork, without asking. Some generate artists’ signatures. Some are even trained on stock image sites without removing the “Getty Images” watermark. Actual artists want to fucking kill anyone using this, particularly ones who’ve ever had their stuff stolen.

Personally, I think an AI model is an Akai S-900, and the issues are largely the same as with sampling — where you make bangin’ dance choons by blatantly stealing people’s shit. It’s definitely art, and it’s definitely large scale theft. Though at least the rich guys tried to stop sampling — thus making it much cooler as art — rather than funding it.

But the images are interesting and compelling. Even as viewers rapidly learn to tell there’s something not quite right about them.

Could we generate music with a trained AI model? Of course we could. Enjoy Riffusion, by Seth Forsgren and Hayk Martiros!

Forsgren and Martiros didn’t try to process audio into an AI. Instead, they tried going via pictures — because we already have pretty good picture generators. Music is scanned in as spectrograms, tagged with words. You enter some words into Riffusion, and it takes the words and generates a spectrogram. The spectrogram is then turned into music.

It does surprisingly well. Here’s an example:

Riffusion generates five-second clips. But you can also get it to interpolate between clips (the “latent space” in AI image discourse):

You can try out Riffusion here.

The code is here if you think you’re up to building AI models and want to run it at home. TechCrunch also interviewed the guys.

Riffusion is a long way from being a songwriter-in-a-box. The creators are musicians themselves, and don’t see this replacing them any time soon.

But you can see what’s obviously going to happen next. I can’t wait for Spotify to offer streams of generated nonsense music. And see if the record companies can spot the source tracks it was trained on.

And you know how sometimes the Midjourney AI image generator just completely fails to understand how to draw an arm, let alone how many fingers are on a hand? Can’t wait for the musical equivalent of that either.

“Through? You think you can stand before the march of the machine? You, an indecisive jellyfish? You’ll change your mind.”

* yes, I know this version is properly called “statistics”

Share this:

Leave a Reply