Text-to-image diffusion models are trained to understand key visual concepts with billions of tagged images. It's a similar (but much more rapid) process to how we learn visual concepts. We know what cats are, because we've seen lots of different cats and understand their common features. And if we want to draw a cat, we use our generic knowledge to create a recognisable picture of a cat, but one that is (very likely to be) slightly different from any other cat picture that we've ever seen.
If the diffusion model is trained specifically on spectrograms, however, rather than pictures of cats, fruit, people etc, then something really interesting happens. Spectrograms are visual representations of sound: plots of frequency vs time, so the machine comes to learn that certain keywords have certain spectrographic similarities. It knows what a generic smooth jazz piece 'looks like' in the same way that it knows what a generic dog looks like. You can then prompt the machine to produce new spectrograms in any style you can think of.
All you have to do then is convert these spectrograms into audio, which is possible thanks to some clever code written by other people, and you end up with some excitingly strange new music, produced by a machine that has absolutely no concept of sound, only about images.
But what if you prompt the spectrogram model for something that isn't audio? How will it interpret visual requests? This album was started - as is so often the case - by accident: a detailed image prompt entered into the wrong model. That first result was sufficiently interesting to inspire me to revisit some of my favourite previously generated images, so the track titles here reflect the prompt used to generate those images and the corresponding audio.
I was initially going to include the original images in the download, but I thought it'd be more interesting to get the listener to imagine them.
Spectrograms by Stable Diffusion with the Riffusion 1.0 model.
Processed into audio with chavinlo's Riffusion Manipulation Tools.
Cover image by Stable Diffusion.