Riffusion is a new tool that generates music from text prompts. Using the latest in AI technology, Riffusion allows users to explore the latent space of sound by experimenting with different styles, instruments, modifiers, genres, and sounds.
The technology behind Riffusion is based on the open-source AI model known as Stable Diffusion. The model has been fine-tuned to generate images of spectrograms, which can then be converted into audio clips. The current version (at time of writing) is the v1.5 stable diffusion model, with no modifications other than being fine-tuned on images of spectrograms paired with text. The audio processing happens downstream of the model.
An audio spectrogram is a visual representation of the frequency content of a sound clip. The x-axis represents time, and the y-axis represents frequency. The color of each pixel gives the amplitude of the audio at the frequency and time given by its row and column.
The spectrogram can be computed from audio using the Short-time Fourier transform (STFT), which approximates the audio as a combination of sine waves of varying amplitudes and phases. The STFT is invertible, meaning the original audio can be reconstructed from a spectrogram.
With Riffusion, it is possible to condition the generated audio not only on a text prompt but also on other images. This allows users to modify sounds while preserving the structure of an original clip they like. The denoising strength parameter can be used to control how much the generated audio deviates from the original clip.
Riffusion also allows users to generate infinite AI-generated jams by looping and interpolating between prompts and seeds in the latent space of the model. This preserves the key properties of the clips, and the interpolation allows for a smooth transition between clips.
Overall, Riffusion is an exciting new tool that allows users to experiment and explore the latent space of sound, with endless possibilities for creating unique and original music.
Inference server: https://github.com/riffusion/riffusion
Model checkpoint: https://huggingface.co/riffusion/riffusion-model-v1