
Everything you need to know:
✓ Stable Audio 2.0 now allows users to generate full-length, coherent musical tracks up to 3 minutes long.
✓ The new audio-to-audio feature enables transforming uploaded audio samples through natural language prompts.
✓ Expanded sound-effect generation provides artists and musicians with greater flexibility and control.
Stable Audio 2.0. got 10x better, but still has room to grow.
The most notable addition is the ability to generate high-quality, full-length musical tracks up to three minutes long. This is a significant leap from the 90-second limit of the previous version, allowing users to explore more complex and cohesive musical structures.
But that’s not all. Stable Audio 2.0 has also introduced a game-changing audio-to-audio transformation capability, enabling users to upload their own audio samples and manipulate them through natural language prompts. This opens up a world of creative possibilities, as musicians and artists can now take their own sounds and seamlessly integrate them into AI-generated compositions.
Ezra Sandzer-Bell from Audiocipher, a respected voice in the AI music space, has been closely examining the new Stable Audio 2.0 features. In his review, he highlights the potential of the audio-to-audio function, noting that it “appears to be applying timbre transfer than the style transfer offered by MusicGen‘s melody mode.”
While the output may not yet rival the full instrumental arrangements of MusicGen, Ezra acknowledges that “Stable Audio 2.0 has a particularly beautiful interface” and calls it “the more consumer-friendly service.”
Stable Audio uses the Audiosparx dataset
One of the key advantages of Stable Audio 2.0 is its expanded sound-effect generation, which provides artists and musicians with greater flexibility and control in shaping their sonic landscapes. This feature, combined with the extended song length and audio-to-audio capabilities, opens up a world of creative possibilities for users.
Ezra’s review also delves into the importance of utilizing the AudioSparx dataset, which was exclusively used to train the Stable Audio 2.0 model. By understanding the terminology and subgenres within this extensive library, users can craft more targeted and effective prompts, resulting in more compelling and unique musical outputs.
As Stability research scientist Jordi Pons explains, one of the keys to unlocking the full potential of Stable Audio 2.0 lies in effective text prompting techniques. Pons suggests starting with simple descriptors like genre, instrument, mood, and tempo, then building upon those foundations by incorporating non-musical elements to shape the overall sonic experience.
He also suggests going beyond just listing musical elements in your prompts. Instead of focusing solely on genre, instruments, or tempo, Pons recommends incorporating vivid, sensory-based descriptions. Things like evoking a certain atmosphere or feeling can actually influence the final sound and emotional quality of the AI-generated music. Blending those kinds of non-musical details with the musical parameters can result in more nuanced, compelling compositions.
The review also touches on the “genre fusion” technique pioneered by CJ Carr of Dadabots, which combines unlikely musical styles to create entirely new genres. This innovative approach exemplifies the transformative potential of Stable Audio 2.0, as it empowers users to push the boundaries of what’s possible in AI-generated music.
Stability AI is clearly dedicated to pushing the boundaries of AI-powered music creation. Stable Audio 2.0 is a prime example of this, showcasing the company’s commitment to innovation and user-focused design. With its impressive array of features and sleek, intuitive interface, this latest version of the tool is poised to become a go-to resource for musicians, artists, and all kinds of creators. Anyone eager to harness the power of AI to fuel their own creative pursuits will find a lot to love in Stable Audio 2.0.