New AI audio tool Fugatto by NVIDIA does the impossible – and music producers are stunned
Nvidia has unveiled Fugatto, a new AI audio generator capable of creating and transforming any combination of music, voices, and sounds through text and audio inputs.
Key developments:
The tool, formally named Foundational Generative Audio Transformer Opus 1, seems to demonstrate remarkable versatility in audio generation and manipulation
Here’s an overview of the features & capabilities:
| Feature | Capability |
|---|---|
| Sound Generation | Creates unique sounds from text prompts |
| Voice Transformation | Modifies accents and emotional tones |
| Music Editing | Isolates vocals and swaps instruments |
| Audio Control | Offers fine-grained control over generated content |
“This thing is wild,” says Ido Zmishlany, a multi-platinum producer and songwriter.
“The idea that I can create entirely new sounds on the fly in the studio is incredible.”
Technical specifications:
- Uses 2.5 billion parameters
- Trained on millions of audio samples
- Runs on NVIDIA DGX systems with 32 H100 GPUs
Industry impact:
The model’s versatility opens new possibilities for music producers, ad agencies, and game developers. Multi-platinum producer Ido Zmishlany notes, “The idea that I can create entirely new sounds on the fly in the studio is incredible”.
Practical applications:
Music Production:
- Quick prototyping of song ideas
- Testing different styles and instruments
- Enhancing audio quality
Voice Modification:
- Accent changes
- Emotional tone adjustments
- Language learning applications
Future implications:
While Nvidia hasn’t announced a release date, the tool’s development signals a transformative moment for music production. Built on a dataset of millions of audio samples, including the BBC’s sound effects library, Fugatto represents a new frontier in AI-assisted creativity.
For musicians looking to stay ahead of the curve, this tool could become as essential as stem separation tools for music production, offering unprecedented control over sound design and manipulation.
Rafael Valle, manager of applied audio research at Nvidia, emphasizes the human element: “We wanted to create a model that understands and generates sound like humans do“.
This approach ensures that while pushing technological boundaries, Fugatto remains an instrument of human creativity rather than a replacement for it.

