Have you ever gotten frustrated trying to understand a podcast or conversation obscured by background noise? A team of researchers may have found a solution by training an AI model on human perceptions of audio quality.
This new model from The Ohio State University promises to significantly enhance noisy speech by leveraging what we, as people, actually notice and find distracting.
Read on to find out how this technology could lead to improved hearing aids and communication tools.
Most speech enhancement algorithms focus only on objective metrics like intelligibility and signal quality. But our judgments don’t always line up with such technical measurements.
What distinguishes this study from others is that we’re trying to use perception to train the model to remove unwanted sounds,” said Williamson. “If something about the signal in terms of its quality can be perceived by people, then our model can use that as additional information to learn and better remove noise.
The team gathered subjective ratings from listeners evaluating the quality of conversations containing various background noises. Their joint-learning model combines this human data with a specialized language module to generate quality scores closely aligned with human opinions.
Testing showed the new approach outperformed other methods across a variety of objective metrics, leading to clearer speech. Its quality predictions also strongly correlated with what test subjects would say.
This kind of personalized processing could improve technologies like hearing aids that today often struggle with complex audio scenes. As augmented audio devices emerge, perceiving sound through human ears may become central to optimizing our listening experiences.
However, perception varies significantly between individuals based on acuity, environment, and equipment. “What makes noisy audio so difficult to evaluate is that it’s very subjective,” notes Williamson.
Accounting for these differences will be key if enhanced speech systems are to benefit diverse users. The team plans additional human testing to ensure the model generalizes well across a range of hearing capabilities.
As you can see from this [Machine learning improves human speech recognition] research, building perceptually aware AI presents both opportunities and obstacles. With further real-world data, algorithms like this one may soon “tune out the noise” to enhance audio for all.