Skip to content
"AI Headphones Enable Users to Listen to a Single Speaker in a Crowd"

"AI Headphones Enable Users to Listen to a Single Speaker in a Crowd"

"

Engineers at the University of Washington have created an artificial intelligence system that enables the wearer of headphones to single out a speaker in a noisy environment. This innovative system, known as 'Target Speech Hearing', allows a user to 'enroll' a person by simply gazing at them for three to five seconds. After the enrollment process, the system then focuses solely on the speaker's voice, regardless of the noisy surroundings or even if the listener is no longer facing the speaker.

Noise-cancelling headphones have increasingly become adept at generating an audio vacuum, but allowing particular sounds through this erasure remains a hurdle. For example, Apple's AirPods Pro adjusts audio levels when it senses conversation, but the wearer has little say over when or to whom they should listen.

'Target Speech Hearing' provides a solution. A wearer can 'enroll' a speaker by looking at them for three to five seconds. The AI system subsequently eliminates all other noise and plays only the speaker's voice clearly. Thus, even in a noisy environment, the headphone wearer can focus primarily on the chosen speaker's voice. The team revealed these findings at the ACM CHI Conference on Human Factors in Computing Systems on May 14.

To setup the system, a user wearing standard headphones, fitted with microphones, activates a button while aiming their head towards the speaker. The speaker's voice (sound waves) should reach the microphones simultaneously on both sides of the headset, with a 16-degree margin of error. Then, the signal is sent to an embedded computer system equipped with the team's machine learning algorithm to learn the desired speaker's vocal patterns. The system clings onto the speaker's voice and continues to play it back to the listener even when they move around.

Upon testing the system with 21 subjects, the researchers found that the clarity of the 'enrolled' speaker's voice was almost twice as high as the unadulterated audio on average. However, the 'Target Speech Hearing' system has its limitations as currently it can only enroll one speaker at a time, and it can only enroll a speaker when there isn't another loud voice originating from the same direction.

The team is looking forward to extending this technology to earbuds and hearing aids in the future. The research associated with the 'Target Speech Hearing' mechanism was funded by a Moore Inventor Fellow award, a Thomas J. Cabel Endowed Professorship, and a UW CoMotion Innovation Gap Fund. More insights from this research are available in the ACM CHI Conference on Human Factors in Computing Systems document.

Disclaimer: The above article was written with the assistance of AI. The original sources can be found on ScienceDaily.

"