Skip to content
Inclusivity in Speech Recognition Systems: Addressing the Gap for African American English Speakers

Inclusivity in Speech Recognition Systems: Addressing the Gap for African American English Speakers

AI-powered voice technology like Amazon’s Alexa, Siri of Apple, or the popular Google Assistant, have ushered in a new era of efficiency and streamlined daily living. However, when it comes to understanding and rendering speech correctly, these platforms are frequently prone to errors. Users typically modify their normal speech patterns, adopting a slower, louder register known as technology-directed speech for better interaction with these devices.

Existing research on technology-directed speech principally revolves around mainstream U.S. English. However, it doesn't take into account speaker groups who are more regularly misunderstood by this technology. One such commonly misunderstood group is the speakers of African American English (AAE).

Researchers from Google Research, the University of California, Davis, and Stanford University, as per a report published in JASA Express Letters by the Acoustical Society of America, found that speech is often transcribed incorrectly for those who converse in AAE. This could potentially have detrimental effects, including linguistic discrimination through voice technology applications, particularly in essential sectors like healthcare and employment.

During a detailed experiment, the research team explored how AAE speakers adapt their speech when interacting with voice assistants, compared to when they speak with friends, family, or strangers. This investigation included comparing speech rate and pitch variations in human-to-human and voice assistant-directed speech scenarios.

Nineteen adults identifying as Black or African American, who had experienced difficulties with voice technologies, participated in the study. They were required to ask a series of inquiries to a voice assistant, repeat them as if speaking to a familiar person, and once more to a stranger. Each inquiry was recorded, resulting in a total of 153 data points.

Upon analyzing these recordings, it was observed that when speaking to voice-activated technology, the participants consistently altered their speech rate and pitch variation. They typically spoke slower and with less pitch variation (more monotone speech) in such scenarios.

Michelle Cohn, among the team members, noted, "These findings suggest that people have mental models of how to talk to technology. It's like a specific mode that they engage with to enhance comprehension, considering the disparities present in the speech recognition systems."

This issue isn't just limited to AAE speakers. Other groups, like second-language speakers, also face difficulties. The researchers hope that this inquiry will shed light on the barriers encountered by a larger portion of technology users, paving the way for making voice technology more accessible and effective for everyone.

Disclaimer: The above article was written with the assistance of AI. The original sources can be found on ScienceDaily.