The same kind of machine learning-produced racial biases found in some facial recognition technologies are also in play in speech recognition, according to new research from Stanford Engineering.
In a paper published in the Proceedings of the National Academy of Sciences, the researchers detail findings showing that speech recognition systems are significantly more likely to misunderstand the speech of black speakers in comparison to white speakers. They assessed speech recognition systems from Amazon, Apple, Google, IBM, and Microsoft, and found that all five had error rates for black speakers that were at least twice as high as those for white speakers.
On average, the speech recognition systems misunderstood 35 percent of the words spoken by black people; for white speakers, the error rate was 19 percent. Apple’s system fared the worst, with an error rate of 23 percent for white speakers and 45 percent for black speakers, while Microsoft’s was the best performer, with an error rate of 15 percent for white people and 27 percent for black people.
The researchers speculate that the problem arises from the use of non-representative samples in machine learning – the same issue that causes lower accuracy rates in facial recognition systems when it comes to identifying non-white (and non-male) subjects. Such AI systems are trained to recognize patterns by scanning a plethora of sample faces and speech clips; if the majority of these samples are from one particular ethnic group or gender, the AI systems will become more proficient in reading members of those groups compared to others.
The speech samples used in the Stanford Engineering study came from a database called Voices of California, which features recordings of individuals from that state; and from a database called Corpus of Regional African American Language, which comprises recordings from African-American communities in North Carolina, New York, and Washington, D.C.
With speech recognition increasingly used not only for device interaction but also for applications like automated job screening and transcribing court transcripts, the disparities point to potentially serious real-world outcomes. That having been said, the speech recognition systems were tested in May and June of last year, and it’s possible that the companies have made improvements to their technologies since then.
Sources: Stanford News, The New York Times
–
March 24, 2020 – by Alex Perala
Follow Us