Siri, It Hurts When You Constantly Misunderstand Me. You too, Alexa

Researchers Seek To Reduce Harm to Multicultural Users of Voice Assistants

July 11, 2024

A blue line drawing shows an stick figure with a speech bubble, clearly making a request, while a stick figure operator works a switchboard in the background.

a simple illustration of a stick person with a speech bubble but they are not being heard by another stick person representing Siri who is seated and facing away from the speaker and towards 2 large analog telephone racks — Users of voice assistants know the frustration of being misunderstood by a machine. But for people who lack a standard American accent, such miscommunication can go beyond simply irritating to downright dangerous, according to HCII researchers.

Users of voice assistants such as Siri, Alexa or Google Assistant know the frustration of being misunderstood by a machine.

But for people who may lack a standard American accent, such miscommunication can go beyond simply irritating to downright dangerous, according to researchers in the Human-Computer Interaction Institute (HCII) in Carnegie Mellon University's School of Computer Science.

In a new study, HCII Ph.D. student Kimi Wenzel and Associate Professor Geoff Kaufman identified six downstream harms caused by voice assistant errors and devised strategies to reduce them. Their work won a Best Paper award at the Association for Computing Machinery's Conference on Human Factors in Computing Systems.

"This paper is part of a larger research project in our lab looking at documenting and understanding the impact of biases that are embedded in technology," Kaufman said.

White Americans are overrepresented in most datasets used to train voice assistants, and studies have shown that these assistants are far more likely to misinterpret or misunderstand Black speakers and people with accents or dialects that vary from standard American. Earlier researchers tended to look at this problem as a technical issue to be overcome, as opposed to a failure that has repercussions on the user, Kaufman said. But having your speech misunderstood, whether by a person or a machine, can be experienced as a microaggression.

"It can have effects on self-esteem or your sense of belonging," Kaufman said.

In a controlled experiment last year, Kaufman and Wenzel studied the impact that error rates by a voice assistant had on white and Black volunteers. Black people who experienced high error rates had higher levels of self-consciousness, lower levels of self-esteem and a less favorable view of technology than Black people who experienced low error rates. White people didn't have this reaction, regardless of error rate.

"We hypothesize that because Black people experience miscommunication more frequently, or have more everyday experience with racism, these experiences build up and they suffer more negative effects," Wenzel said.

In the latest study, Wenzel and Kaufman interviewed 16 volunteers who experienced problems with voice assistants. They found six potential harms that can result from seemingly innocuous voice assistant errors. These included emotional harm as well as cultural or identity harm caused by microaggressions. They also included relational harm, which is when an error leads to interpersonal conflict. A voice assistant, for instance, might make a calendar entry with the wrong time for a meeting or misdirect a call. Other harms include paying the same price for a technology as other people even though it doesn't work as well for you, as well as needing to exert extra effort — such as altering an accent — to make the technology work.

A sixth harm is physical endangerment.

"Voice technologies are not only used as a simple voice assistant in your smartphone," Wenzel said. "Increasingly they are being used in more serious contexts, for example in medical transcription."

Voice technologies also are used in conjunction with auto navigation systems, "and that has very high stakes," Wenzel added.

One person interviewed for the study related their own hair-raising experiences with a voice-controlled navigation system: "Oftentimes, I feel like I'm pronouncing things very clearly and loudly, but it still can't understand me. And I don't know what's going on. And I don't know where I'm going. So, it's just this, this frustrating experience and very dangerous and confusing."

The ultimate solution is to eliminate bias in voice technologies, but creating datasets representative of the full range of human variation is a perplexing task, Wenzel said. So she and Kaufman talked to the participants about things voice assistants could say to their users to mitigate those harms.

One communication repair strategy they identified was blame redirection — not a simple apology, but an explanation describing the error that doesn't put the blame on the user.

Wenzel and Kaufmann also suggest that voice technologies be more culturally sensitive. Addressing cultural harms is to some extent limited by technology, but one simple yet profound action would be to increase the database of proper nouns.

"Misrecognition of non-Anglo names has been a persistent harm across many language technologies," the researchers noted in the paper.

A wealth of social psychology research has shown that self-affirmation — a statement of an individual's values or beliefs — can be protective when their identity is threatened, Kaufman said. He and Wenzel are looking for ways that voice assistants can include affirmations in their conversations with users, preferably in a way that isn't obvious to the user. Wenzel is currently testing some of those affirmations in a follow-up study.

In all these conversational interventions, the need for brevity is paramount. People often use voice technologies, after all, in hopes of being more efficient or able to work hands-free. Adding messages into the conversation tends to work against that goal.

"This is a design challenge that we have: how can we emphasize that the blame is on the technology and not on the user at all. How can you make that emphasis as clear as possible in as few words as possible," Wenzel said. "Right now, the technology says 'sorry,' but we think it should be more than that."

For More Information

Aaron Aupperlee | 412-268-9068 | aaupperlee@cmu.edu

Author
Byron Spice

Related People
Kimi Wenzel, Geoff Kaufman

Research Areas

Social Computing