CMU logo
Expand Menu
Close Menu

Sound, Vibration Recognition Boost Context-Aware Computing


sensors in the kitchen recognize the sound of someone chopping vegetables and displays "chopping" on laptop screen

New methods help smart devices detect what’s happening around them


Smart devices can seem dumb if they don’t understand where they are or what people around them are doing. Carnegie Mellon University researchers say this environmental awareness can be enhanced by complementary methods for analyzing sound and vibrations.

“A smart speaker sitting on a kitchen countertop cannot figure out if it is in a kitchen, let alone know what a person is doing in a kitchen,” said Chris Harrison, assistant professor in CMU’s Human-Computer Interaction Institute (HCII). “But if these devices understood what was happening around them, they could be much more helpful.”

Harrison and colleagues in the Future Interfaces Group will report today at the Association for Computing Machinery’s User Interface Software and Technology Symposium in Berlin about two approaches to this problem — one that uses the most ubiquitous of sensors, the microphone, and another that employs a modern-day version of eavesdropping technology used by the KGB in the 1950s.

In the first case, the researchers have sought to develop a sound-based activity recognition system, called Ubicoustics. This system would use the existing microphones in smart speakers, smartphones and smartwatches, enabling them to recognize sounds associated with places, such as bedrooms, kitchens, workshops, entrances and offices.

“The main idea here is to leverage the professional sound-effect libraries typically used in the entertainment industry,” said Gierad Laput, a Ph.D. student in HCII. “They are clean, properly labeled, well-segmented and diverse. Plus, we can transform and project them into hundreds of different variations, creating volumes of data perfect for training deep-learning models.

“This system could be deployed to an existing device as a software update and work immediately,” he added.

The plug-and-play system could work in any environment. It could alert the user when someone knocks on the front door, for instance, or move to the next step in a recipe when it detects an activity, such as running a blender or chopping.

The researchers, including Karan Ahuja, a Ph.D. student in HCII, and Mayank Goel, assistant professor in the Institute for Software Research, began with an existing model for labeling sounds and tuned it using sound effects from the professional libraries, such as kitchen appliances, power tools, hair dryers, keyboards and other context-specific sounds. They then synthetically altered the sounds to create hundreds of variations.

Laput said recognizing sounds and placing them in the correct context is challenging, in part because multiple sounds are often present and can interfere with each other. In their tests, Ubicoustics had an accuracy of about 80 percent — competitive with human accuracy, but not yet good enough to support user applications. Better microphones, higher sampling rates and different model architectures all might increase accuracy with further research.

A video explaining Ubicoustics is available below.