Audio Emotion Recognition


We have built an emotion recognition system based on prosodic features (i.e. intensity, pitch, formant frequencies of sounds) combined with short-term perceptual features for classifying the following emotions: anger, fear, happy, sad, surprise, and neutral. Additional emotional states can be included. Prosodic information applies to syllables, words, or phrases. An interactive dialog elicits responses from the user. Based on these responses, the emotion would be gauged by the audio emotion recognizer. The classification with Linear Discriminant Analysis (LDA) has demonstrated a high accuracy. A novel minimum-error feature removal mechanism increases accuracy. A two-stage hierarchical classification approach along with a One-Against-All (OAA) framework is used.  We have integrated audio emotion recognition into the Virtual Coach for Stroke Rehabilitation Therapy.  Virtual Coaching feedback includes encouragement, suggesting taking a rest, suggesting a different exercise, and stopping all together.