Publication of the week: A. Al-Talabani, H. Sellahewa & S.A. Jassim
31 August 2015
A. Al-Talabani, H. Sellahewa & S.A. Jassim, “Emotion recognition from speech: Tools and challenges”, Proc. SPIE 9497, Mobile Multimedia/Image Processing, Security, and Applications 2015, 94970N, April 2015. DOI: 10.1117/12.2191623.
Human emotion recognition from speech is studied frequently for its importance in many applications, e.g. human-computer interaction. There is a wide diversity and non-agreement about the basic emotion or emotion-related states on one hand and about where the emotion-related information lies in the speech signal on the other side. These diversities motivated the authors’ investigations into extracting Meta-features using the PCA approach, or using a non-adaptive random projection RP, which significantly reduce the large dimensional speech feature vectors that may contain a wide range of emotion-related information. Subsets of Meta-features are fused to increase the performance of the recognition model that adopts the score-based LDC classifier. The authors demonstrate that their scheme outperforms the state-of-the-art results when tested on non-prompted databases or acted databases (i.e. when subjects act specific emotions while uttering a sentence). However, the huge gap between accuracy rates achieved on the different types of datasets of speech raises questions about the way emotions modulate the speech. In particular the article argues that emotion recognition from speech should not be dealt with as a classification problem. It demonstrates the presence of a spectrum of different emotions in the same speech portion, especially in the non-prompted data sets, which tends to be more “natural” than the acted datasets where the subjects attempt to suppress all but one emotion.
The article is available via the Buckingham E-Archive of Research (BEAR).