Sentiment Analysis and Emotion Recognition from Speech Using Universal Speech Representations

Bagus Tris Atmaja*, Akira Sasou

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

20 Citations (Scopus)

Abstract

The study of understanding sentiment and emotion in speech is a challenging task in human multimodal language. However, in certain cases, such as telephone calls, only audio data can be obtained. In this study, we independently evaluated sentiment analysis and emotion recognition from speech using recent self-supervised learning models—specifically, universal speech representations with speaker-aware pre-training models. Three different sizes of universal models were evaluated for three sentiment tasks and an emotion task. The evaluation revealed that the best results were obtained with two classes of sentiment analysis, based on both weighted and unweighted accuracy scores (81% and 73%). This binary classification with unimodal acoustic analysis also performed competitively compared to previous methods which used multimodal fusion. The models failed to make accurate predictionsin an emotion recognition task and in sentiment analysis tasks with higher numbers of classes. The unbalanced property of the datasets may also have contributed to the performance degradations observed in the six-class emotion, three-class sentiment, and seven-class sentiment tasks.

Original languageEnglish
Article number6369
JournalSensors
Volume22
Issue number17
DOIs
Publication statusPublished - Sept 2022
Externally publishedYes

Keywords

  • affective computing
  • sentiment analysis
  • sentiment analysis and emotion recognition
  • speech emotion recognition
  • universal speech representation

Fingerprint

Dive into the research topics of 'Sentiment Analysis and Emotion Recognition from Speech Using Universal Speech Representations'. Together they form a unique fingerprint.

Cite this