Multilingual, Cross-lingual, and Monolingual Speech Emotion Recognition on EmoFilm Dataset

Bagus Tris Atmaja*, Akira Sasou*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Research on speech emotion recognition has been actively conducted; most are in monolingual settings. Considering that emotion expressed in speech is universal, it is noteworthy to conduct multilingual emotion recognition across different cultures. This paper contributes to evaluating multilingual, cross-lingual, and monolingual automatic speech emotion recognition (SER) on the same EmoFilm dataset. We first evaluated these three scenarios on a fixed training/test split. For multilingual emotion recognition, we then expanded the evaluation with cross-validation. The results show that the multilingual SER gained the highest performance with 74.86% of balanced accuracy for five categorical emotions, followed by 72.17% and 58.03% for monolingual and cross-lingual evaluations. We reduced the number of training samples to observe its impact and found that the monolingual setting is superior among others on the same number of samples. The results of this study could suggest the potential use of multilingual SER over cross-lingual and monolingual SER in future speech technologies.

Original languageEnglish
Title of host publication2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1019-1025
Number of pages7
ISBN (Electronic)9798350300673
DOIs
Publication statusPublished - 2023
Externally publishedYes
Event2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023 - Taipei, Taiwan, Province of China
Duration: 31 Oct 20233 Nov 2023

Publication series

Name2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023

Conference

Conference2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
Country/TerritoryTaiwan, Province of China
CityTaipei
Period31/10/233/11/23

Keywords

  • EmoFilm
  • multilingual emotion recognition
  • pre-trained model
  • speech emotion recognition

Fingerprint

Dive into the research topics of 'Multilingual, Cross-lingual, and Monolingual Speech Emotion Recognition on EmoFilm Dataset'. Together they form a unique fingerprint.

Cite this