Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech

Bagus Tris Atmaja*, Akira Sasou*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Speech emotion recognition has evolved from research to practical applications. Previous studies of emotion recognition from speech have focused on developing models on certain datasets like IEMOCAP. The lack of data in the domain of emotion modeling emerges as a challenge to evaluate models in the other dataset, as well as to evaluate speech emotion recognition models that work in a multilingual setting. This paper proposes an ensemble learning to fuse results of pre-trained models for emotion share recognition from speech. The models were chosen to accommodate multilingual data from English and Spanish. The results show that ensemble learning can improve the performance of the baseline model with a single model and the previous best model from the late fusion. The performance is measured using the Spearman rank correlation coefficient since the task is a regression problem with ranking values. A Spearman rank correlation coefficient of 0.537 is reported for the test set, while for the development set, the score is 0.524. These scores are higher than the previous study of a fusion method from monolingual data, which achieved scores of 0.476 for the test and 0.470 for the development.

Original languageEnglish
Title of host publication2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1026-1029
Number of pages4
ISBN (Electronic)9798350300673
DOIs
Publication statusPublished - 2023
Externally publishedYes
Event2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023 - Taipei, Taiwan, Province of China
Duration: 31 Oct 20233 Nov 2023

Publication series

Name2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023

Conference

Conference2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2023
Country/TerritoryTaiwan, Province of China
CityTaipei
Period31/10/233/11/23

Keywords

  • emotion share
  • ensemble learning
  • multilingual speech processing
  • pre-trained model
  • speech emotion recognition

Fingerprint

Dive into the research topics of 'Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech'. Together they form a unique fingerprint.

Cite this