Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding

Bagus Tris Atmaja*, Zanjabila, Akira Sasou

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)

Abstract

In this paper, we demonstrated the benefit of using a pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks: emotion, age, and native country. The pre-trained model was trained with wav2vec 2.0 large and robust model on the speech emotion corpus. The emotion and age tasks were regression problems, while country prediction was a classification task. A single harmonic mean from three metrics was used to evaluate the performance of multitask learning. The classifier was a linear network with two independent layers and shared layers connected to the output layers. This study explores multitask learning on different acoustic features (including the acoustic embedding extracted from a model trained on an affective speech dataset), seed numbers, batch sizes, and waveform normalizations for predicting paralinguistic information from speech.

Original languageEnglish
Title of host publication2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, ACIIW 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665454902
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, ACIIW 2022 - Nara, Japan
Duration: 17 Oct 202221 Oct 2022

Publication series

Name2022 10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, ACIIW 2022

Conference

Conference10th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, ACIIW 2022
Country/TerritoryJapan
CityNara
Period17/10/2221/10/22

Keywords

  • acoustic embedding
  • affective computing
  • age prediction
  • country prediction
  • multitask learning
  • speech emotion recognition

Fingerprint

Dive into the research topics of 'Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding'. Together they form a unique fingerprint.

Cite this