Leveraging Pre-Trained Acoustic Feature Extractor For Affective Vocal Bursts Tasks

Bagus Tris Atmaja*, Akira Sasou*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Understanding humans' emotions is a challenge for computers. Nowadays, research on speech emotion recognition has been conducted progressively. Instead of a speech, affective information may lay on short vocal bursts (i.e., cry when sad). In this study, we evaluated a recent self-supervised learning model to extract acoustic embedding for affective vocal bursts tasks. There are four tasks investigated on both regression and classification problems. Using similar architectures, we found the effectiveness of using a pre-trained model over the baseline methods. The study is further expanded to evaluate the different number of seeds, patiences, and batch sizes on the performance of the four tasks.

Original languageEnglish
Title of host publicationProceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1412-1417
Number of pages6
ISBN (Electronic)9786165904773
DOIs
Publication statusPublished - 2022
Event2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022 - Chiang Mai, Thailand
Duration: 7 Nov 202210 Nov 2022

Publication series

NameProceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022

Conference

Conference2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
Country/TerritoryThailand
CityChiang Mai
Period7/11/2210/11/22

Keywords

  • affective computing
  • affective vocal bursts
  • pre-trained model
  • speech emotion recognition
  • wav2vec 2.0

Fingerprint

Dive into the research topics of 'Leveraging Pre-Trained Acoustic Feature Extractor For Affective Vocal Bursts Tasks'. Together they form a unique fingerprint.

Cite this