TY - GEN
T1 - On The Optimal Classifier For Affective Vocal Bursts And Stuttering Predictions Based On Pre-Trained Acoustic Embedding
AU - Atmaja, Bagus Tris
AU - Zanjabila,
AU - Sasou, Akira
N1 - Publisher Copyright:
© 2022 Asia-Pacific of Signal and Information Processing Association (APSIPA).
PY - 2022
Y1 - 2022
N2 - Speech emotion recognition currently gained more interest from researchers due to its potential applications in the market. Instead of a speech, vocal bursts are understudied and may contain richer affective information than speech for recognizing emotion (e.g., laugh for happiness and cry for sadness). On the other side, acoustic features used for the affective vocalization may also be helpful for the stuttering evaluation task. Instead of handcrafted acoustic features, a pre-trained model feature extractor is now attaining more attention due to its competitiveness in modeling universal speech embedding. However, the previous speech embedding evaluations are not well-suited for emotion recognition. In this study, the researchers evaluated acoustic embedding extracted from a model fine-tuned on an affective speech dataset for affective vocalization and stuttering predictions using different classifiers. The methods were evaluated on a baseline classifier from the previous study and five new different classifiers, including an ensemble classifier. The results show improvements over the baseline methods; the ensemble classifier consistently resulted in the optimal performance on new validation sets with balanced and unnormalized data for both affective vocal bursts and stuttering predictions.
AB - Speech emotion recognition currently gained more interest from researchers due to its potential applications in the market. Instead of a speech, vocal bursts are understudied and may contain richer affective information than speech for recognizing emotion (e.g., laugh for happiness and cry for sadness). On the other side, acoustic features used for the affective vocalization may also be helpful for the stuttering evaluation task. Instead of handcrafted acoustic features, a pre-trained model feature extractor is now attaining more attention due to its competitiveness in modeling universal speech embedding. However, the previous speech embedding evaluations are not well-suited for emotion recognition. In this study, the researchers evaluated acoustic embedding extracted from a model fine-tuned on an affective speech dataset for affective vocalization and stuttering predictions using different classifiers. The methods were evaluated on a baseline classifier from the previous study and five new different classifiers, including an ensemble classifier. The results show improvements over the baseline methods; the ensemble classifier consistently resulted in the optimal performance on new validation sets with balanced and unnormalized data for both affective vocal bursts and stuttering predictions.
UR - http://www.scopus.com/inward/record.url?scp=85146294770&partnerID=8YFLogxK
U2 - 10.23919/APSIPAASC55919.2022.9980310
DO - 10.23919/APSIPAASC55919.2022.9980310
M3 - Conference contribution
AN - SCOPUS:85146294770
T3 - Proceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
SP - 1690
EP - 1695
BT - Proceedings of 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2022
Y2 - 7 November 2022 through 10 November 2022
ER -