TY - GEN
T1 - On the differences between song and speech emotion recognition
T2 - 2020 IEEE Region 10 Conference, TENCON 2020
AU - Atmaja, Bagus Tris
AU - Akagi, Masato
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/11/16
Y1 - 2020/11/16
N2 - In this paper, we argue that singing voice (song) is more emotional than speech. We evaluate different features sets, feature types, and classifiers on both song and speech emotion recognition. Three feature sets: GeMAPS, pyAudioAnalysis, and LibROSA; two feature types, low-level descriptors and high-level statistical functions; and four classifiers: multilayer perceptron, LSTM, GRU, and convolution neural networks; are examined on both songand speech data with the same parameter values. The results show no remarkable difference between song and speech data on using the same method. Comparisons of two results reveal that song is more emotional than speech. In addition, high-level statistical functions of acoustic features gained higher performance than low-level descriptors in this classification task. This result strengthens the previous finding on the regression task which reported the advantage use of high-level features.
AB - In this paper, we argue that singing voice (song) is more emotional than speech. We evaluate different features sets, feature types, and classifiers on both song and speech emotion recognition. Three feature sets: GeMAPS, pyAudioAnalysis, and LibROSA; two feature types, low-level descriptors and high-level statistical functions; and four classifiers: multilayer perceptron, LSTM, GRU, and convolution neural networks; are examined on both songand speech data with the same parameter values. The results show no remarkable difference between song and speech data on using the same method. Comparisons of two results reveal that song is more emotional than speech. In addition, high-level statistical functions of acoustic features gained higher performance than low-level descriptors in this classification task. This result strengthens the previous finding on the regression task which reported the advantage use of high-level features.
KW - Acoustic features
KW - Affective computing
KW - Emotion classifiers
KW - Song emotion recognition
KW - Speech emotion recognition
UR - http://www.scopus.com/inward/record.url?scp=85098995632&partnerID=8YFLogxK
U2 - 10.1109/TENCON50793.2020.9293852
DO - 10.1109/TENCON50793.2020.9293852
M3 - Conference contribution
AN - SCOPUS:85098995632
T3 - IEEE Region 10 Annual International Conference, Proceedings/TENCON
SP - 968
EP - 972
BT - 2020 IEEE Region 10 Conference, TENCON 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 16 November 2020 through 19 November 2020
ER -