On the differences between song and speech emotion recognition: Effect of feature sets, feature types, and classifiers

Bagus Tris Atmaja, Masato Akagi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Citations (Scopus)

Abstract

In this paper, we argue that singing voice (song) is more emotional than speech. We evaluate different features sets, feature types, and classifiers on both song and speech emotion recognition. Three feature sets: GeMAPS, pyAudioAnalysis, and LibROSA; two feature types, low-level descriptors and high-level statistical functions; and four classifiers: multilayer perceptron, LSTM, GRU, and convolution neural networks; are examined on both songand speech data with the same parameter values. The results show no remarkable difference between song and speech data on using the same method. Comparisons of two results reveal that song is more emotional than speech. In addition, high-level statistical functions of acoustic features gained higher performance than low-level descriptors in this classification task. This result strengthens the previous finding on the regression task which reported the advantage use of high-level features.

Original languageEnglish
Title of host publication2020 IEEE Region 10 Conference, TENCON 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages968-972
Number of pages5
ISBN (Electronic)9781728184555
DOIs
Publication statusPublished - 16 Nov 2020
Event2020 IEEE Region 10 Conference, TENCON 2020 - Virtual, Osaka, Japan
Duration: 16 Nov 202019 Nov 2020

Publication series

NameIEEE Region 10 Annual International Conference, Proceedings/TENCON
Volume2020-November
ISSN (Print)2159-3442
ISSN (Electronic)2159-3450

Conference

Conference2020 IEEE Region 10 Conference, TENCON 2020
Country/TerritoryJapan
CityVirtual, Osaka
Period16/11/2019/11/20

Keywords

  • Acoustic features
  • Affective computing
  • Emotion classifiers
  • Song emotion recognition
  • Speech emotion recognition

Fingerprint

Dive into the research topics of 'On the differences between song and speech emotion recognition: Effect of feature sets, feature types, and classifiers'. Together they form a unique fingerprint.

Cite this