Transformer-CNN Automatic Hyperparameter Tuning for Speech Emotion Recognition

Agustinus Bimo Gumelar, Eko Mulyanto Yuniarno, Derry Pramono Adi, Rudi Setiawan, Indar Sugiarto, Mauridhi Hery Purnomo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

Given the high number of hyperparameters in deep learning models, there is a need to tune automatically deep learning models in specific research cases. Deep learning models require hyperparameters because they substantially influence the model's behavior. As a result, optimizing any given model with a hyperparameter optimization technique will improve model efficiency significantly. This paper discusses the hyperparameter-optimized Speech Emotion Recognition (SER) research case using Transformer-CNN deep learning model. Each speech samples are transformed into spectrogram data using the RAVDESS dataset, which contains 1,536 speech samples (192 samples per eight emotion classes). We use the Gaussian Noise augmentation technique to reduce the overfitting problem in training data. After augmentation, the RAVDESS dataset yields a total of 2,400 emotional speech samples (300 samples per eight emotion classes). For SER model, we combine the Transformer and CNN for temporal and spatial speech feature processing. However, our Transformer-CNN must be thoroughly tested, as different hyperparameter settings result in varying accuracy performance. We experiment with Naive Bayes to optimize many hyperparameters of Transformer-CNN (it could be categorical or numerical), such as learning rate, dropouts, activation function, weight initialization, epoch, even the best split data scale of training and testing. Consequently, our automatically tuned Transformer-CNN achieves 97.3 % of accuracy.

Original languageEnglish
Title of host publicationIST 2022 - IEEE International Conference on Imaging Systems and Techniques, Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781665481021
DOIs
Publication statusPublished - 2022
Event2022 IEEE International Conference on Imaging Systems and Techniques, IST 2022 - Virtual, Online, Taiwan, Province of China
Duration: 21 Jun 202223 Jun 2022

Publication series

NameIST 2022 - IEEE International Conference on Imaging Systems and Techniques, Proceedings

Conference

Conference2022 IEEE International Conference on Imaging Systems and Techniques, IST 2022
Country/TerritoryTaiwan, Province of China
CityVirtual, Online
Period21/06/2223/06/22

Keywords

  • Automatic Hyperparameter Tuning
  • Naive Bayes Optimization
  • Speech Emotion Recognition
  • Transformer-CNN

Fingerprint

Dive into the research topics of 'Transformer-CNN Automatic Hyperparameter Tuning for Speech Emotion Recognition'. Together they form a unique fingerprint.

Cite this