TY - GEN
T1 - Exploring the Impact of Spatio-Temporal Patterns in Audio Spectrograms on Emotion Recognition
AU - Hidayati, Shintami Chusnul
AU - Satria Adidarma, Adam
AU - Sungkono, Kelly Rossa
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Speech emotion recognition plays a vital role in enhancing human-computer interaction and improving user experience in various applications. This paper investigates the utilization of spatio-temporal patterns in speech emotion recognition, contrasting them with conventional methods that rely solely on spatial or temporal information. The approach involves a parallel architecture, coupling Convolutional Neural Networks (CNNs) with Transformers as an encoder block network. This design combines the spatial feature extraction capabilities of CNNs with the temporal modeling strengths of Transformers, enabling the capture of intricate patterns and contextual relationships within speech data. We present a comprehensive experimental analysis conducted on three benchmark datasets, shedding light on the impact of the utilization of spatio-temporal patterns in advancing the field of speech emotion recognition.
AB - Speech emotion recognition plays a vital role in enhancing human-computer interaction and improving user experience in various applications. This paper investigates the utilization of spatio-temporal patterns in speech emotion recognition, contrasting them with conventional methods that rely solely on spatial or temporal information. The approach involves a parallel architecture, coupling Convolutional Neural Networks (CNNs) with Transformers as an encoder block network. This design combines the spatial feature extraction capabilities of CNNs with the temporal modeling strengths of Transformers, enabling the capture of intricate patterns and contextual relationships within speech data. We present a comprehensive experimental analysis conducted on three benchmark datasets, shedding light on the impact of the utilization of spatio-temporal patterns in advancing the field of speech emotion recognition.
KW - audio signal processing
KW - automation
KW - spatio-temporal pattern
KW - speech emotion recognition
KW - technology
UR - http://www.scopus.com/inward/record.url?scp=85186493969&partnerID=8YFLogxK
U2 - 10.1109/ICAMIMIA60881.2023.10427930
DO - 10.1109/ICAMIMIA60881.2023.10427930
M3 - Conference contribution
AN - SCOPUS:85186493969
T3 - 2023 International Conference on Advanced Mechatronics, Intelligent Manufacture and Industrial Automation, ICAMIMIA 2023 - Proceedings
SP - 200
EP - 205
BT - 2023 International Conference on Advanced Mechatronics, Intelligent Manufacture and Industrial Automation, ICAMIMIA 2023 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 International Conference on Advanced Mechatronics, Intelligent Manufacture and Industrial Automation, ICAMIMIA 2023
Y2 - 14 November 2023 through 15 November 2023
ER -