Emotion Recognition from Video Frame Sequence using Face Mesh and Pre-Trained Models of Convolutional Neural Network

Derry Pramono Adi, Eko Mulyanto Yuniarno, Diah Puspito Wulandari

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Emotions are a collection of subjective cognitive experiences and psychological and physiological characteristics that express a wide range of feelings, thoughts, and behaviors in human interaction. Emotions can be represented through several means, such as facial expressions, tone of voice, and behavior. Deep Learning (DL) research has focused on incorporating facial expressions. Images with facial expressions are commonly used as data input for the DL model. Unfortunately, most DL models in Facial Emotion Recognition (FER) use static images. This method does not take into consideration all conceivable facial expressions. The static image of facial expressions is insufficient for recognizing emotions, but a sequential image from a video is required. In this study, we extract MediaPipe's face mesh feature, the state-of-the-art multidimensional expression key points embedded in the video image sequence. Furthermore, we feed sequence image data into the pre-trained Convolutional Neural Network (CNN) model. The data we used is from The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) with the emotion classes of 'Anger,' 'Fearful,' 'Happy,' and 'Sad.' For this specific FER task, we found that the best pre-trained CNN model achieved 92.8% accuracy (using the VGG-19 model), with the fastest runtime of ∼2.3 seconds (achieved using the SqueezeNet model).

Original languageEnglish
Title of host publication2023 International Seminar on Intelligent Technology and Its Applications
Subtitle of host publicationLeveraging Intelligent Systems to Achieve Sustainable Development Goals, ISITIA 2023 - Proceeding
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages353-358
Number of pages6
ISBN (Electronic)9798350313956
DOIs
Publication statusPublished - 2023
Event24th International Seminar on Intelligent Technology and Its Applications, ISITIA 2023 - Hybrid, Surabaya, Indonesia
Duration: 26 Jul 202327 Jul 2023

Publication series

Name2023 International Seminar on Intelligent Technology and Its Applications: Leveraging Intelligent Systems to Achieve Sustainable Development Goals, ISITIA 2023 - Proceeding

Conference

Conference24th International Seminar on Intelligent Technology and Its Applications, ISITIA 2023
Country/TerritoryIndonesia
CityHybrid, Surabaya
Period26/07/2327/07/23

Keywords

  • Convolutional Neural Network
  • Face Mesh
  • Facial Emotion Recognition
  • Video Frame Sequence

Fingerprint

Dive into the research topics of 'Emotion Recognition from Video Frame Sequence using Face Mesh and Pre-Trained Models of Convolutional Neural Network'. Together they form a unique fingerprint.

Cite this