Lip Reading Using Spatio Temporal Convolutions (STCNN) And Long Short Term Memory (LSTM)

Nur Azizah*, Eko Mulyanto Yuniarno, Mauridhi Hery Purnomo

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Research on Lip Reading using Deep Learning Spatio Temporal Convolutions and Long Short-Term Memory is a study that aims to develop a computer-based speech recognition system that can decipher words through lip movement analysis. Traditional methods in speech recognition, such as the use of microphones, are sometimes ineffective in noisy environments or when speech is inaudible. Therefore, the use of lip movement imagery as an alternative feature offers the potential to improve speech recognition in various situations. The solution concept in this research uses a Deep Learning approach by combining Spatio Temporal Convolutions and Long Short-Term Memory. Spatio Temporal Convolutions (STCNN) is used to extract spatial and temporal features from lip movement images, while Long Short-Term Memory (LSTM) is used to model the temporal relationship between these features. By combining these two methods, the system can learn temporal and spatial patterns in lip movements to recognize and interpret speech more accurately. This research has provided promising results in lip motion-based speech recognition using Deep Learning STCNN and LSTM methods. In the trials conducted, the system was able to recognize and understand words with a high degree of accuracy. The application of STCNN and LSTM allows the system to capture important spatial and temporal information in lip movements, which is key to distinguishing between different phonemes. With accurate lip movement-based speech recognition, the system has the potential to be used in various applications, such as speech recognition in noisy environments, automatic transcription, or human and machine interaction through lip movements.

Original languageEnglish
Title of host publication2024 International Seminar on Intelligent Technology and Its Applications
Subtitle of host publicationCollaborative Innovation: A Bridging from Academia to Industry towards Sustainable Strategic Partnership, ISITIA 2024 - Proceeding
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages734-739
Number of pages6
Edition2024
ISBN (Electronic)9798350378573
DOIs
Publication statusPublished - 2024
Event25th International Seminar on Intelligent Technology and Its Applications, ISITIA 2024 - Hybrid, Mataram, Indonesia
Duration: 10 Jul 202412 Jul 2024

Conference

Conference25th International Seminar on Intelligent Technology and Its Applications, ISITIA 2024
Country/TerritoryIndonesia
CityHybrid, Mataram
Period10/07/2412/07/24

Keywords

  • deep learning
  • lip reading
  • short-term memory (lstm)
  • spatial-temporal convolution (stcnn)

Fingerprint

Dive into the research topics of 'Lip Reading Using Spatio Temporal Convolutions (STCNN) And Long Short Term Memory (LSTM)'. Together they form a unique fingerprint.

Cite this