Abstract
Research on Lip Reading using Deep Learning Spatio Temporal Convolutions and Long Short-Term Memory is a study that aims to develop a computer-based speech recognition system that can decipher words through lip movement analysis. Traditional methods in speech recognition, such as the use of microphones, are sometimes ineffective in noisy environments or when speech is inaudible. Therefore, the use of lip movement imagery as an alternative feature offers the potential to improve speech recognition in various situations. The solution concept in this research uses a Deep Learning approach by combining Spatio Temporal Convolutions and Long Short-Term Memory. Spatio Temporal Convolutions (STCNN) is used to extract spatial and temporal features from lip movement images, while Long Short-Term Memory (LSTM) is used to model the temporal relationship between these features. By combining these two methods, the system can learn temporal and spatial patterns in lip movements to recognize and interpret speech more accurately. This research has provided promising results in lip motion-based speech recognition using Deep Learning STCNN and LSTM methods. In the trials conducted, the system was able to recognize and understand words with a high degree of accuracy. The application of STCNN and LSTM allows the system to capture important spatial and temporal information in lip movements, which is key to distinguishing between different phonemes. With accurate lip movement-based speech recognition, the system has the potential to be used in various applications, such as speech recognition in noisy environments, automatic transcription, or human and machine interaction through lip movements.
Original language | English |
---|---|
Title of host publication | 2024 International Seminar on Intelligent Technology and Its Applications |
Subtitle of host publication | Collaborative Innovation: A Bridging from Academia to Industry towards Sustainable Strategic Partnership, ISITIA 2024 - Proceeding |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 734-739 |
Number of pages | 6 |
Edition | 2024 |
ISBN (Electronic) | 9798350378573 |
DOIs | |
Publication status | Published - 2024 |
Event | 25th International Seminar on Intelligent Technology and Its Applications, ISITIA 2024 - Hybrid, Mataram, Indonesia Duration: 10 Jul 2024 → 12 Jul 2024 |
Conference
Conference | 25th International Seminar on Intelligent Technology and Its Applications, ISITIA 2024 |
---|---|
Country/Territory | Indonesia |
City | Hybrid, Mataram |
Period | 10/07/24 → 12/07/24 |
Keywords
- deep learning
- lip reading
- short-term memory (lstm)
- spatial-temporal convolution (stcnn)