OOV Handling Using Partial Lemma-Based Language Model in LF-MMI Based ASR for Bahasa Indonesia

Agung Santosa, Asril Jarin, Eko Mulyanto Yuniarno, Hammam Riza, Mauridhi Hery Purnomo

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

One of the common problems in ASR is the out-of-vocabulary word in an utterance that can degrade the performance of the system. Bahasa Indonesia, as an agglutinative language, uses affixation to generate words from a set of affixes and root words. We propose the use of a partial lemma-based language model (LM) and lexicon that can handle words created from affixation. The partial lemma-based LM and lexicon are created from the original ones using morphology analyzer output as a reference. The experiment shows that using the LM in ASR with LF-MMI cost function gives a better WER when the heuristic to insert inter-word short pause is modified to also consider the affixes.

Original languageEnglish
Title of host publicationProceeding of the International Conference on Computer Engineering, Network and Intelligent Multimedia, CENIM 2022
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages167-171
Number of pages5
ISBN (Electronic)9781665476508
DOIs
Publication statusPublished - 2022
Event2022 International Conference on Computer Engineering, Network and Intelligent Multimedia, CENIM 2022 - Surabaya, Indonesia
Duration: 22 Nov 202223 Nov 2022

Publication series

NameProceeding of the International Conference on Computer Engineering, Network and Intelligent Multimedia, CENIM 2022

Conference

Conference2022 International Conference on Computer Engineering, Network and Intelligent Multimedia, CENIM 2022
Country/TerritoryIndonesia
CitySurabaya
Period22/11/2223/11/22

Keywords

  • ASR
  • Bahasa Indonesia
  • LF-MMI
  • Language Model
  • OOV

Fingerprint

Dive into the research topics of 'OOV Handling Using Partial Lemma-Based Language Model in LF-MMI Based ASR for Bahasa Indonesia'. Together they form a unique fingerprint.

Cite this