Abstract

Many previous studies have discussed extracting ambiguous sentences in the Indonesian language to increase text extraction performance, especially for improving the accuracy of Indonesian-English Machine Translation (MT). One of the factors that can influence the presence of ambiguous sentences is the word features of homonym and polysemy; however, not many extraction methods have been developed to solve these issues. This research proposes word feature extraction of homonyms and polysemy in Indonesian to improve Indonesian-English MT accuracy. First, POS tagging is used to obtain the word types in the sentences. To measure the word similarity for extracting homonyms and polysemy within uni-gram and bi-gram features, Word2vec and Synonym-based term expansion are used. The terms indicated as homonym and polysemy are compiled in dictionaries 1 and 2. Then, Semantic similarity is used to calculate word similarity for extracting the highest similarity value as the updated terms between existing terms in the sentence and extracted synonym terms. Neural Machine Translation (NMT) is used for the translation process, which has two translation steps as the proposed MT: NMT and NMT with proposed word feature extraction. The proposed MT modifies the translation result to obtain a more accurate translation result according to the two translation results. Finally, the proposed MT performance is evaluated. The evaluation results based on precision, recall, f-1 measure, and accuracy, respectively, are 0.7791, 0.8428, 0.8097, and 0.7975.

Original languageEnglish
Title of host publication2023 14th International Conference on Information and Communication Technology and System, ICTS 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages232-237
Number of pages6
ISBN (Electronic)9798350312164
DOIs
Publication statusPublished - 2023
Event14th International Conference on Information and Communication Technology and System, ICTS 2023 - Surabaya, Indonesia
Duration: 4 Oct 20235 Oct 2023

Publication series

Name2023 14th International Conference on Information and Communication Technology and System, ICTS 2023

Conference

Conference14th International Conference on Information and Communication Technology and System, ICTS 2023
Country/TerritoryIndonesia
CitySurabaya
Period4/10/235/10/23

Keywords

  • Machine Translation
  • Semantic similarity
  • Synonym-based term expansion
  • Word features extraction
  • Word2vec

Fingerprint

Dive into the research topics of 'Homonym and Polysemy Approaches in Term Weighting for Indonesian-English Machine Translation'. Together they form a unique fingerprint.

Cite this