TY - GEN
T1 - Homonym and Polysemy Approaches in Term Weighting for Indonesian-English Machine Translation
AU - Abdullah, Rachmad
AU - Sarno, Riyanarto
AU - Purwitasari, Diana
AU - Akhsani, Alifa Izzan
AU - Suhariyanto,
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Many previous studies have discussed extracting ambiguous sentences in the Indonesian language to increase text extraction performance, especially for improving the accuracy of Indonesian-English Machine Translation (MT). One of the factors that can influence the presence of ambiguous sentences is the word features of homonym and polysemy; however, not many extraction methods have been developed to solve these issues. This research proposes word feature extraction of homonyms and polysemy in Indonesian to improve Indonesian-English MT accuracy. First, POS tagging is used to obtain the word types in the sentences. To measure the word similarity for extracting homonyms and polysemy within uni-gram and bi-gram features, Word2vec and Synonym-based term expansion are used. The terms indicated as homonym and polysemy are compiled in dictionaries 1 and 2. Then, Semantic similarity is used to calculate word similarity for extracting the highest similarity value as the updated terms between existing terms in the sentence and extracted synonym terms. Neural Machine Translation (NMT) is used for the translation process, which has two translation steps as the proposed MT: NMT and NMT with proposed word feature extraction. The proposed MT modifies the translation result to obtain a more accurate translation result according to the two translation results. Finally, the proposed MT performance is evaluated. The evaluation results based on precision, recall, f-1 measure, and accuracy, respectively, are 0.7791, 0.8428, 0.8097, and 0.7975.
AB - Many previous studies have discussed extracting ambiguous sentences in the Indonesian language to increase text extraction performance, especially for improving the accuracy of Indonesian-English Machine Translation (MT). One of the factors that can influence the presence of ambiguous sentences is the word features of homonym and polysemy; however, not many extraction methods have been developed to solve these issues. This research proposes word feature extraction of homonyms and polysemy in Indonesian to improve Indonesian-English MT accuracy. First, POS tagging is used to obtain the word types in the sentences. To measure the word similarity for extracting homonyms and polysemy within uni-gram and bi-gram features, Word2vec and Synonym-based term expansion are used. The terms indicated as homonym and polysemy are compiled in dictionaries 1 and 2. Then, Semantic similarity is used to calculate word similarity for extracting the highest similarity value as the updated terms between existing terms in the sentence and extracted synonym terms. Neural Machine Translation (NMT) is used for the translation process, which has two translation steps as the proposed MT: NMT and NMT with proposed word feature extraction. The proposed MT modifies the translation result to obtain a more accurate translation result according to the two translation results. Finally, the proposed MT performance is evaluated. The evaluation results based on precision, recall, f-1 measure, and accuracy, respectively, are 0.7791, 0.8428, 0.8097, and 0.7975.
KW - Machine Translation
KW - Semantic similarity
KW - Synonym-based term expansion
KW - Word features extraction
KW - Word2vec
UR - http://www.scopus.com/inward/record.url?scp=85180372651&partnerID=8YFLogxK
U2 - 10.1109/ICTS58770.2023.10330875
DO - 10.1109/ICTS58770.2023.10330875
M3 - Conference contribution
AN - SCOPUS:85180372651
T3 - 2023 14th International Conference on Information and Communication Technology and System, ICTS 2023
SP - 232
EP - 237
BT - 2023 14th International Conference on Information and Communication Technology and System, ICTS 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th International Conference on Information and Communication Technology and System, ICTS 2023
Y2 - 4 October 2023 through 5 October 2023
ER -