Adapting Google Translate using Dictionary and Word Embedding for Arabic-Indonesian Cross-lingual Information Retrieval

Maryamah Maryamah, Agus Zainal Arifin, Riyanarto Sarno, Ahmad Makki Hasan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

The translation has an essential role in Cross-lingual Information Retrieval. Translation using a dictionary is reliable even though it has a limited vocabulary. Translation using google translate, in some cases, using different words used in document target words. The translation process causes word translation to be less accurate to get relevant documents. In this paper, we proposed a new translation approach by adapting google translate using a dictionary and word embedding in Arabic-Indonesian Cross-lingual Information Retrieval. The dictionary is the primary resource used for translation improved by Levenshtein distance and FastText for finding the correct word translation. Google translate is used to complete translation when the word does not exist in the dictionary resource. The proposed method archive a BLEU score of 0.47. This score is higher than the other comparison resource score. The proposed method successfully improves the translated query to retrieve more relevant documents in cross-lingual information retrieval based on this implementation.

Original languageEnglish
Title of host publicationIoTaIS 2020 - Proceedings
Subtitle of host publication2020 IEEE International Conference on Internet of Things and Intelligence Systems
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages205-209
Number of pages5
ISBN (Electronic)9781728194486
DOIs
Publication statusPublished - 27 Jan 2021
Event2020 IEEE International Conference on Internet of Things and Intelligence Systems, IoTaIS 2020 - Virtual, Bali, Indonesia
Duration: 27 Jan 202128 Jan 2021

Publication series

NameIoTaIS 2020 - Proceedings: 2020 IEEE International Conference on Internet of Things and Intelligence Systems

Conference

Conference2020 IEEE International Conference on Internet of Things and Intelligence Systems, IoTaIS 2020
Country/TerritoryIndonesia
CityVirtual, Bali
Period27/01/2128/01/21

Keywords

  • Cross-lingual information retrieval
  • Dictionary
  • FastText
  • Google Translate
  • Levenshtein distance

Fingerprint

Dive into the research topics of 'Adapting Google Translate using Dictionary and Word Embedding for Arabic-Indonesian Cross-lingual Information Retrieval'. Together they form a unique fingerprint.

Cite this