Abstract
This paper discusses the development of a translation engine for the Sasak language, a low-resource language with various dialects such as Kuto-Kete, Ngento-Ngente, Meno-Mene, Ngeno-Ngene, and Meriak-Meriku. Currently, the development of translation machines fails to preserve Sasak dialects, leading to outputs that lack fluency. Preserving the uniqueness of the Sasak dialects creates its own challenges in translation due to the diversity of dialects, thus requiring complex dataset variations. Sasak was chosen as the study language due to the significant dialect variation on a relatively small island and its potential as an example for similar issues in Indonesia. Translation machines that use Transformer and sequence-to-sequence models to address language translation challenges are appropriate and widely used solutions, but this can lead to inconsistency in the output dialects. Therefore, a method is needed that can maintain dialect consistency in the translation process. This study involves the creation of a transformer model for translating English into Sasak, with the addition of a lock tokenization method aimed at preserving the characteristics of the dialect in the regional language used as the output by the translation machine. This process includes the collection and creation of a dataset that reflects the variations of the Sasak dialects, as well as the development of an algorithm that can recognize and maintain the unique linguistic features of each dialect. This study successfully recorded 105,327 total pairs of English-Indonesian words, achieving a total average validation accuracy (val-accuracy) of 0.8562 in English-Indonesian translation cases and 0.8408 in Sasak translation cases. These findings show that the use of lock tokenization can improve translation accuracy and contextual relevance, making a significant contribution to the development of translation machines capable of handling languages with multiple dialects.
Original language | English |
---|---|
Title of host publication | 2024 International Seminar on Intelligent Technology and Its Applications |
Subtitle of host publication | Collaborative Innovation: A Bridging from Academia to Industry towards Sustainable Strategic Partnership, ISITIA 2024 - Proceeding |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 19-24 |
Number of pages | 6 |
Edition | 2024 |
ISBN (Electronic) | 9798350378573 |
DOIs | |
Publication status | Published - 2024 |
Event | 25th International Seminar on Intelligent Technology and Its Applications, ISITIA 2024 - Hybrid, Mataram, Indonesia Duration: 10 Jul 2024 → 12 Jul 2024 |
Conference
Conference | 25th International Seminar on Intelligent Technology and Its Applications, ISITIA 2024 |
---|---|
Country/Territory | Indonesia |
City | Hybrid, Mataram |
Period | 10/07/24 → 12/07/24 |
Keywords
- linguistic diversity
- low-resource language
- semantic adaptation
- sequence-to-sequence model
- transformer model
- translation engine