Indonesian Part-of-Speech Tagger: A Comparative Study

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

POS Tagging is one of the essential tasks in Natural Language Processing (NLP). Researchers are competing to find the best model for Indonesian POS Tagging cases. However, they still use one corpus as a reference for model development. This study investigates the comparison of three different corpora and three state-of-the-art models to explore the use of appropriate corpora and models for Indonesian POS Tagging. We divided the corpus into training, validation, and testing datasets. We use the training and validation datasets to tune the model. We use the test dataset to evaluate the performance of the model. The experimental results show that Yunshan and Dinakaramani corpora have outstanding performance in POS Tagging. While the Feedforward and BiLSTM models have equally superior performance, outperforming the other models with the highest value of 96.10%. This experiment proves that both models are stable when applied to different corpus. Further investigation is needed to improve the performance of the models by considering variations in word embedding usage, architecture, and evaluation methods.

Original languageEnglish
Title of host publication2023 10th International Conference on Advanced Informatics
Subtitle of host publicationConcept, Theory and Application, ICAICTA 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350329919
DOIs
Publication statusPublished - 2023
Event10th International Conference on Advanced Informatics: Concept, Theory and Application, ICAICTA 2023 - Lombok, Indonesia
Duration: 7 Oct 20239 Oct 2023

Publication series

Name2023 10th International Conference on Advanced Informatics: Concept, Theory and Application, ICAICTA 2023

Conference

Conference10th International Conference on Advanced Informatics: Concept, Theory and Application, ICAICTA 2023
Country/TerritoryIndonesia
CityLombok
Period7/10/239/10/23

Keywords

  • Indonesia
  • POS Tagging
  • corpus
  • tagset

Fingerprint

Dive into the research topics of 'Indonesian Part-of-Speech Tagger: A Comparative Study'. Together they form a unique fingerprint.

Cite this