Skip to main navigation Skip to search Skip to main content

LFF-POS: A linguistic fusion method to handle out-of-vocabulary words in low-resource part-of-speech tagging

  • Institut Teknologi Sepuluh Nopember
  • Universitas Airlangga
  • La Trobe University

Research output: Contribution to journalArticlepeer-review

Abstract

Accurate part-of-speech (POS) tagging is needed for classroom learning evaluation in order to improve the quality of education. However, accurate POS tagging is hampered by the limited amount of training data and the high proportion of out-of-vocabulary (OOV) tokens. We present LFF-POS, a linguistic feature fusion method that overcomes these limitations for Indonesian. The procedure consists of four sequential steps: (1) tokenizing raw text; (2) extracting three complementary features; (3) merging the resulting vectors; (4) applying self-attention; and (4) training a BiLSTM sequence labeler. By combining the three features, LFF-POS improves tagging accuracy without relying on an external lexicon. Experimental results show that the combined features are able to improve the proposed model's ability to handle OOV words and achieve higher POS Tagging accuracy compared to baseline and existing methods.OOV cannot be recognized by the model, thus reducing the accuracy of the POS Tagging modelThis study aims to overcome OOV by combining linguistic features such as orthography, morphology, and characters to improve word representationThe LFF-POS has been proven to improve POS Tagging performance, especially OOV F1 Score by ±14% over baseline.

Original languageEnglish
Article number103615
JournalMethodsX
Volume15
DOIs
Publication statusPublished - Dec 2025

Keywords

  • Deep learning
  • Low-resource language
  • Morphological-rich language
  • Out-of-vocabulary
  • Part-of-speech tagging
  • Quality of education

Fingerprint

Dive into the research topics of 'LFF-POS: A linguistic fusion method to handle out-of-vocabulary words in low-resource part-of-speech tagging'. Together they form a unique fingerprint.

Cite this