Feature-based POS tagging and sentence relevance for news multi-document summarization in Bahasa Indonesia

Moch Zawaruddin Abdullah*, Chastine Fatichah

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Sentence extraction in news document summarization determines representative sentences primarily by employing the news feature known as news feature score (NeFS). NeFS can achieve meaningful sentences by analyzing the frequency and similarity of phrases while neglecting grammatical information and sentence relevance to the title. The presence of instructive content is indicated by grammatical information carried by part of speech (POS). POS tagging is the process of giving a meaningful tag to each term based on qualified data and even surrounding words. Sentence relevance to the title is intended to determine the sentence's level of connectivity to the title in terms of both word-based and meaning-based similarity, primarily for news documents in Bahasa Indonesia. In this study, we present an alternative sentence weighting method by incorporating news features, POS tagging, and sentence relevance to the title. Sentence extraction based on news features, POS tagging, and sentence relevance is introduced to extract the representative sentences. The experiment results on the 11 groups of Indonesian news documents are compared with the news features scores with the grammatical information approach method (NeFGIS). The proposed method achieved better results. The increasing f-score rate of ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-SU4 sequentially are 1.84%, 3.03%, 3.85%, 2.08%.

Original languageEnglish
Pages (from-to)541-549
Number of pages9
JournalBulletin of Electrical Engineering and Informatics
Volume11
Issue number1
DOIs
Publication statusPublished - Feb 2022

Keywords

  • Indonesian news
  • Multi-document summarization
  • News features
  • Pos tagging
  • Sentence extraction
  • Sentence relevance

Fingerprint

Dive into the research topics of 'Feature-based POS tagging and sentence relevance for news multi-document summarization in Bahasa Indonesia'. Together they form a unique fingerprint.

Cite this