TY - JOUR
T1 - Feature-based POS tagging and sentence relevance for news multi-document summarization in Bahasa Indonesia
AU - Abdullah, Moch Zawaruddin
AU - Fatichah, Chastine
N1 - Publisher Copyright:
© 2022, Institute of Advanced Engineering and Science. All rights reserved.
PY - 2022/2
Y1 - 2022/2
N2 - Sentence extraction in news document summarization determines representative sentences primarily by employing the news feature known as news feature score (NeFS). NeFS can achieve meaningful sentences by analyzing the frequency and similarity of phrases while neglecting grammatical information and sentence relevance to the title. The presence of instructive content is indicated by grammatical information carried by part of speech (POS). POS tagging is the process of giving a meaningful tag to each term based on qualified data and even surrounding words. Sentence relevance to the title is intended to determine the sentence's level of connectivity to the title in terms of both word-based and meaning-based similarity, primarily for news documents in Bahasa Indonesia. In this study, we present an alternative sentence weighting method by incorporating news features, POS tagging, and sentence relevance to the title. Sentence extraction based on news features, POS tagging, and sentence relevance is introduced to extract the representative sentences. The experiment results on the 11 groups of Indonesian news documents are compared with the news features scores with the grammatical information approach method (NeFGIS). The proposed method achieved better results. The increasing f-score rate of ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-SU4 sequentially are 1.84%, 3.03%, 3.85%, 2.08%.
AB - Sentence extraction in news document summarization determines representative sentences primarily by employing the news feature known as news feature score (NeFS). NeFS can achieve meaningful sentences by analyzing the frequency and similarity of phrases while neglecting grammatical information and sentence relevance to the title. The presence of instructive content is indicated by grammatical information carried by part of speech (POS). POS tagging is the process of giving a meaningful tag to each term based on qualified data and even surrounding words. Sentence relevance to the title is intended to determine the sentence's level of connectivity to the title in terms of both word-based and meaning-based similarity, primarily for news documents in Bahasa Indonesia. In this study, we present an alternative sentence weighting method by incorporating news features, POS tagging, and sentence relevance to the title. Sentence extraction based on news features, POS tagging, and sentence relevance is introduced to extract the representative sentences. The experiment results on the 11 groups of Indonesian news documents are compared with the news features scores with the grammatical information approach method (NeFGIS). The proposed method achieved better results. The increasing f-score rate of ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-SU4 sequentially are 1.84%, 3.03%, 3.85%, 2.08%.
KW - Indonesian news
KW - Multi-document summarization
KW - News features
KW - Pos tagging
KW - Sentence extraction
KW - Sentence relevance
UR - http://www.scopus.com/inward/record.url?scp=85124610049&partnerID=8YFLogxK
U2 - 10.11591/eei.v11i1.3275
DO - 10.11591/eei.v11i1.3275
M3 - Article
AN - SCOPUS:85124610049
SN - 2089-3191
VL - 11
SP - 541
EP - 549
JO - Bulletin of Electrical Engineering and Informatics
JF - Bulletin of Electrical Engineering and Informatics
IS - 1
ER -