Indonesian news classification using naïve bayes and two-phase feature selection model

M. Ali Fauzi*, Agus Zainal Arifin, Sonny Christiano Gosaria, Isnan Suryo Prabowo

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

17 Citations (Scopus)

Abstract

Since the rise of WWW, information available online is growing rapidly. One of the example is Indonesian online news. Therefore, automatic text classification became very important task for information filtering. One of the major issue in text classification is its high dimensionality of feature space. Most of the features are irrelevant, noisy, and redundant, which may decline the accuracy of the system. Hence, feature selection is needed. Maximal Marginal Relevance for Feature Selection (MMR-FS) has been proven to be a good feature selection for text with many redundant features, but it has high computational complexity. In this paper, we propose a two-phased feature selection method. In the first phase, to lower the complexity of MMR-FS we utilize Information Gain first to reduce features. This reduced feature will be selected using MMR-FS in the second phase. The experiment result showed that our new method can reach the best accuracy by 86%. This new method could lower the complexity of MMR-FS but still retain its accuracy.

Original languageEnglish
Pages (from-to)610-615
Number of pages6
JournalIndonesian Journal of Electrical Engineering and Computer Science
Volume8
Issue number3
DOIs
Publication statusPublished - Dec 2017

Keywords

  • Feature selection
  • Information gain
  • Maximal marginal relevance
  • Naïve bayes
  • News classification

Fingerprint

Dive into the research topics of 'Indonesian news classification using naïve bayes and two-phase feature selection model'. Together they form a unique fingerprint.

Cite this