TY - JOUR
T1 - Indonesian news classification using naïve bayes and two-phase feature selection model
AU - Ali Fauzi, M.
AU - Arifin, Agus Zainal
AU - Gosaria, Sonny Christiano
AU - Prabowo, Isnan Suryo
N1 - Publisher Copyright:
© 2017 Institute of Advanced Engineering and Science. All rights reserved.
PY - 2017/12
Y1 - 2017/12
N2 - Since the rise of WWW, information available online is growing rapidly. One of the example is Indonesian online news. Therefore, automatic text classification became very important task for information filtering. One of the major issue in text classification is its high dimensionality of feature space. Most of the features are irrelevant, noisy, and redundant, which may decline the accuracy of the system. Hence, feature selection is needed. Maximal Marginal Relevance for Feature Selection (MMR-FS) has been proven to be a good feature selection for text with many redundant features, but it has high computational complexity. In this paper, we propose a two-phased feature selection method. In the first phase, to lower the complexity of MMR-FS we utilize Information Gain first to reduce features. This reduced feature will be selected using MMR-FS in the second phase. The experiment result showed that our new method can reach the best accuracy by 86%. This new method could lower the complexity of MMR-FS but still retain its accuracy.
AB - Since the rise of WWW, information available online is growing rapidly. One of the example is Indonesian online news. Therefore, automatic text classification became very important task for information filtering. One of the major issue in text classification is its high dimensionality of feature space. Most of the features are irrelevant, noisy, and redundant, which may decline the accuracy of the system. Hence, feature selection is needed. Maximal Marginal Relevance for Feature Selection (MMR-FS) has been proven to be a good feature selection for text with many redundant features, but it has high computational complexity. In this paper, we propose a two-phased feature selection method. In the first phase, to lower the complexity of MMR-FS we utilize Information Gain first to reduce features. This reduced feature will be selected using MMR-FS in the second phase. The experiment result showed that our new method can reach the best accuracy by 86%. This new method could lower the complexity of MMR-FS but still retain its accuracy.
KW - Feature selection
KW - Information gain
KW - Maximal marginal relevance
KW - Naïve bayes
KW - News classification
UR - http://www.scopus.com/inward/record.url?scp=85037611172&partnerID=8YFLogxK
U2 - 10.11591/ijeecs.v8.i3.pp610-615
DO - 10.11591/ijeecs.v8.i3.pp610-615
M3 - Article
AN - SCOPUS:85037611172
SN - 2502-4752
VL - 8
SP - 610
EP - 615
JO - Indonesian Journal of Electrical Engineering and Computer Science
JF - Indonesian Journal of Electrical Engineering and Computer Science
IS - 3
ER -