TY - JOUR
T1 - Integration of synthetic minority oversampling technique for imbalanced class
AU - Santoso, Noviyanti
AU - Wibowo, Wahyu
AU - Himawati, Hilda
N1 - Publisher Copyright:
© 2019 Institute of Advanced Engineering and Science. All rights reserved.
PY - 2019/1
Y1 - 2019/1
N2 - In the data mining, a class imbalance is a problematic issue to look for the solutions. It probably because machine learning is constructed by using algorithms with assuming the number of instances in each balanced class, so when using a class imbalance, it is possible that the prediction results are not appropriate. They are solutions offered to solve class imbalance issues, including oversampling, undersampling, and synthetic minority oversampling technique (SMOTE). Both oversampling and undersampling have its disadvantages, so SMOTE is an alternative to overcome it. By integrating SMOTE in the data mining classification method such as Naive Bayes, Support Vector Machine (SVM), and Random Forest (RF) is expected to improve the performance of accuracy. In this research, it was found that the data of SMOTE gave better accuracy than the original data. In addition to the three classification methods used, RF gives the highest average AUC, F-measure, and G-means score.
AB - In the data mining, a class imbalance is a problematic issue to look for the solutions. It probably because machine learning is constructed by using algorithms with assuming the number of instances in each balanced class, so when using a class imbalance, it is possible that the prediction results are not appropriate. They are solutions offered to solve class imbalance issues, including oversampling, undersampling, and synthetic minority oversampling technique (SMOTE). Both oversampling and undersampling have its disadvantages, so SMOTE is an alternative to overcome it. By integrating SMOTE in the data mining classification method such as Naive Bayes, Support Vector Machine (SVM), and Random Forest (RF) is expected to improve the performance of accuracy. In this research, it was found that the data of SMOTE gave better accuracy than the original data. In addition to the three classification methods used, RF gives the highest average AUC, F-measure, and G-means score.
KW - Accuracy
KW - Data mining
KW - Imbalanced class
KW - SMOTE
UR - http://www.scopus.com/inward/record.url?scp=85059207727&partnerID=8YFLogxK
U2 - 10.11591/ijeecs.v13.i1.pp102-108
DO - 10.11591/ijeecs.v13.i1.pp102-108
M3 - Article
AN - SCOPUS:85059207727
SN - 2502-4752
VL - 13
SP - 102
EP - 108
JO - Indonesian Journal of Electrical Engineering and Computer Science
JF - Indonesian Journal of Electrical Engineering and Computer Science
IS - 1
ER -