TY - GEN
T1 - Binary Classification on Imbalanced Data
T2 - 1st International Conference on Advanced Technology and Multidiscipline: Advanced Technology and Multidisciplinary Prospective Towards Bright Future, ICATAM 2021
AU - Ningrum, Ratih A.
AU - Fahmiyah, Indah
AU - Syahputra, M. A.
AU - Levi, Aretha
AU - Firdausanti, Neni Alya
AU - Nurlaily, Diana
N1 - Publisher Copyright:
© 2023 American Institute of Physics Inc.. All rights reserved.
PY - 2023/5/19
Y1 - 2023/5/19
N2 - Classification for binary imbalanced class data is still an interesting topic. Especially in the case of classification which is based on the data-driven approach. By this approach, there is often an imbalance in the target class of classification. Therefore, the study of class imbalance is ineluctable. In this study, we classified birth events for the Indonesia Demographic and Health Survey (DHS) 2017 data. We implemented machine learning algorithms, i.e. Logistic Regression (LR) and Support Vector Machine (SVM) classifiers to classify the birth event for women in Indonesia. Several resampling techniques were applied including Undersampling, Oversampling, and Hybrid to rebalance the data distribution. The performance of each technique was evaluated based on several evaluation metrics. We used Accuracy, Sensitivity, F1-Score, Area Under Curve, and Geometric mean to evaluate the classification results. A significant discrepancy in the score of evaluation metrics was found between the methods when the LR and SVM classifiers were employed. Precisely, the evaluation score metrics are high for the balanced data obtained from Undersampling techniques, i.e., Nearmiss-1 for LR classifier and NCL for SVM classifier. The value of Accuracy, Sensitivity, F1-Score, Area Under Curve, and Geometric mean for Nearmiss-1 are 0.9859, 0.9720, 0.9858, 0.9860, 0.9859, respectively. Then for NCL the score of evaluation metrics are 0.9829, 0.9767, 0.9882, 0.9884, 0.9883, respectively. Overall, Undersampling techniques gave higher evaluation score metrics than Oversampling techniques and Hybrid techniques for Indonesia DHS 2017 imbalanced classification.
AB - Classification for binary imbalanced class data is still an interesting topic. Especially in the case of classification which is based on the data-driven approach. By this approach, there is often an imbalance in the target class of classification. Therefore, the study of class imbalance is ineluctable. In this study, we classified birth events for the Indonesia Demographic and Health Survey (DHS) 2017 data. We implemented machine learning algorithms, i.e. Logistic Regression (LR) and Support Vector Machine (SVM) classifiers to classify the birth event for women in Indonesia. Several resampling techniques were applied including Undersampling, Oversampling, and Hybrid to rebalance the data distribution. The performance of each technique was evaluated based on several evaluation metrics. We used Accuracy, Sensitivity, F1-Score, Area Under Curve, and Geometric mean to evaluate the classification results. A significant discrepancy in the score of evaluation metrics was found between the methods when the LR and SVM classifiers were employed. Precisely, the evaluation score metrics are high for the balanced data obtained from Undersampling techniques, i.e., Nearmiss-1 for LR classifier and NCL for SVM classifier. The value of Accuracy, Sensitivity, F1-Score, Area Under Curve, and Geometric mean for Nearmiss-1 are 0.9859, 0.9720, 0.9858, 0.9860, 0.9859, respectively. Then for NCL the score of evaluation metrics are 0.9829, 0.9767, 0.9882, 0.9884, 0.9883, respectively. Overall, Undersampling techniques gave higher evaluation score metrics than Oversampling techniques and Hybrid techniques for Indonesia DHS 2017 imbalanced classification.
UR - http://www.scopus.com/inward/record.url?scp=85161374045&partnerID=8YFLogxK
U2 - 10.1063/5.0118994
DO - 10.1063/5.0118994
M3 - Conference contribution
AN - SCOPUS:85161374045
T3 - AIP Conference Proceedings
BT - Proceedings of the International Conference on Advanced Technology and Multidiscipline, ICATAM 2021
A2 - Widiyanti, Prihartini
A2 - Jiwanti, Prastika Krisma
A2 - Prihandana, Gunawan Setia
A2 - Ningrum, Ratih Ardiati
A2 - Prastio, Rizki Putra
A2 - Setiadi, Herlambang
A2 - Rizki, Intan Nurul
PB - American Institute of Physics Inc.
Y2 - 13 October 2021 through 14 October 2021
ER -