TY - JOUR
T1 - A Multi-Class Classification of Dengue Infection Cases with Feature Selection in Imbalanced Clinical Diagnosis Data
AU - Fahmi, Amiq
AU - Muqtadiroh, Feby Artwodini
AU - Purwitasari, Diana
AU - Sumpeno, Surya
AU - Purnomo, Mauridhi Hery
N1 - Publisher Copyright:
© (2023) All Rights Reserved.
PY - 2023
Y1 - 2023
N2 - Dengue infection is a dangerous infectious disease that threatens human health at every age and can be deadly. The imbalance of the dengue infection disease dataset will interfere with the meaning of the final interpretation of the predicted results to be insignificant due to the bias of the minority class classification against the majority. This study aims to improve classification accuracy by resolving multi-class imbalances problems using the proposed new approach, explicitly improving class by giving weights classes to minority and majority classes. Furthermore, resampling problems from imbalanced datasets use the Random resampling and SMOTE techniques. Eight classification algorithms, NN, KNN, Decision Tree, Random Forest, Naïve Bayes, AdaBoost, SVM, and Logistic Regression, were tested on the balanced datasets by applying 10-fold cross-validation and feature selection. The experimental results show that the new proposed approach can improve accuracy higher than the original primary data. The AdaBoost classification algorithm has the highest accuracy compared to other algorithms on dengue infection cases by 87.0%. We then tested the new method in other cases, the hypothyroid disease, to demonstrate its effectiveness and efficiency in increasing accuracy. Thus, our new method can be applied universally in solving classification problems in imbalanced datasets. The results indicate that the AdaBoost classification algorithm improves everlasting outcomes with the highest accuracy by 99.7% in the hypothyroid cases, with an average AUC, F1, precision, and recall towards 99.8%.
AB - Dengue infection is a dangerous infectious disease that threatens human health at every age and can be deadly. The imbalance of the dengue infection disease dataset will interfere with the meaning of the final interpretation of the predicted results to be insignificant due to the bias of the minority class classification against the majority. This study aims to improve classification accuracy by resolving multi-class imbalances problems using the proposed new approach, explicitly improving class by giving weights classes to minority and majority classes. Furthermore, resampling problems from imbalanced datasets use the Random resampling and SMOTE techniques. Eight classification algorithms, NN, KNN, Decision Tree, Random Forest, Naïve Bayes, AdaBoost, SVM, and Logistic Regression, were tested on the balanced datasets by applying 10-fold cross-validation and feature selection. The experimental results show that the new proposed approach can improve accuracy higher than the original primary data. The AdaBoost classification algorithm has the highest accuracy compared to other algorithms on dengue infection cases by 87.0%. We then tested the new method in other cases, the hypothyroid disease, to demonstrate its effectiveness and efficiency in increasing accuracy. Thus, our new method can be applied universally in solving classification problems in imbalanced datasets. The results indicate that the AdaBoost classification algorithm improves everlasting outcomes with the highest accuracy by 99.7% in the hypothyroid cases, with an average AUC, F1, precision, and recall towards 99.8%.
KW - Accuracy
KW - Class weights
KW - Classification
KW - Feature selection
KW - Multi-class imbalanced data
KW - Random resampling
KW - SMOTE
UR - http://www.scopus.com/inward/record.url?scp=85164251628&partnerID=8YFLogxK
U2 - 10.22266/ijies2022.0630.15
DO - 10.22266/ijies2022.0630.15
M3 - Article
AN - SCOPUS:85164251628
SN - 2185-310X
VL - 15
SP - 176
EP - 192
JO - International Journal of Intelligent Engineering and Systems
JF - International Journal of Intelligent Engineering and Systems
IS - 3
ER -