TY - JOUR
T1 - Pruning-based oversampling technique with smoothed bootstrap resampling for imbalanced clinical dataset of Covid-19
AU - Wibowo, Prasetyo
AU - Fatichah, Chastine
N1 - Publisher Copyright:
© 2021 The Authors
PY - 2022/10
Y1 - 2022/10
N2 - The Coronavirus Disease (COVID-19) was declared a pandemic disease by the World Health Organization (WHO), and it has not ended so far. Since the infection rate of the COVID-19 increases, the computational approach is needed to predict patients infected with COVID-19 in order to speed up the diagnosis time and minimize human error compared to conventional diagnoses. However, the number of negative data that is higher than positive data can result in a data imbalance situation that affects the classification performance, resulting in a bias in the model evaluation results. This study proposes a new oversampling technique, i.e., TRIM-SBR, to generate the minor class data for diagnosing patients infected with COVID-19. It is still challenging to develop the oversampling technique due to the data's generalization issue. The proposed method is based on pruning by looking for specific minority areas while retaining data generalization, resulting in minority data seeds that serve as benchmarks in creating new synthesized data using bootstrap resampling techniques. Accuracy, Specificity, Sensitivity, F-measure, and AUC are used to evaluate classifier performance in data imbalance cases. The results show that the TRIM-SBR method provides the best performance compared to other oversampling techniques.
AB - The Coronavirus Disease (COVID-19) was declared a pandemic disease by the World Health Organization (WHO), and it has not ended so far. Since the infection rate of the COVID-19 increases, the computational approach is needed to predict patients infected with COVID-19 in order to speed up the diagnosis time and minimize human error compared to conventional diagnoses. However, the number of negative data that is higher than positive data can result in a data imbalance situation that affects the classification performance, resulting in a bias in the model evaluation results. This study proposes a new oversampling technique, i.e., TRIM-SBR, to generate the minor class data for diagnosing patients infected with COVID-19. It is still challenging to develop the oversampling technique due to the data's generalization issue. The proposed method is based on pruning by looking for specific minority areas while retaining data generalization, resulting in minority data seeds that serve as benchmarks in creating new synthesized data using bootstrap resampling techniques. Accuracy, Specificity, Sensitivity, F-measure, and AUC are used to evaluate classifier performance in data imbalance cases. The results show that the TRIM-SBR method provides the best performance compared to other oversampling techniques.
KW - COVID-19
KW - Imbalanced data
KW - Machine learning
KW - Oversampling
KW - Smoothed bootstrap resampling
UR - http://www.scopus.com/inward/record.url?scp=85117768292&partnerID=8YFLogxK
U2 - 10.1016/j.jksuci.2021.09.021
DO - 10.1016/j.jksuci.2021.09.021
M3 - Article
AN - SCOPUS:85117768292
SN - 1319-1578
VL - 34
SP - 7830
EP - 7839
JO - Journal of King Saud University - Computer and Information Sciences
JF - Journal of King Saud University - Computer and Information Sciences
IS - 9
ER -