TY - GEN
T1 - Synthesis Ensemble Oversampling and Ensemble Tree-Based Machine Learning for Class Imbalance Problem in Breast Cancer Diagnosis
AU - Slamet Sudaryanto, N.
AU - Purnomo, Mauridhi Hery
AU - Purwitasari, Diana
AU - Yuniarno, Eko Mulyanto
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - The Wisconsin Breast Cancer Database dataset describes the imbalanced class. The imbalanced class will produce accuracy that only favors the majority class but not the minority class. Several ensemble oversampling methods are SMOTE and Random Over Sampling. Meanwhile, the tree-based machine learning ensemble used is Random Forest, Adaptive Boosting, and eXtreme Gradient Boosting. At the level 1 ensemble stage, one of the ensemble models with the best performance will be selected as input for the level 2 ensemble process. The level 2 ensemble is a boosting ensemble, where the results of the best ensemble model chosen at the level 1 ensemble will be used as the base model for boosting the XGBoost algorithm. The results were tested with 10 Fold Cross Validation of 0.981, Accuracy 0.987, Recall 0.980 and Precision 0.982. The performance of our proposed framework outperforms several recent classification studies in the breast cancer domain.
AB - The Wisconsin Breast Cancer Database dataset describes the imbalanced class. The imbalanced class will produce accuracy that only favors the majority class but not the minority class. Several ensemble oversampling methods are SMOTE and Random Over Sampling. Meanwhile, the tree-based machine learning ensemble used is Random Forest, Adaptive Boosting, and eXtreme Gradient Boosting. At the level 1 ensemble stage, one of the ensemble models with the best performance will be selected as input for the level 2 ensemble process. The level 2 ensemble is a boosting ensemble, where the results of the best ensemble model chosen at the level 1 ensemble will be used as the base model for boosting the XGBoost algorithm. The results were tested with 10 Fold Cross Validation of 0.981, Accuracy 0.987, Recall 0.980 and Precision 0.982. The performance of our proposed framework outperforms several recent classification studies in the breast cancer domain.
KW - AdaBoost
KW - Ensemble
KW - Imbalanced Class
KW - ROS
KW - Random Forest
KW - SMOTE
KW - XGBoost
UR - http://www.scopus.com/inward/record.url?scp=85149102406&partnerID=8YFLogxK
U2 - 10.1109/CENIM56801.2022.10037251
DO - 10.1109/CENIM56801.2022.10037251
M3 - Conference contribution
AN - SCOPUS:85149102406
T3 - Proceeding of the International Conference on Computer Engineering, Network and Intelligent Multimedia, CENIM 2022
SP - 110
EP - 116
BT - Proceeding of the International Conference on Computer Engineering, Network and Intelligent Multimedia, CENIM 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 International Conference on Computer Engineering, Network and Intelligent Multimedia, CENIM 2022
Y2 - 22 November 2022 through 23 November 2022
ER -