TY - GEN
T1 - Enhancing XGBoost and CatBoost Methods for Diagnosing Parkinson's Disease Through the Integration of SMOTE and Feature Selection Techniques
AU - Joses, Steven
AU - Saikhu, Ahmad
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Parkinson's disease is a neurodegenerative condition affecting movement, requires early detection for effective treatment. The main symptoms of this disease include tremors, muscle stiffness, slow movements, and difficulty controlling movements. Parkinson's disease that is successfully detected early can be given effective medical treatment. Recent studies suggest that machine learning can be a useful indicator for diagnosis. However, predicting Parkinson's disease remains challenging due to imbalanced data. Utilizing feature selection, imbalance correction techniques, and optimization can enhance prediction accuracy. The results of this study show that XGBoost consistently performs better than CatBoost in initial scenarios across various data split ratios. Both XGBoost and CatBoost achieved their highest accuracy at an 80:20 data split ratio, with XGBoost reaching 91.39% and CatBoost 90.72%. Without hyperparameter tuning through GridSearchCV, CatBoost achieved its highest accuracy of98.01 % using SMOTE and FS3, while XGBoost achieved its highest accuracy of 96.68% with SMOTE and FS1. The application of GridSearchCV significantly improved both accuracy and F1 Score for CatBoost and XGBoost. With GridSearchCV, both models demonstrated consistent performance enhancements across all tested scenarios. Overall, these findings highlight the effectiveness of hyperparameter tuning and feature selection in optimizing model performance for complex classification tasks.
AB - Parkinson's disease is a neurodegenerative condition affecting movement, requires early detection for effective treatment. The main symptoms of this disease include tremors, muscle stiffness, slow movements, and difficulty controlling movements. Parkinson's disease that is successfully detected early can be given effective medical treatment. Recent studies suggest that machine learning can be a useful indicator for diagnosis. However, predicting Parkinson's disease remains challenging due to imbalanced data. Utilizing feature selection, imbalance correction techniques, and optimization can enhance prediction accuracy. The results of this study show that XGBoost consistently performs better than CatBoost in initial scenarios across various data split ratios. Both XGBoost and CatBoost achieved their highest accuracy at an 80:20 data split ratio, with XGBoost reaching 91.39% and CatBoost 90.72%. Without hyperparameter tuning through GridSearchCV, CatBoost achieved its highest accuracy of98.01 % using SMOTE and FS3, while XGBoost achieved its highest accuracy of 96.68% with SMOTE and FS1. The application of GridSearchCV significantly improved both accuracy and F1 Score for CatBoost and XGBoost. With GridSearchCV, both models demonstrated consistent performance enhancements across all tested scenarios. Overall, these findings highlight the effectiveness of hyperparameter tuning and feature selection in optimizing model performance for complex classification tasks.
KW - classification
KW - data balancing techniques
KW - feature selection
KW - hyperparameter tuning
KW - machine learning
UR - http://www.scopus.com/inward/record.url?scp=85210474745&partnerID=8YFLogxK
U2 - 10.1109/ICITISEE63424.2024.10729906
DO - 10.1109/ICITISEE63424.2024.10729906
M3 - Conference contribution
AN - SCOPUS:85210474745
T3 - 2024 8th International Conference on Information Technology, Information Systems and Electrical Engineering, ICITISEE 2024
SP - 487
EP - 492
BT - 2024 8th International Conference on Information Technology, Information Systems and Electrical Engineering, ICITISEE 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th International Conference on Information Technology, Information Systems and Electrical Engineering, ICITISEE 2024
Y2 - 29 August 2024 through 30 August 2024
ER -