Performance Analysis of Resampling and Ensemble Learning Methods on Diabetes Detection as Imbalanced Dataset

Fiqey Indriati Eka Sari, Frederick William Edlim, Fitrah Arie Ramadhan, Muhtadin, Dini Adni Navastara

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Early detection of diabetes is essential to reducing a high mortality rate. Early detection can be made by studying the possibility of diabetes from the variables obtained in the data of diabetes patients. How to diagnose a patient with medical data becomes a challenge because these are usually imbalanced, where negative cases severely outnumber positive cases. For preprocessing the imbalanced data, this paper designs an algorithm using resampling techniques combined with an ensemble learning algorithm. There are some oversampling techniques ADASYN, ROS, and SMOTE. Whereas, the undersampling techniques are RUS, Tomek, and ENN. The combined techniques like SMOTE-ENN and SMOTE-Tomek are also used to handle highly imbalanced dataset diabetes. Then, the ensemble learning algorithm that is used is Random Forest, Bagging, AdaBoost, and XGBoost. Based on the experimental results, the best performance is using SMOTE-ENN with AdaBoost, with a recall score of 0.7330 even though the F1-Score of this model is 0.6459. AdaBoost Classifier also has good and stable results with various types of resampling. By using SMOTE-ENN, the recall score of the model increased by 0.1819 and the F1 score decreased by 0.2000 from the original model result. The higher sensitivity/recall is more important in medical diagnoses to correctly identify patients with disease than the F1 Score.

Original languageEnglish
Title of host publication2022 5th International Conference on Vocational Education and Electrical Engineering
Subtitle of host publicationThe Future of Electrical Engineering, Informatics, and Educational Technology Through the Freedom of Study in the Post-Pandemic Era, ICVEE 2022 - Proceeding
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-5
Number of pages5
ISBN (Electronic)9781665475815
DOIs
Publication statusPublished - 2022
Event5th International Conference on Vocational Education and Electrical Engineering, ICVEE 2022 - Virtual, Surabaya, Indonesia
Duration: 10 Sept 202211 Sept 2022

Publication series

Name2022 5th International Conference on Vocational Education and Electrical Engineering: The Future of Electrical Engineering, Informatics, and Educational Technology Through the Freedom of Study in the Post-Pandemic Era, ICVEE 2022 - Proceeding

Conference

Conference5th International Conference on Vocational Education and Electrical Engineering, ICVEE 2022
Country/TerritoryIndonesia
CityVirtual, Surabaya
Period10/09/2211/09/22

Keywords

  • diabetes
  • ensemble learning
  • imbalanced dataset
  • resampling

Fingerprint

Dive into the research topics of 'Performance Analysis of Resampling and Ensemble Learning Methods on Diabetes Detection as Imbalanced Dataset'. Together they form a unique fingerprint.

Cite this