Diabetes Prediction in Machine Learning using Feature Selection: A Comparative Analysis based on Sampling Techniques

Zulchair Asy'ari*, Shintami Chusnul Hidayati, Riyanarto Sarno

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Diabetes poses a global threat, impacting both patient well-being and healthcare resources. Preventing diabetes has become a key focus in mitigating its impact in the medical field. This research leverages machine learning to predict diabetes in patients by utilizing a dataset of diabetic and non-diabetic individuals to identify patterns indicative of diabetes. The study uses the Diabetes 130-US hospitals dataset, covering the years 1999-2008, and analyzes its features, applies data preprocessing, selects relevant features, and addresses data imbalance through various sampling techniques to enhance prediction accuracy. The machine learning models employed in this research include Logistic Regression, K-Nearest Neighbors (KNN), Random Forest, and Support Vector Machine (SVM). The findings highlight the importance of feature selection in optimizing model performance, identifying key features to improve accuracy, and assessing the effectiveness of each model. A comparative analysis of the models using different sampling techniques demonstrates the exceptional performance of the Random Forest model, achieving 99.8% accuracy with the raw dataset. However, Logistic Regression shows potential for improvement, as its performance increases with the combination of various techniques, indicating its value for future enhancement of predictive capabilities.

Original languageEnglish
Title of host publication2024 7th International Conference on Informatics and Computational Sciences, ICICoS 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages221-226
Number of pages6
ISBN (Electronic)9798350375886
DOIs
Publication statusPublished - 2024
Event7th International Conference on Informatics and Computational Sciences, ICICoS 2024 - Hybrid, Semarang, Indonesia
Duration: 17 Jul 202418 Jul 2024

Publication series

NameProceedings - International Conference on Informatics and Computational Sciences
ISSN (Print)2767-7087

Conference

Conference7th International Conference on Informatics and Computational Sciences, ICICoS 2024
Country/TerritoryIndonesia
CityHybrid, Semarang
Period17/07/2418/07/24

Keywords

  • comparative analysis
  • diabetes prediction
  • feature selection
  • machine learning
  • sampling technique

Fingerprint

Dive into the research topics of 'Diabetes Prediction in Machine Learning using Feature Selection: A Comparative Analysis based on Sampling Techniques'. Together they form a unique fingerprint.

Cite this