Analyzing Oversampling and Machine Learning Approaches for Imbalanced Dataset Classification

Dini Adni Navastara*, Chastine Fatichah, Yulia Niza, Fiqey Indriati Eka Sari, Muchamad Maroqi Abdul Jalil

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Imbalanced data, characterized by a substantial difference in data distribution between majority and minority classes, poses a critical challenge in predictive modeling. This disparity often leads to the misclassification of the minority class, which may contain vital information for real-world applications. Consequently, addressing imbalanced data is paramount, given its potential repercussions in critical classification scenarios. In this study, we conducted a comprehensive analysis of oversampling and ensemble learning techniques to mitigate imbalanced data issues. Through an extensive evaluation process employing confusion matrices, we measured the performance of these methods across various binary datasets. Remarkably, our findings showcased the efficacy of these techniques, with standout results such as a recall score of 0.6883 for the Haberman's Survival dataset using the KNN classification method in conjunction with Borderline-SMOTE oversampling, a recall score of 0.8391 for the COVID19 dataset with the KNN classification method and SMOTE oversampling, and an impressive recall value of 0.9476 for the Credit Card Fraud dataset when applying the XGBoost classification method and ROS oversampling. It is important to note that the performance outcomes are intrinsically tied to the unique characteristics of each dataset. This study provides insights for handling imbalanced data based on dataset characteristics, aiding predictive modeling in real-world scenarios.

Original languageEnglish
Title of host publication2023 IEEE 21st Student Conference on Research and Development, SCOReD 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages27-32
Number of pages6
ISBN (Electronic)9798350318821
DOIs
Publication statusPublished - 2023
Event21st IEEE Student Conference on Research and Development, SCOReD 2023 - Kuala Lumpur, Malaysia
Duration: 13 Dec 202314 Dec 2023

Publication series

Name2023 IEEE 21st Student Conference on Research and Development, SCOReD 2023

Conference

Conference21st IEEE Student Conference on Research and Development, SCOReD 2023
Country/TerritoryMalaysia
CityKuala Lumpur
Period13/12/2314/12/23

Keywords

  • Classification
  • Ensemble Learning
  • Imbalanced Data
  • Machine Learning
  • Oversampling

Fingerprint

Dive into the research topics of 'Analyzing Oversampling and Machine Learning Approaches for Imbalanced Dataset Classification'. Together they form a unique fingerprint.

Cite this