Hybrid method of undersampling and oversampling for handling imbalanced data

Shabrina Choirunnisa, Joko Lianto

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

24 Citations (Scopus)

Abstract

Imbalance of data occurs in various kinds of data including natural imbalanced data. If the computation process of the imbalanced data is carried out (for example clustering), the data imbalance has the potential to cause misclassification because the majority data is more dominant on minority data which results in a decrease in accuracy. The combination method of oversampling and undersampling can be one solution in solving imbalance cases. This study aims to address the problem of imbalanced data by combining the oversampling method with the undersampling method to obtain more representative synthetic data. In this study, the undersampling methods used is Neighborhood Cleaning Rules (NCL. While Adaptive Semiunsupervised Weighted Oversampling (A-SUWO) will be used as the oversampling method. After the undersampling and oversampling process is carried out, the data will be classified using the Decision Tree C4.5 and Random Forest algorithm. Performance evaluation will be processed using the calculation of precision, recall, F-measure and accuracy.

Original languageEnglish
Title of host publication2018 International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages276-280
Number of pages5
ISBN (Electronic)9781538674222
DOIs
Publication statusPublished - Nov 2018
Event2018 International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2018 - Yogyakarta, Indonesia
Duration: 21 Nov 201822 Nov 2018

Publication series

Name2018 International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2018

Conference

Conference2018 International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2018
Country/TerritoryIndonesia
CityYogyakarta
Period21/11/1822/11/18

Keywords

  • A-SUWO
  • Imbalance
  • NCL
  • Natural data
  • Oversampling
  • Undersampling

Fingerprint

Dive into the research topics of 'Hybrid method of undersampling and oversampling for handling imbalanced data'. Together they form a unique fingerprint.

Cite this