Two-Stage Sampling: A Framework for Imbalanced Classification With Overlapped Classes

Neni Alya Firdausanti*, Tirana Noor Fatyanosa, Mahendra Data, Israel Mendonca, Masayoshi Aritsugi

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

Class imbalance and overlapping instances problems have long been recognized as one of the major causes of the performance deterioration of the classification model. Moreover, the majority class may have an irrelevant and noisy instance that shifts the decision boundary of the classification far away from the ideal one. We propose a framework for balancing the class distribution and mitigating the class overlap problem in a dataset. The key feature of our framework is its ability to detect the overlapping instances between classes and then remove the problematic instances from the majority class. Thus, it will have more precise information for the oversampling method to generate the synthetic minority instances. We evaluated the proposed framework using the Lending club and ten other datasets from the KEEL repository. We demonstrate the implementations of our framework using Tomek and Edited Nearest Neighbor for removing the overlapping instances from the majority class and SWIM-MD for generating the synthetic minority instances. Also, we used eight well-known classifiers to show that our proposed framework can improve the performance of various classifiers. Lastly, we present a detailed analysis of the experimental result that shows the superiority of our proposed framework. Our proposed framework outperformed the state-of-the-art methods in terms of geometry mean classification performance metric.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
EditorsShusaku Tsumoto, Yukio Ohsawa, Lei Chen, Dirk Van den Poel, Xiaohua Hu, Yoichi Motomura, Takuya Takagi, Lingfei Wu, Ying Xie, Akihiro Abe, Vijay Raghavan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages271-280
Number of pages10
ISBN (Electronic)9781665480451
DOIs
Publication statusPublished - 2022
Externally publishedYes
Event2022 IEEE International Conference on Big Data, Big Data 2022 - Osaka, Japan
Duration: 17 Dec 202220 Dec 2022

Publication series

NameProceedings - 2022 IEEE International Conference on Big Data, Big Data 2022

Conference

Conference2022 IEEE International Conference on Big Data, Big Data 2022
Country/TerritoryJapan
CityOsaka
Period17/12/2220/12/22

Keywords

  • imbalance
  • machine learning
  • overlap
  • synthetic oversampling
  • undersampling

Fingerprint

Dive into the research topics of 'Two-Stage Sampling: A Framework for Imbalanced Classification With Overlapped Classes'. Together they form a unique fingerprint.

Cite this