A Cross-Sampling Method for Hidden Structure Extraction to Improve Imbalanced Multiclass Classification Accuracy

Wiyli Yustanti*, Nur Iriawan, Irhamah, I. Kadek Dwi Nuryana, Aries Dwi Indriyanti

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Class prediction problems in classification cases are often faced with irregular data conditions. This research contributes to providing a more structured hidden pattern extraction approach to data sets that already have class labels. The process of taking samples from data collection in this study is called the cross-sampling (CS) technique. The basic idea of this technique is to regroup the data using the appropriate clustering method. This study applied the Mini Batch K-Mean and Spectral algorithms to four public datasets with an imbalanced multiclass distribution. The grouping results are then assigned a label based on the original label reference using the pattern-matching concept on the first principal component through the PCA procedure. The labelling results are then used as external validation for the actual data labels to represent all classes. The validation results produce a confusion matrix used as the basis for the cross-sampling method. The dataset before cross-sampling and the results of cross-sampling were compared to the performance of the classification prediction accuracy using the F1-Score and Area Under Curve (AUC) measurements. The statistical hypothesis testing results show a significant difference in performance before and after the cross-sampling procedure. This difference is demonstrated by the accuracy of all the classification algorithms used, which increased significantly from the average performance value of 82.09% to 96.7%.

Original languageEnglish
Title of host publication2023 6th International Conference on Vocational Education and Electrical Engineering
Subtitle of host publicationIntegrating Scalable Digital Connectivity, Intelligence Systems, and Green Technology for Education and Sustainable Community Development, ICVEE 2023 - Proceeding
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages353-358
Number of pages6
ISBN (Electronic)9798350326642
DOIs
Publication statusPublished - 2023
Event6th International Conference on Vocational Education and Electrical Engineering, ICVEE 2023 - Hybrid, Surabaya, Indonesia
Duration: 14 Oct 202315 Oct 2023

Publication series

Name2023 6th International Conference on Vocational Education and Electrical Engineering: Integrating Scalable Digital Connectivity, Intelligence Systems, and Green Technology for Education and Sustainable Community Development, ICVEE 2023 - Proceeding

Conference

Conference6th International Conference on Vocational Education and Electrical Engineering, ICVEE 2023
Country/TerritoryIndonesia
CityHybrid, Surabaya
Period14/10/2315/10/23

Keywords

  • AUC
  • F1-Score
  • classification
  • clustering
  • cross-sampling
  • imbalanced
  • multiclass
  • non-separable
  • structure extraction

Fingerprint

Dive into the research topics of 'A Cross-Sampling Method for Hidden Structure Extraction to Improve Imbalanced Multiclass Classification Accuracy'. Together they form a unique fingerprint.

Cite this