Combine Sampling Support Vector Machine for Imbalanced Data Classification

Hartayuni Sain, Santi Wulan Purnami*

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

48 Citations (Scopus)

Abstract

Imbalanced data are defined as dataset condition with some class is larger than any class in number: the larger class described as the majority or negative class and the less class described as minority or positive class. This condition considerate as problem in data classification since most of classifiers tend to predict major class and ignore minor class, hence the analysis provide lack of accuracy for minor class. Some basic ideas on the approach to the data level by using sampling-based approaches to handle this classification issue are under sampling and oversampling. Synthetic minority oversampling technique (SMOTE) is one of oversampling methods to increase number of positive class using sample drawing techniques by randomly replicate the data in such way that the number of positive class is equal to the number of negative class. Other method is Tomek links, an under sampling method and works by decreasing the number of negative class. In this research, combine sampling was done by combining SMOTE and Tomek links techniques along with SVM as the binary classification method. Based on accuracy rates in this study, using combine sampling method provided better result than SMOTE and Tomek links in 5-fold cross validation. However, in some extreme cases combine sampling method are no better than the use of methods Tomek links.

Original languageEnglish
Pages (from-to)59-66
Number of pages8
JournalProcedia Computer Science
Volume72
DOIs
Publication statusPublished - 2015
Event3rd Information Systems International Conference, 2015 - Shenzhen, China
Duration: 16 Apr 201518 Apr 2015

Keywords

  • Classification
  • Imbalanced data
  • Support Vector Machine (SVM)
  • Synthetic Minority Oversampling Technique (SMOTE)
  • Tomek Links

Fingerprint

Dive into the research topics of 'Combine Sampling Support Vector Machine for Imbalanced Data Classification'. Together they form a unique fingerprint.

Cite this