Boosting Support Vector Machines for Imbalanced Microarray Data

Risky Frasetio Wahyu Pratama, Santi Wulan Purnami*, Santi Puteri Rahayu

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

8 Citations (Scopus)

Abstract

Nowadays, microarray data plays an important role in the detection and classification of almost all types of cancer tissue. The gene expression produced by microarray technology that carries the information from genes is then matched to a specific cancer condition. The problems that often appear in the classification using microarray data are high-dimensional data and imbalanced class. The problem of high-dimensional data can be solved by using Fast Correlated Based Filter (FCBF) feature selection. In this paper, Support Vector Machine (SVM) classifier is used because of its advantages. However, some studies mention that almost all classifier model including SVM are sensitive with respect to imbalanced class. Synthetic Minority Oversampling Technique (SMOTE) is one of the prepocessing data methods in handling imbalanced class based on sampling approach by increasing the number of samples from the minority class. This method often works well but sometimes it might suffer from over-fitting problem. One other alternative approach in improving the performance of imbalanced data classification is boosting. This method constructs a powerful final classifier by combining a set of SVMs as base classifier during the iteration process. So, it can improve the classification performance. In this study, colon cancer and myeloma data are used in the analysis. The results show that SMOTEBoost with SVM as base classifier outperforms SVM, SMOTE-SVM, and AdaBoost with SVM as base classifier by looking on G-mean metric.

Original languageEnglish
Pages (from-to)174-183
Number of pages10
JournalProcedia Computer Science
Volume144
DOIs
Publication statusPublished - 2018
Event3rd International Neural Network Society Conference on Big Data and Deep Learning, INNS BDDL 2018 - Sanur, Bali, Indonesia
Duration: 17 Apr 201819 Apr 2018

Keywords

  • AdaBoost
  • Imbalanced Class
  • Microarray Data
  • SMOTE
  • SMOTEBoost
  • SVM

Fingerprint

Dive into the research topics of 'Boosting Support Vector Machines for Imbalanced Microarray Data'. Together they form a unique fingerprint.

Cite this