Abstract
Nowadays, microarray data plays an important role in the detection and classification of almost all types of cancer tissue. The gene expression produced by microarray technology that carries the information from genes is then matched to a specific cancer condition. The problems that often appear in the classification using microarray data are high-dimensional data and imbalanced class. The problem of high-dimensional data can be solved by using Fast Correlated Based Filter (FCBF) feature selection. In this paper, Support Vector Machine (SVM) classifier is used because of its advantages. However, some studies mention that almost all classifier model including SVM are sensitive with respect to imbalanced class. Synthetic Minority Oversampling Technique (SMOTE) is one of the prepocessing data methods in handling imbalanced class based on sampling approach by increasing the number of samples from the minority class. This method often works well but sometimes it might suffer from over-fitting problem. One other alternative approach in improving the performance of imbalanced data classification is boosting. This method constructs a powerful final classifier by combining a set of SVMs as base classifier during the iteration process. So, it can improve the classification performance. In this study, colon cancer and myeloma data are used in the analysis. The results show that SMOTEBoost with SVM as base classifier outperforms SVM, SMOTE-SVM, and AdaBoost with SVM as base classifier by looking on G-mean metric.
Original language | English |
---|---|
Pages (from-to) | 174-183 |
Number of pages | 10 |
Journal | Procedia Computer Science |
Volume | 144 |
DOIs | |
Publication status | Published - 2018 |
Event | 3rd International Neural Network Society Conference on Big Data and Deep Learning, INNS BDDL 2018 - Sanur, Bali, Indonesia Duration: 17 Apr 2018 → 19 Apr 2018 |
Keywords
- AdaBoost
- Imbalanced Class
- Microarray Data
- SMOTE
- SMOTEBoost
- SVM