TY - GEN
T1 - Support vector machine for imbalanced microarray dataset classification using ant colony optimization and genetic algorithm
AU - Nurlaily, Diana
AU - Irhamah,
AU - Purnami, Santi Wulan
AU - Kuswanto, Heri
N1 - Publisher Copyright:
© 2019 Author(s).
PY - 2019/12/18
Y1 - 2019/12/18
N2 - The microarray dataset contains a series of samples with the number of variables that reach thousands of genes expression. DNA microarrays are used to determine the level of gene expression and gene sequence in the sample. In cancer research, microarrays are used to study variation of molecular between tumors in order to develop better diagnosis and treatment for this disease. Classification is one of the important methods in microarray research to classify gene expression. Some characteristics of microarray dataset are high dimensions and imbalanced. Those characteristics cause prediction of classification which is over fitting. The purpose of this study is to overcome that problem with selection variables and generate synthetic data. The method for variables selection is Ant Colony Optimization (ACO), this method will compare with Genetic Algorithm (GA). The ACO method was inspired by the behavior of ant colonies looking for the shortest distance between the nest and food sources. The Method to solve imbalanced data is Synthetic Minority Oversampling Technique (SMOTE). This method generates synthetic data in minor classes randomly. In this study, the Support Vector Machine (SVM) is used to classify microarray dataset. This study uses breast cancer and lymphoma dataset. These datasets have different imbalanced ratios and number of variables. The result is variable selection using ACO method has fewer variables selected and higher AUC than GA method, but GA method more efficient in running time. SVM with SMOTE has higher performance than SVM without SMOTE.
AB - The microarray dataset contains a series of samples with the number of variables that reach thousands of genes expression. DNA microarrays are used to determine the level of gene expression and gene sequence in the sample. In cancer research, microarrays are used to study variation of molecular between tumors in order to develop better diagnosis and treatment for this disease. Classification is one of the important methods in microarray research to classify gene expression. Some characteristics of microarray dataset are high dimensions and imbalanced. Those characteristics cause prediction of classification which is over fitting. The purpose of this study is to overcome that problem with selection variables and generate synthetic data. The method for variables selection is Ant Colony Optimization (ACO), this method will compare with Genetic Algorithm (GA). The ACO method was inspired by the behavior of ant colonies looking for the shortest distance between the nest and food sources. The Method to solve imbalanced data is Synthetic Minority Oversampling Technique (SMOTE). This method generates synthetic data in minor classes randomly. In this study, the Support Vector Machine (SVM) is used to classify microarray dataset. This study uses breast cancer and lymphoma dataset. These datasets have different imbalanced ratios and number of variables. The result is variable selection using ACO method has fewer variables selected and higher AUC than GA method, but GA method more efficient in running time. SVM with SMOTE has higher performance than SVM without SMOTE.
UR - http://www.scopus.com/inward/record.url?scp=85077684913&partnerID=8YFLogxK
U2 - 10.1063/1.5139808
DO - 10.1063/1.5139808
M3 - Conference contribution
AN - SCOPUS:85077684913
T3 - AIP Conference Proceedings
BT - 2nd International Conference on Science, Mathematics, Environment, and Education
A2 - Indriyanti, Nurma Yunita
A2 - Ramli, Murni
A2 - Nurhasanah, Farida
PB - American Institute of Physics Inc.
T2 - 2nd International Conference on Science, Mathematics, Environment, and Education, ICoSMEE 2019
Y2 - 26 July 2019 through 28 July 2019
ER -