TY - JOUR
T1 - Comparison of Feature Selection Methods to Classify Inhibitors in DUD-E Database
AU - Kuswanto, Heri
AU - Nurhidayah, Renny Yunia
AU - Ohwada, Hayato
N1 - Publisher Copyright:
© 2018 The Authors. Published by Elsevier Ltd.
PY - 2018
Y1 - 2018
N2 - In designing a new drug, inhibitor compound is usually used to control the enzyme work to recover a particular disease. In the drug design technique, the classification of inhibitor is carry out by docking software to simulate the bounding of mixing (new inhibitor candidate) with the targeted enzyme. DUD-E is a database to simulate docking with high dimensional data characteristic, which lead to the feasibility of machine learning approach as the analytical tool. A compound with specific characterictics can be classified into ligand or decoy by using many characterictics leading to a problem in the machine learning algorithm. This paper discusses feature selection analysis to obtain the compound characteristics which are effectively determine ligand or decoy. This paper examined Mutual Information-based Feature Selection (MIFS), Correlation-based Feature Selection (CFS) as well as Fast Correlation-Based Filter (FCBF), and the results show that the FCBF always selects less number of features with fastest runtime of classification. The highest classification accuracy is obtained when all features are used in the classification by k-NN. However, the accuracy is slightly different with classification using selected features. The CFS method performs well for Data-A with accuracy of 89,55%, while the MIFS outperforms the others for Data-B and Data-C with the classification accuracy of 92,34% and 95,20% consecutively.
AB - In designing a new drug, inhibitor compound is usually used to control the enzyme work to recover a particular disease. In the drug design technique, the classification of inhibitor is carry out by docking software to simulate the bounding of mixing (new inhibitor candidate) with the targeted enzyme. DUD-E is a database to simulate docking with high dimensional data characteristic, which lead to the feasibility of machine learning approach as the analytical tool. A compound with specific characterictics can be classified into ligand or decoy by using many characterictics leading to a problem in the machine learning algorithm. This paper discusses feature selection analysis to obtain the compound characteristics which are effectively determine ligand or decoy. This paper examined Mutual Information-based Feature Selection (MIFS), Correlation-based Feature Selection (CFS) as well as Fast Correlation-Based Filter (FCBF), and the results show that the FCBF always selects less number of features with fastest runtime of classification. The highest classification accuracy is obtained when all features are used in the classification by k-NN. However, the accuracy is slightly different with classification using selected features. The CFS method performs well for Data-A with accuracy of 89,55%, while the MIFS outperforms the others for Data-B and Data-C with the classification accuracy of 92,34% and 95,20% consecutively.
KW - DUD-E
KW - accuracy
KW - feature
KW - k-NN
KW - runtime
UR - http://www.scopus.com/inward/record.url?scp=85061115770&partnerID=8YFLogxK
U2 - 10.1016/j.procs.2018.10.519
DO - 10.1016/j.procs.2018.10.519
M3 - Conference article
AN - SCOPUS:85061115770
SN - 1877-0509
VL - 144
SP - 194
EP - 202
JO - Procedia Computer Science
JF - Procedia Computer Science
T2 - 3rd International Neural Network Society Conference on Big Data and Deep Learning, INNS BDDL 2018
Y2 - 17 April 2018 through 19 April 2018
ER -