TY - GEN
T1 - An empirical study of classifier behavior in rattle tool
AU - Wibowo, Wahyu
AU - Abdul-Rahman, Shuzlina
N1 - Publisher Copyright:
© Springer Nature Singapore Pte Ltd. 2019.
PY - 2019
Y1 - 2019
N2 - There are many factors that influence classifiers behavior in machine learning, and thus determining the best classifier is not an easy task. One way of tackling this problem is by experimenting the classifiers with several performance measures. In this paper, the behaviors of machine learning classifiers are experimented using the Rattle tool. Rattle tool is a graphical user interface (GUI) in R package used to carry out data mining modeling using classifiers namely, tree, boost, random forest, support vector machine, logit and neural net. This study was conducted using simulation and real data in which the behaviors of the classifiers are observed based on accuracy, ROC curve and modeling time. Based on the simulation data, there is grouping of the algorithms in terms of accuracy. The first are logit, neural net and support vector machine. The second are boost and random forest and the third is decision tree. Based on the real data, the highest accuracy based on the training data is boost algorithm and based on the testing data the highest accuracy is the neural net algorithm. Overall, the support vector machine and neural net classifier are the two best classifiers in both simulation and real data.
AB - There are many factors that influence classifiers behavior in machine learning, and thus determining the best classifier is not an easy task. One way of tackling this problem is by experimenting the classifiers with several performance measures. In this paper, the behaviors of machine learning classifiers are experimented using the Rattle tool. Rattle tool is a graphical user interface (GUI) in R package used to carry out data mining modeling using classifiers namely, tree, boost, random forest, support vector machine, logit and neural net. This study was conducted using simulation and real data in which the behaviors of the classifiers are observed based on accuracy, ROC curve and modeling time. Based on the simulation data, there is grouping of the algorithms in terms of accuracy. The first are logit, neural net and support vector machine. The second are boost and random forest and the third is decision tree. Based on the real data, the highest accuracy based on the training data is boost algorithm and based on the testing data the highest accuracy is the neural net algorithm. Overall, the support vector machine and neural net classifier are the two best classifiers in both simulation and real data.
KW - Accuracy
KW - Classifier
KW - Empirical data
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=85059100566&partnerID=8YFLogxK
U2 - 10.1007/978-981-13-3441-2_25
DO - 10.1007/978-981-13-3441-2_25
M3 - Conference contribution
AN - SCOPUS:85059100566
SN - 9789811334405
T3 - Communications in Computer and Information Science
SP - 322
EP - 334
BT - Soft Computing in Data Science - 4th International Conference, SCDS 2018, Proceedings
A2 - Yap, Bee Wah
A2 - Mohamed, Azlinah Hj
A2 - Berry, Michael W.
PB - Springer Verlag
T2 - 4th International Conference on Soft Computing in Data Science, SCDS 2018
Y2 - 15 August 2018 through 16 August 2018
ER -