TY - JOUR
T1 - Classification model based on url and content feature approach for detection phishing website in Indonesia
AU - Purwiantono, Febry Eka
AU - Tjahyanto, Aris
N1 - Publisher Copyright:
© 2005 - Ongoing JATIT & LLS.
PY - 2017/9/15
Y1 - 2017/9/15
N2 - This research proposed a classification model that can be used to detect phishing website accurately. This study takes a case study from Indonesia because data used are sites using Bahasa Indonesia, hosted in Indonesia and frequently accessed by Internet users from Indonesia. Dataset used in this research consisted of approximately 102 authentic websites and 364 phishing websites. The proposed detection technique based on website analysis using the URL and content feature based approach. This classification model combines several heterogeneous features from previous research and proposes new URL and content feature based approach that are expected to improve detection performance when compared with previous research. Moreover, in the proposed classification model created a web crawler to extract feature vectors in this research. This research uses four different algorithms such as Sequential Minimal Optimization (SMO), Naive Bayes, Bagging and Multilayer Perceptron. The result, SMO, Naive Bayes, Bagging and Multilayer Perceptron have accuracy of approximately 89.27%, 93.78%, 95.49% and 92.70%. Algorithm has the best accuracy is Bagging, it will be used in this classification model to compare with classification model in previous research using same dataset. The result, accuracy of classification model in this research outperformed accuracy of classification model in previous research. The classification model in this research outperform 5.79% against classification model in previous research which only yielded 89.70% accuracy.
AB - This research proposed a classification model that can be used to detect phishing website accurately. This study takes a case study from Indonesia because data used are sites using Bahasa Indonesia, hosted in Indonesia and frequently accessed by Internet users from Indonesia. Dataset used in this research consisted of approximately 102 authentic websites and 364 phishing websites. The proposed detection technique based on website analysis using the URL and content feature based approach. This classification model combines several heterogeneous features from previous research and proposes new URL and content feature based approach that are expected to improve detection performance when compared with previous research. Moreover, in the proposed classification model created a web crawler to extract feature vectors in this research. This research uses four different algorithms such as Sequential Minimal Optimization (SMO), Naive Bayes, Bagging and Multilayer Perceptron. The result, SMO, Naive Bayes, Bagging and Multilayer Perceptron have accuracy of approximately 89.27%, 93.78%, 95.49% and 92.70%. Algorithm has the best accuracy is Bagging, it will be used in this classification model to compare with classification model in previous research using same dataset. The result, accuracy of classification model in this research outperformed accuracy of classification model in previous research. The classification model in this research outperform 5.79% against classification model in previous research which only yielded 89.70% accuracy.
KW - Classification model
KW - Detection
KW - Feature
KW - Indonesia
KW - Phishing website
UR - http://www.scopus.com/inward/record.url?scp=85029710829&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85029710829
SN - 1992-8645
VL - 95
SP - 4181
EP - 4191
JO - Journal of Theoretical and Applied Information Technology
JF - Journal of Theoretical and Applied Information Technology
IS - 17
ER -