TY - JOUR
T1 - Clustering under-sampling data for improving the performance of intrusion detection system
AU - Aziz, Mohammad Nasrul
AU - Ahmad, Tohari
N1 - Publisher Copyright:
© School of Engineering, Taylor's University.
PY - 2021/4
Y1 - 2021/4
N2 - The fast development of information technology has made information security and computer networks an essential factor. One possible method of protecting these security resources is the Intrusion Detection System (IDS), which recognizes abnormal packets among incoming data. In this study, we work on its detection capability by exploring a machine learning-based data mining approach. In this approach, proper training data are needed to obtain a useful detection model. Preprocessing is one way to increase the quality of the training data, which can be performed by removing noise. Our research attempts to cluster data for the majority class by using k-means that we can recognize the noise by taking an appropriate threshold. In this case, we identify the clusters with a value below the threshold as noise data. Thus, a new majority class of data should not contain noise anymore. This majority class is then combined with the minority class to form a new training data set. It is tested by implementing several classifiers: Naive Bayes (NB), k-Nearest Neighbor (k-NN), Support Vector Machine (SVM), Multi-Layer Perceptron (MLP) and Random Forest (RF) in the NSL-KDD and UNSW-NB15 dataset. The results we obtained from this proposed method show that it can improve the performance. It is depicted that the best improvement is achieved by using the NB classifier. In NSL-KDD, there is an increase from 88.60% to 88.85%, while in UNSW-NB15, it is from 76.04% to 92.57%.
AB - The fast development of information technology has made information security and computer networks an essential factor. One possible method of protecting these security resources is the Intrusion Detection System (IDS), which recognizes abnormal packets among incoming data. In this study, we work on its detection capability by exploring a machine learning-based data mining approach. In this approach, proper training data are needed to obtain a useful detection model. Preprocessing is one way to increase the quality of the training data, which can be performed by removing noise. Our research attempts to cluster data for the majority class by using k-means that we can recognize the noise by taking an appropriate threshold. In this case, we identify the clusters with a value below the threshold as noise data. Thus, a new majority class of data should not contain noise anymore. This majority class is then combined with the minority class to form a new training data set. It is tested by implementing several classifiers: Naive Bayes (NB), k-Nearest Neighbor (k-NN), Support Vector Machine (SVM), Multi-Layer Perceptron (MLP) and Random Forest (RF) in the NSL-KDD and UNSW-NB15 dataset. The results we obtained from this proposed method show that it can improve the performance. It is depicted that the best improvement is achieved by using the NB classifier. In NSL-KDD, there is an increase from 88.60% to 88.85%, while in UNSW-NB15, it is from 76.04% to 92.57%.
KW - Classification
KW - Computer security
KW - Intrusion detection system
KW - Machine learning
KW - Network security
KW - Undersampling
UR - http://www.scopus.com/inward/record.url?scp=85104191164&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85104191164
SN - 1823-4690
VL - 16
SP - 1342
EP - 1355
JO - Journal of Engineering Science and Technology
JF - Journal of Engineering Science and Technology
IS - 2
ER -