Clustering under-sampling data for improving the performance of intrusion detection system

Mohammad Nasrul Aziz, Tohari Ahmad*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)


The fast development of information technology has made information security and computer networks an essential factor. One possible method of protecting these security resources is the Intrusion Detection System (IDS), which recognizes abnormal packets among incoming data. In this study, we work on its detection capability by exploring a machine learning-based data mining approach. In this approach, proper training data are needed to obtain a useful detection model. Preprocessing is one way to increase the quality of the training data, which can be performed by removing noise. Our research attempts to cluster data for the majority class by using k-means that we can recognize the noise by taking an appropriate threshold. In this case, we identify the clusters with a value below the threshold as noise data. Thus, a new majority class of data should not contain noise anymore. This majority class is then combined with the minority class to form a new training data set. It is tested by implementing several classifiers: Naive Bayes (NB), k-Nearest Neighbor (k-NN), Support Vector Machine (SVM), Multi-Layer Perceptron (MLP) and Random Forest (RF) in the NSL-KDD and UNSW-NB15 dataset. The results we obtained from this proposed method show that it can improve the performance. It is depicted that the best improvement is achieved by using the NB classifier. In NSL-KDD, there is an increase from 88.60% to 88.85%, while in UNSW-NB15, it is from 76.04% to 92.57%.

Original languageEnglish
Pages (from-to)1342-1355
Number of pages14
JournalJournal of Engineering Science and Technology
Issue number2
Publication statusPublished - Apr 2021


  • Classification
  • Computer security
  • Intrusion detection system
  • Machine learning
  • Network security
  • Undersampling


Dive into the research topics of 'Clustering under-sampling data for improving the performance of intrusion detection system'. Together they form a unique fingerprint.

Cite this