TY - GEN
T1 - A Review of Imbalanced Datasets and Resampling Techniques in Network Intrusion Detection System
AU - Rajasa, Mahesa Cadi
AU - Rahma, Fayruz
AU - Rachmadi, Reza Fuad
AU - Pratomo, Baskoro Adi
AU - Purnomo, Mauridhi Hery
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - The rapid development of network connections and the widespread use of Internet of Things (IoT) devices has increased network traffic. The surge in network traffic has created new vulnerabilities in cyberspace, making it vulnerable to cyber-attacks. To address this challenge, researchers have turned to intelligent techniques, especially machine learning and deep learning, to improve the detection of network traffic attacks. However, a common problem arises: the data imbalance problem, where normal samples occur more often than attack samples, which hurts the performance and classification of machine learning or deep learning models. This study conducted a systematic literature review to identify the imbalanced datasets and the use of resampling techniques for addressing data imbalances in network intrusion detection research. We found four widely-used imbalanced datasets: NSL-KDD, CIC-IDS2017, UNSW-NB15, and KDD-Cup 1999. Researchers used three resampling approaches to tackle the imbalance problem: oversampling, undersampling, and hybrid sampling (combining oversampling and undersampling approaches). Researchers and practitioners can improve the security and efficiency of attack detection across network traffic by applying resampling techniques.
AB - The rapid development of network connections and the widespread use of Internet of Things (IoT) devices has increased network traffic. The surge in network traffic has created new vulnerabilities in cyberspace, making it vulnerable to cyber-attacks. To address this challenge, researchers have turned to intelligent techniques, especially machine learning and deep learning, to improve the detection of network traffic attacks. However, a common problem arises: the data imbalance problem, where normal samples occur more often than attack samples, which hurts the performance and classification of machine learning or deep learning models. This study conducted a systematic literature review to identify the imbalanced datasets and the use of resampling techniques for addressing data imbalances in network intrusion detection research. We found four widely-used imbalanced datasets: NSL-KDD, CIC-IDS2017, UNSW-NB15, and KDD-Cup 1999. Researchers used three resampling approaches to tackle the imbalance problem: oversampling, undersampling, and hybrid sampling (combining oversampling and undersampling approaches). Researchers and practitioners can improve the security and efficiency of attack detection across network traffic by applying resampling techniques.
KW - imbalanced data
KW - network intrusion detection
KW - resampling techniques
UR - http://www.scopus.com/inward/record.url?scp=85186527787&partnerID=8YFLogxK
U2 - 10.1109/ICITDA60835.2023.10427217
DO - 10.1109/ICITDA60835.2023.10427217
M3 - Conference contribution
AN - SCOPUS:85186527787
T3 - ICITDA 2023 - Proceedings of the 2023 8th International Conference on Information Technology and Digital Applications
BT - ICITDA 2023 - Proceedings of the 2023 8th International Conference on Information Technology and Digital Applications
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th International Conference on Information Technology and Digital Applications, ICITDA 2023
Y2 - 17 November 2023 through 18 November 2023
ER -