The Effects of Feature Selection and Balancing Dataset to Improve IoT-Based IDS Using Machine Learning

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The amount of data has increased dramatically over the past decade, which has made classifying the data more complicated, mainly when the data contains an uneven distribution, which indicates that one majority class has more instances than others. Standard classifiers can disregard the minority class entirely in this situation and have a tendency to classify all samples as belonging to the majority class. In the area of IDS study research, this imbalanced dataset is frequently found, as naturally, the amount of benign traffic outweighs the number of cyber threats. Researchers often implement the SMOTE technique for this challenge, particularly the over-sampling method. Nonetheless, other SMOTE techniques, namely undersampling and combined sampling, are rarely implemented. In this work, we attempt to balance the RT IoT 2022 dataset with wrapper feature selection and to-fold cross-validation. The result was then evaluated using several machine learning classifiers, including K-nearest neighbors, Naive Bayes, Decision Tree, Random Forest, Support Vector Machines, and Adaptive Boosting. The result indicates that in terms of accuracy, Random Forest exceeds other classifiers in either oversampling, undersampling, or combined experiments at 99.09%, 98.65%, and 99.97%, respectively.

Original languageEnglish
Title of host publication2024 IEEE 3rd Industrial Electronics Society Annual On-Line Conference, ONCON 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331540319
DOIs
Publication statusPublished - 2024
Event3rd IEEE Industrial Electronics Society Annual On-Line Conference, ONCON 2024 - Beijing, China
Duration: 8 Dec 202410 Dec 2024

Publication series

Name2024 IEEE 3rd Industrial Electronics Society Annual On-Line Conference, ONCON 2024

Conference

Conference3rd IEEE Industrial Electronics Society Annual On-Line Conference, ONCON 2024
Country/TerritoryChina
CityBeijing
Period8/12/2410/12/24

Keywords

  • IDS
  • RT IoT 2022
  • SMOTE
  • cross-validation
  • imbalanced dataset
  • machine learning

Fingerprint

Dive into the research topics of 'The Effects of Feature Selection and Balancing Dataset to Improve IoT-Based IDS Using Machine Learning'. Together they form a unique fingerprint.

Cite this