Abstract
Brute force attacks represent one of the most significant and persistent cybersecurity threats. Protocols such as Secure Shell (SSH) and File Transfer Protocol (FTP) are primary targets due to their vulnerability to repeated login attempts. There is a growing need for detection mechanisms that are not only accurate but also robust under an unclean dataset that has been manipulated either in its features or its labels. Features like network address or timestamps, and labels signifying attack status: (1) benign, (2) SSH, and (3) FTP brute force. Existing methods typically focus on detection mechanisms that address unclean datasets by considering only one aspect, either the features or the labels, without considering the correlation between them. Although it is not often that attacks target both features and labels, there is a possibility of attempts on those aspects. To simulate the condition, our work generates label flipping for the data labels and perturbation-based adversarial attacks using the Fast Gradient Sign Method (FGSM) for the data features. Those manipulation steps, including flipping and perturbation, resulted in unclean datasets. This study aims to evaluate the robustness of deep learning models in detecting brute force attacks, whether using labels only or both features-labels, on SSH and FTP protocols when tested with unclean datasets. Due to the shortened attack time of brute force attacks, while the number of features is quite complex and the detection mechanism must be fast, our model reduces the data dimension using a stacked autoencoder (SAE). The evaluated models include deep learning approaches such as Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and transformer, as well as baseline models for performance comparison, including traditional and ensemble machine learning methods. The experiments are conducted through a three-phase experimental design. Phase-0 compares model performance and computational time on the original data and data reduced by the stacked autoencoder. Phase-1 evaluates the impact of label flipping on the training data at varying percentages (0.01% to 10%) and its effect on model performance. Phase-2 introduces a combined attack scenario, applying label flipping to the training labels and the FGSM attack on feature values in the test data. The experimental results show that the use of a stacked autoencoder effectively reduces deep learning computation time by three to five times. In Phase-1, ensemble learning achieved the best performance. However, in Phase-2, deep learning models demonstrated greater robustness than the baseline models, indicating that deep learning architectures are better at handling features-labels unclean datasets.
| Original language | English |
|---|---|
| Pages (from-to) | 176-198 |
| Number of pages | 23 |
| Journal | International Journal of Intelligent Engineering and Systems |
| Volume | 18 |
| Issue number | 11 |
| DOIs | |
| Publication status | Published - 31 Dec 2025 |
Keywords
- Brute-force attacks
- Flipped labels
- Perturbed features
- Stacked autoencoder
- Unclean dataset
Fingerprint
Dive into the research topics of 'Integration of Stacked Autoencoder and Deep Learning Against Unclean Datasets in Detecting SSH and FTP Brute Force Attack'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver