TY - GEN
T1 - Enhancing Anomaly Classification Over Log Files through Topic Modeling and Ensemble Methods
AU - Islami, Achmad Mujaddid
AU - Maulani, Irham
AU - Zumadila, Rifqi
AU - Yoga Putra, Anggi Malanda
AU - Santoso, Bagus Jati
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Log files play a vital role in monitoring system changes, and their usage is rapidly increasing in cloud computing. To effectively address this challenge, text processing is an essential step in extracting information from unstructured text data, specifically log files, and transforming it into identifiable patterns using specific methods. Given the brevity of most log file text, this study focuses on the application of short text topic modeling and ensemble methods to classify log files andextract meaningful insights. The findings demonstrate that anomaly detection using short-text topic modeling and ensemble methods surpasses the benchmark method of Latent Dirichlet Allocation (LDA) topic modeling. Notably, the classification approach utilizing GSDMM,in combination with experiments involving XGBoost, achieves the highest performance when compared to other ensemble methods such as Random Forest, Gradient Boosting, andAdaBoost. To further optimize the performance of the XGBoost method in anomaly detection classification, hyperparameter tuning is conducted using Optuna. This approach effectively identifies the most optimal hyperparameters for XGBoost, leading to enhanced performance. Overall, this research illustrates that the utilization of short text topic modeling and ensemble methods, along with hyperparameter optimization, significantly improves the accuracy and effectiveness of anomaly detection in log file classification.
AB - Log files play a vital role in monitoring system changes, and their usage is rapidly increasing in cloud computing. To effectively address this challenge, text processing is an essential step in extracting information from unstructured text data, specifically log files, and transforming it into identifiable patterns using specific methods. Given the brevity of most log file text, this study focuses on the application of short text topic modeling and ensemble methods to classify log files andextract meaningful insights. The findings demonstrate that anomaly detection using short-text topic modeling and ensemble methods surpasses the benchmark method of Latent Dirichlet Allocation (LDA) topic modeling. Notably, the classification approach utilizing GSDMM,in combination with experiments involving XGBoost, achieves the highest performance when compared to other ensemble methods such as Random Forest, Gradient Boosting, andAdaBoost. To further optimize the performance of the XGBoost method in anomaly detection classification, hyperparameter tuning is conducted using Optuna. This approach effectively identifies the most optimal hyperparameters for XGBoost, leading to enhanced performance. Overall, this research illustrates that the utilization of short text topic modeling and ensemble methods, along with hyperparameter optimization, significantly improves the accuracy and effectiveness of anomaly detection in log file classification.
KW - Anomaly Classification
KW - Ensemble Methods
KW - Log
KW - Natural Language Processing
KW - Short Text
KW - Topic Modelling
UR - http://www.scopus.com/inward/record.url?scp=85187215642&partnerID=8YFLogxK
U2 - 10.1109/ICITCOM60176.2023.10442730
DO - 10.1109/ICITCOM60176.2023.10442730
M3 - Conference contribution
AN - SCOPUS:85187215642
T3 - Proceeding - International Conference on Information Technology and Computing 2023, ICITCOM 2023
SP - 57
EP - 61
BT - Proceeding - International Conference on Information Technology and Computing 2023, ICITCOM 2023
A2 - Chen, Hsing-Chung
A2 - Damarjati, Cahya
A2 - Blum, Christian
A2 - Jusman, Yessi
A2 - Kanafiah, Siti Nurul Aqmariah Mohd
A2 - Ejaz, Waleed
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 International Conference on Information Technology and Computing, ICITCOM 2023
Y2 - 1 December 2023 through 2 December 2023
ER -