TY - JOUR
T1 - Android Malware Classification Using Gain Ratio and Ensembled Machine Learning
AU - Ansori, Dwinanda Bagoes
AU - Slamet, Joko
AU - Ghufron, Muhammad Zakky
AU - Putra, Muhammad Aidiel Rachman
AU - Ahmad, Tohari
N1 - Publisher Copyright:
© 2024 The authors.
PY - 2024/2
Y1 - 2024/2
N2 - Recently, the number of Android users has significantly increased, which has made Android a target for attackers to launch their malicious activities. Malware or malicious code is often embedded in Android apps to gain access to the user's device and retrieve personal data. Researchers have explored various approaches to mitigate the spread of Android malware. Besides, the Android malware dataset has huge dimensions with hundreds of features. Choosing the proper feature selection method is one of the challenges for producing a reliable detection model. This paper proposes an approach to detecting Android malware and classifying it into five categories using gain ratio feature selection and an ensemble machine learning algorithm. Features are reduced based on their importance value through the gain ratio calculation method. Then, features that are considered necessary are included in a classification process that combines many models. Experiment using the CICMalDroid2020 (Canadian Institute for Cybersecurity Malware of Android 2020) dataset shows that the proposed approach can improve detection performance. Gain ratio feature selection improves the detection accuracy in several machine learning classification algorithms, 2.59% in Naïve Bayes, 0.90% in κ-Nearest Neighbor, and 2.29% in Support Vector Machine. Thus, the ensembled machine learning models of Random Forest, Extra Tree, and k-Nearest Neighbors achieved the highest performance, with an accuracy of 94.57% and a precision score of 94.71%.
AB - Recently, the number of Android users has significantly increased, which has made Android a target for attackers to launch their malicious activities. Malware or malicious code is often embedded in Android apps to gain access to the user's device and retrieve personal data. Researchers have explored various approaches to mitigate the spread of Android malware. Besides, the Android malware dataset has huge dimensions with hundreds of features. Choosing the proper feature selection method is one of the challenges for producing a reliable detection model. This paper proposes an approach to detecting Android malware and classifying it into five categories using gain ratio feature selection and an ensemble machine learning algorithm. Features are reduced based on their importance value through the gain ratio calculation method. Then, features that are considered necessary are included in a classification process that combines many models. Experiment using the CICMalDroid2020 (Canadian Institute for Cybersecurity Malware of Android 2020) dataset shows that the proposed approach can improve detection performance. Gain ratio feature selection improves the detection accuracy in several machine learning classification algorithms, 2.59% in Naïve Bayes, 0.90% in κ-Nearest Neighbor, and 2.29% in Support Vector Machine. Thus, the ensembled machine learning models of Random Forest, Extra Tree, and k-Nearest Neighbors achieved the highest performance, with an accuracy of 94.57% and a precision score of 94.71%.
KW - Android malware
KW - Android security
KW - ensemble machine learning
KW - gain ratio
KW - information security
KW - malware detection
KW - national security
KW - network infrastructure
KW - network security
UR - http://www.scopus.com/inward/record.url?scp=85187678764&partnerID=8YFLogxK
U2 - 10.18280/ijsse.140126
DO - 10.18280/ijsse.140126
M3 - Article
AN - SCOPUS:85187678764
SN - 2041-9031
VL - 14
SP - 259
EP - 266
JO - International Journal of Safety and Security Engineering
JF - International Journal of Safety and Security Engineering
IS - 1
ER -