Android Malware Classification Using Gain Ratio and Ensembled Machine Learning

Dwinanda Bagoes Ansori, Joko Slamet, Muhammad Zakky Ghufron, Muhammad Aidiel Rachman Putra, Tohari Ahmad*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


Recently, the number of Android users has significantly increased, which has made Android a target for attackers to launch their malicious activities. Malware or malicious code is often embedded in Android apps to gain access to the user's device and retrieve personal data. Researchers have explored various approaches to mitigate the spread of Android malware. Besides, the Android malware dataset has huge dimensions with hundreds of features. Choosing the proper feature selection method is one of the challenges for producing a reliable detection model. This paper proposes an approach to detecting Android malware and classifying it into five categories using gain ratio feature selection and an ensemble machine learning algorithm. Features are reduced based on their importance value through the gain ratio calculation method. Then, features that are considered necessary are included in a classification process that combines many models. Experiment using the CICMalDroid2020 (Canadian Institute for Cybersecurity Malware of Android 2020) dataset shows that the proposed approach can improve detection performance. Gain ratio feature selection improves the detection accuracy in several machine learning classification algorithms, 2.59% in Naïve Bayes, 0.90% in κ-Nearest Neighbor, and 2.29% in Support Vector Machine. Thus, the ensembled machine learning models of Random Forest, Extra Tree, and k-Nearest Neighbors achieved the highest performance, with an accuracy of 94.57% and a precision score of 94.71%.

Original languageEnglish
Pages (from-to)259-266
Number of pages8
JournalInternational Journal of Safety and Security Engineering
Issue number1
Publication statusPublished - Feb 2024


  • Android malware
  • Android security
  • ensemble machine learning
  • gain ratio
  • information security
  • malware detection
  • national security
  • network infrastructure
  • network security


Dive into the research topics of 'Android Malware Classification Using Gain Ratio and Ensembled Machine Learning'. Together they form a unique fingerprint.

Cite this