A Comparative Analysis and Evaluation of Machine Learning Algorithms for Malware Detection
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The increasing complexity of malware has become a threat to the security of individuals, businesses, and institutions that operate in cyberspace. New malware variants are regularly being created with obfuscation techniques to steal confidential information and cause harm to users' computers while evading detection. Due to this, malware detection and analysis are critical components of Cybersecurity. This study documents the processes undertaken to perform a comparative analysis and evaluation of the current machine-learning algorithms for malware detection and analysis to determine the most efficient model. Efficiency was measured in terms of accuracy, precision, recall, specificity, f1 score, index of balanced accuracy, and Matthews correlation coefficient. The findings indicate the Random Forest classifier is the most efficient as it outperformed the other algorithms studied. The study also identified factors that enhanced the performance of machine learning models, concluding that feature selection using Recursive feature elimination and handling imbalance in the dataset using Synthetic Minority Oversampling Technique improve model performance.