Abstract
Malware analysis is a vital and challenging task in the ever-changing cyber threat landscape. Traditional signature-based methods cannot keep up with the fast-paced evolution of malware variants. This underscores the need for develo** a more effective malware classification tool hel** in exploring new research directions with solutions. In this research work firstly, we present a comprehensive assessment on malware classification using dendrogram clustering techniques. This work proposes, a robust approach to malware analysis by constructing dendrogram that groups malware samples which works based on their ancestral relationships. Semantic analysis of the code also provides deeper insights into the malware’s behavior and functionality, enabling more precise identification. These methods help to detect new variants as well as enhance the recognition of existing malware families. Our proposed model results shows that, the robustness in accuracy and False Positive Rate when compared with other existing models.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01982-z/MediaObjects/41870_2024_1982_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01982-z/MediaObjects/41870_2024_1982_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01982-z/MediaObjects/41870_2024_1982_Figa_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01982-z/MediaObjects/41870_2024_1982_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01982-z/MediaObjects/41870_2024_1982_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01982-z/MediaObjects/41870_2024_1982_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01982-z/MediaObjects/41870_2024_1982_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01982-z/MediaObjects/41870_2024_1982_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01982-z/MediaObjects/41870_2024_1982_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01982-z/MediaObjects/41870_2024_1982_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01982-z/MediaObjects/41870_2024_1982_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01982-z/MediaObjects/41870_2024_1982_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41870-024-01982-z/MediaObjects/41870_2024_1982_Fig12_HTML.png)
Data availability
The dataset ‘Malware Memory Analysis/CIC-MalMem-2022’ by Lucca Godoy is publicly available on Kaggle. The dataset can be retrieved from the following URL: https://www.kaggle.com/datasets/luccagodoy/obfuscated-malware-memory-2022-cic.
References
K Shaukat, TM Alam, IA Hameed, WA Khan, N Abbas, S Luo. (2021) A review on security challenges in internet of things (IoT). https://doi.org/10.23919/ICAC50006.2021.9594183
Damodaran A, Troia FD, Visaggio CA et al (2017) A comparison of static, dynamic, and hybrid analysis for malware detection. J Comput Virol Hack Tech 13:1–12. https://doi.org/10.1007/s11416-015-0261-z
Sihwail R, Omar K, Ariffin Z, Akram K (2019) Malware detection approach based on artifacts in memory image and dynamic analysis. Appl Sci. https://doi.org/10.3390/app9183680
Taha A, Barukab O (2022) Android malware classification using optimized ensemble learning based on genetic algorithms. Sustainability 14:1–11. https://doi.org/10.3390/su142114406
Dhanya L, Chitra R, Anusha-Bamini AM (2022) Performance evaluation of various ensemble classifiers for malware detection. Mater Proc 62(7):4973–4979. https://doi.org/10.1016/j.matpr.2022.03.696
https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.dendrogram.html
Mat SRT, Ab-Razak MF, Kahar MNM et al (2021) Towards a systematic description of the field using bibliometric analysis: malware evolution. Scientometrics 126:2013–2055. https://doi.org/10.1007/s11192-020-03834-6
Mostafa M, Sani NS (2022) An optimal framework for SDN based on deep neural network. Comput, Mater Continua. https://doi.org/10.32604/cmc.2022.025810
Hashemi H, Hamzeh A (2019) Visual malware detection using local malicious pattern. J Comput Virol Hack Tech 15:1–14. https://doi.org/10.1007/s11416-018-0314-1
Sun Z, Rao Z, Chen J, Xu R, He D, Yang H, Liu J (2019) An opcode sequences analysis method for unknown malware detection. In: ICGDA 2019: Proceedings of the 2019 2nd International Conference on Geoinformatics and Data Analysis, pp 15–19. https://doi.org/10.1145/3318236.3318255
Aurangzeb S, Aleem M (2023) Evaluation and classification of obfuscated Android malware through deep learning using ensemble voting mechanism. Sci Reports. https://doi.org/10.1038/s41598-023-30028-w
Shaukat K, Luo S, Varadharajan V, Hameed IA, Chen S, Liu D, Li J (2020) Performance comparison and current challenges of using machine learning techniques in cybersecurity. Energies 13:2509. https://doi.org/10.3390/en13102509
Shaukat K, Luo S, Varadharajan V, Hameed IA, Xu M (2020) A survey on machine learning techniques for cyber security in the last decade. IEEE Access 8:222310–222354. https://doi.org/10.1109/ACCESS.2020.3041951
Han H, Lim S, Suh K, Park S, Cho SJ, Park M (2020) Enhanced android malware detection: an SVM-based machine learning approach. In: 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), Busan, Korea (South), 2020, pp 75–81, https://doi.org/10.1109/BigComp48618.2020.00-96
Singh P, Borgohain SK, Kumar J (2022) Performance enhancement of SVM-based ML malware detection model using data preprocessing. In: 2022 2nd International Conference on Emerging Frontiers in Electrical and Electronic Technologies (ICEFEET), Patna, India, 2022, pp. 1–4, https://doi.org/10.1109/ICEFEET51821.2022.9848192
Baldini G, Geneiatakis D (2019) A performance evaluation on distance measures in KNN for mobile malware detection. In: 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), Paris, France, pp. 193–198, https://doi.org/10.1109/CoDIT.2019.8820510
Assegie TA (2021) An optimized KNN model for signature-based malware detection. In: International Journal of Computer Engineering In Research Trends (IJCERT) , ISSN: 2349–7084, Vol. 8, Issue 02, pp. 46–49
Yilmaz AB, Taspinar Y, Koklu M (2022) Classification of malicious android applications using naive Bayes and support vector machine algorithms. Int J Intell Syst Appl Eng 10:269–274
Garcia FC, Muga II FP (2020) Random forest for malware classification
Roy KS, Ahmed T, Udas PB, Karim ME, Majumdar S (2023) MalHyStack: a hybrid stacked ensemble learning framework with feature engineering schemes for obfuscated malware analysis. Intell Syst Appl 20:200283. https://doi.org/10.1016/j.iswa.2023.200283
Dhalaria M, Gandotra E (2021) CSForest: an approach for imbalanced family classification of android malicious applications. Int J Inf Technol 13:1059–1071. https://doi.org/10.1007/s41870-021-00661-7
Jeon S, Moon J (2020) Malware-detection method with a convolutional recurrent neural network using opcode sequences. Inform Sci 535:1–15. https://doi.org/10.1016/j.ins.2020.05.026
Abdoli HN, Bidgoly AJ, Fallah S (2022) Intrusion detection system using soft labeling and stacking ensemble. Int J Inf Technol 14:3711–3718. https://doi.org/10.1007/s41870-022-01114-5
Rajak A, Tripathi R (2024) DL-SkLSTM approach for cyber security threats detection in 5G enabled IIoT. Int J Inf Technol 16:13–20. https://doi.org/10.1007/s41870-023-01651-7
Roshan K, Zafar A (2024) Ensemble adaptive online machine learning in data stream: a case study in cyber intrusion detection system. Int J Inf Technol. https://doi.org/10.1007/s41870-024-01727-y
SPIDER: a shallow PCA based network intrusion detection system with enhanced recurrent neural networks. J King Saud Univ Comput Inform Sci 34(10):10246–10272, https://doi.org/10.1016/j.jksuci.2022.10.019
Udas PB, Roy KS, Karim ME, Ullah SM (2023) Attention-based RNN architecture for detecting multi-step cyber-attack using PSO metaheuristic. In: 2023 International Conference on Electrical, Computer and Communication Engineering (ECCE), Chittagong, Bangladesh, pp. 1–6, https://doi.org/10.1109/ECCE57851.2023.10101590
Dang QV (2022) Enhancing obfuscated malware detection with machine learning techniques. Communications in computer and information science, vol 1688. Springer, Singapore. https://doi.org/10.1007/978-981-19-8069-5_54
Louk MH, Tama BA (2022) Tree-based classifier ensembles for PE malware analysis: a performance revisit. Algorithms 15:332. https://doi.org/10.3390/a15090332
Yuxin D, Siyi Z (2019) Malware detection based on deep learning algorithm. Neural Comput Appl 31:461–472. https://doi.org/10.1007/s00521-017-3077-6
Cai L, Li Y, ** and classifier parameters. Comput Secur 100:102086. https://doi.org/10.1016/j.cose.2020.102086
Mahindru A, Sangal AL (2021) MLDroid—framework for Android malware detection using machine learning techniques. Neural Comput Appl 33:5183–5240. https://doi.org/10.1007/s00521-020-05309-4
Kavitha PM, Muruganantham B (2021) An extensive review on malware classification based on classifiers
**e N, Qin Z, Di X (2023) GA-StackingMD: android malware detection method based on genetic algorithm optimized stacking. Appl Sci 13:2629. https://doi.org/10.3390/app13042629
Joshi A, Kumar S (2023) Stacking-based ensemble model for malware detection in android devices. Int J Inf Technol 15:2907–2915. https://doi.org/10.1007/s41870-023-01392-7
Alomari ES, Nuiaa RR, Alyasseri ZAA, Mohammed HJ, Sani NS, Esa MI, Musawi BA (2023) Malware detection using deep learning and correlation-based feature selection. Symmetry 15(1):123. https://doi.org/10.3390/sym15010123
Al-Ogaili RRN, Alomari ES, Alkorani MBM et al (2023) Malware cyberattacks detection using a novel feature selection method based on a modified whale optimization algorithm. Wirel Netw. https://doi.org/10.1007/s11276-023-03606-z
Li X, Kong K, Xu S, Qin P, He D (2021) Feature selection-based android malware adversarial sample generation and detection method
Masabo E, Kaawaase KS, Sansa-Otim J et al (2020) Improvement of malware classification using hybrid feature engineering. SN Comput Sci 1:17. https://doi.org/10.1007/s42979-019-0017-9
Abawajy J, Darem A, Alhashmi AA (2021) Feature subset selection for malware detection in smart IoT platforms. Sensors 21(4):1374. https://doi.org/10.3390/s21041374
Islam R, Sayed MI, Saha S, Hossain MJ, Masud MA (2023) Android malware classification using optimum feature selection and ensemble machine learning. Internet of Things Cyber-Phys Syst. https://doi.org/10.1016/j.iotcps.2023.03.001
Zhang JY (2019) Machine learning with feature selection using principal component analysis for malware detection: a case study. Ar**v, abs/1902.03639
Parveen AN, Inbarani HH, Kumar ENS (2012) Performance analysis of unsupervised feature selection methods. In: 2012 International Conference on Computing, Communication and Applications, Dindigul, India, pp. 1–7, https://doi.org/10.1109/ICCCA.2012.6179181
Abbasi MS, Al-Sahaf H, Welch I (2020) Particle swarm optimization: a wrapper-based feature selection method for ransom ware detection and classification. https://doi.org/10.1007/978-3-030-43722-0_12
Ramjee S, Gamal AE (2019) Efficient wrapper feature selection using autoencoder and model based elimination. Ar**v, abs/1905.11592
Sharifipour S, Fayyazi H, Sabokrou M, Adeli E (2019) Unsupervised feature ranking and selection based on autoencoders. In: ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, pp. 3172–3176, https://doi.org/10.1109/ICASSP.2019.8682226
He T, Han C, Tanaka A, Takahashi T, Takeuchi JA (2023) New seed set selection method of the scalable method for constructing dendrogram trees
He T, Han C, Isawa R, Takahashi T, Kijima S, Takeuchi JI, Nakao K (2019) A fast algorithm for constructing phylogenetic trees with application to IoT malware clustering. https://doi.org/10.1007/978-3-030-36708-4_63
Lucca Godoy (2024) Malware memory analysis/CIC-MalMem-2022. Kaggle. https://www.kaggle.com/datasets/luccagodoy/obfuscated-malware-memory-2022-cic
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Ethical approval
Not applicable.
Consent for publication
Yes.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Naveen Kumar, N., Balamurugan, S., Maruthamuthu, R. et al. A robust method for malware analysis using stacking classifiers and dendrogram visualization. Int. j. inf. tecnol. (2024). https://doi.org/10.1007/s41870-024-01982-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41870-024-01982-z