Log in

A robust method for malware analysis using stacking classifiers and dendrogram visualization

  • Original Research
  • Published:
International Journal of Information Technology Aims and scope Submit manuscript

Abstract

Malware analysis is a vital and challenging task in the ever-changing cyber threat landscape. Traditional signature-based methods cannot keep up with the fast-paced evolution of malware variants. This underscores the need for develo** a more effective malware classification tool hel** in exploring new research directions with solutions. In this research work firstly, we present a comprehensive assessment on malware classification using dendrogram clustering techniques. This work proposes, a robust approach to malware analysis by constructing dendrogram that groups malware samples which works based on their ancestral relationships. Semantic analysis of the code also provides deeper insights into the malware’s behavior and functionality, enabling more precise identification. These methods help to detect new variants as well as enhance the recognition of existing malware families. Our proposed model results shows that, the robustness in accuracy and False Positive Rate when compared with other existing models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm:
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Data availability

The dataset ‘Malware Memory Analysis/CIC-MalMem-2022’ by Lucca Godoy is publicly available on Kaggle. The dataset can be retrieved from the following URL: https://www.kaggle.com/datasets/luccagodoy/obfuscated-malware-memory-2022-cic.

References

  1. K Shaukat, TM Alam, IA Hameed, WA Khan, N Abbas, S Luo. (2021) A review on security challenges in internet of things (IoT). https://doi.org/10.23919/ICAC50006.2021.9594183

  2. Damodaran A, Troia FD, Visaggio CA et al (2017) A comparison of static, dynamic, and hybrid analysis for malware detection. J Comput Virol Hack Tech 13:1–12. https://doi.org/10.1007/s11416-015-0261-z

    Article  Google Scholar 

  3. Sihwail R, Omar K, Ariffin Z, Akram K (2019) Malware detection approach based on artifacts in memory image and dynamic analysis. Appl Sci. https://doi.org/10.3390/app9183680

    Article  Google Scholar 

  4. Taha A, Barukab O (2022) Android malware classification using optimized ensemble learning based on genetic algorithms. Sustainability 14:1–11. https://doi.org/10.3390/su142114406

    Article  Google Scholar 

  5. Dhanya L, Chitra R, Anusha-Bamini AM (2022) Performance evaluation of various ensemble classifiers for malware detection. Mater Proc 62(7):4973–4979. https://doi.org/10.1016/j.matpr.2022.03.696

    Article  Google Scholar 

  6. https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.dendrogram.html

  7. Mat SRT, Ab-Razak MF, Kahar MNM et al (2021) Towards a systematic description of the field using bibliometric analysis: malware evolution. Scientometrics 126:2013–2055. https://doi.org/10.1007/s11192-020-03834-6

    Article  Google Scholar 

  8. Mostafa M, Sani NS (2022) An optimal framework for SDN based on deep neural network. Comput, Mater Continua. https://doi.org/10.32604/cmc.2022.025810

    Article  Google Scholar 

  9. Hashemi H, Hamzeh A (2019) Visual malware detection using local malicious pattern. J Comput Virol Hack Tech 15:1–14. https://doi.org/10.1007/s11416-018-0314-1

    Article  Google Scholar 

  10. Sun Z, Rao Z, Chen J, Xu R, He D, Yang H, Liu J (2019) An opcode sequences analysis method for unknown malware detection. In: ICGDA 2019: Proceedings of the 2019 2nd International Conference on Geoinformatics and Data Analysis, pp 15–19. https://doi.org/10.1145/3318236.3318255

  11. Aurangzeb S, Aleem M (2023) Evaluation and classification of obfuscated Android malware through deep learning using ensemble voting mechanism. Sci Reports. https://doi.org/10.1038/s41598-023-30028-w

    Article  Google Scholar 

  12. Shaukat K, Luo S, Varadharajan V, Hameed IA, Chen S, Liu D, Li J (2020) Performance comparison and current challenges of using machine learning techniques in cybersecurity. Energies 13:2509. https://doi.org/10.3390/en13102509

    Article  Google Scholar 

  13. Shaukat K, Luo S, Varadharajan V, Hameed IA, Xu M (2020) A survey on machine learning techniques for cyber security in the last decade. IEEE Access 8:222310–222354. https://doi.org/10.1109/ACCESS.2020.3041951

    Article  Google Scholar 

  14. Han H, Lim S, Suh K, Park S, Cho SJ, Park M (2020) Enhanced android malware detection: an SVM-based machine learning approach. In: 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), Busan, Korea (South), 2020, pp 75–81, https://doi.org/10.1109/BigComp48618.2020.00-96

  15. Singh P, Borgohain SK, Kumar J (2022) Performance enhancement of SVM-based ML malware detection model using data preprocessing. In: 2022 2nd International Conference on Emerging Frontiers in Electrical and Electronic Technologies (ICEFEET), Patna, India, 2022, pp. 1–4, https://doi.org/10.1109/ICEFEET51821.2022.9848192

  16. Baldini G, Geneiatakis D (2019) A performance evaluation on distance measures in KNN for mobile malware detection. In: 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), Paris, France, pp. 193–198, https://doi.org/10.1109/CoDIT.2019.8820510

  17. Assegie TA (2021) An optimized KNN model for signature-based malware detection. In: International Journal of Computer Engineering In Research Trends (IJCERT) , ISSN: 2349–7084, Vol. 8, Issue 02, pp. 46–49

  18. Yilmaz AB, Taspinar Y, Koklu M (2022) Classification of malicious android applications using naive Bayes and support vector machine algorithms. Int J Intell Syst Appl Eng 10:269–274

    Google Scholar 

  19. Garcia FC, Muga II FP (2020) Random forest for malware classification

  20. Roy KS, Ahmed T, Udas PB, Karim ME, Majumdar S (2023) MalHyStack: a hybrid stacked ensemble learning framework with feature engineering schemes for obfuscated malware analysis. Intell Syst Appl 20:200283. https://doi.org/10.1016/j.iswa.2023.200283

    Article  Google Scholar 

  21. Dhalaria M, Gandotra E (2021) CSForest: an approach for imbalanced family classification of android malicious applications. Int J Inf Technol 13:1059–1071. https://doi.org/10.1007/s41870-021-00661-7

    Article  Google Scholar 

  22. Jeon S, Moon J (2020) Malware-detection method with a convolutional recurrent neural network using opcode sequences. Inform Sci 535:1–15. https://doi.org/10.1016/j.ins.2020.05.026

    Article  MathSciNet  Google Scholar 

  23. Abdoli HN, Bidgoly AJ, Fallah S (2022) Intrusion detection system using soft labeling and stacking ensemble. Int J Inf Technol 14:3711–3718. https://doi.org/10.1007/s41870-022-01114-5

    Article  Google Scholar 

  24. Rajak A, Tripathi R (2024) DL-SkLSTM approach for cyber security threats detection in 5G enabled IIoT. Int J Inf Technol 16:13–20. https://doi.org/10.1007/s41870-023-01651-7

    Article  Google Scholar 

  25. Roshan K, Zafar A (2024) Ensemble adaptive online machine learning in data stream: a case study in cyber intrusion detection system. Int J Inf Technol. https://doi.org/10.1007/s41870-024-01727-y

    Article  Google Scholar 

  26. SPIDER: a shallow PCA based network intrusion detection system with enhanced recurrent neural networks. J King Saud Univ Comput Inform Sci 34(10):10246–10272, https://doi.org/10.1016/j.jksuci.2022.10.019

  27. Udas PB, Roy KS, Karim ME, Ullah SM (2023) Attention-based RNN architecture for detecting multi-step cyber-attack using PSO metaheuristic. In: 2023 International Conference on Electrical, Computer and Communication Engineering (ECCE), Chittagong, Bangladesh, pp. 1–6, https://doi.org/10.1109/ECCE57851.2023.10101590

  28. Dang QV (2022) Enhancing obfuscated malware detection with machine learning techniques. Communications in computer and information science, vol 1688. Springer, Singapore. https://doi.org/10.1007/978-981-19-8069-5_54

    Chapter  Google Scholar 

  29. Louk MH, Tama BA (2022) Tree-based classifier ensembles for PE malware analysis: a performance revisit. Algorithms 15:332. https://doi.org/10.3390/a15090332

    Article  Google Scholar 

  30. Yuxin D, Siyi Z (2019) Malware detection based on deep learning algorithm. Neural Comput Appl 31:461–472. https://doi.org/10.1007/s00521-017-3077-6

    Article  Google Scholar 

  31. Cai L, Li Y, ** and classifier parameters. Comput Secur 100:102086. https://doi.org/10.1016/j.cose.2020.102086

    Article  Google Scholar 

  32. Mahindru A, Sangal AL (2021) MLDroid—framework for Android malware detection using machine learning techniques. Neural Comput Appl 33:5183–5240. https://doi.org/10.1007/s00521-020-05309-4

    Article  Google Scholar 

  33. Kavitha PM, Muruganantham B (2021) An extensive review on malware classification based on classifiers

  34. **e N, Qin Z, Di X (2023) GA-StackingMD: android malware detection method based on genetic algorithm optimized stacking. Appl Sci 13:2629. https://doi.org/10.3390/app13042629

    Article  Google Scholar 

  35. Joshi A, Kumar S (2023) Stacking-based ensemble model for malware detection in android devices. Int J Inf Technol 15:2907–2915. https://doi.org/10.1007/s41870-023-01392-7

    Article  Google Scholar 

  36. Alomari ES, Nuiaa RR, Alyasseri ZAA, Mohammed HJ, Sani NS, Esa MI, Musawi BA (2023) Malware detection using deep learning and correlation-based feature selection. Symmetry 15(1):123. https://doi.org/10.3390/sym15010123

    Article  Google Scholar 

  37. Al-Ogaili RRN, Alomari ES, Alkorani MBM et al (2023) Malware cyberattacks detection using a novel feature selection method based on a modified whale optimization algorithm. Wirel Netw. https://doi.org/10.1007/s11276-023-03606-z

    Article  Google Scholar 

  38. Li X, Kong K, Xu S, Qin P, He D (2021) Feature selection-based android malware adversarial sample generation and detection method

  39. Masabo E, Kaawaase KS, Sansa-Otim J et al (2020) Improvement of malware classification using hybrid feature engineering. SN Comput Sci 1:17. https://doi.org/10.1007/s42979-019-0017-9

    Article  Google Scholar 

  40. Abawajy J, Darem A, Alhashmi AA (2021) Feature subset selection for malware detection in smart IoT platforms. Sensors 21(4):1374. https://doi.org/10.3390/s21041374

    Article  Google Scholar 

  41. Islam R, Sayed MI, Saha S, Hossain MJ, Masud MA (2023) Android malware classification using optimum feature selection and ensemble machine learning. Internet of Things Cyber-Phys Syst. https://doi.org/10.1016/j.iotcps.2023.03.001

    Article  Google Scholar 

  42. Zhang JY (2019) Machine learning with feature selection using principal component analysis for malware detection: a case study. Ar**v, abs/1902.03639

  43. Parveen AN, Inbarani HH, Kumar ENS (2012) Performance analysis of unsupervised feature selection methods. In: 2012 International Conference on Computing, Communication and Applications, Dindigul, India, pp. 1–7, https://doi.org/10.1109/ICCCA.2012.6179181

  44. Abbasi MS, Al-Sahaf H, Welch I (2020) Particle swarm optimization: a wrapper-based feature selection method for ransom ware detection and classification. https://doi.org/10.1007/978-3-030-43722-0_12

  45. Ramjee S, Gamal AE (2019) Efficient wrapper feature selection using autoencoder and model based elimination. Ar**v, abs/1905.11592

  46. Sharifipour S, Fayyazi H, Sabokrou M, Adeli E (2019) Unsupervised feature ranking and selection based on autoencoders. In: ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, pp. 3172–3176, https://doi.org/10.1109/ICASSP.2019.8682226

  47. He T, Han C, Tanaka A, Takahashi T, Takeuchi JA (2023) New seed set selection method of the scalable method for constructing dendrogram trees

  48. He T, Han C, Isawa R, Takahashi T, Kijima S, Takeuchi JI, Nakao K (2019) A fast algorithm for constructing phylogenetic trees with application to IoT malware clustering. https://doi.org/10.1007/978-3-030-36708-4_63

  49. Lucca Godoy (2024) Malware memory analysis/CIC-MalMem-2022. Kaggle. https://www.kaggle.com/datasets/luccagodoy/obfuscated-malware-memory-2022-cic

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to N. Naveen Kumar.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Ethical approval

Not applicable.

Consent for publication

Yes.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Naveen Kumar, N., Balamurugan, S., Maruthamuthu, R. et al. A robust method for malware analysis using stacking classifiers and dendrogram visualization. Int. j. inf. tecnol. (2024). https://doi.org/10.1007/s41870-024-01982-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s41870-024-01982-z

Keywords

Navigation