Abstract
The need to develop more efficient network traffic data generation techniques that can reproduce the intricate features of traffic flows forms a central element in secured monitoring systems for networks and cybersecurity. This study investigates selected Generative Adversarial Network (GAN) architectures to generate realistic network traffic samples. It incorporates Extreme Gradient Boosting (XGBoost), an Ensemble Machine Learning algorithm effectively used for the classification and detection of observed and unobserved Advanced Persistent Threat (APT) attack samples in the synthetic and new data distributions. Results show that the Wasserstein GAN architectures achieve optimal generation with a sustained Earth Mover distance estimation of \(10^{-3}\) between the Critic loss and the Generator loss compared to the vanilla GAN architecture. Performance statistics using XGBoost and other evaluation metrics indicate successful generation and detection with an accuracy of 99.97% a recall rate of 99.94%, and 100% precision. Further results show a 99.97% \(f_1\) score for detecting APT samples in the synthetic data, and a Receiver Operator Characteristic Area Under the Curve (ROC_AUC) value of 1.0, indicating optimum behavior, surpassing previous state-of-the-art methods. When evaluated on unseen data, the proposed approach maintains optimal detection performance with 100% recall, 100% Area Under the Curve (AUC) and precision above 90%.
Supported by the University of Warwick.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdullayeva, F.J.: Advanced persistent threat attack detection method in cloud computing based on autoencoder and softmax regression algorithm. Array 10, 100067-1–100067-11 (2021)
Ahmad, A., Webb, J., Desouza, K.C., Boorman, J.: Strategically-motivated advanced persistent threat: definition, process, tactics and a disinformation model of counterattack. Comput. Secur. 86, 402–418 (2019)
Alqahtani, S.H., Thorne, M.K., Kumar, G.: Applications of generative adversarial networks (GANs): an updated review. Arch. Comput. Methods Eng. 28(2), 525–552 (2021)
Alshamrani, A., Myneni, S., Chowdhary, A., Huang, D.: A survey on advanced persistent threats: techniques, solutions, challenges, and research opportunities. IEEE Commun. Surv. Tutor. 21(2), 1851–1877 (2019)
Anande, T.J., Leeson, M.S.: Generative adversarial networks (GANs): a survey on network traffic generation. Int. J. Mach. Learn. Comput. 12(6), 333–343 (2022)
Anande, T.J., Al-Saadi, S., Leeson, M.S.: Generative adversarial networks for network traffic feature generation. Int. J. Comput. Appl. 1–9 (2023). https://doi.org/10.1080/1206212X.2023.2191072
Bentéjac, C., Csörgö, A., Martínez-Muñoz, G.: A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 54(3), 1937–1967 (2021)
Biggio, B., Šrndić, N.: Machine learning for computer security. In: Joseph, A.D., Laskov, P., Roli, F., Tygar, J.D., Nelson, B. (eds.) Machine Learning Methods for Computer Security, vol. 3, pp. 5–10. Dagstuhl Manifestos, Dagstuhl (2012)
Chan, T.N., Yiu, M.L., U, L.H.: The power of bounds: answering approximate earth mover’s distance with parametric bounds. IEEE Trans. Knowl. Data Eng. 33(2), 768–781 (2021)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting systems. In: 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), pp. 785–794. ACM, New York (2016)
Chen, Y.W., Lin, C.J.: Combining SVMs with various feature selection strategies. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. Studies in Fuzziness and Soft Computing, vol. 207, pp. 314–324. Springer, Berlin (2006). https://doi.org/10.1007/978-3-540-35488-8_13
Chen, P., Desmet, L., Huygens, C.: A study on advanced persistent threats. In: De Decker, B., Zúquete, A. (eds.) CMS 2014. LNCS, vol. 8735, pp. 63–72. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44885-4_5
Dhaliwal, S., Nahid, A., Abbas, R.: Effective intrusion detection system using XGBoost. Information 9(7), 149-1–149-24 (2018)
Ding, B., Qian, H., Zhou, J.: Activation functions and their characteristics in deep neural networks. In: Chinese Control and Decision Conference (CCDC), pp. 1836–1841. IEEE, Piscataway (2018)
Dingledine, R., Mathewson, N., Syverson, P.: Tor: the second-generation onion router. In: 13th USENIX Security Symposium, pp. 303–320. USENIX Association (2004)
Dixon, M.F., Polson, N.G., Sokolov, V.O.: Deep learning for spatio-temporal modeling: dynamic traffic flows and high frequency trading. Appl. Stoch. Model. Bus. Ind. 35(3), 788–807 (2019)
Ferdowsi, A., Saad, W.: Generative adversarial networks for distributed intrusion detection in the internet of things. In: IEEE Global Communications Conference (GLOBECOM), pp. 1–6. IEEE, Piscataway (2019)
Ghafir, I., Prenosil, V.: Proposed approach for targeted attacks detection. In: Sulaiman, H.A., Othman, M.A., Othman, M.F.I., Rahim, Y.A., Pee, N.C. (eds.) Advanced Computer and Communication Engineering Technology. LNEE, vol. 362, pp. 73–80. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24584-3_7
Gibert, D., Mateu, C., Planes, J., Vicens, R.: Using convolutional neural networks for classification of malware represented as images. J. Comput. Virol. Hacking Tech. 15(1), 15–28 (2019)
Goodfellow, I.J., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Proceedings of Advances in Neural Information Processing Systems (NIPS 2014), vol. 27, pp. 2672–2680. Curran Associates Inc., Red Hook (2014)
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: 30th Conference on Advances in Neural Information Processing Systems (NIPS 2017), pp. 5767–5777. Curran Associates Inc., Red Hook (2017)
Hitaj, B., Gasti, P., Ateniese, G., Perez-Cruz, F.: PassGAN: a deep learning approach for password guessing. In: NeurIPS 2018 Workshop on Security in Machine Learning (2018)
Hurtik, P., Tomasiello, S., Hula, J., Hynar, D.: Binary cross-entropy with dynamical clip**. Neural Comput. Appl. (1), 1–13 (2022). https://doi.org/10.1007/s00521-022-07091-x
Ishitaki, T., Obukata, R., Oda, T., Barolli, L.: Application of deep recurrent neural networks for prediction of user behavior in tor networks. In: Barolli, L., Takizawa, M., Enokido, T., Hsu, H.H., Lin, C.Y. (eds.) 31st International Conference on Advanced Information Networking and Applications Workshops (WAINA), vol. 56, pp. 238–243. IEEE, Piscataway (2017)
Javaid, A., Niyaz, Q., Sun, W., Alam, M.: Deep learning for spatio-temporal modeling: dynamic traffic flows and high frequency trading. In: 9th EAI International Conference on Bio-Inspired Information and Communications Technologies, vol. 3, pp. 21–26. Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering (2016)
Jung, W., Kim, S., Choi, S.: Deep learning for zero-day flash malware detection. In: 36th IEEE Symposium on Security and Privacy. IEEE (2015, poster)
Kingma, D.P., Ba, L.J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR). ICLR (2015, poster)
Kiperberg, M., Resh, A., Zaidenberg, N.: Malware analysis. In: Lehto, M., Neittaanmäki, P. (eds.) Cyber Security. CMAS, vol. 56, pp. 475–484. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-91293-2_21
Kobojek, P., Saeed, K.: Application of recurrent neural networks for user verification based on keystroke dynamics. J. Telecommun. Inf. Technol. 3, 80–90 (2016)
Kos, J., Fischer, I., Song, D.: Adversarial examples for generative model. In: IEEE Security and Privacy Workshops (SPW), pp. 36–42. IEEE, Piscataway (2018)
Kramer, O.: Machine Learning for Evolution Strategies. Springer, Cham (2016)
Kudugunta, S., Ferrara, E.: Deep neural networks for bot detection. Inf. Sci. 467, 312–322 (2018)
Li, A.J., Madry, A., Peebles, J., Schmidt, L.: On the limitations of first-order approximation in GAN dynamics. Proc. Mach. Learn. Res. 80, 3005–3013 (2018)
Li, W., Moore, A.: A machine learning approach for efficient traffic classification. In: 15th International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 310–317. IEEE, Piscataway (2007)
Li, Y., Wu, H.: A clustering method based on k-means algorithm. Phys. Procedia 25, 1104–1109 (2012)
Lin, Z., Shi, Y., Xue, Z.: IDSGAN: generative adversarial networks for attack generation against intrusion detection. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds.) PAKDD 2022. LNAI, vol. 13282, pp. 79–91. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-05981-0_7
Liu, Z., Li, S., Zhang, Y., Yun, X., Cheng, Z.: Efficient malware originated traffic classification by using generative adversarial networks. In: IEEE Symposium on Computers and Communications (ISCC), pp. 1–7. IEEE, Piscataway (2020)
de Melo, C.M., Torralba, A., Guibas, L., DiCarlo, J., Chellappa, R., Hodgins, J.: Next-generation deep learning based on simulators and synthetic data. Trends Cogn. Sci. 26(2), 174–187 (2022)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. ar**v e-prints (2014)
Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Military Communications and Information Systems Conference (MilCIS), pp. 1–6. IEEE, Piscataway (2015)
Nath, H.V., Mehtre, B.M.: Static malware analysis using machine learning methods. In: Martínez Pérez, G., Thampi, S.M., Ko, R., Shu, L. (eds.) SNDS 2014. CCIS, vol. 420, pp. 440–450. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54525-2_39
Nikos, V., Oscar, S., Luc, D.: Big data analytics for sophisticated attack detection. ISASCA J. 3, 1–8 (2014)
Pang, G., Shen, C., Cao, L., Hengel, A.V.D.: Deep learning for anomaly detection: a review. ACM Comput. Surv. 54(2), 1–38 (2022)
Powers, D.M.W.: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. Int. J. Mach. Learn. Technol. 2(1), 37–63 (2011)
Sedgwick, P.: Pearson’s correlation coefficient. Br. Med. J. 345, e4483-1–e4483-2 (2012)
Seo, E., Song, H.M., Kim, H.K.: GIDS: GAN based intrusion detection system for in-vehicle network. In: 16th Annual Conference on Privacy, Security and Trust (PST), pp. 1–6. IEEE, Piscataway (2018)
Thakkar, A., Lohiyan, R.: A review on machine learning and deep learning perspectives of ids for IoT: recent updates, security issues, and challenges. Arch. Comput. Methods Eng. 28(4), 3211–3243 (2021)
Torres, P., Catania, C., Garcia, S., Garino, C.G.: An analysis of recurrent neural networks for botnet detection behavior. In: Biennial Congress of Argentina (ARGENCON), pp. 1–6. IEEE (2016)
Tran, D., Mac, H., Tong, V., Tran, H.A., Nguyen, L.G.: A LSTM based framework for handling multiclass imbalance in DGA botnet detection. Neurocomputing 275, 2401–2413 (2018)
Usama, M., Asim, M., Latif, S., Qadir, J., Ala-Al-Fuqaha: Generative adversarial networks for launching and thwarting adversarial attacks on network intrusion detection systems. In: 15th International Wireless Communications & Mobile Computing Conference (IWCMC), pp. 78–83. IEEE, Piscataway (2019)
Wang, K., Gou, C., Duan, Y., Lin, Y., Zheng, X., Wang, F.Y.: Generative adversarial networks: introduction and outlook. IEEE/CAA J. Automatica Sinica 4(4), 588–598 (2017)
Wang, Z.: The applications of deep learning on traffic identification (2015). https://www.blackhat.com/docs/us-15/materials/us-15-Wang-The-Applications-Of-Deep-Learning-On-Traffic-Identification-wp.pdf. Accessed 9 Nov 2022
Yin, C., Zhu, Y., Liu, S., Fei, J., Zhang, H.: An enhancing framework for botnet detection using generative adversarial networks. In: International Conference on Artificial Intelligence and Big Data, pp. 228–234. IEEE (2018)
Zhang, M., Xu, B., Bai, S., Lu, S., Lin, Z.: A deep learning method to detect web attacks using a specially designed CNN. In: Liu, D., **e, S., Li, Y., Zhao, D., El-Alfy, E.-S.M. (eds.) ICONIP 2017. LNCS, vol. 10638, pp. 828–836. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70139-4_84
Zheng, M., et al.: Conditional Wasserstein generative adversarial network-gradient penalty-based approach to alleviating imbalanced data classification. Inf. Sci. 512, 1009–1023 (2020)
Acknowledgements
This research is supported by the University of Warwick School of Engineering and Cyber Security Global Research Priorities - Early Career Fellowships (Cyber Security GRP).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Anande, T.J., Leeson, M.S. (2023). Synthetic Network Traffic Data Generation and Classification of Advanced Persistent Threat Samples: A Case Study with GANs and XGBoost. In: Conte, D., Fred, A., Gusikhin, O., Sansone, C. (eds) Deep Learning Theory and Applications. DeLTA 2023. Communications in Computer and Information Science, vol 1875. Springer, Cham. https://doi.org/10.1007/978-3-031-39059-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-39059-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39058-6
Online ISBN: 978-3-031-39059-3
eBook Packages: Computer ScienceComputer Science (R0)