Synthetic Network Traffic Data Generation and Classification of Advanced Persistent Threat Samples: A Case Study with GANs and XGBoost

Anande, T. J.; Leeson, M. S.

doi:10.1007/978-3-031-39059-3_1

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1875))

Included in the following conference series:

International Conference on Deep Learning Theory and Applications

616 Accesses

Abstract

The need to develop more efficient network traffic data generation techniques that can reproduce the intricate features of traffic flows forms a central element in secured monitoring systems for networks and cybersecurity. This study investigates selected Generative Adversarial Network (GAN) architectures to generate realistic network traffic samples. It incorporates Extreme Gradient Boosting (XGBoost), an Ensemble Machine Learning algorithm effectively used for the classification and detection of observed and unobserved Advanced Persistent Threat (APT) attack samples in the synthetic and new data distributions. Results show that the Wasserstein GAN architectures achieve optimal generation with a sustained Earth Mover distance estimation of \(10^{-3}\) between the Critic loss and the Generator loss compared to the vanilla GAN architecture. Performance statistics using XGBoost and other evaluation metrics indicate successful generation and detection with an accuracy of 99.97% a recall rate of 99.94%, and 100% precision. Further results show a 99.97% \(f_1\) score for detecting APT samples in the synthetic data, and a Receiver Operator Characteristic Area Under the Curve (ROC_AUC) value of 1.0, indicating optimum behavior, surpassing previous state-of-the-art methods. When evaluated on unseen data, the proposed approach maintains optimal detection performance with 100% recall, 100% Area Under the Curve (AUC) and precision above 90%.

Supported by the University of Warwick.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdullayeva, F.J.: Advanced persistent threat attack detection method in cloud computing based on autoencoder and softmax regression algorithm. Array 10, 100067-1–100067-11 (2021)
Google Scholar
Ahmad, A., Webb, J., Desouza, K.C., Boorman, J.: Strategically-motivated advanced persistent threat: definition, process, tactics and a disinformation model of counterattack. Comput. Secur. 86, 402–418 (2019)
Article Google Scholar
Alqahtani, S.H., Thorne, M.K., Kumar, G.: Applications of generative adversarial networks (GANs): an updated review. Arch. Comput. Methods Eng. 28(2), 525–552 (2021)
Article MathSciNet Google Scholar
Alshamrani, A., Myneni, S., Chowdhary, A., Huang, D.: A survey on advanced persistent threats: techniques, solutions, challenges, and research opportunities. IEEE Commun. Surv. Tutor. 21(2), 1851–1877 (2019)
Article Google Scholar
Anande, T.J., Leeson, M.S.: Generative adversarial networks (GANs): a survey on network traffic generation. Int. J. Mach. Learn. Comput. 12(6), 333–343 (2022)
Google Scholar
Anande, T.J., Al-Saadi, S., Leeson, M.S.: Generative adversarial networks for network traffic feature generation. Int. J. Comput. Appl. 1–9 (2023). https://doi.org/10.1080/1206212X.2023.2191072
Bentéjac, C., Csörgö, A., Martínez-Muñoz, G.: A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 54(3), 1937–1967 (2021)
Article Google Scholar
Biggio, B., Šrndić, N.: Machine learning for computer security. In: Joseph, A.D., Laskov, P., Roli, F., Tygar, J.D., Nelson, B. (eds.) Machine Learning Methods for Computer Security, vol. 3, pp. 5–10. Dagstuhl Manifestos, Dagstuhl (2012)
Google Scholar
Chan, T.N., Yiu, M.L., U, L.H.: The power of bounds: answering approximate earth mover’s distance with parametric bounds. IEEE Trans. Knowl. Data Eng. 33(2), 768–781 (2021)
Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting systems. In: 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), pp. 785–794. ACM, New York (2016)
Google Scholar
Chen, Y.W., Lin, C.J.: Combining SVMs with various feature selection strategies. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. Studies in Fuzziness and Soft Computing, vol. 207, pp. 314–324. Springer, Berlin (2006). https://doi.org/10.1007/978-3-540-35488-8_13
Chapter Google Scholar
Chen, P., Desmet, L., Huygens, C.: A study on advanced persistent threats. In: De Decker, B., Zúquete, A. (eds.) CMS 2014. LNCS, vol. 8735, pp. 63–72. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44885-4_5
Chapter Google Scholar
Dhaliwal, S., Nahid, A., Abbas, R.: Effective intrusion detection system using XGBoost. Information 9(7), 149-1–149-24 (2018)
Google Scholar
Ding, B., Qian, H., Zhou, J.: Activation functions and their characteristics in deep neural networks. In: Chinese Control and Decision Conference (CCDC), pp. 1836–1841. IEEE, Piscataway (2018)
Google Scholar
Dingledine, R., Mathewson, N., Syverson, P.: Tor: the second-generation onion router. In: 13th USENIX Security Symposium, pp. 303–320. USENIX Association (2004)
Google Scholar
Dixon, M.F., Polson, N.G., Sokolov, V.O.: Deep learning for spatio-temporal modeling: dynamic traffic flows and high frequency trading. Appl. Stoch. Model. Bus. Ind. 35(3), 788–807 (2019)
Article MathSciNet Google Scholar
Ferdowsi, A., Saad, W.: Generative adversarial networks for distributed intrusion detection in the internet of things. In: IEEE Global Communications Conference (GLOBECOM), pp. 1–6. IEEE, Piscataway (2019)
Google Scholar
Ghafir, I., Prenosil, V.: Proposed approach for targeted attacks detection. In: Sulaiman, H.A., Othman, M.A., Othman, M.F.I., Rahim, Y.A., Pee, N.C. (eds.) Advanced Computer and Communication Engineering Technology. LNEE, vol. 362, pp. 73–80. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24584-3_7
Chapter Google Scholar
Gibert, D., Mateu, C., Planes, J., Vicens, R.: Using convolutional neural networks for classification of malware represented as images. J. Comput. Virol. Hacking Tech. 15(1), 15–28 (2019)
Article Google Scholar
Goodfellow, I.J., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Proceedings of Advances in Neural Information Processing Systems (NIPS 2014), vol. 27, pp. 2672–2680. Curran Associates Inc., Red Hook (2014)
Google Scholar
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: 30th Conference on Advances in Neural Information Processing Systems (NIPS 2017), pp. 5767–5777. Curran Associates Inc., Red Hook (2017)
Google Scholar
Hitaj, B., Gasti, P., Ateniese, G., Perez-Cruz, F.: PassGAN: a deep learning approach for password guessing. In: NeurIPS 2018 Workshop on Security in Machine Learning (2018)
Google Scholar
Hurtik, P., Tomasiello, S., Hula, J., Hynar, D.: Binary cross-entropy with dynamical clip**. Neural Comput. Appl. (1), 1–13 (2022). https://doi.org/10.1007/s00521-022-07091-x
Ishitaki, T., Obukata, R., Oda, T., Barolli, L.: Application of deep recurrent neural networks for prediction of user behavior in tor networks. In: Barolli, L., Takizawa, M., Enokido, T., Hsu, H.H., Lin, C.Y. (eds.) 31st International Conference on Advanced Information Networking and Applications Workshops (WAINA), vol. 56, pp. 238–243. IEEE, Piscataway (2017)
Google Scholar
Javaid, A., Niyaz, Q., Sun, W., Alam, M.: Deep learning for spatio-temporal modeling: dynamic traffic flows and high frequency trading. In: 9th EAI International Conference on Bio-Inspired Information and Communications Technologies, vol. 3, pp. 21–26. Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering (2016)
Google Scholar
Jung, W., Kim, S., Choi, S.: Deep learning for zero-day flash malware detection. In: 36th IEEE Symposium on Security and Privacy. IEEE (2015, poster)
Google Scholar
Kingma, D.P., Ba, L.J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR). ICLR (2015, poster)
Google Scholar
Kiperberg, M., Resh, A., Zaidenberg, N.: Malware analysis. In: Lehto, M., Neittaanmäki, P. (eds.) Cyber Security. CMAS, vol. 56, pp. 475–484. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-91293-2_21
Chapter Google Scholar
Kobojek, P., Saeed, K.: Application of recurrent neural networks for user verification based on keystroke dynamics. J. Telecommun. Inf. Technol. 3, 80–90 (2016)
Google Scholar
Kos, J., Fischer, I., Song, D.: Adversarial examples for generative model. In: IEEE Security and Privacy Workshops (SPW), pp. 36–42. IEEE, Piscataway (2018)
Google Scholar
Kramer, O.: Machine Learning for Evolution Strategies. Springer, Cham (2016)
Book MATH Google Scholar
Kudugunta, S., Ferrara, E.: Deep neural networks for bot detection. Inf. Sci. 467, 312–322 (2018)
Article Google Scholar
Li, A.J., Madry, A., Peebles, J., Schmidt, L.: On the limitations of first-order approximation in GAN dynamics. Proc. Mach. Learn. Res. 80, 3005–3013 (2018)
Google Scholar
Li, W., Moore, A.: A machine learning approach for efficient traffic classification. In: 15th International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 310–317. IEEE, Piscataway (2007)
Google Scholar
Li, Y., Wu, H.: A clustering method based on k-means algorithm. Phys. Procedia 25, 1104–1109 (2012)
Article Google Scholar
Lin, Z., Shi, Y., Xue, Z.: IDSGAN: generative adversarial networks for attack generation against intrusion detection. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds.) PAKDD 2022. LNAI, vol. 13282, pp. 79–91. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-05981-0_7
Chapter Google Scholar
Liu, Z., Li, S., Zhang, Y., Yun, X., Cheng, Z.: Efficient malware originated traffic classification by using generative adversarial networks. In: IEEE Symposium on Computers and Communications (ISCC), pp. 1–7. IEEE, Piscataway (2020)
Google Scholar
de Melo, C.M., Torralba, A., Guibas, L., DiCarlo, J., Chellappa, R., Hodgins, J.: Next-generation deep learning based on simulators and synthetic data. Trends Cogn. Sci. 26(2), 174–187 (2022)
Article Google Scholar
Mirza, M., Osindero, S.: Conditional generative adversarial nets. ar**v e-prints (2014)
Google Scholar
Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Military Communications and Information Systems Conference (MilCIS), pp. 1–6. IEEE, Piscataway (2015)
Google Scholar
Nath, H.V., Mehtre, B.M.: Static malware analysis using machine learning methods. In: Martínez Pérez, G., Thampi, S.M., Ko, R., Shu, L. (eds.) SNDS 2014. CCIS, vol. 420, pp. 440–450. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54525-2_39
Chapter Google Scholar
Nikos, V., Oscar, S., Luc, D.: Big data analytics for sophisticated attack detection. ISASCA J. 3, 1–8 (2014)
Google Scholar
Pang, G., Shen, C., Cao, L., Hengel, A.V.D.: Deep learning for anomaly detection: a review. ACM Comput. Surv. 54(2), 1–38 (2022)
Article Google Scholar
Powers, D.M.W.: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. Int. J. Mach. Learn. Technol. 2(1), 37–63 (2011)
MathSciNet Google Scholar
Sedgwick, P.: Pearson’s correlation coefficient. Br. Med. J. 345, e4483-1–e4483-2 (2012)
Google Scholar
Seo, E., Song, H.M., Kim, H.K.: GIDS: GAN based intrusion detection system for in-vehicle network. In: 16th Annual Conference on Privacy, Security and Trust (PST), pp. 1–6. IEEE, Piscataway (2018)
Google Scholar
Thakkar, A., Lohiyan, R.: A review on machine learning and deep learning perspectives of ids for IoT: recent updates, security issues, and challenges. Arch. Comput. Methods Eng. 28(4), 3211–3243 (2021)
Article Google Scholar
Torres, P., Catania, C., Garcia, S., Garino, C.G.: An analysis of recurrent neural networks for botnet detection behavior. In: Biennial Congress of Argentina (ARGENCON), pp. 1–6. IEEE (2016)
Google Scholar
Tran, D., Mac, H., Tong, V., Tran, H.A., Nguyen, L.G.: A LSTM based framework for handling multiclass imbalance in DGA botnet detection. Neurocomputing 275, 2401–2413 (2018)
Article Google Scholar
Usama, M., Asim, M., Latif, S., Qadir, J., Ala-Al-Fuqaha: Generative adversarial networks for launching and thwarting adversarial attacks on network intrusion detection systems. In: 15th International Wireless Communications & Mobile Computing Conference (IWCMC), pp. 78–83. IEEE, Piscataway (2019)
Google Scholar
Wang, K., Gou, C., Duan, Y., Lin, Y., Zheng, X., Wang, F.Y.: Generative adversarial networks: introduction and outlook. IEEE/CAA J. Automatica Sinica 4(4), 588–598 (2017)
Article MathSciNet Google Scholar
Wang, Z.: The applications of deep learning on traffic identification (2015). https://www.blackhat.com/docs/us-15/materials/us-15-Wang-The-Applications-Of-Deep-Learning-On-Traffic-Identification-wp.pdf. Accessed 9 Nov 2022
Yin, C., Zhu, Y., Liu, S., Fei, J., Zhang, H.: An enhancing framework for botnet detection using generative adversarial networks. In: International Conference on Artificial Intelligence and Big Data, pp. 228–234. IEEE (2018)
Google Scholar
Zhang, M., Xu, B., Bai, S., Lu, S., Lin, Z.: A deep learning method to detect web attacks using a specially designed CNN. In: Liu, D., **e, S., Li, Y., Zhao, D., El-Alfy, E.-S.M. (eds.) ICONIP 2017. LNCS, vol. 10638, pp. 828–836. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70139-4_84
Chapter Google Scholar
Zheng, M., et al.: Conditional Wasserstein generative adversarial network-gradient penalty-based approach to alleviating imbalanced data classification. Inf. Sci. 512, 1009–1023 (2020)
Article Google Scholar

Download references

Acknowledgements

This research is supported by the University of Warwick School of Engineering and Cyber Security Global Research Priorities - Early Career Fellowships (Cyber Security GRP).

Author information

Authors and Affiliations

School of Engineering, University of Warwick, Coventry, CV4 7AL, UK
T. J. Anande & M. S. Leeson

Authors

T. J. Anande
View author publications
You can also search for this author in PubMed Google Scholar
M. S. Leeson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. S. Leeson .

Editor information

Editors and Affiliations

Université de Tours, Tours, France
Donatello Conte
Instituto de Telecomunicações and University of Lisbon, Lisbon, Portugal
Ana Fred
Ford Motor Company, Commerce Township, MI, USA
Oleg Gusikhin
University of Naples Federico II, Naples, Italy
Carlo Sansone

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anande, T.J., Leeson, M.S. (2023). Synthetic Network Traffic Data Generation and Classification of Advanced Persistent Threat Samples: A Case Study with GANs and XGBoost. In: Conte, D., Fred, A., Gusikhin, O., Sansone, C. (eds) Deep Learning Theory and Applications. DeLTA 2023. Communications in Computer and Information Science, vol 1875. Springer, Cham. https://doi.org/10.1007/978-3-031-39059-3_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-39059-3_1
Published: 31 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-39058-6
Online ISBN: 978-3-031-39059-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Synthetic Network Traffic Data Generation and Classification of Advanced Persistent Threat Samples: A Case Study with GANs and XGBoost