Synthetic Network Traffic Data Generation and Classification of Advanced Persistent Threat Samples: A Case Study with GANs and XGBoost

  • Conference paper
  • First Online:
Deep Learning Theory and Applications (DeLTA 2023)

Abstract

The need to develop more efficient network traffic data generation techniques that can reproduce the intricate features of traffic flows forms a central element in secured monitoring systems for networks and cybersecurity. This study investigates selected Generative Adversarial Network (GAN) architectures to generate realistic network traffic samples. It incorporates Extreme Gradient Boosting (XGBoost), an Ensemble Machine Learning algorithm effectively used for the classification and detection of observed and unobserved Advanced Persistent Threat (APT) attack samples in the synthetic and new data distributions. Results show that the Wasserstein GAN architectures achieve optimal generation with a sustained Earth Mover distance estimation of \(10^{-3}\) between the Critic loss and the Generator loss compared to the vanilla GAN architecture. Performance statistics using XGBoost and other evaluation metrics indicate successful generation and detection with an accuracy of 99.97% a recall rate of 99.94%, and 100% precision. Further results show a 99.97% \(f_1\) score for detecting APT samples in the synthetic data, and a Receiver Operator Characteristic Area Under the Curve (ROC_AUC) value of 1.0, indicating optimum behavior, surpassing previous state-of-the-art methods. When evaluated on unseen data, the proposed approach maintains optimal detection performance with 100% recall, 100% Area Under the Curve (AUC) and precision above 90%.

Supported by the University of Warwick.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abdullayeva, F.J.: Advanced persistent threat attack detection method in cloud computing based on autoencoder and softmax regression algorithm. Array 10, 100067-1–100067-11 (2021)

    Google Scholar 

  2. Ahmad, A., Webb, J., Desouza, K.C., Boorman, J.: Strategically-motivated advanced persistent threat: definition, process, tactics and a disinformation model of counterattack. Comput. Secur. 86, 402–418 (2019)

    Article  Google Scholar 

  3. Alqahtani, S.H., Thorne, M.K., Kumar, G.: Applications of generative adversarial networks (GANs): an updated review. Arch. Comput. Methods Eng. 28(2), 525–552 (2021)

    Article  MathSciNet  Google Scholar 

  4. Alshamrani, A., Myneni, S., Chowdhary, A., Huang, D.: A survey on advanced persistent threats: techniques, solutions, challenges, and research opportunities. IEEE Commun. Surv. Tutor. 21(2), 1851–1877 (2019)

    Article  Google Scholar 

  5. Anande, T.J., Leeson, M.S.: Generative adversarial networks (GANs): a survey on network traffic generation. Int. J. Mach. Learn. Comput. 12(6), 333–343 (2022)

    Google Scholar 

  6. Anande, T.J., Al-Saadi, S., Leeson, M.S.: Generative adversarial networks for network traffic feature generation. Int. J. Comput. Appl. 1–9 (2023). https://doi.org/10.1080/1206212X.2023.2191072

  7. Bentéjac, C., Csörgö, A., Martínez-Muñoz, G.: A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 54(3), 1937–1967 (2021)

    Article  Google Scholar 

  8. Biggio, B., Šrndić, N.: Machine learning for computer security. In: Joseph, A.D., Laskov, P., Roli, F., Tygar, J.D., Nelson, B. (eds.) Machine Learning Methods for Computer Security, vol. 3, pp. 5–10. Dagstuhl Manifestos, Dagstuhl (2012)

    Google Scholar 

  9. Chan, T.N., Yiu, M.L., U, L.H.: The power of bounds: answering approximate earth mover’s distance with parametric bounds. IEEE Trans. Knowl. Data Eng. 33(2), 768–781 (2021)

    Google Scholar 

  10. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting systems. In: 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2016), pp. 785–794. ACM, New York (2016)

    Google Scholar 

  11. Chen, Y.W., Lin, C.J.: Combining SVMs with various feature selection strategies. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction. Studies in Fuzziness and Soft Computing, vol. 207, pp. 314–324. Springer, Berlin (2006). https://doi.org/10.1007/978-3-540-35488-8_13

    Chapter  Google Scholar 

  12. Chen, P., Desmet, L., Huygens, C.: A study on advanced persistent threats. In: De Decker, B., Zúquete, A. (eds.) CMS 2014. LNCS, vol. 8735, pp. 63–72. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44885-4_5

    Chapter  Google Scholar 

  13. Dhaliwal, S., Nahid, A., Abbas, R.: Effective intrusion detection system using XGBoost. Information 9(7), 149-1–149-24 (2018)

    Google Scholar 

  14. Ding, B., Qian, H., Zhou, J.: Activation functions and their characteristics in deep neural networks. In: Chinese Control and Decision Conference (CCDC), pp. 1836–1841. IEEE, Piscataway (2018)

    Google Scholar 

  15. Dingledine, R., Mathewson, N., Syverson, P.: Tor: the second-generation onion router. In: 13th USENIX Security Symposium, pp. 303–320. USENIX Association (2004)

    Google Scholar 

  16. Dixon, M.F., Polson, N.G., Sokolov, V.O.: Deep learning for spatio-temporal modeling: dynamic traffic flows and high frequency trading. Appl. Stoch. Model. Bus. Ind. 35(3), 788–807 (2019)

    Article  MathSciNet  Google Scholar 

  17. Ferdowsi, A., Saad, W.: Generative adversarial networks for distributed intrusion detection in the internet of things. In: IEEE Global Communications Conference (GLOBECOM), pp. 1–6. IEEE, Piscataway (2019)

    Google Scholar 

  18. Ghafir, I., Prenosil, V.: Proposed approach for targeted attacks detection. In: Sulaiman, H.A., Othman, M.A., Othman, M.F.I., Rahim, Y.A., Pee, N.C. (eds.) Advanced Computer and Communication Engineering Technology. LNEE, vol. 362, pp. 73–80. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24584-3_7

    Chapter  Google Scholar 

  19. Gibert, D., Mateu, C., Planes, J., Vicens, R.: Using convolutional neural networks for classification of malware represented as images. J. Comput. Virol. Hacking Tech. 15(1), 15–28 (2019)

    Article  Google Scholar 

  20. Goodfellow, I.J., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K. (eds.) Proceedings of Advances in Neural Information Processing Systems (NIPS 2014), vol. 27, pp. 2672–2680. Curran Associates Inc., Red Hook (2014)

    Google Scholar 

  21. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: 30th Conference on Advances in Neural Information Processing Systems (NIPS 2017), pp. 5767–5777. Curran Associates Inc., Red Hook (2017)

    Google Scholar 

  22. Hitaj, B., Gasti, P., Ateniese, G., Perez-Cruz, F.: PassGAN: a deep learning approach for password guessing. In: NeurIPS 2018 Workshop on Security in Machine Learning (2018)

    Google Scholar 

  23. Hurtik, P., Tomasiello, S., Hula, J., Hynar, D.: Binary cross-entropy with dynamical clip**. Neural Comput. Appl. (1), 1–13 (2022). https://doi.org/10.1007/s00521-022-07091-x

  24. Ishitaki, T., Obukata, R., Oda, T., Barolli, L.: Application of deep recurrent neural networks for prediction of user behavior in tor networks. In: Barolli, L., Takizawa, M., Enokido, T., Hsu, H.H., Lin, C.Y. (eds.) 31st International Conference on Advanced Information Networking and Applications Workshops (WAINA), vol. 56, pp. 238–243. IEEE, Piscataway (2017)

    Google Scholar 

  25. Javaid, A., Niyaz, Q., Sun, W., Alam, M.: Deep learning for spatio-temporal modeling: dynamic traffic flows and high frequency trading. In: 9th EAI International Conference on Bio-Inspired Information and Communications Technologies, vol. 3, pp. 21–26. Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering (2016)

    Google Scholar 

  26. Jung, W., Kim, S., Choi, S.: Deep learning for zero-day flash malware detection. In: 36th IEEE Symposium on Security and Privacy. IEEE (2015, poster)

    Google Scholar 

  27. Kingma, D.P., Ba, L.J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (ICLR). ICLR (2015, poster)

    Google Scholar 

  28. Kiperberg, M., Resh, A., Zaidenberg, N.: Malware analysis. In: Lehto, M., Neittaanmäki, P. (eds.) Cyber Security. CMAS, vol. 56, pp. 475–484. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-91293-2_21

    Chapter  Google Scholar 

  29. Kobojek, P., Saeed, K.: Application of recurrent neural networks for user verification based on keystroke dynamics. J. Telecommun. Inf. Technol. 3, 80–90 (2016)

    Google Scholar 

  30. Kos, J., Fischer, I., Song, D.: Adversarial examples for generative model. In: IEEE Security and Privacy Workshops (SPW), pp. 36–42. IEEE, Piscataway (2018)

    Google Scholar 

  31. Kramer, O.: Machine Learning for Evolution Strategies. Springer, Cham (2016)

    Book  MATH  Google Scholar 

  32. Kudugunta, S., Ferrara, E.: Deep neural networks for bot detection. Inf. Sci. 467, 312–322 (2018)

    Article  Google Scholar 

  33. Li, A.J., Madry, A., Peebles, J., Schmidt, L.: On the limitations of first-order approximation in GAN dynamics. Proc. Mach. Learn. Res. 80, 3005–3013 (2018)

    Google Scholar 

  34. Li, W., Moore, A.: A machine learning approach for efficient traffic classification. In: 15th International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 310–317. IEEE, Piscataway (2007)

    Google Scholar 

  35. Li, Y., Wu, H.: A clustering method based on k-means algorithm. Phys. Procedia 25, 1104–1109 (2012)

    Article  Google Scholar 

  36. Lin, Z., Shi, Y., Xue, Z.: IDSGAN: generative adversarial networks for attack generation against intrusion detection. In: Gama, J., Li, T., Yu, Y., Chen, E., Zheng, Y., Teng, F. (eds.) PAKDD 2022. LNAI, vol. 13282, pp. 79–91. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-05981-0_7

    Chapter  Google Scholar 

  37. Liu, Z., Li, S., Zhang, Y., Yun, X., Cheng, Z.: Efficient malware originated traffic classification by using generative adversarial networks. In: IEEE Symposium on Computers and Communications (ISCC), pp. 1–7. IEEE, Piscataway (2020)

    Google Scholar 

  38. de Melo, C.M., Torralba, A., Guibas, L., DiCarlo, J., Chellappa, R., Hodgins, J.: Next-generation deep learning based on simulators and synthetic data. Trends Cogn. Sci. 26(2), 174–187 (2022)

    Article  Google Scholar 

  39. Mirza, M., Osindero, S.: Conditional generative adversarial nets. ar**v e-prints (2014)

    Google Scholar 

  40. Moustafa, N., Slay, J.: UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: Military Communications and Information Systems Conference (MilCIS), pp. 1–6. IEEE, Piscataway (2015)

    Google Scholar 

  41. Nath, H.V., Mehtre, B.M.: Static malware analysis using machine learning methods. In: Martínez Pérez, G., Thampi, S.M., Ko, R., Shu, L. (eds.) SNDS 2014. CCIS, vol. 420, pp. 440–450. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54525-2_39

    Chapter  Google Scholar 

  42. Nikos, V., Oscar, S., Luc, D.: Big data analytics for sophisticated attack detection. ISASCA J. 3, 1–8 (2014)

    Google Scholar 

  43. Pang, G., Shen, C., Cao, L., Hengel, A.V.D.: Deep learning for anomaly detection: a review. ACM Comput. Surv. 54(2), 1–38 (2022)

    Article  Google Scholar 

  44. Powers, D.M.W.: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. Int. J. Mach. Learn. Technol. 2(1), 37–63 (2011)

    MathSciNet  Google Scholar 

  45. Sedgwick, P.: Pearson’s correlation coefficient. Br. Med. J. 345, e4483-1–e4483-2 (2012)

    Google Scholar 

  46. Seo, E., Song, H.M., Kim, H.K.: GIDS: GAN based intrusion detection system for in-vehicle network. In: 16th Annual Conference on Privacy, Security and Trust (PST), pp. 1–6. IEEE, Piscataway (2018)

    Google Scholar 

  47. Thakkar, A., Lohiyan, R.: A review on machine learning and deep learning perspectives of ids for IoT: recent updates, security issues, and challenges. Arch. Comput. Methods Eng. 28(4), 3211–3243 (2021)

    Article  Google Scholar 

  48. Torres, P., Catania, C., Garcia, S., Garino, C.G.: An analysis of recurrent neural networks for botnet detection behavior. In: Biennial Congress of Argentina (ARGENCON), pp. 1–6. IEEE (2016)

    Google Scholar 

  49. Tran, D., Mac, H., Tong, V., Tran, H.A., Nguyen, L.G.: A LSTM based framework for handling multiclass imbalance in DGA botnet detection. Neurocomputing 275, 2401–2413 (2018)

    Article  Google Scholar 

  50. Usama, M., Asim, M., Latif, S., Qadir, J., Ala-Al-Fuqaha: Generative adversarial networks for launching and thwarting adversarial attacks on network intrusion detection systems. In: 15th International Wireless Communications & Mobile Computing Conference (IWCMC), pp. 78–83. IEEE, Piscataway (2019)

    Google Scholar 

  51. Wang, K., Gou, C., Duan, Y., Lin, Y., Zheng, X., Wang, F.Y.: Generative adversarial networks: introduction and outlook. IEEE/CAA J. Automatica Sinica 4(4), 588–598 (2017)

    Article  MathSciNet  Google Scholar 

  52. Wang, Z.: The applications of deep learning on traffic identification (2015). https://www.blackhat.com/docs/us-15/materials/us-15-Wang-The-Applications-Of-Deep-Learning-On-Traffic-Identification-wp.pdf. Accessed 9 Nov 2022

  53. Yin, C., Zhu, Y., Liu, S., Fei, J., Zhang, H.: An enhancing framework for botnet detection using generative adversarial networks. In: International Conference on Artificial Intelligence and Big Data, pp. 228–234. IEEE (2018)

    Google Scholar 

  54. Zhang, M., Xu, B., Bai, S., Lu, S., Lin, Z.: A deep learning method to detect web attacks using a specially designed CNN. In: Liu, D., **e, S., Li, Y., Zhao, D., El-Alfy, E.-S.M. (eds.) ICONIP 2017. LNCS, vol. 10638, pp. 828–836. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70139-4_84

    Chapter  Google Scholar 

  55. Zheng, M., et al.: Conditional Wasserstein generative adversarial network-gradient penalty-based approach to alleviating imbalanced data classification. Inf. Sci. 512, 1009–1023 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

This research is supported by the University of Warwick School of Engineering and Cyber Security Global Research Priorities - Early Career Fellowships (Cyber Security GRP).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. S. Leeson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Anande, T.J., Leeson, M.S. (2023). Synthetic Network Traffic Data Generation and Classification of Advanced Persistent Threat Samples: A Case Study with GANs and XGBoost. In: Conte, D., Fred, A., Gusikhin, O., Sansone, C. (eds) Deep Learning Theory and Applications. DeLTA 2023. Communications in Computer and Information Science, vol 1875. Springer, Cham. https://doi.org/10.1007/978-3-031-39059-3_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-39059-3_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-39058-6

  • Online ISBN: 978-3-031-39059-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation