Abstract
This chapter proposes data-centric machine learning to protect web applications from dynamic application security testing (DAST) attacks. DAST scanning consists of automated pen testing against web applications to find exploitable vulnerabilities. They are often used by malicious actors in a brute-force manner for attack reconnaissance with a view to eventual compromise. Traditionally, threshold-based methods have been used to detect such malicious events and behaviour in defensive cybersecurity systems. There are inherent challenges in thresholding, however, not least the arguably arbitrary and brittle nature of selecting and applying a threshold in a production environment. Given these drawbacks, we present a machine learning method using random forests and aggregated event data to detect DAST reconnaissance attacks, using data collected from our proprietary web application firewall. Utilising a vast dataset comprising over 40 million real-world events, it is demonstrated our method is effective in successfully detecting DAST attacks, achieving an F1 score of 0.94 with a low miss rate of 6%. This approach provides important insights into the development of accurate and reliable detection systems that minimise manual tuning, essential in safeguarding against evolving cyber threats.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alqahtani, H., Sarker, I.H., Kalim, A., Minhaz Hossain, S.M., Ikhlaq, S., Hossain, S.: Cyber intrusion detection using machine learning classification techniques. In: Computing Science, Communication and Security, pp. 121–131. Springer Singapore, Singapore (2020)
Choraś, M., Kozik, R.: Machine learning techniques applied to detect cyber attacks on web applications. Logic J. IGPL 23(1), 45–56 (2014). https://doi.org/10.1093/jigpal/jzu038
Denning, D.E.: An intrusion-detection model. IEEE Trans. Softw. Eng. 2, 222–232 (1987)
European Union Agency For Cybersecurity: ENISA threat landscape 2020 - web application attacks. Tech. rep., ENISA (2021). https://www.enisa.europa.eu/publications/web-application-attacks
Farnaaz, N., Jabbar, M.: Random forest modeling for network intrusion detection system. Procedia Comput. Sci. 89, 213–217 (2016)
Harris, C.R., Millman, K.J., van der Walt, S.J., et al.: Array programming with NumPy. Nature 585(7825), 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2
Hyslip, T.S.: Cybercrime-as-a-Service Operations, pp. 815–846. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-319-78440-3_36
Javaid, A., Niyaz, Q., Sun, W., Alam, M.: A deep learning approach for network intrusion detection system. EAI Endorsed Trans. Secur. Safety 3(9), e2 (2016)
Kali Linux: Kali tools: Kali linux tools. https://www.kali.org/tools/ (2023)
Kluyver, T., Ragan-Kelley, B., Pérez, F., et al.: Jupyter notebooks – a publishing format for reproducible computational workflows. In: Loizides, F., Schmidt, B. (eds.) Positioning and Power in Academic Publishing: Players, Agents and Agendas, pp. 87–90. IOS Press, Amsterdam (2016)
Kruegel, C., Vigna, G.: An anomaly detection of web-based attacks. In: Proceedings of the 10th ACM Conference on Computer and Communications Security, pp. 251–261 (2003)
Millar, S., Podgurskii, D., Kuykendall, D., MartÃnez del Rincón, J., Miller, P.: Optimising vulnerability triage in dast with deep learning. In: Proceedings of the 15th ACM Workshop on Artificial Intelligence and Security, AISec’22, pp. 137–147. Association for Computing Machinery, New York, NY, USA (2022)
Negandhi, P., Trivedi, Y., Mangrulkar, R.: Intrusion detection system using random forest on the NSL-KDD dataset. In: Emerging Research in Computing, Information, Communication and Applications, pp. 519–531. Springer, New York (2019)
Pan, Y., Sun, F., Teng, Z., et al.: Detecting web attacks with end-to-end deep learning. J. Internet Serv. Appl. 10(1), 1–22 (2019)
Pandas: Pandas-dev/pandas: Pandas (2023). https://doi.org/10.5281/zenodo.3509134
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Perez-Villegas, A., Torrano-Gimenez, C., Alvarez, G.: Applying markov chains to web intrusion detection. In: Proceedings of Reunión Espanola sobre CriptologÃa y Seguridad de la Información (RECSI 2010), pp. 361–366 (2010)
Portswigger: BURP scanner – web vulnerability scanner from portswigger. https://portswigger.net/burp/vulnerability-scanner (2023)
Qaiser, S., Ali, R.: Text mining: use of tf-idf to examine the relevance of words to documents. Int. J. Comput. Appl. 181(1), 25–29 (2018)
Rapid7: HTTP track. https://www.rapid7.com/db/vulnerabilities/http-track-method-enabled/ (2023)
Rapid7: WebDAV propfind method allows web directory browsing. https://www.rapid7.com/db/vulnerabilities/http-generic-propfind-dir-browsing/ (2023)
Saha, A., Sanyal, S.: Application layer intrusion detection with combination of explicit-rule-based and machine learning algorithms and deployment in cyber-defence program. CoRR abs/1411.3089 (2014). http://arxiv.org/abs/1411.3089
Scikit-learn: Scikit-learn: Preprocessing Min-Max Scaler. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html (2023)
Scikit-learn: Scikit-learn: Random Forest Classifier. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html (2023)
Shone, N., Ngoc, T.N., Phai, V.D., He, X.: A deep learning approach for intrusion detection using recurrent neural networks. IEEE Trans. Emerg. Top. Comput. Intell. 2(1), 41–50 (2018)
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Document. 28(1), 11–21 (1972)
Sun, F., Zhang, P., White, J., Schmidt, D., Staples, J., Krause, L.: A feasibility study of autonomically detecting in-process cyber-attacks. In: 2017 3rd IEEE International Conference on Cybernetics (CYBCONF), pp. 1–8. IEEE (2017)
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the kdd cup 99 dataset. In: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–6. IEEE (2009)
The OWASP Foundation: OWASP Top Ten. https://owasp.org/www-project-top-ten/ (2023)
The OWASP Foundation: OWASP Zed Attack Proxy (ZAP). https://www.zaproxy.org/ (2023)
The OWASP Foundation: Project AppSensor. https://owasp.org/www-project-appsensor/ (2023)
Torrano-Giménez, C., Perez-Villegas, A., Alvarez Maranón, G.: An anomaly-based approach for intrusion detection in web traffic. J. Inf. Assur. Secur. 5(4), 446–454
Tukey, J.W., et al.: Exploratory Data Analysis, vol. 2. Addison-Wesley, Reading, MA (1977)
UCI Machine Learning Repository: UCI machine learning repository: Kdd cup 1999 dataset. https://archive.ics.uci.edu/ml/datasets/kdd+cup+1999+data (2023)
Van Rossum, G., Drake Jr, F.L.: Python Reference Manual. Centrum voor Wiskunde en Informatica, Amsterdam (1995)
Vinayakumar, R., Alazab, M., Soman, K., Poornachandran, P., Al-Nemrat, A., Venkatraman, S.: Deep learning approach for intelligent intrusion detection system. IEEE Access 7, 41525–41550 (2019)
WebGoat: WebGoat 8: A deliberately insecure web application. https://github.com/WebGoat/WebGoat (2023)
Yin, C., Zhu, Y., Fei, J., He, X.: A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access 5, 21954–21961 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Shahrivar, P., Millar, S. (2024). Detecting Web Application DAST Attacks in Large-Scale Event Data. In: Sipola, T., Alatalo, J., Wolfmayr, M., Kokkonen, T. (eds) Artificial Intelligence for Security. Springer, Cham. https://doi.org/10.1007/978-3-031-57452-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-031-57452-8_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57451-1
Online ISBN: 978-3-031-57452-8
eBook Packages: Computer ScienceComputer Science (R0)