Detecting Web Application DAST Attacks in Large-Scale Event Data

Shahrivar, Pojan; Millar, Stuart

doi:10.1007/978-3-031-57452-8_14

Pojan Shahrivar⁵ &
Stuart Millar⁵

43 Accesses

Abstract

This chapter proposes data-centric machine learning to protect web applications from dynamic application security testing (DAST) attacks. DAST scanning consists of automated pen testing against web applications to find exploitable vulnerabilities. They are often used by malicious actors in a brute-force manner for attack reconnaissance with a view to eventual compromise. Traditionally, threshold-based methods have been used to detect such malicious events and behaviour in defensive cybersecurity systems. There are inherent challenges in thresholding, however, not least the arguably arbitrary and brittle nature of selecting and applying a threshold in a production environment. Given these drawbacks, we present a machine learning method using random forests and aggregated event data to detect DAST reconnaissance attacks, using data collected from our proprietary web application firewall. Utilising a vast dataset comprising over 40 million real-world events, it is demonstrated our method is effective in successfully detecting DAST attacks, achieving an F1 score of 0.94 with a low miss rate of 6%. This approach provides important insights into the development of accurate and reliable detection systems that minimise manual tuning, essential in safeguarding against evolving cyber threats.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alqahtani, H., Sarker, I.H., Kalim, A., Minhaz Hossain, S.M., Ikhlaq, S., Hossain, S.: Cyber intrusion detection using machine learning classification techniques. In: Computing Science, Communication and Security, pp. 121–131. Springer Singapore, Singapore (2020)
Google Scholar
Choraś, M., Kozik, R.: Machine learning techniques applied to detect cyber attacks on web applications. Logic J. IGPL 23(1), 45–56 (2014). https://doi.org/10.1093/jigpal/jzu038
Article MathSciNet Google Scholar
Denning, D.E.: An intrusion-detection model. IEEE Trans. Softw. Eng. 2, 222–232 (1987)
Article Google Scholar
European Union Agency For Cybersecurity: ENISA threat landscape 2020 - web application attacks. Tech. rep., ENISA (2021). https://www.enisa.europa.eu/publications/web-application-attacks
Farnaaz, N., Jabbar, M.: Random forest modeling for network intrusion detection system. Procedia Comput. Sci. 89, 213–217 (2016)
Article Google Scholar
Harris, C.R., Millman, K.J., van der Walt, S.J., et al.: Array programming with NumPy. Nature 585(7825), 357–362 (2020). https://doi.org/10.1038/s41586-020-2649-2
Article Google Scholar
Hyslip, T.S.: Cybercrime-as-a-Service Operations, pp. 815–846. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-319-78440-3_36
Javaid, A., Niyaz, Q., Sun, W., Alam, M.: A deep learning approach for network intrusion detection system. EAI Endorsed Trans. Secur. Safety 3(9), e2 (2016)
Google Scholar
Kali Linux: Kali tools: Kali linux tools. https://www.kali.org/tools/ (2023)
Kluyver, T., Ragan-Kelley, B., Pérez, F., et al.: Jupyter notebooks – a publishing format for reproducible computational workflows. In: Loizides, F., Schmidt, B. (eds.) Positioning and Power in Academic Publishing: Players, Agents and Agendas, pp. 87–90. IOS Press, Amsterdam (2016)
Google Scholar
Kruegel, C., Vigna, G.: An anomaly detection of web-based attacks. In: Proceedings of the 10th ACM Conference on Computer and Communications Security, pp. 251–261 (2003)
Google Scholar
Millar, S., Podgurskii, D., Kuykendall, D., Martínez del Rincón, J., Miller, P.: Optimising vulnerability triage in dast with deep learning. In: Proceedings of the 15th ACM Workshop on Artificial Intelligence and Security, AISec’22, pp. 137–147. Association for Computing Machinery, New York, NY, USA (2022)
Google Scholar
Negandhi, P., Trivedi, Y., Mangrulkar, R.: Intrusion detection system using random forest on the NSL-KDD dataset. In: Emerging Research in Computing, Information, Communication and Applications, pp. 519–531. Springer, New York (2019)
Google Scholar
Pan, Y., Sun, F., Teng, Z., et al.: Detecting web attacks with end-to-end deep learning. J. Internet Serv. Appl. 10(1), 1–22 (2019)
Article Google Scholar
Pandas: Pandas-dev/pandas: Pandas (2023). https://doi.org/10.5281/zenodo.3509134
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., et al.: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet Google Scholar
Perez-Villegas, A., Torrano-Gimenez, C., Alvarez, G.: Applying markov chains to web intrusion detection. In: Proceedings of Reunión Espanola sobre Criptología y Seguridad de la Información (RECSI 2010), pp. 361–366 (2010)
Google Scholar
Portswigger: BURP scanner – web vulnerability scanner from portswigger. https://portswigger.net/burp/vulnerability-scanner (2023)
Qaiser, S., Ali, R.: Text mining: use of tf-idf to examine the relevance of words to documents. Int. J. Comput. Appl. 181(1), 25–29 (2018)
Google Scholar
Rapid7: HTTP track. https://www.rapid7.com/db/vulnerabilities/http-track-method-enabled/ (2023)
Rapid7: WebDAV propfind method allows web directory browsing. https://www.rapid7.com/db/vulnerabilities/http-generic-propfind-dir-browsing/ (2023)
Saha, A., Sanyal, S.: Application layer intrusion detection with combination of explicit-rule-based and machine learning algorithms and deployment in cyber-defence program. CoRR abs/1411.3089 (2014). http://arxiv.org/abs/1411.3089
Scikit-learn: Scikit-learn: Preprocessing Min-Max Scaler. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html (2023)
Scikit-learn: Scikit-learn: Random Forest Classifier. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html (2023)
Shone, N., Ngoc, T.N., Phai, V.D., He, X.: A deep learning approach for intrusion detection using recurrent neural networks. IEEE Trans. Emerg. Top. Comput. Intell. 2(1), 41–50 (2018)
Article Google Scholar
Sparck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. J. Document. 28(1), 11–21 (1972)
Article Google Scholar
Sun, F., Zhang, P., White, J., Schmidt, D., Staples, J., Krause, L.: A feasibility study of autonomically detecting in-process cyber-attacks. In: 2017 3rd IEEE International Conference on Cybernetics (CYBCONF), pp. 1–8. IEEE (2017)
Google Scholar
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the kdd cup 99 dataset. In: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp. 1–6. IEEE (2009)
Google Scholar
The OWASP Foundation: OWASP Top Ten. https://owasp.org/www-project-top-ten/ (2023)
The OWASP Foundation: OWASP Zed Attack Proxy (ZAP). https://www.zaproxy.org/ (2023)
The OWASP Foundation: Project AppSensor. https://owasp.org/www-project-appsensor/ (2023)
Torrano-Giménez, C., Perez-Villegas, A., Alvarez Maranón, G.: An anomaly-based approach for intrusion detection in web traffic. J. Inf. Assur. Secur. 5(4), 446–454
Google Scholar
Tukey, J.W., et al.: Exploratory Data Analysis, vol. 2. Addison-Wesley, Reading, MA (1977)
Google Scholar
UCI Machine Learning Repository: UCI machine learning repository: Kdd cup 1999 dataset. https://archive.ics.uci.edu/ml/datasets/kdd+cup+1999+data (2023)
Van Rossum, G., Drake Jr, F.L.: Python Reference Manual. Centrum voor Wiskunde en Informatica, Amsterdam (1995)
Google Scholar
Vinayakumar, R., Alazab, M., Soman, K., Poornachandran, P., Al-Nemrat, A., Venkatraman, S.: Deep learning approach for intelligent intrusion detection system. IEEE Access 7, 41525–41550 (2019)
Article Google Scholar
WebGoat: WebGoat 8: A deliberately insecure web application. https://github.com/WebGoat/WebGoat (2023)
Yin, C., Zhu, Y., Fei, J., He, X.: A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access 5, 21954–21961 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Rapid7, LLC, Boston, MA, USA
Pojan Shahrivar & Stuart Millar

Authors

Pojan Shahrivar
View author publications
You can also search for this author in PubMed Google Scholar
Stuart Millar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pojan Shahrivar .

Editor information

Editors and Affiliations

Jamk University of Applied Sciences, JYVÄSKYLÄ, Finland
Tuomo Sipola
Jamk University of Applied Sciences, JYVÄSKYLÄ, Finland
Janne Alatalo
Jamk University of Applied Sciences, JYVÄSKYLÄ, Finland
Monika Wolfmayr
Jamk University of Applied Sciences, JYVÄSKYLÄ, Finland
Tero Kokkonen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shahrivar, P., Millar, S. (2024). Detecting Web Application DAST Attacks in Large-Scale Event Data. In: Sipola, T., Alatalo, J., Wolfmayr, M., Kokkonen, T. (eds) Artificial Intelligence for Security. Springer, Cham. https://doi.org/10.1007/978-3-031-57452-8_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-57452-8_14
Published: 17 April 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-57451-1
Online ISBN: 978-3-031-57452-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics