Dom-BERT: Detecting Malicious Domains with Pre-training Model

  • Conference paper
  • First Online:
Passive and Active Measurement (PAM 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14537))

Included in the following conference series:

  • 135 Accesses

Abstract

Domain Name System (DNS) is widely abused by attackers, which thus makes malicious domain detection a crucial routine task for operators to combat cyber crimes. Existing classification-based models often struggle to achieve high accuracy in practical settings due to the class imbalance of the task. Moreover, inference-based models, which hinge upon the resolution similarity between domains, often fail to harness the full potential of linguistic associations among domains. This paper first conducts a detailed analysis of the characteristics of malicious domains and contrasts them with those of benign ones, using a real-life passive DNS dataset obtained from several major ISPs (Internet Service Providers). With this basis, we then propose an efficient solution for the detection of malicious domains, called Dom-BERT. To adeptly capture the resolution associations among domains, Dom-BERT constructs a heterogeneous graph and incorporates a pruning module, facilitating the modeling of relationships among domains, clients, and hosting servers. Building upon this graph, we employ techniques such as random walks with restart and a domain association prediction downstream task to compute similarity scores for domains. These scores are then used to fine-tune the pre-trained BERT model. The performance of Dom-BERT is evaluated using our passive DNS logs. The results notably illustrate that Dom-BERT surpasses the state-of-the-art solutions, achieving higher F1 scores and demonstrating resilience to class imbalance. (The implementation of Dom-BERT is publicly available at https://github.com/yutian99/Dom-BERT).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alexa top domains. http://www.alexa.com

  2. BERT. https://github.com/google-research/bert

  3. Qihoo 360. https://www.360.cn

  4. VirusTotal. http://www.virustotal.com

  5. Anderson, H.S., Woodbridge, J., Filar, B.: DeepDGA: adversarially-tuned domain generation and detection. In: Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security (2016)

    Google Scholar 

  6. Antonakakis, M., et al.: From throw-away traffic to bots: detecting the rise of DGA-based malware. In: Kohno, T. (ed.) Proceedings of the 21th USENIX Security Symposium (2012). https://www.usenix.org/conference/usenixsecurity12/technical-sessions/presentation/antonakakis

  7. Bayer, J., et al.: Operational domain name classification: from automatic ground truth generation to adaptation to missing values. In: Brunstrom, A., Flores, M., Fiore, M. (eds.) International Conference on Passive and Active Network Measurement, vol. 13882, pp. 564–591. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-28486-1_24

  8. Bilge, L., Sen, S., Balzarotti, D., Kirda, E., Kruegel, C.: Exposure: a passive DNS analysis service to detect and report malicious domains. ACM Trans. Inf. Syst. Secur. (2014). https://doi.org/10.1145/2584679

  9. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021)

    Google Scholar 

  10. Gao, H., et al.: An empirical reexamination of global DNS behavior. In: Chiu, D.M., Wang, J., Barford, P., Seshan, S. (eds.) ACM SIGCOMM 2013 Conference, SIGCOMM 2013, Hong Kong, 12–16 August 2013, pp. 267–278. ACM (2013). https://doi.org/10.1145/2486001.2486018

  11. Hao, S., Feamster, N., Pandrangi, R.: Monitoring the initial DNS behavior of malicious domains. In: Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement Conference, pp. 269–278 (2011)

    Google Scholar 

  12. Hao, S., Kantchelian, A., Miller, B., Paxson, V., Feamster, N.: PREDATOR: proactive recognition and elimination of domain abuse at time-of-registration. In: The 2016 ACM SIGSAC Conference (2016)

    Google Scholar 

  13. Hao, S., Thomas, M., Paxson, V., Feamster, N., Hollenbeck, S.: Understanding the domain registration behavior of spammers. In: Conference on Internet Measurement Conference (2013)

    Google Scholar 

  14. Khalil, I., Yu, T., Guan, B.: Discovering malicious domains through passive DNS data graph analysis. In: Chen, X., Wang, X., Huang, X. (eds.) AsiaCCS (2016). https://doi.org/10.1145/2897845.2897877

  15. Khalil, I.M., Guan, B., Nabeel, M., Yu, T.: A domain is only as good as its buddies: detecting stealthy malicious domains via graph inference. In: Zhao, Z., Ahn, G., Krishnan, R., Ghinita, G. (eds.) CODASPY (2018). https://doi.org/10.1145/3176258.3176329

  16. Lei, K., Fu, Q., Ni, J., Wang, F., Yang, M., Xu, K.: Detecting malicious domains with behavioral modeling and graph embedding. In: ICDCS (2019). https://doi.org/10.1109/ICDCS.2019.00066

  17. Liang, J., Chen, S., Wei, Z., Zhao, S., Zhao, W.: HAGDetector: heterogeneous DGA domain name detection model. Comput. Secur. (2022). https://doi.org/10.1016/j.cose.2022.102803

  18. Manadhata, P.K., Yadav, S., Rao, P., Horne, W.: Detecting malicious domains via graph inference. In: Kutyłowski, M., Vaidya, J. (eds.) ESORICS 2014. LNCS, vol. 8712, pp. 1–18. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11203-9_1

    Chapter  Google Scholar 

  19. Maroofi, S., Korczyński, M., Hesselman, C., Ampeau, B., Duda, A.: COMAR: classification of compromised versus maliciously registered domains. In: 2020 IEEE European Symposium on Security and Privacy (EuroS &P), pp. 607–623. IEEE (2020)

    Google Scholar 

  20. Nabeel, M., Khalil, I.M., Guan, B., Yu, T.: Following passive DNS traces to detect stealthy malicious domains via graph inference. ACM Trans. Priv. Secur. (2020). https://doi.org/10.1145/3401897

  21. Schüppen, S., Teubert, D., Herrmann, P., Meyer, U.: FANCI: feature-based automated NXDomain classification and intelligence. In: 27th USENIX Security Symposium (2018). https://www.usenix.org/conference/usenixsecurity18/presentation/schuppen

  22. Sun, X., Tong, M., Yang, J., Liu, X., Liu, H.: HinDom: a robust malicious domain detection system based on heterogeneous information network with transductive classification. In: RAID (2019). https://www.usenix.org/conference/raid2019/presentation/sun

  23. Sun, X., Wang, Z., Yang, J., Liu, X.: DeepDom: malicious domain detection with scalable and heterogeneous graph convolutional networks. Comput. Secur. (2020). https://doi.org/10.1016/j.cose.2020.102057

  24. Sun, X., Yang, J., Wang, Z., Liu, H.: HGDom: heterogeneous graph convolutional networks for malicious domain detection. In: NOMS (2020). https://doi.org/10.1109/NOMS47738.2020.9110462

  25. Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: PathSim: meta path-based top-K similarity search in heterogeneous information networks. Proc. VLDB Endow. (2011). http://www.vldb.org/pvldb/vol4/p992-sun.pdf

  26. Sun, Y., Yu, Y., Han, J.: Ranking-based clustering of heterogeneous information networks with star network schema. In: Elder, J., Fogelman-Soulié, F., Flach, P.A., Zaki, M. (eds.) SIGKDD (2009). https://doi.org/10.1145/1557019.1557107

  27. Tran, D., Mac, H., Tong, V., Tran, H.A., Nguyen, L.G.: A LSTM based framework for handling multiclass imbalance in DGA botnet detection. Neurocomputing (2018). https://doi.org/10.1016/j.neucom.2017.11.018

  28. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems (2017). https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

  29. Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: ICLR (2018). https://openreview.net/forum?id=rJXMpikCZ

  30. Woodbridge, J., Anderson, H.S., Ahuja, A., Grant, D.: Predicting domain generation algorithms with long short-term memory networks. CoRR abs/1611.00791 (2016). http://arxiv.org/abs/1611.00791

  31. Yang, D., Li, Z., Tyson, G.: A deep dive into DNS query failures. In: USENIX ATC 20. USENIX Association, July 2020. https://www.usenix.org/conference/atc20/presentation/yang

  32. Yilmaz, I., Siraj, A., Ulybyshev, D.A.: Improving DGA-based malicious domain classifiers for malware defense with adversarial machine learning. CoRR abs/2101.00521 (2021). https://arxiv.org/abs/2101.00521

Download references

Acknowledgement

This work is supported by National Key R &D Program of China (Grant No. 2022YFB3103000), by the National Natural Science Foundation of China (Grant No. U20A20180 and 62072437).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenyu Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tian, Y., Li, Z. (2024). Dom-BERT: Detecting Malicious Domains with Pre-training Model. In: Richter, P., Bajpai, V., Carisimo, E. (eds) Passive and Active Measurement. PAM 2024. Lecture Notes in Computer Science, vol 14537. Springer, Cham. https://doi.org/10.1007/978-3-031-56249-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-56249-5_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-56248-8

  • Online ISBN: 978-3-031-56249-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation