LogoMotive: Detecting Logos on Websites to Identify Online Scams - A TLD Case Study

  • Conference paper
  • First Online:
Passive and Active Measurement (PAM 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13210))

Included in the following conference series:

  • 2057 Accesses

Abstract

Logos give a website a familiar feel and promote trust. Scammers take advantage of that by using well-known organizations’ logos on malicious websites. Unsuspecting Internet users see these logos and think they are looking at a government website or legitimate webshop, when it is a phishing site, a counterfeit webshop, or a site set up to spread misinformation. We present the largest logo detection study on websites to date. We analyze 6.2M domain names from the Netherlands ’ country-code top-level domain .nl, in two case studies to detect logo misuse for two organizations: the Dutch national government and Thuiswinkel Waarborg, an organization that issues certified webshop trust marks. We show how we can detect phishing, spear phishing, dormant phishing attacks, and brand misuse. To that end, we developed LogoMotive, an application that crawls domain names, generates screenshots, and detects logos using supervised machine learning. LogoMotive is operational in the .nl registry, and it is generalizable to detect any other logo in any DNS zone to help identify abuse.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Spain)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 85.59
Price includes VAT (Spain)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 103.99
Price includes VAT (Spain)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Afroz, S., Greenstadt, R.: PhishZoo: detecting phishing websites by looking at them. In: 2011 IEEE Fifth International Conference on Semantic Computing. IEEE, September 2011. https://doi.org/10.1109/icsc.2011.52

  2. Arends, R., Austein, R., Larson, M., Massey, D., Rose, S.: DNS Security Introduction and Requirements. RFC 4033, IETF, March 2005. http://tools.ietf.org/rfc/rfc4033.txt

  3. Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008). https://doi.org/10.1016/j.cviu.2007.09.014

    Article  Google Scholar 

  4. Bijmans, H., Booij, T., Schwedersky, A., Nedgabat, A., van Wegberg, R.: Catching phishers by their bait: investigating the Dutch phishing landscape through phishing kit detection. In: USENIX Security 2021, pp. 3757–3774. USENIX Association, August 2021

    Google Scholar 

  5. Bozkir, A.S., Aydos, M.: LogoSENSE: a companion HOG based logo detection scheme for phishing web page and e-mail brand recognition. Comput. Secur. 95, 101855 (2020). https://doi.org/10.1016/j.cose.2020.101855

  6. Hesselman, C., Jansen, J., Wullink, M., Vink, K., Simon, M.: A privacy framework for DNS big data applications. Technical report, SIDN (2014). https://www.sidnlabs.nl/downloads/yBW6hBoaSZe4m6GJc_0b7w/2211058ab6330c7f3788141ea19d3db7/SIDN_Labs_Privacyraamwerk_Position_Paper_V1.4_ENG.pdf

  7. Chang, E.H., Chiew, K.L., Sze, S.N., Tiong, W.K.: Phishing detection via identification of website identity. In: 2013 International Conference on IT Convergence and Security (ICITCS). IEEE, December 2013. https://doi.org/10.1109/icitcs.2013.6717870

  8. Chiew, K.L., Chang, E.H., Sze, S.N., Tiong, W.K.: Utilisation of website logo for phishing detection. Comput. Secur. 54, 16–26 (2015). https://doi.org/10.1016/j.cose.2015.07.006

    Article  Google Scholar 

  9. CISA: Sophisticated Spearphishing Campaign Targets Government Organizations, IGOs, and NGOs, May 2021. https://us-cert.cisa.gov/ncas/alerts/aa21-148a

  10. Consumentenbond: Keurmerken webwinkels: hoe betrouwbaar zijn ze? (2019). https://www.consumentenbond.nl/online-kopen/keurmerken-webwinkels. Accessed 20 Oct 2021

  11. Cui, Q., Jourdan, G.-V., Bochmann, G.V., Onut, I.-V.: Proactive detection of phishing kit traffic. In: Sako, K., Tippenhauer, N.O. (eds.) ACNS 2021. LNCS, vol. 12727, pp. 257–286. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78375-4_11

    Chapter  Google Scholar 

  12. Eggert, C., Winschel, A., Lienhart, R.: On the benefit of synthetic data for company logo detection. In: Proceedings of the 23rd ACM International Conference on Multimedia. ACM, October 2015. https://doi.org/10.1145/2733373.2806407

  13. FBI: FBI Warns Public to Beware of Government Impersonation Scams, April 2021. https://www.fbi.gov/contact-us/field-offices/boston/news/press-releases/fbi-warns-public-to-beware-of-government-impersonation-scams

  14. Fielding, R., Reschke, J.: Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. RFC 7231, IETF, June 2014. http://tools.ietf.org/rfc/rfc7231.txt

  15. FTC: How To Avoid a Government Impersonator Scam, April 2021. https://www.consumer.ftc.gov/articles/how-avoid-government-impersonator-scam

  16. Goel, R.K.: Masquerading the government: drivers of government impersonation fraud. Public Finan. Rev. 49(4), 548–572 (2021)

    Article  Google Scholar 

  17. Google: Google Public DNS (2021). https://developers.google.com/speed/public-dns/

  18. Google Inc.: Certificate transparency. https://certificate.transparency.dev/

  19. Han, Y., Shen, Y.: Accurate spear phishing campaign attribution and early detection. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing. ACM, April 2016. https://doi.org/10.1145/2851613.2851801

  20. Hesselman, C., Moura, G.C., Schmidt, R.D.O., Toet, C.: Increasing DNS security and stability through a control plane for top-level domain operators. IEEE Commun. Mag. 55(1), 197–203 (2017). https://doi.org/10.1109/mcom.2017.1600521cm

    Article  Google Scholar 

  21. Hill, K.: The Secretive Company That Might End Privacy as We Know It, January 2020. https://www.nytimes.com/2020/01/18/technology/clearview-privacy-facial-recognition.html

  22. Hoffman, P., Sullivan, A., Fujiwara, K.: DNS Terminology. RFC 8499, IETF, November 2018. http://tools.ietf.org/rfc/rfc8499.txt

  23. Introna, L.D.: Disclosive ethics and information technology: disclosing facial recognition systems. Ethics Inf. Technol. 7(2), 75–86 (2005). https://doi.org/10.1007/s10676-005-4583-2

    Article  Google Scholar 

  24. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2017)

    Google Scholar 

  25. Kucherawy, M., Zwicky, E.: Domain-based Message Authentication, Reporting, and Conformance (DMARC). RFC 7489, IETF, March 2015. http://tools.ietf.org/rfc/rfc7489.txt

  26. Lauinger, T., Buyukkayhan, A.S., Chaabane, A., Robertson, W., Kirda, E.: From deletion to re-registration in zero seconds. In: Proceedings of the Internet Measurement Conference 2018. ACM, October 2018. https://doi.org/10.1145/3278532.3278560

  27. Le, A., Markopoulou, A., Faloutsos, M.: PhishDef: URL names say it all. In: 2011 Proceedings IEEE INFOCOM. IEEE, April 2011. https://doi.org/10.1109/infcom.2011.5934995

  28. Li, Y., Yang, Z., Chen, X., Yuan, H., Liu, W.: A stacking model using URL and HTML features for phishing webpage detection. Futur. Gener. Comput. Syst. 94, 27–39 (2019). https://doi.org/10.1016/j.future.2018.11.004

    Article  Google Scholar 

  29. Lin, Y., et al.: Phishpedia: a hybrid deep learning based approach to visually identify phishing webpages. In: 30th USENIX Security Symposium (USENIX Security 2021) (2021)

    Google Scholar 

  30. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  31. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004). https://doi.org/10.1023/b:visi.0000029664.99615.94

    Article  Google Scholar 

  32. Netcraft Ltd.: Netcraft, 10 October 2021. https://www.netcraft.com/

  33. Markt, A.C.: Onderzoek naar de kennis, houding en gedrag van consumenten ten aanzien van keurmerken (2016). https://web.archive.org/web/20180420203000/www.thuiswinkel.org/data/uploads/publication/ACM_en_GfK_onderzoek_keurmerken_2016.pdf. Accessed 20 Oct 2021

  34. Mockapetris, P.: Domain names - implementation and specification. RFC 1035, IETF, November 1987. http://tools.ietf.org/rfc/rfc1035.txt

  35. Moura, G.C.M., Heidemann, J., Müller, M., de O. Schmidt, R., Davids, M.: When the dike breaks. In: Proceedings of the Internet Measurement Conference 2018. ACM, October 2018. https://doi.org/10.1145/3278532.3278534

  36. Moura, G.C.M., Heidemann, J., de O. Schmidt, R., Hardaker, W.: Cache me if you can. In: Proceedings of the Internet Measurement Conference. ACM, October 2019. https://doi.org/10.1145/3355369.3355568

  37. Mozurl, P.: One Month, 500,000 Face Scans: How China Is Using A.I. to Profile a Minority, April 2019. https://www.nytimes.com/2019/04/14/technology/china-surveillance-artificial-intelligence-racial-profiling.html

  38. Munro, R.: Human-in-the-Loop Machine Learning. Manning Publications, New York, October 2021

    Google Scholar 

  39. Nguyen, L.A.T., To, B.L., Nguyen, H.K., Nguyen, M.H.: A novel approach for phishing detection using URL-based heuristic. In: 2014 International Conference on Computing, Management and Telecommunications (ComManTel), pp. 298–303. IEEE (2014)

    Google Scholar 

  40. Nieuws, R.: Politiegeheimen op straat door verlopen mailadressen (2017). https://www.rtlnieuws.nl/nieuws/nederland/artikel/240411/politiegeheimen-op-straat-door-verlopen-mailadressen. Accessed 15 Oct 2021

  41. Nieuws, R.: Groot datalek bij jeugdzorg: dossiers duizenden kwetsbare kinderen gelekt (2019). https://www.rtlnieuws.nl/tech/artikel/4672826/jeugdzorg-datalek-dossiers-kinderen-utrecht-email. Accessed 15 Oct 2021

  42. Oest, A., Safei, Y., Doupe, A., Ahn, G.J., Wardman, B., Warner, G.: Inside a phisher’s mind: understanding the anti-phishing ecosystem through phishing kit analysis. In: 2018 APWG Symposium on Electronic Crime Research (eCrime). IEEE, May 2018. https://doi.org/10.1109/ecrime.2018.8376206

  43. Opara, C., Wei, B., Chen, Y.: HTMLPhish: enabling phishing web page detection by applying deep learning techniques on HTML analysis. In: 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, July 2020. https://doi.org/10.1109/ijcnn48605.2020.9207707

  44. Quan, L., Heidemann, J., Pradkin, Y.: When the internet sleeps. In: Proceedings of the 2014 Conference on Internet Measurement Conference. ACM, November 2014. https://doi.org/10.1145/2663716.2663721

  45. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, June 2016. https://doi.org/10.1109/cvpr.2016.91

  46. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/tpami.2016.2577031

    Article  Google Scholar 

  47. van Riel, C.B., van den Ban, A.: The added value of corporate logos - an empirical study. Eur. J. Mark. 35(3/4), 428–440 (2001). https://doi.org/10.1108/03090560110382093

    Article  Google Scholar 

  48. Roopak, S., Thomas, T.: A novel phishing page detection mechanism using HTML source code comparison and cosine similarity. In: 2014 Fourth International Conference on Advances in Computing and Communications. IEEE, August 2014. https://doi.org/10.1109/icacc.2014.47

  49. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: 2011 International Conference on Computer Vision. IEEE, November 2011. https://doi.org/10.1109/iccv.2011.6126544

  50. Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from URLs. Expert Syst. Appl. 117, 345–357 (2019)

    Article  Google Scholar 

  51. Sanchez, S.A., Romero, H.J., Morales, A.D.: A review: comparison of performance metrics of pretrained models for object detection using the TensorFlow framework. In: IOP Conference Series: Materials Science and Engineering, vol. 844, p. 012024, June 2020. https://doi.org/10.1088/1757-899x/844/1/012024

  52. Shao, S., et al.: Objects365: a large-scale, high-quality dataset for object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, October 2019. https://doi.org/10.1109/iccv.2019.00852

  53. Software Freedom Conservancy: Selenium hub. https://hub.docker.com/r/selenium/hub/tags

  54. Srivastava, S., Divekar, A.V., Anilkumar, C., Naik, I., Kulkarni, V., Pattabiraman, V.: Comparative analysis of deep learning image detection algorithms. J. Big Data 8(1), 1–27 (2021). https://doi.org/10.1186/s40537-021-00434-w

    Article  Google Scholar 

  55. Stringhini, G., Thonnard, O.: That ain’t you: blocking spearphishing through behavioral modelling. In: Almgren, M., Gulisano, V., Maggi, F. (eds.) DIMVA 2015. LNCS, vol. 9148, pp. 78–97. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20550-2_5

    Chapter  Google Scholar 

  56. Su, H., Zhu, X., Gong, S.: Deep learning logo detection with data expansion by synthesising context. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, March 2017. https://doi.org/10.1109/wacv.2017.65

  57. Ultralytics: Yolov5. https://github.com/ultralytics/yolov5

  58. Wabeke, T., Moura, G.C.M., Franken, N., Hesselman, C.: Counterfighting counterfeit: detecting and taking down fraudulent webshops at a ccTLD. In: Sperotto, A., Dainotti, A., Stiller, B. (eds.) PAM 2020. LNCS, vol. 12048, pp. 158–174. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44081-7_10

    Chapter  Google Scholar 

  59. Wang, D.Y., et al.: Search + seizure. In: Proceedings of the 2014 Conference on Internet Measurement Conference. ACM, November 2014. https://doi.org/10.1145/2663716.2663738

  60. Wang, G., et al.: Verilogo: proactive phishing detection via logo recognition. Department of Computer Science and Engineering, University of California (2011)

    Google Scholar 

  61. Wilson, J.M., Grammich, C.A.: Brand protection across the enterprise: toward a total-business solution. Bus. Horiz. 63(3), 363–376 (2020). https://doi.org/10.1016/j.bushor.2020.02.002

    Article  Google Scholar 

  62. Wullink, M., Moura, G.C.M., Hesselman, C.: DMAP: automating domain name ecosystem measurements and applications. In: 2018 Network Traffic Measurement and Analysis Conference (TMA). IEEE, June 2018. https://doi.org/10.23919/tma.2018.8506521

  63. Wullink, M., Moura, G.C.M., Muller, M., Hesselman, C.: ENTRADA: a high-performance network traffic data streaming warehouse. In: NOMS 2016–2016 IEEE/IFIP Network Operations and Management Symposium. IEEE, April 2016. https://doi.org/10.1109/noms.2016.7502925

  64. Yao, W., Ding, Y., Li, X.: Deep learning for phishing detection. In: ISPA/IUCC/BDCloud/SocialCom/SustainCom. IEEE, December 2018. https://doi.org/10.1109/bdcloud.2018.00099

  65. Yao, W., Ding, Y., Li, X.: LogoPhish: a new two-dimensional code phishing attack detection method. In: ISPA/IUCC/BDCloud/SocialCom/SustainCom. IEEE, December 2018. https://doi.org/10.1109/bdcloud.2018.00045

  66. Zhou, Y., Zhang, Y., **ao, J., Wang, Y., Lin, W.: Visual similarity based anti-phishing with the combination of local and global features. In: 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications, pp. 189–196. IEEE (2014)

    Google Scholar 

Download references

Acknowledgments

We thank very much the manual validation and annotation work carried by the anonymous analysts at the Dutch national government and Thuiswinkel Waarborg, for more than 10k domain names. We would also like to thank our colleagues at SIDN for reviewing and indirectly contributing to this study.

SIDN was partly funded by the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement No 830927 (https://cordis.europa.eu/project/id/830927). Project website: https://www.concordia-h2020.eu/.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thijs van den Hout .

Editor information

Editors and Affiliations

A Appendix: LogoMotive Dashboard

A Appendix: LogoMotive Dashboard

(See Fig. 9).

Fig. 9.
figure 9

Dashboard annotation pop-up screen

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

van den Hout, T., Wabeke, T., Moura, G.C.M., Hesselman, C. (2022). LogoMotive: Detecting Logos on Websites to Identify Online Scams - A TLD Case Study. In: Hohlfeld, O., Moura, G., Pelsser, C. (eds) Passive and Active Measurement. PAM 2022. Lecture Notes in Computer Science, vol 13210. Springer, Cham. https://doi.org/10.1007/978-3-030-98785-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-98785-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-98784-8

  • Online ISBN: 978-3-030-98785-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation