LogoMotive: Detecting Logos on Websites to Identify Online Scams - A TLD Case Study

van den Hout, Thijs; Wabeke, Thymen; Moura, Giovane C. M.; Hesselman, Cristian

doi:10.1007/978-3-030-98785-5_1

Thijs van den Hout¹¹,
Thymen Wabeke¹¹,
Giovane C. M. Moura^11,12 &
…
Cristian Hesselman^11,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13210))

Included in the following conference series:

International Conference on Passive and Active Network Measurement

2057 Accesses

Abstract

Logos give a website a familiar feel and promote trust. Scammers take advantage of that by using well-known organizations’ logos on malicious websites. Unsuspecting Internet users see these logos and think they are looking at a government website or legitimate webshop, when it is a phishing site, a counterfeit webshop, or a site set up to spread misinformation. We present the largest logo detection study on websites to date. We analyze 6.2M domain names from the Netherlands ’ country-code top-level domain .nl, in two case studies to detect logo misuse for two organizations: the Dutch national government and Thuiswinkel Waarborg, an organization that issues certified webshop trust marks. We show how we can detect phishing, spear phishing, dormant phishing attacks, and brand misuse. To that end, we developed LogoMotive, an application that crawls domain names, generates screenshots, and detects logos using supervised machine learning. LogoMotive is operational in the .nl registry, and it is generalizable to detect any other logo in any DNS zone to help identify abuse.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Spain)

eBook: EUR 85.59; Price includes VAT (Spain)

Softcover Book: EUR 103.99; Price includes VAT (Spain)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Mentor: Positive DNS Reputation to Skim-Off Benign Domains in Botnet C&C Blacklists

Empirically Measuring WHOIS Misuse

Google Dorks: Analysis, Creation, and New Defenses

References

Afroz, S., Greenstadt, R.: PhishZoo: detecting phishing websites by looking at them. In: 2011 IEEE Fifth International Conference on Semantic Computing. IEEE, September 2011. https://doi.org/10.1109/icsc.2011.52
Arends, R., Austein, R., Larson, M., Massey, D., Rose, S.: DNS Security Introduction and Requirements. RFC 4033, IETF, March 2005. http://tools.ietf.org/rfc/rfc4033.txt
Bay, H., Ess, A., Tuytelaars, T., Gool, L.V.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008). https://doi.org/10.1016/j.cviu.2007.09.014
Article Google Scholar
Bijmans, H., Booij, T., Schwedersky, A., Nedgabat, A., van Wegberg, R.: Catching phishers by their bait: investigating the Dutch phishing landscape through phishing kit detection. In: USENIX Security 2021, pp. 3757–3774. USENIX Association, August 2021
Google Scholar
Bozkir, A.S., Aydos, M.: LogoSENSE: a companion HOG based logo detection scheme for phishing web page and e-mail brand recognition. Comput. Secur. 95, 101855 (2020). https://doi.org/10.1016/j.cose.2020.101855
Hesselman, C., Jansen, J., Wullink, M., Vink, K., Simon, M.: A privacy framework for DNS big data applications. Technical report, SIDN (2014). https://www.sidnlabs.nl/downloads/yBW6hBoaSZe4m6GJc_0b7w/2211058ab6330c7f3788141ea19d3db7/SIDN_Labs_Privacyraamwerk_Position_Paper_V1.4_ENG.pdf
Chang, E.H., Chiew, K.L., Sze, S.N., Tiong, W.K.: Phishing detection via identification of website identity. In: 2013 International Conference on IT Convergence and Security (ICITCS). IEEE, December 2013. https://doi.org/10.1109/icitcs.2013.6717870
Chiew, K.L., Chang, E.H., Sze, S.N., Tiong, W.K.: Utilisation of website logo for phishing detection. Comput. Secur. 54, 16–26 (2015). https://doi.org/10.1016/j.cose.2015.07.006
Article Google Scholar
CISA: Sophisticated Spearphishing Campaign Targets Government Organizations, IGOs, and NGOs, May 2021. https://us-cert.cisa.gov/ncas/alerts/aa21-148a
Consumentenbond: Keurmerken webwinkels: hoe betrouwbaar zijn ze? (2019). https://www.consumentenbond.nl/online-kopen/keurmerken-webwinkels. Accessed 20 Oct 2021
Cui, Q., Jourdan, G.-V., Bochmann, G.V., Onut, I.-V.: Proactive detection of phishing kit traffic. In: Sako, K., Tippenhauer, N.O. (eds.) ACNS 2021. LNCS, vol. 12727, pp. 257–286. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78375-4_11
Chapter Google Scholar
Eggert, C., Winschel, A., Lienhart, R.: On the benefit of synthetic data for company logo detection. In: Proceedings of the 23rd ACM International Conference on Multimedia. ACM, October 2015. https://doi.org/10.1145/2733373.2806407
FBI: FBI Warns Public to Beware of Government Impersonation Scams, April 2021. https://www.fbi.gov/contact-us/field-offices/boston/news/press-releases/fbi-warns-public-to-beware-of-government-impersonation-scams
Fielding, R., Reschke, J.: Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. RFC 7231, IETF, June 2014. http://tools.ietf.org/rfc/rfc7231.txt
FTC: How To Avoid a Government Impersonator Scam, April 2021. https://www.consumer.ftc.gov/articles/how-avoid-government-impersonator-scam
Goel, R.K.: Masquerading the government: drivers of government impersonation fraud. Public Finan. Rev. 49(4), 548–572 (2021)
Article Google Scholar
Google: Google Public DNS (2021). https://developers.google.com/speed/public-dns/
Google Inc.: Certificate transparency. https://certificate.transparency.dev/
Han, Y., Shen, Y.: Accurate spear phishing campaign attribution and early detection. In: Proceedings of the 31st Annual ACM Symposium on Applied Computing. ACM, April 2016. https://doi.org/10.1145/2851613.2851801
Hesselman, C., Moura, G.C., Schmidt, R.D.O., Toet, C.: Increasing DNS security and stability through a control plane for top-level domain operators. IEEE Commun. Mag. 55(1), 197–203 (2017). https://doi.org/10.1109/mcom.2017.1600521cm
Article Google Scholar
Hill, K.: The Secretive Company That Might End Privacy as We Know It, January 2020. https://www.nytimes.com/2020/01/18/technology/clearview-privacy-facial-recognition.html
Hoffman, P., Sullivan, A., Fujiwara, K.: DNS Terminology. RFC 8499, IETF, November 2018. http://tools.ietf.org/rfc/rfc8499.txt
Introna, L.D.: Disclosive ethics and information technology: disclosing facial recognition systems. Ethics Inf. Technol. 7(2), 75–86 (2005). https://doi.org/10.1007/s10676-005-4583-2
Article Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2017)
Google Scholar
Kucherawy, M., Zwicky, E.: Domain-based Message Authentication, Reporting, and Conformance (DMARC). RFC 7489, IETF, March 2015. http://tools.ietf.org/rfc/rfc7489.txt
Lauinger, T., Buyukkayhan, A.S., Chaabane, A., Robertson, W., Kirda, E.: From deletion to re-registration in zero seconds. In: Proceedings of the Internet Measurement Conference 2018. ACM, October 2018. https://doi.org/10.1145/3278532.3278560
Le, A., Markopoulou, A., Faloutsos, M.: PhishDef: URL names say it all. In: 2011 Proceedings IEEE INFOCOM. IEEE, April 2011. https://doi.org/10.1109/infcom.2011.5934995
Li, Y., Yang, Z., Chen, X., Yuan, H., Liu, W.: A stacking model using URL and HTML features for phishing webpage detection. Futur. Gener. Comput. Syst. 94, 27–39 (2019). https://doi.org/10.1016/j.future.2018.11.004
Article Google Scholar
Lin, Y., et al.: Phishpedia: a hybrid deep learning based approach to visually identify phishing webpages. In: 30th USENIX Security Symposium (USENIX Security 2021) (2021)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004). https://doi.org/10.1023/b:visi.0000029664.99615.94
Article Google Scholar
Netcraft Ltd.: Netcraft, 10 October 2021. https://www.netcraft.com/
Markt, A.C.: Onderzoek naar de kennis, houding en gedrag van consumenten ten aanzien van keurmerken (2016). https://web.archive.org/web/20180420203000/www.thuiswinkel.org/data/uploads/publication/ACM_en_GfK_onderzoek_keurmerken_2016.pdf. Accessed 20 Oct 2021
Mockapetris, P.: Domain names - implementation and specification. RFC 1035, IETF, November 1987. http://tools.ietf.org/rfc/rfc1035.txt
Moura, G.C.M., Heidemann, J., Müller, M., de O. Schmidt, R., Davids, M.: When the dike breaks. In: Proceedings of the Internet Measurement Conference 2018. ACM, October 2018. https://doi.org/10.1145/3278532.3278534
Moura, G.C.M., Heidemann, J., de O. Schmidt, R., Hardaker, W.: Cache me if you can. In: Proceedings of the Internet Measurement Conference. ACM, October 2019. https://doi.org/10.1145/3355369.3355568
Mozurl, P.: One Month, 500,000 Face Scans: How China Is Using A.I. to Profile a Minority, April 2019. https://www.nytimes.com/2019/04/14/technology/china-surveillance-artificial-intelligence-racial-profiling.html
Munro, R.: Human-in-the-Loop Machine Learning. Manning Publications, New York, October 2021
Google Scholar
Nguyen, L.A.T., To, B.L., Nguyen, H.K., Nguyen, M.H.: A novel approach for phishing detection using URL-based heuristic. In: 2014 International Conference on Computing, Management and Telecommunications (ComManTel), pp. 298–303. IEEE (2014)
Google Scholar
Nieuws, R.: Politiegeheimen op straat door verlopen mailadressen (2017). https://www.rtlnieuws.nl/nieuws/nederland/artikel/240411/politiegeheimen-op-straat-door-verlopen-mailadressen. Accessed 15 Oct 2021
Nieuws, R.: Groot datalek bij jeugdzorg: dossiers duizenden kwetsbare kinderen gelekt (2019). https://www.rtlnieuws.nl/tech/artikel/4672826/jeugdzorg-datalek-dossiers-kinderen-utrecht-email. Accessed 15 Oct 2021
Oest, A., Safei, Y., Doupe, A., Ahn, G.J., Wardman, B., Warner, G.: Inside a phisher’s mind: understanding the anti-phishing ecosystem through phishing kit analysis. In: 2018 APWG Symposium on Electronic Crime Research (eCrime). IEEE, May 2018. https://doi.org/10.1109/ecrime.2018.8376206
Opara, C., Wei, B., Chen, Y.: HTMLPhish: enabling phishing web page detection by applying deep learning techniques on HTML analysis. In: 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, July 2020. https://doi.org/10.1109/ijcnn48605.2020.9207707
Quan, L., Heidemann, J., Pradkin, Y.: When the internet sleeps. In: Proceedings of the 2014 Conference on Internet Measurement Conference. ACM, November 2014. https://doi.org/10.1145/2663716.2663721
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, June 2016. https://doi.org/10.1109/cvpr.2016.91
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/tpami.2016.2577031
Article Google Scholar
van Riel, C.B., van den Ban, A.: The added value of corporate logos - an empirical study. Eur. J. Mark. 35(3/4), 428–440 (2001). https://doi.org/10.1108/03090560110382093
Article Google Scholar
Roopak, S., Thomas, T.: A novel phishing page detection mechanism using HTML source code comparison and cosine similarity. In: 2014 Fourth International Conference on Advances in Computing and Communications. IEEE, August 2014. https://doi.org/10.1109/icacc.2014.47
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: 2011 International Conference on Computer Vision. IEEE, November 2011. https://doi.org/10.1109/iccv.2011.6126544
Sahingoz, O.K., Buber, E., Demir, O., Diri, B.: Machine learning based phishing detection from URLs. Expert Syst. Appl. 117, 345–357 (2019)
Article Google Scholar
Sanchez, S.A., Romero, H.J., Morales, A.D.: A review: comparison of performance metrics of pretrained models for object detection using the TensorFlow framework. In: IOP Conference Series: Materials Science and Engineering, vol. 844, p. 012024, June 2020. https://doi.org/10.1088/1757-899x/844/1/012024
Shao, S., et al.: Objects365: a large-scale, high-quality dataset for object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, October 2019. https://doi.org/10.1109/iccv.2019.00852
Software Freedom Conservancy: Selenium hub. https://hub.docker.com/r/selenium/hub/tags
Srivastava, S., Divekar, A.V., Anilkumar, C., Naik, I., Kulkarni, V., Pattabiraman, V.: Comparative analysis of deep learning image detection algorithms. J. Big Data 8(1), 1–27 (2021). https://doi.org/10.1186/s40537-021-00434-w
Article Google Scholar
Stringhini, G., Thonnard, O.: That ain’t you: blocking spearphishing through behavioral modelling. In: Almgren, M., Gulisano, V., Maggi, F. (eds.) DIMVA 2015. LNCS, vol. 9148, pp. 78–97. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20550-2_5
Chapter Google Scholar
Su, H., Zhu, X., Gong, S.: Deep learning logo detection with data expansion by synthesising context. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, March 2017. https://doi.org/10.1109/wacv.2017.65
Ultralytics: Yolov5. https://github.com/ultralytics/yolov5
Wabeke, T., Moura, G.C.M., Franken, N., Hesselman, C.: Counterfighting counterfeit: detecting and taking down fraudulent webshops at a ccTLD. In: Sperotto, A., Dainotti, A., Stiller, B. (eds.) PAM 2020. LNCS, vol. 12048, pp. 158–174. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-44081-7_10
Chapter Google Scholar
Wang, D.Y., et al.: Search + seizure. In: Proceedings of the 2014 Conference on Internet Measurement Conference. ACM, November 2014. https://doi.org/10.1145/2663716.2663738
Wang, G., et al.: Verilogo: proactive phishing detection via logo recognition. Department of Computer Science and Engineering, University of California (2011)
Google Scholar
Wilson, J.M., Grammich, C.A.: Brand protection across the enterprise: toward a total-business solution. Bus. Horiz. 63(3), 363–376 (2020). https://doi.org/10.1016/j.bushor.2020.02.002
Article Google Scholar
Wullink, M., Moura, G.C.M., Hesselman, C.: DMAP: automating domain name ecosystem measurements and applications. In: 2018 Network Traffic Measurement and Analysis Conference (TMA). IEEE, June 2018. https://doi.org/10.23919/tma.2018.8506521
Wullink, M., Moura, G.C.M., Muller, M., Hesselman, C.: ENTRADA: a high-performance network traffic data streaming warehouse. In: NOMS 2016–2016 IEEE/IFIP Network Operations and Management Symposium. IEEE, April 2016. https://doi.org/10.1109/noms.2016.7502925
Yao, W., Ding, Y., Li, X.: Deep learning for phishing detection. In: ISPA/IUCC/BDCloud/SocialCom/SustainCom. IEEE, December 2018. https://doi.org/10.1109/bdcloud.2018.00099
Yao, W., Ding, Y., Li, X.: LogoPhish: a new two-dimensional code phishing attack detection method. In: ISPA/IUCC/BDCloud/SocialCom/SustainCom. IEEE, December 2018. https://doi.org/10.1109/bdcloud.2018.00045
Zhou, Y., Zhang, Y., **ao, J., Wang, Y., Lin, W.: Visual similarity based anti-phishing with the combination of local and global features. In: 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications, pp. 189–196. IEEE (2014)
Google Scholar

Download references

Acknowledgments

We thank very much the manual validation and annotation work carried by the anonymous analysts at the Dutch national government and Thuiswinkel Waarborg, for more than 10k domain names. We would also like to thank our colleagues at SIDN for reviewing and indirectly contributing to this study.

SIDN was partly funded by the European Union’s Horizon 2020 Research and Innovation programme under Grant Agreement No 830927 (https://cordis.europa.eu/project/id/830927). Project website: https://www.concordia-h2020.eu/.

Author information

Authors and Affiliations

SIDN Labs, Arnhem, The Netherlands
Thijs van den Hout, Thymen Wabeke, Giovane C. M. Moura & Cristian Hesselman
TU Delft, Delft, The Netherlands
Giovane C. M. Moura
University of Twente, Enschede, The Netherlands
Cristian Hesselman

Authors

Thijs van den Hout
View author publications
You can also search for this author in PubMed Google Scholar
Thymen Wabeke
View author publications
You can also search for this author in PubMed Google Scholar
Giovane C. M. Moura
View author publications
You can also search for this author in PubMed Google Scholar
Cristian Hesselman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thijs van den Hout .

Editor information

Editors and Affiliations

Brandenburg University of Technology, Cottbus, Germany
Oliver Hohlfeld
SIDN Labs - TU Delft, Arnhem, The Netherlands
Giovane Moura
ICube - University of Strasbourg, Illkirch, France
Cristel Pelsser

A Appendix: LogoMotive Dashboard

(See Fig. 9).

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van den Hout, T., Wabeke, T., Moura, G.C.M., Hesselman, C. (2022). LogoMotive: Detecting Logos on Websites to Identify Online Scams - A TLD Case Study. In: Hohlfeld, O., Moura, G., Pelsser, C. (eds) Passive and Active Measurement. PAM 2022. Lecture Notes in Computer Science, vol 13210. Springer, Cham. https://doi.org/10.1007/978-3-030-98785-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-98785-5_1
Published: 22 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98784-8
Online ISBN: 978-3-030-98785-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

LogoMotive: Detecting Logos on Websites to Identify Online Scams - A TLD Case Study

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Mentor: Positive DNS Reputation to Skim-Off Benign Domains in Botnet C&C Blacklists

Empirically Measuring WHOIS Misuse

Google Dorks: Analysis, Creation, and New Defenses

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix: LogoMotive Dashboard

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

LogoMotive: Detecting Logos on Websites to Identify Online Scams - A TLD Case Study

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Mentor: Positive DNS Reputation to Skim-Off Benign Domains in Botnet C&C Blacklists

Empirically Measuring WHOIS Misuse

Google Dorks: Analysis, Creation, and New Defenses

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix: LogoMotive Dashboard

A Appendix: LogoMotive Dashboard

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation