Forecasting Click Fraud via Machine Learning Algorithms

  • Conference paper
  • First Online:
Codes, Cryptology and Information Security (C2SI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13874))

  • 318 Accesses

Abstract

Click fraud, the manipulation of online advertisement traffic figures, is becoming a major concern for businesses that advertise online. This can lead to financial losses and inaccurate click statistics. To address this problem, it is essential to have a reliable method for identifying click fraud. This includes distinguishing between legitimate clicks made by users and fraudulent clicks generated by bots or other software, which enables companies to advertise their products safely. The XGBoost model was trained on the TalkingData AdTracking Fraud Detection dataset from Kaggle, using binary classification to predict the likelihood of a click being fraudulent. The importance of data treatment was taken into consideration in the model training process, by carefully preprocessing and cleaning the data before feeding it into the model. This helped to improve the accuracy and performance of the model by reaching an AUC of 0.96 and LogLoss of 0.15.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Spain)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 67.40
Price includes VAT (Spain)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 83.19
Price includes VAT (Spain)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Digital advertising soared 35% to \$189 billion in 2021 according to the IAB Internet Advertising Revenue Report. Retrieved 15 June 2022 (2022). https://www.iab.com/news/digital-advertising-soared-35-to-189-billion-in-2021-according-to-the-iab-internet-advertising-revenue-report/

  2. Fourberg, N., et al.; Online Advertising: The Impact of Targeted Advertising on Advertisers, Market Access and Consumer Choice (2021). https://www.europarl.europa.eu/thinktank/en/document.html?reference=IPOL_STU%282021%29662913

  3. Thejas, G.S., Boroojeni, K.G., Chandna, K., Bhatia, I., Iyengar, S.S., Sunitha, N.R.: Deep Learning-based model to fight against Ad click fraud. In; Proceedings of the 2019 ACM Southeast Conference. ACM SE 2019. ACM (2019). https://doi.org/10.1145/3299815.3314453

  4. GS, T., Soni, J., Chandna, K., Iyengar, S.S., Sunitha, N.R., Prabakar, N.: Learning-based model to fight against fake like clicks on instagram posts. In: 2019 SoutheastCon. IEEE (2019). https://doi.org/10.1109/southeastcon42311.2019.9020533

  5. Thejas, G.S., et al.: A multi-time-scale time series analysis for click fraud forecasting using binary labeled imbalanced dataset. In: 2019 4th International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS). IEEE (2019). https://doi.org/10.1109/csitss47250.2019.9031036

  6. Crussell, J., Stevens, R., Chen, H.: MAdFraud. In: Proceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services. MobiSys 2014. ACM (2014). https://doi.org/10.1145/2594368.2594391

  7. Kwon, J., Kim, J., Lee, J., Lee, H., Perrig, A.: PsyBoG: power spectral density analysis for detecting botnet groups. In: 2014 9th International Conference on Malicious and Unwanted Software: The Americas (MALWARE). IEEE (2014). https://doi.org/10.1109/malware.2014.6999414

  8. Kantardzic, M., Walgampaya, C., Yampolskiy, R., Joung Woo, R.: Click fraud prevention via multimodal evidence fusion by Dempster-Shafer theory. In: 2010 IEEE Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI 2010). IEEE (2010). https://doi.org/10.1109/mfi.2010.5604480

  9. Ge, L., Kantardzic, M., King, D.: CCFDP: collaborative click fraud detection and prevention system. In: 18th International Conference on Computer Application in Industry and Engineering - CAINE 2005, Honolulu (2005)

    Google Scholar 

  10. Iqbal, M.S., Zulkernine, M., Jaafar, F., Gu, Y.: FCFraud: fighting click-fraud from the user side. In: 2016 IEEE 17th International Symposium on High Assurance Systems Engineering (HASE). IEEE (2016). https://doi.org/10.1109/hase.2016.17

  11. Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004). https://doi.org/10.1145/1007730.1007735

  12. Talkingdata Adtracking Fraud Detection Challenge. https://www.kaggle.com/competitions/talkingdata-adtracking-fraud-detection/data. Accessed 24 June 2022

  13. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

  14. Friedman, J. H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001). https://www.jstor.org/stable/2699986

  15. Chen, T., Guestrin, C.: XGBoost. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD 2016. ACM (2016). https://doi.org/10.1145/2939672.2939785

  16. Urbanowicz, R.J., Moore, J.H.: The application of Michigan-style learning classifiersystems to address genetic heterogeneity and epistasisin association studies. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation. GECCO 2010. ACM (2010). https://doi.org/10.1145/1830483.1830518

  17. Qiu, X., Zuo, Y., Liu, G.: ETCF: an ensemble model for CTR prediction. In: 2018 15th International Conference on Service Systems and Service Management (ICSSSM), pp. 1–5 (2018). https://doi.org/10.1109/ICSSSM.2018.8465044

  18. Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. Roy. Stat. Soc. Ser. B (Methodol.) 36(2), 111–147 (1974). https://www.jstor.org/stable/2984809

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nadir Sahllal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sahllal, N., Souidi, E.M. (2023). Forecasting Click Fraud via Machine Learning Algorithms. In: El Hajji, S., Mesnager, S., Souidi, E.M. (eds) Codes, Cryptology and Information Security. C2SI 2023. Lecture Notes in Computer Science, vol 13874. Springer, Cham. https://doi.org/10.1007/978-3-031-33017-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-33017-9_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-33016-2

  • Online ISBN: 978-3-031-33017-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation