Abstract
Click fraud, the manipulation of online advertisement traffic figures, is becoming a major concern for businesses that advertise online. This can lead to financial losses and inaccurate click statistics. To address this problem, it is essential to have a reliable method for identifying click fraud. This includes distinguishing between legitimate clicks made by users and fraudulent clicks generated by bots or other software, which enables companies to advertise their products safely. The XGBoost model was trained on the TalkingData AdTracking Fraud Detection dataset from Kaggle, using binary classification to predict the likelihood of a click being fraudulent. The importance of data treatment was taken into consideration in the model training process, by carefully preprocessing and cleaning the data before feeding it into the model. This helped to improve the accuracy and performance of the model by reaching an AUC of 0.96 and LogLoss of 0.15.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Digital advertising soared 35% to \$189 billion in 2021 according to the IAB Internet Advertising Revenue Report. Retrieved 15 June 2022 (2022). https://www.iab.com/news/digital-advertising-soared-35-to-189-billion-in-2021-according-to-the-iab-internet-advertising-revenue-report/
Fourberg, N., et al.; Online Advertising: The Impact of Targeted Advertising on Advertisers, Market Access and Consumer Choice (2021). https://www.europarl.europa.eu/thinktank/en/document.html?reference=IPOL_STU%282021%29662913
Thejas, G.S., Boroojeni, K.G., Chandna, K., Bhatia, I., Iyengar, S.S., Sunitha, N.R.: Deep Learning-based model to fight against Ad click fraud. In; Proceedings of the 2019 ACM Southeast Conference. ACM SE 2019. ACM (2019). https://doi.org/10.1145/3299815.3314453
GS, T., Soni, J., Chandna, K., Iyengar, S.S., Sunitha, N.R., Prabakar, N.: Learning-based model to fight against fake like clicks on instagram posts. In: 2019 SoutheastCon. IEEE (2019). https://doi.org/10.1109/southeastcon42311.2019.9020533
Thejas, G.S., et al.: A multi-time-scale time series analysis for click fraud forecasting using binary labeled imbalanced dataset. In: 2019 4th International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS). IEEE (2019). https://doi.org/10.1109/csitss47250.2019.9031036
Crussell, J., Stevens, R., Chen, H.: MAdFraud. In: Proceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services. MobiSys 2014. ACM (2014). https://doi.org/10.1145/2594368.2594391
Kwon, J., Kim, J., Lee, J., Lee, H., Perrig, A.: PsyBoG: power spectral density analysis for detecting botnet groups. In: 2014 9th International Conference on Malicious and Unwanted Software: The Americas (MALWARE). IEEE (2014). https://doi.org/10.1109/malware.2014.6999414
Kantardzic, M., Walgampaya, C., Yampolskiy, R., Joung Woo, R.: Click fraud prevention via multimodal evidence fusion by Dempster-Shafer theory. In: 2010 IEEE Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI 2010). IEEE (2010). https://doi.org/10.1109/mfi.2010.5604480
Ge, L., Kantardzic, M., King, D.: CCFDP: collaborative click fraud detection and prevention system. In: 18th International Conference on Computer Application in Industry and Engineering - CAINE 2005, Honolulu (2005)
Iqbal, M.S., Zulkernine, M., Jaafar, F., Gu, Y.: FCFraud: fighting click-fraud from the user side. In: 2016 IEEE 17th International Symposium on High Assurance Systems Engineering (HASE). IEEE (2016). https://doi.org/10.1109/hase.2016.17
Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 6(1), 20–29 (2004). https://doi.org/10.1145/1007730.1007735
Talkingdata Adtracking Fraud Detection Challenge. https://www.kaggle.com/competitions/talkingdata-adtracking-fraud-detection/data. Accessed 24 June 2022
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
Friedman, J. H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001). https://www.jstor.org/stable/2699986
Chen, T., Guestrin, C.: XGBoost. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD 2016. ACM (2016). https://doi.org/10.1145/2939672.2939785
Urbanowicz, R.J., Moore, J.H.: The application of Michigan-style learning classifiersystems to address genetic heterogeneity and epistasisin association studies. In: Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation. GECCO 2010. ACM (2010). https://doi.org/10.1145/1830483.1830518
Qiu, X., Zuo, Y., Liu, G.: ETCF: an ensemble model for CTR prediction. In: 2018 15th International Conference on Service Systems and Service Management (ICSSSM), pp. 1–5 (2018). https://doi.org/10.1109/ICSSSM.2018.8465044
Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. Roy. Stat. Soc. Ser. B (Methodol.) 36(2), 111–147 (1974). https://www.jstor.org/stable/2984809
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sahllal, N., Souidi, E.M. (2023). Forecasting Click Fraud via Machine Learning Algorithms. In: El Hajji, S., Mesnager, S., Souidi, E.M. (eds) Codes, Cryptology and Information Security. C2SI 2023. Lecture Notes in Computer Science, vol 13874. Springer, Cham. https://doi.org/10.1007/978-3-031-33017-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-33017-9_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33016-2
Online ISBN: 978-3-031-33017-9
eBook Packages: Computer ScienceComputer Science (R0)