Log in

A semi-supervised Anti-Fraud model based on integrated XGBoost and BiGRU with self-attention network: an application to internet loan fraud detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Recently, fraud debt has been one of the major issues for Internet financial institutions. Due to fraudulent activities, huge losses are occurring in financial institutions. Hence, there is a need for a method of analyzing and detecting fraudulent transactions and separating them from genuine ones. Supervised learning approaches are mainly used for fraud detection since they consider the fraudulent set, which can be known from past transaction analysis. Though these models are interpretable, the prediction accuracy of these models remains challenging. However, these approaches fail to perform well when there are changes in customer behaviour. Moreover, it is complex to identify abnormal transactions due to data imbalance. Hence, this work presents a semi-supervised outlier score-based Anti-Fraud model to identify the loan applicant as a genuine or fraudulent debtor. The proposed work has the stages like a pre-processing module, Data augmentation and classification model. After per-processing the data, different outlier models such as Z-score and Isolation forest (IF) are applied to generate more data. Then, the Unsupervised K-Means Clustering (KMC) granularity-based Outlier scoring method is proposed to augment the datasets with too many scores. This clustering module clusters the loan applicants based on their credit history. Then, the Z-score and IF are applied to each cluster to augment the original dataset with different scores. This normalized data is input to the XGBoost-bidirectional Gated Recurrent unit (BiGRU) self-attention network (SAN). This XGB-BiGRU-SAN is used to capture more efficient dynamic information. Further, a mathematical model, an Arithmetic Optimization algorithm (AOA), is used to optimize the network weights. The performance of a proposed XGB-BiGRU-SAN Internet loan fraud detection is analyzed on the two benchmark datasets, like the leading club and bank loan status. The proposed XGB-BiGRU-SAN achieved better classification accuracy, precision and recall of 99.05%, 99.11% and 99.34% on the leading club dataset. Further, the accuracy, precision and recall values achieved in the bank loan status dataset are 98.67%, 98.82% and 98.62%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Data availability

Data sharing is not applicable to this article.

Abbreviations

IF:

Isolation forest

KMC:

K-Means Clustering

BiGRU:

Bidirectional Gated Recurrent Unit

SAN:

Self-attention network

ML:

Machine learning

SVM:

Support vector machine

KNN:

K-nearest neighbour

LR:

Logistic regression

NB:

Naïve Bayes

AO:

Arithmetic Optimizer

CNN:

Convolutional Neural Network

LSTM:

Long Short Term Memory Network

FFD:

Financial fraud detection

PCA:

Principal Component Analysis

t-SNE:

T-Distributed Stochastic Neighbor Embeddings

UMAP:

Uniform manifold approximation and projection

ABC:

Artificial bee colony

GA:

Genetic algorithm

ROC:

Receiver operating curve

AUC:

Area under the curve

References

  1. Al-Hashedi KG, Magalingam P (2021) Financial fraud detection applying data mining techniques: a comprehensive review from 2009 to 2019. Comput Sci Rev 40:100402

    Article  Google Scholar 

  2. Wang D, Lin J, Cui P, Jia Q, Wang Z, Fang Y, Yu Q, Zhou J, Yang S, Qi Y (2019) A semi-supervised graph attentive network for financial fraud detection. In IEEE International Conference on Data Mining (ICDM), Bei**g, China, pp 598–607

  3. Huang D, Mu D, Yang L, Cai X (2018) CoDetect: financial fraud detection with anomaly feature detection. IEEE Access 6:19161–19174

    Article  Google Scholar 

  4. Chaudhary K, Yadav J, Mallick B (2012) A review of fraud detection techniques: credit card. Int J Comput Appl 45(1):39–44

    Google Scholar 

  5. Majhi SK, Bhatachharya S, Pradhan R, Biswal S (2019) Fuzzy clustering using salp swarm algorithm for automobile insurance fraud detection. J Intell Fuzzy Syst 36(3):2333–2344

    Article  Google Scholar 

  6. Rtayli N, Enneya N (2020) Selection features and support vector machine for credit card risk identification. Procedia Manuf 46:941–948

    Article  Google Scholar 

  7. Itoo F, Meenakshi, Singh S (2021) Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. Int J Inf Technol 13:1503–1511

    Google Scholar 

  8. Ajah I, Inyiama C (1970) Loan fraud detection and IT-based combat strategies. J Internet Bank Commer 16(2):1–3

    Google Scholar 

  9. Arora N, Kaur PD (2020) A Bolasso based consistent feature selection enabled random forest classification algorithm: an application to credit risk assessment. Appl Soft Comput 86:105936

    Article  Google Scholar 

  10. Rahmawati D, Sarno R, Fatichah C, Sunaryono D (2017) Fraud detection on event log of bank financial credit business process using Hidden Markov Model algorithm. In IEEE 3rd International Conference on Science in Information Technology (ICSITech), Bandung, Indonesia, pp 35–40

  11. Abiola I, Oyewole AT (2013) Internal control system on fraud detection: Nigeria experience. J Account Financ 13(5):141–152

    Google Scholar 

  12. Ali A, AbdRazak S, Othman SH, Eisa TA, Al-Dhaqm A, Nasser M, Elhassan T, Elshafie H, Saif A (2022) Financial fraud detection based on machine learning: a systematic literature review. Appl Sci 12(19):9637

    Article  Google Scholar 

  13. Popat RR, Chaudhary J (2018) A survey on credit card fraud detection using machine learning. In IEEE 2nd international conference on trends in electronics and informatics (ICOEI), Tirunelveli, India, pp 1120–1125

  14. Nguyen TT, Tahir H, Abdelrazek M, Babar A (2020) Deep learning methods for credit card fraud detection. ar**v preprint ar**v:2012.03754

  15. Singla J (2020) A survey of deep learning based online transactions fraud detection systems. In IEEE International Conference on Intelligent Engineering and Management (ICIEM), London, UK, pp 130–136

  16. Roy A, Sun J, Mahoney R, Alonzi L, Adams S, Beling P (2018) Deep learning detecting fraud in credit card transactions. In IEEE Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, VA, USA, pp 129–134

  17. Abakarim Y, Lahby M, Attioui A (2018) An efficient real time model for credit card fraud detection based on deep learning. In: Proceedings of the 12th international conference on intelligent systems: theories and applications, Association for Computing Machinery, New York, NY, United States, pp 1–7

  18. Mubalaike AM, Adali E (2018) Deep learning approach for intelligent financial fraud detection system. In IEEE 3rd International Conference on Computer Science and Engineering (UBMK), Sarajevo, Bosnia and Herzegovina, pp 598–603

  19. Fang W, Li X, Zhou P, Yan J, Jiang D, Zhou T (2021) Deep learning anti-fraud model for internet loan: where we are going. IEEE Access 9:9777–9784

    Article  Google Scholar 

  20. Jan CL (2021) Detection of financial statement fraud using deep learning for sustainable development of capital markets under information asymmetry. Sustainability 13(17):9879

    Article  Google Scholar 

  21. Zhang X, Han Y, Xu W, Wang Q (2021) HOBA: a novel feature engineering methodology for credit card fraud detection with a deep learning architecture. Inf Sci 557:302–316

    Article  Google Scholar 

  22. Baratzadeh F, Hasheminejad SM (2022) Customer behavior analysis to improve detection of fraudulent transactions using deep learning. J AI Data Min 10(1):87–101

    Google Scholar 

  23. Al-Shabi MA (2019) Credit card fraud detection using autoencoder model in unbalanced datasets. J Adv Math Comput Sci 33(5):1–6

    Article  Google Scholar 

  24. Yang W, Zhang Y, Ye K, Li L, Xu CZ (2019) Ffd: a federated learning based method for credit card fraud detection. InBig Data–BigData 2019: 8th International Congress, Held as Part of the Services Conference Federation, SCF 2019, San Diego, CA, USA, June 25–30, 2019, Proceedings Springer International Publishing 8: 18-32

  25. Rushin G, Stancil C, Sun M, and Adams S, Beling P (2017) Horse race analysis in credit card fraud— deep learning, logistic regression, and Gradient Boosted Tree. In IEEE systems and information engineering design symposium (SIEDS), Charlottesville, VA, USA, pp 117–121

  26. Abualigah L, Diabat A, Mirjalili S, AbdElaziz M, Gandomi AH (2021) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:113609

    Article  MathSciNet  Google Scholar 

  27. Panigrahi S, Kundu A, Sural S, Majumdar AK (2009) Credit card fraud detection: a fusion approach using Dempster-Shafer theory and Bayesian learning. Inform Fusion 10(4):354–363

    Article  Google Scholar 

  28. Subudhi S, Panigrahi S (2020) Use of optimized Fuzzy C-Means clustering and supervised classifiers for automobile insurance fraud detection. J King Saud Univ-Comput Inform Sci 32(5):568–575

    Google Scholar 

  29. Karthika J, Senthilselvi A (2023) Smart credit card fraud detection system based on dilated convolutional neural network with sampling technique. Multimed Tools Appl 82:31691–31708

  30. Benchaji I, Douzi S, El Ouahidi B, Jaafari J (2021) Enhanced credit card fraud detection based on attention mechanism and LSTM deep model. J Big Data 8:1–21

    Article  Google Scholar 

  31. Darwish SM (2020) An intelligent credit card fraud detection approach based on semantic fusion of two classifiers. Soft Comput 24(2):1243–1253

    Article  MathSciNet  Google Scholar 

  32. Ileberi E, Sun Y, Wang Z (2022) A machine learning based credit card fraud detection using the GA algorithm for feature selection. J Big Data 9(1):1–17

    Article  Google Scholar 

  33. Zhu L, Qiu D, Ergu D, Ying C, Liu K (2019) A study on predicting loan default based on the random forest algorithm. Procedia Comput Sci 162:503–513

    Article  Google Scholar 

  34. Zioviris G, Kolomvatsos K, Stamoulis G (2022) Credit card fraud detection using a deep learning multistage model. J Supercomput 78(12):14571–14596

    Article  Google Scholar 

Download references

Funding

No funding is provided for the preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

All authors read and approved the final manuscript.

Corresponding author

Correspondence to Venkata Lakshmi Narayana Gorle.

Ethics declarations

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Consent to participate

All the authors involved have agreed to participate in this submitted article.

Consent to publish

All the authors involved in this manuscript give full consent for publication of this submitted article.

Conflict of interest

Authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gorle, V.L.N., Panigrahi, S. A semi-supervised Anti-Fraud model based on integrated XGBoost and BiGRU with self-attention network: an application to internet loan fraud detection. Multimed Tools Appl 83, 56939–56964 (2024). https://doi.org/10.1007/s11042-023-17681-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17681-z

Keywords

Navigation