Abstract
Recently, fraud debt has been one of the major issues for Internet financial institutions. Due to fraudulent activities, huge losses are occurring in financial institutions. Hence, there is a need for a method of analyzing and detecting fraudulent transactions and separating them from genuine ones. Supervised learning approaches are mainly used for fraud detection since they consider the fraudulent set, which can be known from past transaction analysis. Though these models are interpretable, the prediction accuracy of these models remains challenging. However, these approaches fail to perform well when there are changes in customer behaviour. Moreover, it is complex to identify abnormal transactions due to data imbalance. Hence, this work presents a semi-supervised outlier score-based Anti-Fraud model to identify the loan applicant as a genuine or fraudulent debtor. The proposed work has the stages like a pre-processing module, Data augmentation and classification model. After per-processing the data, different outlier models such as Z-score and Isolation forest (IF) are applied to generate more data. Then, the Unsupervised K-Means Clustering (KMC) granularity-based Outlier scoring method is proposed to augment the datasets with too many scores. This clustering module clusters the loan applicants based on their credit history. Then, the Z-score and IF are applied to each cluster to augment the original dataset with different scores. This normalized data is input to the XGBoost-bidirectional Gated Recurrent unit (BiGRU) self-attention network (SAN). This XGB-BiGRU-SAN is used to capture more efficient dynamic information. Further, a mathematical model, an Arithmetic Optimization algorithm (AOA), is used to optimize the network weights. The performance of a proposed XGB-BiGRU-SAN Internet loan fraud detection is analyzed on the two benchmark datasets, like the leading club and bank loan status. The proposed XGB-BiGRU-SAN achieved better classification accuracy, precision and recall of 99.05%, 99.11% and 99.34% on the leading club dataset. Further, the accuracy, precision and recall values achieved in the bank loan status dataset are 98.67%, 98.82% and 98.62%, respectively.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Figa_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig14_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig15_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig16_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig17_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig18_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11042-023-17681-z/MediaObjects/11042_2023_17681_Fig19_HTML.png)
Similar content being viewed by others
Data availability
Data sharing is not applicable to this article.
Abbreviations
- IF:
-
Isolation forest
- KMC:
-
K-Means Clustering
- BiGRU:
-
Bidirectional Gated Recurrent Unit
- SAN:
-
Self-attention network
- ML:
-
Machine learning
- SVM:
-
Support vector machine
- KNN:
-
K-nearest neighbour
- LR:
-
Logistic regression
- NB:
-
Naïve Bayes
- AO:
-
Arithmetic Optimizer
- CNN:
-
Convolutional Neural Network
- LSTM:
-
Long Short Term Memory Network
- FFD:
-
Financial fraud detection
- PCA:
-
Principal Component Analysis
- t-SNE:
-
T-Distributed Stochastic Neighbor Embeddings
- UMAP:
-
Uniform manifold approximation and projection
- ABC:
-
Artificial bee colony
- GA:
-
Genetic algorithm
- ROC:
-
Receiver operating curve
- AUC:
-
Area under the curve
References
Al-Hashedi KG, Magalingam P (2021) Financial fraud detection applying data mining techniques: a comprehensive review from 2009 to 2019. Comput Sci Rev 40:100402
Wang D, Lin J, Cui P, Jia Q, Wang Z, Fang Y, Yu Q, Zhou J, Yang S, Qi Y (2019) A semi-supervised graph attentive network for financial fraud detection. In IEEE International Conference on Data Mining (ICDM), Bei**g, China, pp 598–607
Huang D, Mu D, Yang L, Cai X (2018) CoDetect: financial fraud detection with anomaly feature detection. IEEE Access 6:19161–19174
Chaudhary K, Yadav J, Mallick B (2012) A review of fraud detection techniques: credit card. Int J Comput Appl 45(1):39–44
Majhi SK, Bhatachharya S, Pradhan R, Biswal S (2019) Fuzzy clustering using salp swarm algorithm for automobile insurance fraud detection. J Intell Fuzzy Syst 36(3):2333–2344
Rtayli N, Enneya N (2020) Selection features and support vector machine for credit card risk identification. Procedia Manuf 46:941–948
Itoo F, Meenakshi, Singh S (2021) Comparison and analysis of logistic regression, Naïve Bayes and KNN machine learning algorithms for credit card fraud detection. Int J Inf Technol 13:1503–1511
Ajah I, Inyiama C (1970) Loan fraud detection and IT-based combat strategies. J Internet Bank Commer 16(2):1–3
Arora N, Kaur PD (2020) A Bolasso based consistent feature selection enabled random forest classification algorithm: an application to credit risk assessment. Appl Soft Comput 86:105936
Rahmawati D, Sarno R, Fatichah C, Sunaryono D (2017) Fraud detection on event log of bank financial credit business process using Hidden Markov Model algorithm. In IEEE 3rd International Conference on Science in Information Technology (ICSITech), Bandung, Indonesia, pp 35–40
Abiola I, Oyewole AT (2013) Internal control system on fraud detection: Nigeria experience. J Account Financ 13(5):141–152
Ali A, AbdRazak S, Othman SH, Eisa TA, Al-Dhaqm A, Nasser M, Elhassan T, Elshafie H, Saif A (2022) Financial fraud detection based on machine learning: a systematic literature review. Appl Sci 12(19):9637
Popat RR, Chaudhary J (2018) A survey on credit card fraud detection using machine learning. In IEEE 2nd international conference on trends in electronics and informatics (ICOEI), Tirunelveli, India, pp 1120–1125
Nguyen TT, Tahir H, Abdelrazek M, Babar A (2020) Deep learning methods for credit card fraud detection. ar**v preprint ar**v:2012.03754
Singla J (2020) A survey of deep learning based online transactions fraud detection systems. In IEEE International Conference on Intelligent Engineering and Management (ICIEM), London, UK, pp 130–136
Roy A, Sun J, Mahoney R, Alonzi L, Adams S, Beling P (2018) Deep learning detecting fraud in credit card transactions. In IEEE Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, VA, USA, pp 129–134
Abakarim Y, Lahby M, Attioui A (2018) An efficient real time model for credit card fraud detection based on deep learning. In: Proceedings of the 12th international conference on intelligent systems: theories and applications, Association for Computing Machinery, New York, NY, United States, pp 1–7
Mubalaike AM, Adali E (2018) Deep learning approach for intelligent financial fraud detection system. In IEEE 3rd International Conference on Computer Science and Engineering (UBMK), Sarajevo, Bosnia and Herzegovina, pp 598–603
Fang W, Li X, Zhou P, Yan J, Jiang D, Zhou T (2021) Deep learning anti-fraud model for internet loan: where we are going. IEEE Access 9:9777–9784
Jan CL (2021) Detection of financial statement fraud using deep learning for sustainable development of capital markets under information asymmetry. Sustainability 13(17):9879
Zhang X, Han Y, Xu W, Wang Q (2021) HOBA: a novel feature engineering methodology for credit card fraud detection with a deep learning architecture. Inf Sci 557:302–316
Baratzadeh F, Hasheminejad SM (2022) Customer behavior analysis to improve detection of fraudulent transactions using deep learning. J AI Data Min 10(1):87–101
Al-Shabi MA (2019) Credit card fraud detection using autoencoder model in unbalanced datasets. J Adv Math Comput Sci 33(5):1–6
Yang W, Zhang Y, Ye K, Li L, Xu CZ (2019) Ffd: a federated learning based method for credit card fraud detection. InBig Data–BigData 2019: 8th International Congress, Held as Part of the Services Conference Federation, SCF 2019, San Diego, CA, USA, June 25–30, 2019, Proceedings Springer International Publishing 8: 18-32
Rushin G, Stancil C, Sun M, and Adams S, Beling P (2017) Horse race analysis in credit card fraud— deep learning, logistic regression, and Gradient Boosted Tree. In IEEE systems and information engineering design symposium (SIEDS), Charlottesville, VA, USA, pp 117–121
Abualigah L, Diabat A, Mirjalili S, AbdElaziz M, Gandomi AH (2021) The arithmetic optimization algorithm. Comput Methods Appl Mech Eng 376:113609
Panigrahi S, Kundu A, Sural S, Majumdar AK (2009) Credit card fraud detection: a fusion approach using Dempster-Shafer theory and Bayesian learning. Inform Fusion 10(4):354–363
Subudhi S, Panigrahi S (2020) Use of optimized Fuzzy C-Means clustering and supervised classifiers for automobile insurance fraud detection. J King Saud Univ-Comput Inform Sci 32(5):568–575
Karthika J, Senthilselvi A (2023) Smart credit card fraud detection system based on dilated convolutional neural network with sampling technique. Multimed Tools Appl 82:31691–31708
Benchaji I, Douzi S, El Ouahidi B, Jaafari J (2021) Enhanced credit card fraud detection based on attention mechanism and LSTM deep model. J Big Data 8:1–21
Darwish SM (2020) An intelligent credit card fraud detection approach based on semantic fusion of two classifiers. Soft Comput 24(2):1243–1253
Ileberi E, Sun Y, Wang Z (2022) A machine learning based credit card fraud detection using the GA algorithm for feature selection. J Big Data 9(1):1–17
Zhu L, Qiu D, Ergu D, Ying C, Liu K (2019) A study on predicting loan default based on the random forest algorithm. Procedia Comput Sci 162:503–513
Zioviris G, Kolomvatsos K, Stamoulis G (2022) Credit card fraud detection using a deep learning multistage model. J Supercomput 78(12):14571–14596
Funding
No funding is provided for the preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent to participate
All the authors involved have agreed to participate in this submitted article.
Consent to publish
All the authors involved in this manuscript give full consent for publication of this submitted article.
Conflict of interest
Authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gorle, V.L.N., Panigrahi, S. A semi-supervised Anti-Fraud model based on integrated XGBoost and BiGRU with self-attention network: an application to internet loan fraud detection. Multimed Tools Appl 83, 56939–56964 (2024). https://doi.org/10.1007/s11042-023-17681-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17681-z