Abstract
Software fault prediction (SFP) refers to the early prediction of fault-prone modules in software development which are susceptible to faults and can incur high development cost. Machine learning (ML)-based classifiers are extensively being used for SFP. Machine learning models utilize handcrafted metrics (or features), i.e., static code metrics for classification of software modules into one of the two categories, i.e., {buggy, clean}. It involves overhead of selecting the most significant features due to the presence of some correlated or non-significant features. With the shifting paradigm of machine learning to deep learning, it is desirable to improve the performance of SFP classifiers to keep pace up with the changing industrial needs. This study proposes a novel model (SCM-DLA-SFP) based on deep learning architecture (DLA) to predict the defects utilizing the static code metrics (SCMs). The defect dataset with SCMs is fed to the input layer of specially designed deep learning model, where the input is automatically conditioned using normalization. Then, the conditioned data pass through dense layers of deep neural network architecture to predict the faulty modules. The study utilizes five datasets from PROMISE repository namely camel, jedit, lucene, synapse and xalan. The proposed model SCM-DLA-SFP exhibits the performance of the average values of 88.01%, 79.83%, and 73.3% for AUC measure, accuracy criteria and F-measure, respectively. The comparison shows that proposed model is better on average than the state-of-the-art DL-based SFP methods by 16.28%, 19.61%, and 18.45% over AUC, accuracy and F-measure, respectively.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig14_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig15_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig16_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig17_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig18_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig19_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig20_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig21_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig22a_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig22b_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig23_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig24a_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig24b_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig24c_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig25_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig26_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig27_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig28_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07365-5/MediaObjects/500_2022_7365_Fig29_HTML.png)
Similar content being viewed by others
Data availability
Enquiries about data availability should be directed to the authors.
References
Afzal W, Torkar R (2016) Towards benchmarking feature subset selection methods for software fault prediction. Computational intelligence and quantitative software engineering, Springer, Cham, pp 33–58. https://doi.org/10.1007/978-3-319-25964-2-3
Aggarwal (2021) Software defect prediction dataset. figshare. Dataset.https://doi.org/10.6084/m9.figshare.13536506.v1
Boucher A, Badri M (2018) Software metrics thresholds calculation techniques to predict fault-proneness: an empirical comparison. Inf Softw Technol 96:38–67
Chen J, Yang Y, Hu K, Xuan Q, Liu Y, Yang C (2019) Multiview transfer learning for software defect prediction. IEEE Access 7:8901–8916
Dam HK, Tran T, Pham T, Ng SW, Grundy J, Ghose A (2018) Automatic feature learning for predicting vulnerable software components. IEEE Trans Software Eng 47(1):67–85
Erturk E, Sezer EA (2016) Iterative software fault prediction with a hybrid approach. Appl Soft Comput 49:1020–1033
Fan G, Diao X, Yu H, Yang K, Chen L (2019) Software defect prediction via attention-based recurrent neural network. Sci Program. https://doi.org/10.1155/2019/6230953
Ferreira F, Silva LL, Valente MT (2021) Software engineering meets deep learning: a map** study. In: Proceedings of the 36th annual ACM symposium on applied computing, pp 1542–1549
Ghotra B, McIntosh S, Hassan AE (2017) A large-scale study of the impact of feature selection techniques on defect classification models. In: 2017 IEEE/ACM 14th international conference on mining software repositories (MSR), pp 146–157, IEEE
Goyal S (2020) Heterogeneous stacked ensemble classifier for software defect prediction. In: 2020 sixth international conference on parallel, distributed and grid computing (PDGC), pp 126–130, IEEE. https://doi.org/10.1109/PDGC50313.2020.9315754.
Goyal S (2021) Predicting the defects using stacked ensemble learner with filtered dataset. Autom Softw Eng 28(2):1–81. https://doi.org/10.1007/s10515-021-00285-y
Goyal S (2022f) Effective software defect prediction using support vector machines (SVMs). Int J Syst Assur Eng Manag 13(2):681–696. https://doi.org/10.1007/s13198-021-01326-1
Goyal S (2022g) Handling class-imbalance with KNN (neighbourhood) under-sampling for software defect prediction. Artif Intell Rev 55(3):2023–2064. https://doi.org/10.1007/s10462-021-10044-w
Goyal S (2022a) Software measurements using machine learning techniques—a review. Recent Adv Comput Sci Commun e070422203243. https://doi.org/10.2174/2666255815666220407101922
Goyal S (2022b) Comparative analysis of machine learning techniques for software effort estimation. Intelligent computing techniques for smart energy systems, Springer, Singapore, pp 63–73
Goyal S (2022c) Effective software effort estimation using heterogenous stacked ensemble. in 2022c ieee international conference on signal processing, informatics, communication and energy systems (SPICES), vol 1, pp 584–588, IEEE
Goyal S (2022d) 3PcGE: 3-parent child-based genetic evolution for software defect prediction. Innov Syst Softw Eng, pp1–20. https://doi.org/10.1007/s11334-021-00427-1
Goyal S (2022e) Genetic evolution-based feature selection for software defect prediction using SVMs. J Circ Syst Comput 2250161. https://doi.org/10.1142/S0218126622501614
Goyal S (2022h) Metaheuristics for empirical software measurements. Computational intelligence in software modeling, vol 13, De Gruyter, Boston, p 67. https://doi.org/10.1515/9783110709247-005
Goyal S (2022i) FOFS: firefly optimization for feature selection to predict fault-prone software modules. Data engineering for smart systems, Springer, Singapore, pp 479–487. https://doi.org/10.1007/978-981-16-2641-8_46
Goyal S, Bhatia PK (2020) Empirical software measurements with machine learning. Computational intelligence techniques and their applications to software engineering problems, CRC Press, pp. 49–64. https://doi.org/10.1201/9781003079996
Goyal S, Bhatia PK (2021) Software quality prediction using machine learning techniques. Innovations in computational intelligence and computer vision, Springer, Singapore, vol 1189, pp 551–560. https://doi.org/10.1007/978-981-15-6067-5_62
Halstead MH (1977) Elements of software science (operating and programming systems series), vol 2. Elsevier, Amsterdam, Netherlands
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36
Huda S, Alyahya S, Ali MM, Ahmad S, Abawajy J, Al-Dossari H, Yearwood J (2017) A framework for software defect prediction and metric selection. IEEE Access 6:2844–2858. https://doi.org/10.1109/ACCESS.2017.2785445
Jayanthi R, Florence L (2019) Software defect prediction techniques using metrics based on neural network classifier. Clust Comput 22(1):77–88
Jiarpakdee J, Tantithamthavorn C, Hassan AE (2019) The impact of correlated metrics on the interpretation of defect prediction models. IEEE Trans Softw Eng Early Access. https://doi.org/10.1109/TSE.2019.2891758
Jureczko M, Spinellis D (2010) Using object-oriented design metrics to predict software defects. Models and Methods Syst Dependabil. Oficyna Wydawnicza Politechniki Wrocławskiej, pp 69–81
Khoshgoftaar TM, Allen EB (1998) Classification of fault-prone software modules: prior probabilities, costs, and model evaluation. Empir Softw Eng 3(3):275–298
Kumar L, Sripada SK, Sureka A, Rath SK (2018) Effective fault prediction model developed using least square support vector machine (LSSVM). J Syst Softw 137:686–712
Laradji IH, Alshayeb M, Ghouti L (2015) Software defect prediction using ensemble learning on selected features. Inf Softw Technol 58:388–402
Lehmann EL, Romano JP, Casella (2005) Testing statistical hypotheses, vol 3, Springer, New York
Li J, Li X, He D (2019) A directed acyclic graph network combined with CNN and LSTM for remaining useful life prediction. IEEE Access 7:75464–75475
Li J, He P, Zhu J, Lyu MR (2017) Software defect prediction via convolutional neural network. In: 2017 IEEE international conference on software quality, reliability and security (QRS), pp 318–328, IEEE. https://doi.org/10.1109/QRS.2017.42
Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256
McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 4:308–320
Okutan A, Yıldız OT (2014) Software defect prediction using Bayesian networks. Empir Softw Eng 19(1):154–181
Özakıncı R, Tarhan A (2018) Early software defect prediction: a systematic map and review. J Syst Softw 144:216–239. https://doi.org/10.1016/j.jss.2018.06.025
PROMISE (2006) https://github.com/feiwww/PROMISE-backup/tree/master/bug-data
Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327. https://doi.org/10.1007/s10462-017-9563-5
Ross SM (2005) Probability and statistics for engineers and scientists, 3rd edn, Elsevier
Sayyad S, Menzies T (2005) The PROMISE repository of software engineering databases, University of Ottawa, Canada. http://promise.site.uottawa.ca/SERepository
Selby RW, Porter AA (1988) Learning from examples: generation and evaluation of decision trees for software resource analysis. IEEE Trans Software Eng 14(12):1743–1757
Sheng L, Lu L, Lin J (2020) An adversarial discriminative convolutional neural network for cross-project defect prediction. IEEE Access 8:55241–55253
Shippey T, Bowes D, Hall T (2019) Automatically identifying code features for software defect prediction: using AST N-grams. Inf Softw Technol 106:142–160
Tong H, Liu B, Wang S (2018) Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf Softw Technol 96:94–111
Wang H, Zhuang W, Zhang X (2021) Software defect prediction based on gated hierarchical LSTMs. IEEE Trans Reliab 70(2):711–727. https://doi.org/10.1109/TR.2020.3047396
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
Xu Z, Liu J, Yang Z, An G, Jia X (2016) The impact of feature selection on defect prediction performance: an empirical comparison. In: 2016 IEEE 27th international symposium on software reliability engineering (ISSRE), pp 309–320, IEEE
Funding
No funding is availed for this research work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author has no conflicts of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Goyal, S. Static code metrics-based deep learning architecture for software fault prediction. Soft Comput 26, 13765–13797 (2022). https://doi.org/10.1007/s00500-022-07365-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-022-07365-5