Log in

Static code metrics-based deep learning architecture for software fault prediction

  • Application of soft computing
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Software fault prediction (SFP) refers to the early prediction of fault-prone modules in software development which are susceptible to faults and can incur high development cost. Machine learning (ML)-based classifiers are extensively being used for SFP. Machine learning models utilize handcrafted metrics (or features), i.e., static code metrics for classification of software modules into one of the two categories, i.e., {buggy, clean}. It involves overhead of selecting the most significant features due to the presence of some correlated or non-significant features. With the shifting paradigm of machine learning to deep learning, it is desirable to improve the performance of SFP classifiers to keep pace up with the changing industrial needs. This study proposes a novel model (SCM-DLA-SFP) based on deep learning architecture (DLA) to predict the defects utilizing the static code metrics (SCMs). The defect dataset with SCMs is fed to the input layer of specially designed deep learning model, where the input is automatically conditioned using normalization. Then, the conditioned data pass through dense layers of deep neural network architecture to predict the faulty modules. The study utilizes five datasets from PROMISE repository namely camel, jedit, lucene, synapse and xalan. The proposed model SCM-DLA-SFP exhibits the performance of the average values of 88.01%, 79.83%, and 73.3% for AUC measure, accuracy criteria and F-measure, respectively. The comparison shows that proposed model is better on average than the state-of-the-art DL-based SFP methods by 16.28%, 19.61%, and 18.45% over AUC, accuracy and F-measure, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29

Similar content being viewed by others

Data availability

Enquiries about data availability should be directed to the authors.

References

  • Afzal W, Torkar R (2016) Towards benchmarking feature subset selection methods for software fault prediction. Computational intelligence and quantitative software engineering, Springer, Cham, pp 33–58. https://doi.org/10.1007/978-3-319-25964-2-3

  • Aggarwal (2021) Software defect prediction dataset. figshare. Dataset.https://doi.org/10.6084/m9.figshare.13536506.v1

  • Boucher A, Badri M (2018) Software metrics thresholds calculation techniques to predict fault-proneness: an empirical comparison. Inf Softw Technol 96:38–67

    Article  Google Scholar 

  • Chen J, Yang Y, Hu K, Xuan Q, Liu Y, Yang C (2019) Multiview transfer learning for software defect prediction. IEEE Access 7:8901–8916

    Article  Google Scholar 

  • Dam HK, Tran T, Pham T, Ng SW, Grundy J, Ghose A (2018) Automatic feature learning for predicting vulnerable software components. IEEE Trans Software Eng 47(1):67–85

    Article  Google Scholar 

  • Erturk E, Sezer EA (2016) Iterative software fault prediction with a hybrid approach. Appl Soft Comput 49:1020–1033

    Article  Google Scholar 

  • Fan G, Diao X, Yu H, Yang K, Chen L (2019) Software defect prediction via attention-based recurrent neural network. Sci Program. https://doi.org/10.1155/2019/6230953

    Article  Google Scholar 

  • Ferreira F, Silva LL, Valente MT (2021) Software engineering meets deep learning: a map** study. In: Proceedings of the 36th annual ACM symposium on applied computing, pp 1542–1549

  • Ghotra B, McIntosh S, Hassan AE (2017) A large-scale study of the impact of feature selection techniques on defect classification models. In: 2017 IEEE/ACM 14th international conference on mining software repositories (MSR), pp 146–157, IEEE

  • Goyal S (2020) Heterogeneous stacked ensemble classifier for software defect prediction. In: 2020 sixth international conference on parallel, distributed and grid computing (PDGC), pp 126–130, IEEE. https://doi.org/10.1109/PDGC50313.2020.9315754.

  • Goyal S (2021) Predicting the defects using stacked ensemble learner with filtered dataset. Autom Softw Eng 28(2):1–81. https://doi.org/10.1007/s10515-021-00285-y

    Article  Google Scholar 

  • Goyal S (2022f) Effective software defect prediction using support vector machines (SVMs). Int J Syst Assur Eng Manag 13(2):681–696. https://doi.org/10.1007/s13198-021-01326-1

    Article  Google Scholar 

  • Goyal S (2022g) Handling class-imbalance with KNN (neighbourhood) under-sampling for software defect prediction. Artif Intell Rev 55(3):2023–2064. https://doi.org/10.1007/s10462-021-10044-w

    Article  Google Scholar 

  • Goyal S (2022a) Software measurements using machine learning techniques—a review. Recent Adv Comput Sci Commun e070422203243. https://doi.org/10.2174/2666255815666220407101922

  • Goyal S (2022b) Comparative analysis of machine learning techniques for software effort estimation. Intelligent computing techniques for smart energy systems, Springer, Singapore, pp 63–73

  • Goyal S (2022c) Effective software effort estimation using heterogenous stacked ensemble. in 2022c ieee international conference on signal processing, informatics, communication and energy systems (SPICES), vol 1, pp 584–588, IEEE

  • Goyal S (2022d) 3PcGE: 3-parent child-based genetic evolution for software defect prediction. Innov Syst Softw Eng, pp1–20. https://doi.org/10.1007/s11334-021-00427-1

  • Goyal S (2022e) Genetic evolution-based feature selection for software defect prediction using SVMs. J Circ Syst Comput 2250161. https://doi.org/10.1142/S0218126622501614

  • Goyal S (2022h) Metaheuristics for empirical software measurements. Computational intelligence in software modeling, vol 13, De Gruyter, Boston, p 67. https://doi.org/10.1515/9783110709247-005

  • Goyal S (2022i) FOFS: firefly optimization for feature selection to predict fault-prone software modules. Data engineering for smart systems, Springer, Singapore, pp 479–487. https://doi.org/10.1007/978-981-16-2641-8_46

  • Goyal S, Bhatia PK (2020) Empirical software measurements with machine learning. Computational intelligence techniques and their applications to software engineering problems, CRC Press, pp. 49–64. https://doi.org/10.1201/9781003079996

  • Goyal S, Bhatia PK (2021) Software quality prediction using machine learning techniques. Innovations in computational intelligence and computer vision, Springer, Singapore, vol 1189, pp 551–560. https://doi.org/10.1007/978-981-15-6067-5_62

  • Halstead MH (1977) Elements of software science (operating and programming systems series), vol 2. Elsevier, Amsterdam, Netherlands

    MATH  Google Scholar 

  • Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36

    Article  Google Scholar 

  • Huda S, Alyahya S, Ali MM, Ahmad S, Abawajy J, Al-Dossari H, Yearwood J (2017) A framework for software defect prediction and metric selection. IEEE Access 6:2844–2858. https://doi.org/10.1109/ACCESS.2017.2785445

    Article  Google Scholar 

  • Jayanthi R, Florence L (2019) Software defect prediction techniques using metrics based on neural network classifier. Clust Comput 22(1):77–88

    Article  Google Scholar 

  • Jiarpakdee J, Tantithamthavorn C, Hassan AE (2019) The impact of correlated metrics on the interpretation of defect prediction models. IEEE Trans Softw Eng Early Access. https://doi.org/10.1109/TSE.2019.2891758

    Article  Google Scholar 

  • Jureczko M, Spinellis D (2010) Using object-oriented design metrics to predict software defects. Models and Methods Syst Dependabil. Oficyna Wydawnicza Politechniki Wrocławskiej, pp 69–81

  • Khoshgoftaar TM, Allen EB (1998) Classification of fault-prone software modules: prior probabilities, costs, and model evaluation. Empir Softw Eng 3(3):275–298

    Article  Google Scholar 

  • Kumar L, Sripada SK, Sureka A, Rath SK (2018) Effective fault prediction model developed using least square support vector machine (LSSVM). J Syst Softw 137:686–712

    Article  Google Scholar 

  • Laradji IH, Alshayeb M, Ghouti L (2015) Software defect prediction using ensemble learning on selected features. Inf Softw Technol 58:388–402

    Article  Google Scholar 

  • Lehmann EL, Romano JP, Casella (2005) Testing statistical hypotheses, vol 3, Springer, New York

  • Li J, Li X, He D (2019) A directed acyclic graph network combined with CNN and LSTM for remaining useful life prediction. IEEE Access 7:75464–75475

    Article  Google Scholar 

  • Li J, He P, Zhu J, Lyu MR (2017) Software defect prediction via convolutional neural network. In: 2017 IEEE international conference on software quality, reliability and security (QRS), pp 318–328, IEEE. https://doi.org/10.1109/QRS.2017.42

  • Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256

    Article  Google Scholar 

  • McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 4:308–320

    Article  MathSciNet  MATH  Google Scholar 

  • Okutan A, Yıldız OT (2014) Software defect prediction using Bayesian networks. Empir Softw Eng 19(1):154–181

    Article  Google Scholar 

  • Özakıncı R, Tarhan A (2018) Early software defect prediction: a systematic map and review. J Syst Softw 144:216–239. https://doi.org/10.1016/j.jss.2018.06.025

    Article  Google Scholar 

  • PROMISE (2006) https://github.com/feiwww/PROMISE-backup/tree/master/bug-data

  • Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327. https://doi.org/10.1007/s10462-017-9563-5

    Article  Google Scholar 

  • Ross SM (2005) Probability and statistics for engineers and scientists, 3rd edn, Elsevier

  • Sayyad S, Menzies T (2005) The PROMISE repository of software engineering databases, University of Ottawa, Canada. http://promise.site.uottawa.ca/SERepository

  • Selby RW, Porter AA (1988) Learning from examples: generation and evaluation of decision trees for software resource analysis. IEEE Trans Software Eng 14(12):1743–1757

    Article  Google Scholar 

  • Sheng L, Lu L, Lin J (2020) An adversarial discriminative convolutional neural network for cross-project defect prediction. IEEE Access 8:55241–55253

    Article  Google Scholar 

  • Shippey T, Bowes D, Hall T (2019) Automatically identifying code features for software defect prediction: using AST N-grams. Inf Softw Technol 106:142–160

    Article  Google Scholar 

  • Tong H, Liu B, Wang S (2018) Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf Softw Technol 96:94–111

    Article  Google Scholar 

  • Wang H, Zhuang W, Zhang X (2021) Software defect prediction based on gated hierarchical LSTMs. IEEE Trans Reliab 70(2):711–727. https://doi.org/10.1109/TR.2020.3047396

    Article  Google Scholar 

  • Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco

    MATH  Google Scholar 

  • Xu Z, Liu J, Yang Z, An G, Jia X (2016) The impact of feature selection on defect prediction performance: an empirical comparison. In: 2016 IEEE 27th international symposium on software reliability engineering (ISSRE), pp 309–320, IEEE

Download references

Funding

No funding is availed for this research work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Somya Goyal.

Ethics declarations

Conflict of interest

The author has no conflicts of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goyal, S. Static code metrics-based deep learning architecture for software fault prediction. Soft Comput 26, 13765–13797 (2022). https://doi.org/10.1007/s00500-022-07365-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-022-07365-5

Keywords

Navigation