Static code metrics-based deep learning architecture for software fault prediction

Goyal, Somya

doi:10.1007/s00500-022-07365-5

Static code metrics-based deep learning architecture for software fault prediction

Application of soft computing
Published: 11 August 2022

Volume 26, pages 13765–13797, (2022)
Cite this article

Soft Computing Aims and scope Submit manuscript

Somya Goyal¹

440 Accesses
8 Citations
Explore all metrics

Abstract

Software fault prediction (SFP) refers to the early prediction of fault-prone modules in software development which are susceptible to faults and can incur high development cost. Machine learning (ML)-based classifiers are extensively being used for SFP. Machine learning models utilize handcrafted metrics (or features), i.e., static code metrics for classification of software modules into one of the two categories, i.e., {buggy, clean}. It involves overhead of selecting the most significant features due to the presence of some correlated or non-significant features. With the shifting paradigm of machine learning to deep learning, it is desirable to improve the performance of SFP classifiers to keep pace up with the changing industrial needs. This study proposes a novel model (SCM-DLA-SFP) based on deep learning architecture (DLA) to predict the defects utilizing the static code metrics (SCMs). The defect dataset with SCMs is fed to the input layer of specially designed deep learning model, where the input is automatically conditioned using normalization. Then, the conditioned data pass through dense layers of deep neural network architecture to predict the faulty modules. The study utilizes five datasets from PROMISE repository namely camel, jedit, lucene, synapse and xalan. The proposed model SCM-DLA-SFP exhibits the performance of the average values of 88.01%, 79.83%, and 73.3% for AUC measure, accuracy criteria and F-measure, respectively. The comparison shows that proposed model is better on average than the state-of-the-art DL-based SFP methods by 16.28%, 19.61%, and 18.45% over AUC, accuracy and F-measure, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Semantic and traditional feature fusion for software defect prediction using hybrid deep learning model

Article Open access 01 July 2024

Is deep learning good enough for software defect prediction?

Article 08 October 2023

Evaluation of LMT and DNN Algorithms in Software Defect Prediction for Open-Source Software

Data availability

Enquiries about data availability should be directed to the authors.

References

Afzal W, Torkar R (2016) Towards benchmarking feature subset selection methods for software fault prediction. Computational intelligence and quantitative software engineering, Springer, Cham, pp 33–58. https://doi.org/10.1007/978-3-319-25964-2-3
Aggarwal (2021) Software defect prediction dataset. figshare. Dataset.https://doi.org/10.6084/m9.figshare.13536506.v1
Boucher A, Badri M (2018) Software metrics thresholds calculation techniques to predict fault-proneness: an empirical comparison. Inf Softw Technol 96:38–67
Article Google Scholar
Chen J, Yang Y, Hu K, Xuan Q, Liu Y, Yang C (2019) Multiview transfer learning for software defect prediction. IEEE Access 7:8901–8916
Article Google Scholar
Dam HK, Tran T, Pham T, Ng SW, Grundy J, Ghose A (2018) Automatic feature learning for predicting vulnerable software components. IEEE Trans Software Eng 47(1):67–85
Article Google Scholar
Erturk E, Sezer EA (2016) Iterative software fault prediction with a hybrid approach. Appl Soft Comput 49:1020–1033
Article Google Scholar
Fan G, Diao X, Yu H, Yang K, Chen L (2019) Software defect prediction via attention-based recurrent neural network. Sci Program. https://doi.org/10.1155/2019/6230953
Article Google Scholar
Ferreira F, Silva LL, Valente MT (2021) Software engineering meets deep learning: a map** study. In: Proceedings of the 36th annual ACM symposium on applied computing, pp 1542–1549
Ghotra B, McIntosh S, Hassan AE (2017) A large-scale study of the impact of feature selection techniques on defect classification models. In: 2017 IEEE/ACM 14th international conference on mining software repositories (MSR), pp 146–157, IEEE
Goyal S (2020) Heterogeneous stacked ensemble classifier for software defect prediction. In: 2020 sixth international conference on parallel, distributed and grid computing (PDGC), pp 126–130, IEEE. https://doi.org/10.1109/PDGC50313.2020.9315754.
Goyal S (2021) Predicting the defects using stacked ensemble learner with filtered dataset. Autom Softw Eng 28(2):1–81. https://doi.org/10.1007/s10515-021-00285-y
Article Google Scholar
Goyal S (2022f) Effective software defect prediction using support vector machines (SVMs). Int J Syst Assur Eng Manag 13(2):681–696. https://doi.org/10.1007/s13198-021-01326-1
Article Google Scholar
Goyal S (2022g) Handling class-imbalance with KNN (neighbourhood) under-sampling for software defect prediction. Artif Intell Rev 55(3):2023–2064. https://doi.org/10.1007/s10462-021-10044-w
Article Google Scholar
Goyal S (2022a) Software measurements using machine learning techniques—a review. Recent Adv Comput Sci Commun e070422203243. https://doi.org/10.2174/2666255815666220407101922
Goyal S (2022b) Comparative analysis of machine learning techniques for software effort estimation. Intelligent computing techniques for smart energy systems, Springer, Singapore, pp 63–73
Goyal S (2022c) Effective software effort estimation using heterogenous stacked ensemble. in 2022c ieee international conference on signal processing, informatics, communication and energy systems (SPICES), vol 1, pp 584–588, IEEE
Goyal S (2022d) 3PcGE: 3-parent child-based genetic evolution for software defect prediction. Innov Syst Softw Eng, pp1–20. https://doi.org/10.1007/s11334-021-00427-1
Goyal S (2022e) Genetic evolution-based feature selection for software defect prediction using SVMs. J Circ Syst Comput 2250161. https://doi.org/10.1142/S0218126622501614
Goyal S (2022h) Metaheuristics for empirical software measurements. Computational intelligence in software modeling, vol 13, De Gruyter, Boston, p 67. https://doi.org/10.1515/9783110709247-005
Goyal S (2022i) FOFS: firefly optimization for feature selection to predict fault-prone software modules. Data engineering for smart systems, Springer, Singapore, pp 479–487. https://doi.org/10.1007/978-981-16-2641-8_46
Goyal S, Bhatia PK (2020) Empirical software measurements with machine learning. Computational intelligence techniques and their applications to software engineering problems, CRC Press, pp. 49–64. https://doi.org/10.1201/9781003079996
Goyal S, Bhatia PK (2021) Software quality prediction using machine learning techniques. Innovations in computational intelligence and computer vision, Springer, Singapore, vol 1189, pp 551–560. https://doi.org/10.1007/978-981-15-6067-5_62
Halstead MH (1977) Elements of software science (operating and programming systems series), vol 2. Elsevier, Amsterdam, Netherlands
MATH Google Scholar
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36
Article Google Scholar
Huda S, Alyahya S, Ali MM, Ahmad S, Abawajy J, Al-Dossari H, Yearwood J (2017) A framework for software defect prediction and metric selection. IEEE Access 6:2844–2858. https://doi.org/10.1109/ACCESS.2017.2785445
Article Google Scholar
Jayanthi R, Florence L (2019) Software defect prediction techniques using metrics based on neural network classifier. Clust Comput 22(1):77–88
Article Google Scholar
Jiarpakdee J, Tantithamthavorn C, Hassan AE (2019) The impact of correlated metrics on the interpretation of defect prediction models. IEEE Trans Softw Eng Early Access. https://doi.org/10.1109/TSE.2019.2891758
Article Google Scholar
Jureczko M, Spinellis D (2010) Using object-oriented design metrics to predict software defects. Models and Methods Syst Dependabil. Oficyna Wydawnicza Politechniki Wrocławskiej, pp 69–81
Khoshgoftaar TM, Allen EB (1998) Classification of fault-prone software modules: prior probabilities, costs, and model evaluation. Empir Softw Eng 3(3):275–298
Article Google Scholar
Kumar L, Sripada SK, Sureka A, Rath SK (2018) Effective fault prediction model developed using least square support vector machine (LSSVM). J Syst Softw 137:686–712
Article Google Scholar
Laradji IH, Alshayeb M, Ghouti L (2015) Software defect prediction using ensemble learning on selected features. Inf Softw Technol 58:388–402
Article Google Scholar
Lehmann EL, Romano JP, Casella (2005) Testing statistical hypotheses, vol 3, Springer, New York
Li J, Li X, He D (2019) A directed acyclic graph network combined with CNN and LSTM for remaining useful life prediction. IEEE Access 7:75464–75475
Article Google Scholar
Li J, He P, Zhu J, Lyu MR (2017) Software defect prediction via convolutional neural network. In: 2017 IEEE international conference on software quality, reliability and security (QRS), pp 318–328, IEEE. https://doi.org/10.1109/QRS.2017.42
Ma Y, Luo G, Zeng X, Chen A (2012) Transfer learning for cross-company software defect prediction. Inf Softw Technol 54(3):248–256
Article Google Scholar
McCabe TJ (1976) A complexity measure. IEEE Trans Softw Eng 4:308–320
Article MathSciNet MATH Google Scholar
Okutan A, Yıldız OT (2014) Software defect prediction using Bayesian networks. Empir Softw Eng 19(1):154–181
Article Google Scholar
Özakıncı R, Tarhan A (2018) Early software defect prediction: a systematic map and review. J Syst Softw 144:216–239. https://doi.org/10.1016/j.jss.2018.06.025
Article Google Scholar
PROMISE (2006) https://github.com/feiwww/PROMISE-backup/tree/master/bug-data
Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327. https://doi.org/10.1007/s10462-017-9563-5
Article Google Scholar
Ross SM (2005) Probability and statistics for engineers and scientists, 3rd edn, Elsevier
Sayyad S, Menzies T (2005) The PROMISE repository of software engineering databases, University of Ottawa, Canada. http://promise.site.uottawa.ca/SERepository
Selby RW, Porter AA (1988) Learning from examples: generation and evaluation of decision trees for software resource analysis. IEEE Trans Software Eng 14(12):1743–1757
Article Google Scholar
Sheng L, Lu L, Lin J (2020) An adversarial discriminative convolutional neural network for cross-project defect prediction. IEEE Access 8:55241–55253
Article Google Scholar
Shippey T, Bowes D, Hall T (2019) Automatically identifying code features for software defect prediction: using AST N-grams. Inf Softw Technol 106:142–160
Article Google Scholar
Tong H, Liu B, Wang S (2018) Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Inf Softw Technol 96:94–111
Article Google Scholar
Wang H, Zhuang W, Zhang X (2021) Software defect prediction based on gated hierarchical LSTMs. IEEE Trans Reliab 70(2):711–727. https://doi.org/10.1109/TR.2020.3047396
Article Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco
MATH Google Scholar
Xu Z, Liu J, Yang Z, An G, Jia X (2016) The impact of feature selection on defect prediction performance: an empirical comparison. In: 2016 IEEE 27th international symposium on software reliability engineering (ISSRE), pp 309–320, IEEE

Download references

Funding

No funding is availed for this research work.

Author information

Authors and Affiliations

Manipal University Jaipur, Jaipur, 303007, Rajasthan, India
Somya Goyal

Authors

Somya Goyal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Somya Goyal.

Ethics declarations

Conflict of interest

The author has no conflicts of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Goyal, S. Static code metrics-based deep learning architecture for software fault prediction. Soft Comput 26, 13765–13797 (2022). https://doi.org/10.1007/s00500-022-07365-5

Download citation

Accepted: 05 July 2022
Published: 11 August 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s00500-022-07365-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Static code metrics-based deep learning architecture for software fault prediction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic and traditional feature fusion for software defect prediction using hybrid deep learning model

Is deep learning good enough for software defect prediction?

Evaluation of LMT and DNN Algorithms in Software Defect Prediction for Open-Source Software

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Static code metrics-based deep learning architecture for software fault prediction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Semantic and traditional feature fusion for software defect prediction using hybrid deep learning model

Is deep learning good enough for software defect prediction?

Evaluation of LMT and DNN Algorithms in Software Defect Prediction for Open-Source Software

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation