Log in

Design and development of novel hybrid optimization-based convolutional neural network for software bug localization

  • Application of soft computing
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Software systems are frequently affected by a few defects. In general, the developers make use of the data present in the file, while reporting a bug; thus, they can help to fix the bug by localizing the source code fragments that are required to be improved. This paper intends to implement the bug localization model using the improved deep learning algorithm, which helps to localize the buggy files to enhance the productivity and efficiency of the software quality assurance teams. The datasets such as “Aspect J and SWT” were utilized for assessing the proposed bug localization model. The proposed model involves a few fundamental steps like data pre-processing, word embedding, CNN-based feature detection, and CNN-based classification. Initially, the source files from which the bug has to be localized are given as input. Here, the pre-processing of files is performed by erasing punctuation followed by splitting. Here, the relevant information or features are gathered from the source files using Word2Vec, a bag of n-grams, and the term frequency–inverse document frequency (TF-IDF) technique in the word embedding process. Further, final feature vector extraction is performed by the convolution neural network (CNN). The extracted features are subjected to optimized CNN-based classification to localize the bugs. Here, the number of hidden neurons of CNN is optimized using hybridized cuckoo search-based sea lion optimization (CS-SLnO). The main objective of this work is to introduce a new optimized CNN-based classification of bug localization with hidden neurons optimization using a hybridized optimization algorithm termed CS-SLnO for improving the localization accurateness. Finally, the bug fixing ability of the proposed model is proved and certified through valuable performance analysis. The experimental results shows that the accuracy@1 of the suggested CS-SLnO-CNN is attaining the best performance when compared to other algorithms. It is 85.1% better than DeepLoc, 51.9% better than DeepLocator, 54.3% better than HyLoc, 65.1% better than LR + WE, and 73.5% better than BugLocator. Thus, it can be confirmed that the developed CS-SLnO-CNN is acquiring better bug localization performance when compared to the other deep learning models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability statement

The data underlying this article are available in “Aspect J, JDT, TOMCAT, ZXING, and SWT” were used for evaluating the proposed bug localization model.

Abbreviations

TF-IDF:

Term Frequency–Inverse Document Frequency

CNN:

Convolution Neural Network

CSO:

Cuckoo Search Optimization

SLnO:

Sea Lion Optimization Algorithm

CS-SLnO:

Cuckoo Search-based Sea Lion Optimization

IR:

Information Retrieval

VSM:

Vector Space Model

MAP:

Mean Average Precision

TBCNN:

Tree-based CNN

BLiM2:

Bug Localization in Models

FPR:

False Positive Rate

NGD:

Normalized Google Distance

GWO:

Grey Wolf Optimization

NPV:

Negative Predictive Value

PMI:

Point-wise Mutual Information

FNR:

False Negative Rate

NetML:

Network-clustered Multi-modal Bug Localization

TF-IDuF:

Term Frequency–user-focused Inverse Document Frequency

MCC:

Matthew’s Correlation Coefficient

WOA:

Whale Optimization Algorithm

FDR:

False Discover Rate

NN:

Neural Network

References

  • Abreu R, Zoeteweij P, Golsteijn R, van Gemund AJC (2009a) A practical evaluation of spectrum-based fault localization. J Syst Softw 82(11):1780–1792

    Article  Google Scholar 

  • Abreu R, Zoeteweij P, Golsteijn R, van Gemund AJC (2009b) A practical evaluation of spectrum-based fault localization. J Syst Softw 82(11):1780–1792

    Article  Google Scholar 

  • Ambati LS, Narukonda K, Bojja GR, Bishop D (2020) Factors Influencing the adoption of artificial intelligence in organizations-from an employee's perspective. Adoption of AI in organization from employee perspective

  • Arcega L, Font J, Haugen Ø, Cetina C (2019) An approach for bug localization in models using two levels: model and metamodel. Softw Syst Model 18:3551–3576

    Article  Google Scholar 

  • Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  • Beno MM, Valarmathi IR, Swamy SM, Rajakumar BR (2014) Threshold prediction for segmenting tumour from brain MRI scans. Int J Imaging Syst Technol 24(2):129–137

    Article  Google Scholar 

  • Bojja GR, Ofori M, Liu J, Ambati LS (2020) Early public outlook on the coronavirus disease (COVID-19): a social media study," Social Media Analysis on Coronavirus (COVID-19)

  • Chang H-H, Lai T-C, Chang H-J, Lee W-J (2022) Fault location identifications in HV transmission networks and different MV wind farms using nonintrusive monitoring techniques. IEEE Trans Ind Appl 58(2):1822–1830

    Article  Google Scholar 

  • Dam HK, Pham T, Ng SW, Tran T, Grundy J, Ghose A, Kim T, Kim C-J (2018) A deep tree-based model for software defect prediction. http://arxiv.org/abs/1802.00921

  • Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: 2009 IEEE international conference on software maintenance, Edmonton, AB, pp 351–360

  • Hoang T, Oentaryo RJ, Le TB, Lo D (2019) Network-clustered multi-modal bug localization. IEEE Trans Software Eng 45(10):1002–1023

    Article  Google Scholar 

  • I Namatēvs (2017b) Deep convolutional neural networks: structure, feature extraction and training. Inf Technol Manage Sci

  • Jones JA, Harrold MJ (2005) Empirical evaluation of the Tarantula automatic fault-localization technique. In: 20th IEEE/ACM international conference on automated software engineering (ASE 2005), November 7–11

  • Joshi AS, Kulkarni O, Kakandikar GM, Nandedkar VM (2017) Cuckoo search optimization—a review. Mater Today Proce 4(8):7262–7269

    Article  Google Scholar 

  • Khatiwada S, Tushev M, Mahmoud A (2018) ust enough semantics: an information theoretic approach for IR-based software bug localization. Inf Softw Technol 93:45–57

    Article  Google Scholar 

  • Kim K, Ghatpande S, Liu K, Koyuncu A, Kim D, Bissyandé TF, Klein J, Traonb YL (2022) DigBug—Pre/post-processing operator selection for accurate bug localization. J Syst Softw 189:111300

    Article  Google Scholar 

  • Kim Y (2014) Convolutional neural networks for sentence classification

  • Krizhevsky A, Sutskever I, Hinton G E (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst, 25(2)

  • Kumaraswamy B, Poonacha PG (2021) Deep Convolutional Neural Network for musical genre classification via new Self Adaptive Sea Lion Optimization. Appl Soft Comput 108:107446

    Article  Google Scholar 

  • LaToza TD, Myers BA (2010) Hard-to-answer questions about code. In: Evaluation and usability of programming languages and tools

  • Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2017) Bug localization with combination of deep learning and information retrieval. In: 2017 IEEE/ACM 25th international conference on program comprehension (ICPC), Buenos Aires, pp 218–229

  • Lam AN, Nguyen AT, Nguyen HA, and Nguyen TN (2015) Combining deep learning with information retrieval to localize buggy files for bug reports (N), vol 1, pp 476–481

  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436

    Article  Google Scholar 

  • Li Z, Jiang Z, Chen X, Cao K, Gu Q (2021) Laprob: A Label propagation-Based software bug localization method. Inf Softw Technol 130:106410

    Article  Google Scholar 

  • Li B, Liu T, Du X, Zhang D, Zhao Z (2015b) Learning document embeddings by predicting n-grams for sentiment classification of long movie reviews

  • Liang H, Sun L, Wang M, Yang Y (2019) Deep learning with customized abstract syntax tree for bug localization. IEEE Access 7:116309–116320

    Article  Google Scholar 

  • Liu C, Fei L, Yan X, Han J, Midkiff SP (2006) Statistical debugging: a hypothesis testing-based approach. IEEE Trans Softw Eng 32(10):831–848

    Article  Google Scholar 

  • Liu G, Lu Y, Shi K, Chang J, Wei X (2019) Convolutional neural networks-based locating relevant buggy code files for bug reports affected by data imbalance. IEEE Access 7:131304–131316

    Article  Google Scholar 

  • Ma L, Zhang Y (2015) Using Word2Vec to process big text data. In: IEEE International Conference

  • Masadeh RMT, Mahafzah BA, Sharieh AA-A (2019) Sea lion optimization algorithm. Int J Adv Comput Sci Appl 10(5):388–395

    Google Scholar 

  • Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) "Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  • Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67

    Article  Google Scholar 

  • Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61

    Article  Google Scholar 

  • Miryeganeh N, Hashtroudi S, Hemmati H (2021) GloBug: using global data in fault localization. J Syst Softw 177:110961

    Article  Google Scholar 

  • Poshyvanyk D, Gueheneuc Y, Marcus A, Antoniol G, Rajlich V (2007) feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432

    Article  Google Scholar 

  • Seyam AA, Hamdy A, Farhan MS (2021) Code complexity and version history for enhancing hybrid bug localization. IEEE Access 9:61101–61113

    Article  Google Scholar 

  • Tabjula JL, Kanakambaran S, Kalyani S, Rajagopal P, Srinivasan B (2021) Outlier analysis for defect detection using sparse sampling in guided wave structural health monitoring. Struct Control Health Monit, 28

  • Tabjula J, Kalyani S, Rajagopal P, Srinivasan B (2021) Statistics-based baseline-free approach for rapid inspection of delamination in composite structures using ultrasonic guided waves. Struct Health Monit

  • Takahashi A, Sae-Lim N, Hayashi S, Saeki M (2021) An extensive study on smell-aware bug localization. J Syst Softw 178:110986

    Article  Google Scholar 

  • Tantithamthavorn C, Abebe SL, Hassan AE, Ihara A, Matsumoto K (2018) The impact of IR-based classifier configuration on the performance and the effort of method-level bug localization. Inf Softw Technol 102:160–174

    Article  Google Scholar 

  • Wang TT, Yu HL, Wang KC, Su XH (2022) Fault localization based on wide & deep learning model by mining software behavior. Futur Gener Comput Syst 127:309–319

    Article  Google Scholar 

  • Wen Z, Ziqiang L, Qing W, Juan L (2019) FineLocator: a novel approach to method-level fine-grained bug localization by query expansion. Inf Softw Technol 110:121–135

    Article  Google Scholar 

  • **ao Y, Keung J, Bennin KE, Mi Q (2019) Improving bug localization with word embedding and enhanced convolutional neural networks. Inf Softw Technol 105:17–29

    Article  Google Scholar 

  • **ao Y, Keung J, Mi Q, Bennin KE (2017c) Improving bug localization with an enhanced convolutional neural network. In: 2017c 24th asia-pacific software engineering conference (APSEC), Nan**g

  • Yadav SP (2020) Vision-based detection, tracking, and classification of vehicles. IEIE Trans Smart Process Comput 9(6):427–434

    Article  Google Scholar 

  • Yadav SP, Yadav S (2020a) Image fusion using hybrid methods in multimodality medical images. Med Biol Eng Comput 58:669–687

    Article  Google Scholar 

  • Yadav SP, Yadav S (2020b) Fusion of medical images in wavelet domain: a hybrid implementation. Comput Model Eng Sci 122(1):303–321

    Google Scholar 

  • Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 689–699

  • Ye X, Shen H, Ma X, Bunescu R, Liu C (2016) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th international conference on software engineering, ACM, pp 404–415

  • Zhang Y, Wallace B (2015a) A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification", article

  • Zhou Z-H, Liu X-Y (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77

    Article  Google Scholar 

  • Zhou J, Zhang H, Lo D (2012a) Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: 2012a 34th international conference on software engineering (ICSE), Zurich, pp 14–24

  • Zhou J, Zhang H, Lo D (2012b) Where should the bugs be fixed?-more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th international conference on software engineering, IEEE Press, pp. 14–24

Download references

Funding

This research did not receive any specific funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ginika Mahajan.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical approval

This paper does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahajan, G., Chaudhary, N. Design and development of novel hybrid optimization-based convolutional neural network for software bug localization. Soft Comput 26, 13651–13672 (2022). https://doi.org/10.1007/s00500-022-07341-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-022-07341-z

Keywords

Navigation