Abstract
Software systems are frequently affected by a few defects. In general, the developers make use of the data present in the file, while reporting a bug; thus, they can help to fix the bug by localizing the source code fragments that are required to be improved. This paper intends to implement the bug localization model using the improved deep learning algorithm, which helps to localize the buggy files to enhance the productivity and efficiency of the software quality assurance teams. The datasets such as “Aspect J and SWT” were utilized for assessing the proposed bug localization model. The proposed model involves a few fundamental steps like data pre-processing, word embedding, CNN-based feature detection, and CNN-based classification. Initially, the source files from which the bug has to be localized are given as input. Here, the pre-processing of files is performed by erasing punctuation followed by splitting. Here, the relevant information or features are gathered from the source files using Word2Vec, a bag of n-grams, and the term frequency–inverse document frequency (TF-IDF) technique in the word embedding process. Further, final feature vector extraction is performed by the convolution neural network (CNN). The extracted features are subjected to optimized CNN-based classification to localize the bugs. Here, the number of hidden neurons of CNN is optimized using hybridized cuckoo search-based sea lion optimization (CS-SLnO). The main objective of this work is to introduce a new optimized CNN-based classification of bug localization with hidden neurons optimization using a hybridized optimization algorithm termed CS-SLnO for improving the localization accurateness. Finally, the bug fixing ability of the proposed model is proved and certified through valuable performance analysis. The experimental results shows that the accuracy@1 of the suggested CS-SLnO-CNN is attaining the best performance when compared to other algorithms. It is 85.1% better than DeepLoc, 51.9% better than DeepLocator, 54.3% better than HyLoc, 65.1% better than LR + WE, and 73.5% better than BugLocator. Thus, it can be confirmed that the developed CS-SLnO-CNN is acquiring better bug localization performance when compared to the other deep learning models.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07341-z/MediaObjects/500_2022_7341_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07341-z/MediaObjects/500_2022_7341_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07341-z/MediaObjects/500_2022_7341_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07341-z/MediaObjects/500_2022_7341_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07341-z/MediaObjects/500_2022_7341_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07341-z/MediaObjects/500_2022_7341_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07341-z/MediaObjects/500_2022_7341_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00500-022-07341-z/MediaObjects/500_2022_7341_Fig8_HTML.png)
Similar content being viewed by others
Data availability statement
The data underlying this article are available in “Aspect J, JDT, TOMCAT, ZXING, and SWT” were used for evaluating the proposed bug localization model.
Abbreviations
- TF-IDF:
-
Term Frequency–Inverse Document Frequency
- CNN:
-
Convolution Neural Network
- CSO:
-
Cuckoo Search Optimization
- SLnO:
-
Sea Lion Optimization Algorithm
- CS-SLnO:
-
Cuckoo Search-based Sea Lion Optimization
- IR:
-
Information Retrieval
- VSM:
-
Vector Space Model
- MAP:
-
Mean Average Precision
- TBCNN:
-
Tree-based CNN
- BLiM2:
-
Bug Localization in Models
- FPR:
-
False Positive Rate
- NGD:
-
Normalized Google Distance
- GWO:
-
Grey Wolf Optimization
- NPV:
-
Negative Predictive Value
- PMI:
-
Point-wise Mutual Information
- FNR:
-
False Negative Rate
- NetML:
-
Network-clustered Multi-modal Bug Localization
- TF-IDuF:
-
Term Frequency–user-focused Inverse Document Frequency
- MCC:
-
Matthew’s Correlation Coefficient
- WOA:
-
Whale Optimization Algorithm
- FDR:
-
False Discover Rate
- NN:
-
Neural Network
References
Abreu R, Zoeteweij P, Golsteijn R, van Gemund AJC (2009a) A practical evaluation of spectrum-based fault localization. J Syst Softw 82(11):1780–1792
Abreu R, Zoeteweij P, Golsteijn R, van Gemund AJC (2009b) A practical evaluation of spectrum-based fault localization. J Syst Softw 82(11):1780–1792
Ambati LS, Narukonda K, Bojja GR, Bishop D (2020) Factors Influencing the adoption of artificial intelligence in organizations-from an employee's perspective. Adoption of AI in organization from employee perspective
Arcega L, Font J, Haugen Ø, Cetina C (2019) An approach for bug localization in models using two levels: model and metamodel. Softw Syst Model 18:3551–3576
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Beno MM, Valarmathi IR, Swamy SM, Rajakumar BR (2014) Threshold prediction for segmenting tumour from brain MRI scans. Int J Imaging Syst Technol 24(2):129–137
Bojja GR, Ofori M, Liu J, Ambati LS (2020) Early public outlook on the coronavirus disease (COVID-19): a social media study," Social Media Analysis on Coronavirus (COVID-19)
Chang H-H, Lai T-C, Chang H-J, Lee W-J (2022) Fault location identifications in HV transmission networks and different MV wind farms using nonintrusive monitoring techniques. IEEE Trans Ind Appl 58(2):1822–1830
Dam HK, Pham T, Ng SW, Tran T, Grundy J, Ghose A, Kim T, Kim C-J (2018) A deep tree-based model for software defect prediction. http://arxiv.org/abs/1802.00921
Gay G, Haiduc S, Marcus A, Menzies T (2009) On the use of relevance feedback in IR-based concept location. In: 2009 IEEE international conference on software maintenance, Edmonton, AB, pp 351–360
Hoang T, Oentaryo RJ, Le TB, Lo D (2019) Network-clustered multi-modal bug localization. IEEE Trans Software Eng 45(10):1002–1023
I Namatēvs (2017b) Deep convolutional neural networks: structure, feature extraction and training. Inf Technol Manage Sci
Jones JA, Harrold MJ (2005) Empirical evaluation of the Tarantula automatic fault-localization technique. In: 20th IEEE/ACM international conference on automated software engineering (ASE 2005), November 7–11
Joshi AS, Kulkarni O, Kakandikar GM, Nandedkar VM (2017) Cuckoo search optimization—a review. Mater Today Proce 4(8):7262–7269
Khatiwada S, Tushev M, Mahmoud A (2018) ust enough semantics: an information theoretic approach for IR-based software bug localization. Inf Softw Technol 93:45–57
Kim K, Ghatpande S, Liu K, Koyuncu A, Kim D, Bissyandé TF, Klein J, Traonb YL (2022) DigBug—Pre/post-processing operator selection for accurate bug localization. J Syst Softw 189:111300
Kim Y (2014) Convolutional neural networks for sentence classification
Krizhevsky A, Sutskever I, Hinton G E (2012) ImageNet classification with deep convolutional neural networks. Adv Neural Inf Process Syst, 25(2)
Kumaraswamy B, Poonacha PG (2021) Deep Convolutional Neural Network for musical genre classification via new Self Adaptive Sea Lion Optimization. Appl Soft Comput 108:107446
LaToza TD, Myers BA (2010) Hard-to-answer questions about code. In: Evaluation and usability of programming languages and tools
Lam AN, Nguyen AT, Nguyen HA, Nguyen TN (2017) Bug localization with combination of deep learning and information retrieval. In: 2017 IEEE/ACM 25th international conference on program comprehension (ICPC), Buenos Aires, pp 218–229
Lam AN, Nguyen AT, Nguyen HA, and Nguyen TN (2015) Combining deep learning with information retrieval to localize buggy files for bug reports (N), vol 1, pp 476–481
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
Li Z, Jiang Z, Chen X, Cao K, Gu Q (2021) Laprob: A Label propagation-Based software bug localization method. Inf Softw Technol 130:106410
Li B, Liu T, Du X, Zhang D, Zhao Z (2015b) Learning document embeddings by predicting n-grams for sentiment classification of long movie reviews
Liang H, Sun L, Wang M, Yang Y (2019) Deep learning with customized abstract syntax tree for bug localization. IEEE Access 7:116309–116320
Liu C, Fei L, Yan X, Han J, Midkiff SP (2006) Statistical debugging: a hypothesis testing-based approach. IEEE Trans Softw Eng 32(10):831–848
Liu G, Lu Y, Shi K, Chang J, Wei X (2019) Convolutional neural networks-based locating relevant buggy code files for bug reports affected by data imbalance. IEEE Access 7:131304–131316
Ma L, Zhang Y (2015) Using Word2Vec to process big text data. In: IEEE International Conference
Masadeh RMT, Mahafzah BA, Sharieh AA-A (2019) Sea lion optimization algorithm. Int J Adv Comput Sci Appl 10(5):388–395
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) "Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Mirjalili S, Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Miryeganeh N, Hashtroudi S, Hemmati H (2021) GloBug: using global data in fault localization. J Syst Softw 177:110961
Poshyvanyk D, Gueheneuc Y, Marcus A, Antoniol G, Rajlich V (2007) feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans Softw Eng 33(6):420–432
Seyam AA, Hamdy A, Farhan MS (2021) Code complexity and version history for enhancing hybrid bug localization. IEEE Access 9:61101–61113
Tabjula JL, Kanakambaran S, Kalyani S, Rajagopal P, Srinivasan B (2021) Outlier analysis for defect detection using sparse sampling in guided wave structural health monitoring. Struct Control Health Monit, 28
Tabjula J, Kalyani S, Rajagopal P, Srinivasan B (2021) Statistics-based baseline-free approach for rapid inspection of delamination in composite structures using ultrasonic guided waves. Struct Health Monit
Takahashi A, Sae-Lim N, Hayashi S, Saeki M (2021) An extensive study on smell-aware bug localization. J Syst Softw 178:110986
Tantithamthavorn C, Abebe SL, Hassan AE, Ihara A, Matsumoto K (2018) The impact of IR-based classifier configuration on the performance and the effort of method-level bug localization. Inf Softw Technol 102:160–174
Wang TT, Yu HL, Wang KC, Su XH (2022) Fault localization based on wide & deep learning model by mining software behavior. Futur Gener Comput Syst 127:309–319
Wen Z, Ziqiang L, Qing W, Juan L (2019) FineLocator: a novel approach to method-level fine-grained bug localization by query expansion. Inf Softw Technol 110:121–135
**ao Y, Keung J, Bennin KE, Mi Q (2019) Improving bug localization with word embedding and enhanced convolutional neural networks. Inf Softw Technol 105:17–29
**ao Y, Keung J, Mi Q, Bennin KE (2017c) Improving bug localization with an enhanced convolutional neural network. In: 2017c 24th asia-pacific software engineering conference (APSEC), Nan**g
Yadav SP (2020) Vision-based detection, tracking, and classification of vehicles. IEIE Trans Smart Process Comput 9(6):427–434
Yadav SP, Yadav S (2020a) Image fusion using hybrid methods in multimodality medical images. Med Biol Eng Comput 58:669–687
Yadav SP, Yadav S (2020b) Fusion of medical images in wavelet domain: a hybrid implementation. Comput Model Eng Sci 122(1):303–321
Ye X, Bunescu R, Liu C (2014) Learning to rank relevant files for bug reports using domain knowledge. In: Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering. ACM, pp 689–699
Ye X, Shen H, Ma X, Bunescu R, Liu C (2016) From word embeddings to document similarities for improved information retrieval in software engineering. In: Proceedings of the 38th international conference on software engineering, ACM, pp 404–415
Zhang Y, Wallace B (2015a) A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification", article
Zhou Z-H, Liu X-Y (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77
Zhou J, Zhang H, Lo D (2012a) Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In: 2012a 34th international conference on software engineering (ICSE), Zurich, pp 14–24
Zhou J, Zhang H, Lo D (2012b) Where should the bugs be fixed?-more accurate information retrieval-based bug localization based on bug reports. In: Proceedings of the 34th international conference on software engineering, IEEE Press, pp. 14–24
Funding
This research did not receive any specific funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Ethical approval
This paper does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mahajan, G., Chaudhary, N. Design and development of novel hybrid optimization-based convolutional neural network for software bug localization. Soft Comput 26, 13651–13672 (2022). https://doi.org/10.1007/s00500-022-07341-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-022-07341-z