Log in

Automatic software code repair using deep learning techniques

  • Research
  • Published:
Software Quality Journal Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

In the multi-hundred-billion-dollar industry of software development, the debugging process is an expensive task for developers. So, much effort has been put into debugging automation. In the last decade, researchers have repaired codes according to predefined rules which are only effective in limited types of bugs. Through a lot of experiments, researchers have found that deep learning models are useful in code repair automation similar to the revolutionary results these approaches have produced in various other fields. To solve such a difficult problem, recent works focus on debugging bugs that appear on one line of code. It has been reported that this type of bug occurs at an average rate of 1 out of every 1600 lines of code in a software project, which is significant. The current research follows these approaches and introduces a novel automatic code-repair system. We have employed the transfer learning technique to reuse a pre-trained model on the problem. The proposed system is designed based on the encoder-decoder architecture. In the encoder, a new pre-trained Bert model named JavaBert is used. Then, the model was fine tuned. The decoder is a transformer with an autoregressive structure. ManySStuBs4J [1] dataset is used for evaluation purposes. The results of the evaluations show that the proposed system has higher accuracy and BLEU criteria than CodeBert and the baseline model. Baseline is a simple model that acts as a reference in machine learning studies and CodeBert is one of the most similar models to the proposed model. The bilingual evaluation understudy score (BLEU) improvement is between 0.04 and 0.16%, the accuracy improvement is between 0.64 and 5.81%, the recall improvement is between 1.08 and 9.2%, and the F-score improvement is between 3.27 and 6.18%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Algorithm 2
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Algorithm 3
Fig. 10
Algorithm 4
Algorithm 5
Algorithm 6
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

To evaluate the proposed system, the ManySStuBs4J (Karampatsis & Sutton, 2020) dataset is used. No new datasets were created in this study.

Notes

  1. In computer-based language recognition, ANTLR, or ANother Tool for Language Recognition, is a parser generator that uses an LL algorithm for parsing.

  2. https://github.com/GumTreeDiff/gumtree

Abbreviations

BLEU:

Bilingual Evaluation Understudy Score

GRU:

Gated Recurrent Unit

LSTM:

Long Short-Term Memory Networks

AST:

Abstract Syntax Tree

CNN:

Convolutional Neural Network

BERT:

Bidirectional Encoder Representations from Transformers

MLP:

A multilayer perceptron

DNN:

Deep Neural Networks

RNN:

Recurrent neural network

Seq2Seq:

Sequence-to-Sequence

BPE:

Byte Pair Encoding

API:

Application Programming Interface

GPU:

Graphics processing unit

MLM:

Masked Language Model

References

  • Ahmed, U. Z., Kumar, P., Karkare, A., Kar, P., & Gulwani, S. (2018). Compilation error repair: for the student programs, from the student programs. In 40th International Conference on Software Engineering: Software Engineering Education and Training, Gothenburg.

  • Al-Ghamdi, S., Al-Khalifa, H., & Al-Salman, A. (2023). Fine-tuning BERT-based pre-trained models for Arabic dependency parsing. Applied Sciences, 13(7), 42–25.

    Article  Google Scholar 

  • Anvik, J., Hiew, L., & Murphy, G. C. (2005, October). Co** with an open bug repository. In Proceedings of the OOPSLA Workshop on Eclipse Technology eXchange, San Diego, California, USA, p. 35–39.

  • Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, San Francisco.

  • Bengio, Y., Goodfellow, I., & Courville, A. (2017). Deep learning. MIT Press Cambridge.

  • Britton, T., Jeng, L., Carver, G., Cheak, P., & Katzenellenbogen, T. (2013). Reversible debugging software: Quantify the time and cost saved using reversible debuggers. University of Cambridge.

    Google Scholar 

  • Carzaniga, A., Gorla, A., Perino, N., & Pezzè, M. (2010, November). Automatic workarounds for web applications,” in Proceedings of the 18th ACM SIGSOFT International Symposium on Foundations of Software, November-2010, Santa Fe, New Mexico, USA, p.237–246.

  • Carzaniga, A., Gorla, A., Mattavelli, A., Perino, N., & Pezzè, M. (2013, May). Automatic recovery from runtime failures. In Proceedings of the 35th International Conference on Software Engineering (ICSE), San Francisco, CA, USA, p. 782–791.

  • Chen, Z., Kommrusch, S., Tufano, M., Pouchet, L. N., Poshyvanyk, D., & Monperrus, M. (2021). SequenceR: Sequence-to-sequence learning for end-to-end program repair. IEEE Transactions on Software Engineering, 47(9), 1943–1959.

    Google Scholar 

  • De Sousa, N. T., & Hasselbring, W. (2021). JavaBERT: Training a transformer-based model for the Java programming language. In 36th IEEE/ACM International Conference on Automated Software Engineering Workshops, Los Alamitos.

  • Debroy, V., & Wong, W. E. (2010). Using mutation to automatically suggest fixes for faulty programs,” in Third International Conference on Software Testing, Verification and Validation, Paris.

  • Devlin, J., Chang, M. -W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis.

  • Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., & Zhou, M. (2020). CodeBERT: A pre-trained model for programming and natural languages. Findings of the Association for Computational Linguistics, 1536–1547.

  • Gabel, M., & Su, Z. (2010, November). A study of the uniqueness of source code. In Proceedings of the 28th ACM SIGSOFT international symposium on Foundations of software engineering, New York, NY, USA, p. 147–156.

  • Ganaie, M. A., Hu, M., Malik, A. K., Tanveer, M., & Suganthan, P. N. (2022). Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence.

  • Gupta, R., Pal, S., Kanade, A., & Shevade, S. (2017). DeepFix: Fixing common c language errors by deep learning. In Thirty-First AAAI Conference on Artificial Intelligence, San Francisco.

  • Jang, Y., Won, K., Choi, H. D., & Shin, S. Y. (2023). Classification of research papers on radio frequency electromagnetic field (RF-EMF) using graph neural networks (GNN). Applied Sciences, 13(7), 4614.

    Article  Google Scholar 

  • Jones, J. A., & Harrold, M. J. (2005). Empirical evaluation of the tarantula automatic fault-localization technique. In 20th IEEE/ACM International Conference on Automated Software Engineering, Long Beach.

  • Karampatsis, R. M., & Sutton, C. (2020, June). How often do single-statement bugs occur? The manysstubs4j dataset. In Proceedings of the 17th International Conference on Mining Software Repositories, New York, NY, USA, p. 573–577.

  • Lamy-Poirier, J. (2021). Layered gradient accumulation and modular pipeline parallelism: Fast and efficient training of large language models. ar**v preprint ar**v:2106.02679

  • Le, X. B. D., Lo, D., & Le Goues, C. (2016). History driven program repair. In 23rd International Conference on Software Analysis, Evolution, and Reengineering, Osaka.

  • Le Goues, C., Nguyen, T., Forrest, S., & Weimer, W. (2012). GenProg: A generic method for automatic software repair. IEEE Transactions on Software Engineering, 38(1), 54–72.

    Article  Google Scholar 

  • Liblit, B., Aiken, A., Zheng, A. X., & Jordan, M. I. (2003, June). Bug isolation via remote program sampling. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, California, USA, p. 141–154.

  • Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized Bert pretraining approach. ar**v preprint ar**v:1907.11692

  • Long, F., & Rinard, M. (2016). Automatic patch generation by learning correct code. SIGPLAN Not, 51(1), 298–312.

    Article  Google Scholar 

  • Long, F., Amidon, P., & Rinard, M. (2017). Automatic inference of code transforms for patch generation. In 11th Joint Meeting on Foundations of Software Engineering, Paderborn.

  • Lutellier, T., Pham, H. V., Pang, L., Li, Y., Wei, M., & Tan, L. (2020). CoCoNuT: Combining context-aware neural translation models using ensemble for program repair. In 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event.

  • Mashhadi, E., & Hemmati, H. (2021, May). Applying CodeBERT for automated program repair of Java simple bugs. In Proceeding of the 18th International Conference on Mining Software Repositories, Madrid, Spain, pp. 505–509.

  • Mechtaev, S., Yi, J., & Roychoudhury, A. (2016, May). Angelix: Scalable multiline program patch synthesis via symbolic analysis. In Proceedings of the 38th International Conference on Software Engineering, Austin, Texas, p. 691–701.

  • Monperrus, M. (2018). Automatic software repair: A bibliography. ACM Computing Surveys, 51(1), 1–24.

    Article  Google Scholar 

  • Niu, Z., Zhong, G., & Yu, H. (2021). A review on the attention mechanism of deep learning. Neurocomputing.

  • Papineni K., Roukos S., Ward T., & Zhu, W. J. (2002). Bleu: A method for automatic evaluation of machine translation. In 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia.

  • Saha, R. K., Lyu, Y., Yoshida, H., & Prasad, M. R. (2017, October). ELIXIR: Effective object oriented program repair. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, Urbana-Champaign, IL, USA, p. 648–659.

  • Sarker, I. (2021). Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions. SN Computer Science, 420(2).

  • Tassey, G. (2002, May). The economic impacts of inadequate infrastructure for software testing. RTI Project Report, National Institute of Standards and Technology.

  • Tufano, M., Watson, C., Bavota, G., Di Penta, M., White, M., & Poshyvanyk, D. (2018, September). An empirical investigation into learning bug-fixing patches in the wild via neural machine translation. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, Montpellier, France, 832–837.

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

  • Vrbančič, G., & Podgorelec, V. (2020). Transfer learning with adaptive fine-tuning. IEEE Access, 8, 196197–196211.

    Article  Google Scholar 

  • Wang, S. (2003). Artificial neural network. Interdisciplinary Computing in Java Programming, 81–100, 81–100.

    Article  Google Scholar 

  • Weiss, C., Premraj, R., Zimmermann, T., & Zeller, A. (2007, May). How long will it take to fix this bug? In Proceedings of the 4th International Workshop on Mining Software Repositories, Minneapolis, MN, USA, pp. 1–1.

  • Wen, M., Chen, J., Wu, R., Hao, D., & Cheung, S. C. (2018, May). Context-aware patch generation for better automated program repair. In Proceedings of the 40th International Conference on Software Engineering, Gothenburg, Sweden, p. 1–11.

  • Xu, Q., Jiang, H., Zhang, X., Li, J., & Chen, L. (2023). Multiscale convolutional neural network based on channel space attention for gearbox compound fault diagnosis. Sensors, 23(8), 3827.

    Article  Google Scholar 

  • Zhuang, F., Qi, Z., Duan, K., **, D., Zhu, Y., Zhu, H., **ong, H., & He, Q. (2020). A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1), 43–79.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Mohammad Mahdi Abdollahpour: Conceptualization, Methodology, Software, Visualization and Original draft preparation. Mehrdad Ashtiani: Conceptualization, Writing- Reviewing and Editing, Verification of the results, Supervision. Fatemeh Bakhshi: Writing- Reviewing and Editing, Verification of the results.

Corresponding author

Correspondence to Mehrdad Ashtiani.

Ethics declarations

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abdollahpour, M.M., Ashtiani, M. & Bakhshi, F. Automatic software code repair using deep learning techniques. Software Qual J 32, 361–390 (2024). https://doi.org/10.1007/s11219-023-09653-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-023-09653-1

Keywords

Navigation