Development of a Program Code Review System Using Machine Learning Methods

  • Conference paper
  • First Online:
Artificial Intelligence in Models, Methods and Applications (AIES 2022)

Abstract

The paper presents a description of the developed approach and service for analyzing source code in Python. The service reduces the time for code review due to partial automation. The FastText algorithm is used to obtain vector representations of source code texts. A pre-trained neural network language model based on the transformer architecture was used to derive a possible natural language function assignment. A classifier based on the gradient boosting algorithm was used to detect duplicate PR. The developed service checks the changeset and publishes error and duplicate reports in changeset comment format after the changeset is published to a remote Git repository. The conducted testing did not reveal any errors that affect the operation. All the main functions of the system are performed correctly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Kapser, C., Godfrey, M.: Cloning Considered Harmful Considered Harmful. https://ieeexplore.ieee.org/document/4023973. Accessed 02 May 2022

  2. Roy, C.K.: A survey on software clone detection research / C. K. Roy, J. R. Cordy // A survey on software clone detection research Queen’s School of Computing Technical Report, pp. 64–68 (2007)

    Google Scholar 

  3. DeepCode What is DeepCode? https://deepcode.freshdesk.com/support/solutions/articles/60000346607-what-is-deepcode. Accessed 02 May 2022

  4. Hindle, A.: On the naturalness of software / A. Hindle, E. T. Barr, Z. Su, M. Gabel, P. Devanbu // ICSE ’12: Proceedings of the 34th International Conference on Software Engineering, pp. 837–847 (2012)

    Google Scholar 

  5. Ray, B., Hellendoorn, V.: On the “naturalness” of Buggy Code. https://arxiv.org/abs/1506.01159. Accessed 02 Feb 2022

  6. Hellendoorn, V.J.: Will they like this? Evaluating code contributions with language models / V. J. Hellendoorn, P. T. Devanbu, A. Bacchelli // Will they like this? Evaluating code contributions with language models MSR ’15: Proceedings of the 12th Working Conference on Mining Software Repositories, pp. 157–167 (2015)

    Google Scholar 

  7. Mikolov, T.: Efficient Estimation of Word Representations in Vector Space. https://arxiv.org/abs/1301.3781. Accessed 02 Feb 2022

  8. Bojanowski, P.: Enriching Word Vectors with Subword Information. https://arxiv.org/abs/1607.04606. Accessed 02 Feb 2022

  9. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)

    Google Scholar 

  10. Elnaggar, A., Ding, W., Jones, L., Gibbs, T.: Towards cracking the language of silicon’s code through self-supervised deep learning and high performance computing. https://arxiv.org/abs/2104.02443. Accessed 02 Feb 2022

  11. Friedman, J.: Greedy Function Approximation: A Gradient Boosting Machine. IM Reitz Lecture (1999)

    Google Scholar 

  12. Yarushkina, N., Moshkin, V., Filippov, A.: Development of a knowledge base based on context analysis of external information resources // DS-ITNT 2018// Proceedings of the International conference Information Technology and Nanotechnology. Session Data Science // Samara, Russia, 24–27 April, 2018. pp. 328–337 (2018)

    Google Scholar 

  13. Zarubin, A., Moshkin, V., Filippov, A., Koval, A.: The approach to the construction of question-answer systems based on the syntagmatic analysis of the text // DS-ITNT 2018// Proceedings of the International conference Information Technology and Nanotechnology. Session Data Science // Samara, Russia, 24–27 April, 2018, pp. 179–185 (2018)

    Google Scholar 

Download references

Acknowledgements

This study was supported by the RFBR (project No. 20-07-00672) and by Ministry of Education and Science of Russia in framework of project № 075-00233-20-05 from 03.11.2020 «Research of intelligent predictive multimodal analysis of big data, and the extraction of knowledge from different sources».

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vadim Moshkin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Moshkin, V., Andreev, I. (2023). Development of a Program Code Review System Using Machine Learning Methods. In: Dolinina, O., et al. Artificial Intelligence in Models, Methods and Applications. AIES 2022. Studies in Systems, Decision and Control, vol 457. Springer, Cham. https://doi.org/10.1007/978-3-031-22938-1_5

Download citation

Publish with us

Policies and ethics

Navigation