Abstract
The paper presents a description of the developed approach and service for analyzing source code in Python. The service reduces the time for code review due to partial automation. The FastText algorithm is used to obtain vector representations of source code texts. A pre-trained neural network language model based on the transformer architecture was used to derive a possible natural language function assignment. A classifier based on the gradient boosting algorithm was used to detect duplicate PR. The developed service checks the changeset and publishes error and duplicate reports in changeset comment format after the changeset is published to a remote Git repository. The conducted testing did not reveal any errors that affect the operation. All the main functions of the system are performed correctly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kapser, C., Godfrey, M.: Cloning Considered Harmful Considered Harmful. https://ieeexplore.ieee.org/document/4023973. Accessed 02 May 2022
Roy, C.K.: A survey on software clone detection research / C. K. Roy, J. R. Cordy // A survey on software clone detection research Queen’s School of Computing Technical Report, pp. 64–68 (2007)
DeepCode What is DeepCode? https://deepcode.freshdesk.com/support/solutions/articles/60000346607-what-is-deepcode. Accessed 02 May 2022
Hindle, A.: On the naturalness of software / A. Hindle, E. T. Barr, Z. Su, M. Gabel, P. Devanbu // ICSE ’12: Proceedings of the 34th International Conference on Software Engineering, pp. 837–847 (2012)
Ray, B., Hellendoorn, V.: On the “naturalness” of Buggy Code. https://arxiv.org/abs/1506.01159. Accessed 02 Feb 2022
Hellendoorn, V.J.: Will they like this? Evaluating code contributions with language models / V. J. Hellendoorn, P. T. Devanbu, A. Bacchelli // Will they like this? Evaluating code contributions with language models MSR ’15: Proceedings of the 12th Working Conference on Mining Software Repositories, pp. 157–167 (2015)
Mikolov, T.: Efficient Estimation of Word Representations in Vector Space. https://arxiv.org/abs/1301.3781. Accessed 02 Feb 2022
Bojanowski, P.: Enriching Word Vectors with Subword Information. https://arxiv.org/abs/1607.04606. Accessed 02 Feb 2022
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532–1543 (2014)
Elnaggar, A., Ding, W., Jones, L., Gibbs, T.: Towards cracking the language of silicon’s code through self-supervised deep learning and high performance computing. https://arxiv.org/abs/2104.02443. Accessed 02 Feb 2022
Friedman, J.: Greedy Function Approximation: A Gradient Boosting Machine. IM Reitz Lecture (1999)
Yarushkina, N., Moshkin, V., Filippov, A.: Development of a knowledge base based on context analysis of external information resources // DS-ITNT 2018// Proceedings of the International conference Information Technology and Nanotechnology. Session Data Science // Samara, Russia, 24–27 April, 2018. pp. 328–337 (2018)
Zarubin, A., Moshkin, V., Filippov, A., Koval, A.: The approach to the construction of question-answer systems based on the syntagmatic analysis of the text // DS-ITNT 2018// Proceedings of the International conference Information Technology and Nanotechnology. Session Data Science // Samara, Russia, 24–27 April, 2018, pp. 179–185 (2018)
Acknowledgements
This study was supported by the RFBR (project No. 20-07-00672) and by Ministry of Education and Science of Russia in framework of project № 075-00233-20-05 from 03.11.2020 «Research of intelligent predictive multimodal analysis of big data, and the extraction of knowledge from different sources».
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Moshkin, V., Andreev, I. (2023). Development of a Program Code Review System Using Machine Learning Methods. In: Dolinina, O., et al. Artificial Intelligence in Models, Methods and Applications. AIES 2022. Studies in Systems, Decision and Control, vol 457. Springer, Cham. https://doi.org/10.1007/978-3-031-22938-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-22938-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22937-4
Online ISBN: 978-3-031-22938-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)