Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Learning Approaches

Zaitsev, Oleksandr; Ducasse, Stephane; Bergel, Alexandre; Eveillard, Mathieu

doi:10.1007/978-3-030-58793-2_8

Oleksandr Zaitsev^9,10,
Stephane Ducasse⁹,
Alexandre Bergel¹¹ &
…
Mathieu Eveillard¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1266))

Included in the following conference series:

International Conference on the Quality of Information and Communications Technology

1359 Accesses
3 Citations

Abstract

Programming is a form of communication between the person who is writing code and the one reading it. Nevertheless, very often developers neglect readability, and even well-written code becomes less understandable as software evolves. Together with the growing complexity of software systems, this creates an increasing need for automated tools for improving the readability of source code. In this work, we focus on method names and study how a descriptive name can be automatically generated from a method’s body. We experiment with two approaches from the field of text summarization: One based on TF-IDF and the other on deep recurrent neural network. We collect a dataset of methods from 50 real world projects. We evaluate our approaches by comparing the generated names to the actual ones and report the result using Precision and Recall metrics. For TF-IDF, we get results as good as 28% precision and 45% recall; and for deep neural network, 46% precision and 32% recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Accurate module name prediction using similarity based and sequence generation models

Article 02 February 2022

Using LSTMs to Model the Java Programming Language

Semantic Code Search in Software Repositories using Neural Machine Translation

Notes

1.
pharo.org/.
2.
The actual meaning of the code is not important, but, double quotes delimit comments, pipes delimit local variables declaration, square brackets delimit lambda functions, and caret is a return.
3.
4.
The probability that during training the word generated by the model is substituted by the word from a real name. It is used to make the training smoother.
5.
Harmonic mean is more intuitive than the arithmetic mean when computing a mean of ratios.
6.
The complete list of stop words that we used in this study can be found here: https://gist.github.com/olekscode/125804150f2a559a171bf695c0a3f809.

References

Allamanis, M., Barr, E.T., Bird, C., Sutton, C.: Suggesting accurate method and class names. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, pp. 38–49. ACM (2015)
Google Scholar
Allamanis, M., Barr, E.T., Devanbu, P., Sutton, C.: A survey of machine learning for big code and naturalness. ACM Comput. Surv. (CSUR) 51(4), 81 (2018)
Article Google Scholar
Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: International Conference on Machine Learning, pp. 2091–2100 (2016)
Google Scholar
Alon, U., Zilberstein, M., Levy, O., Yahav, E.: code2vec: learning distributed representations of code. ar**v preprint ar**v:1803.09473 (2018)
Bavishi, R., Pradel, M., Sen, K.: Context2name: a deep learning-based approach to infer natural variable names from usage contexts. ar**v preprint ar**v:1809.05193 (2018)
Beck, K.: Test Driven Development: By Example. Addison-Wesley Longman (2002)
Google Scholar
Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. ar**v preprint ar**v:1409.1259 (2014)
Demeyer, S., Ducasse, S., Nierstrasz, O.: Object-Oriented Reengineering Patterns. Morgan Kaufmann, Burlington (2002)
Google Scholar
Fowler, M., Beck, K., Brant, J., Opdyke, W., Roberts, D.: Refactoring: Improving the Design of Existing Code. Addison Wesley, Boston (1999)
Google Scholar
Gabel, M., Su, Z.: A study of the uniqueness of source code. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 147–156. ACM (2010)
Google Scholar
Hindle, A., Barr, E.T., Gabel, M., Su, Z., Devanbu, P.: On the naturalness of software. Commun. ACM 59(5), 122–131 (2016)
Article Google Scholar
Hindle, A., Barr, E.T., Su, Z., Gabel, M., Devanbu, P.: On the naturalness of software. In: 2012 34th International Conference on Software Engineering (ICSE), pp. 837–847. IEEE (2012)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 2073–2083 (2016)
Google Scholar
Knuth, D.E.: Literate programming. Comput. J. 27(2), 97–111 (1984)
Article Google Scholar
Koenig, A.: Patterns and antipatterns. J. Object-Oriented Program. 8(1), 46–48 (1995)
Google Scholar
Lehman, M., Belady, L.: Program Evolution: Processes of Software Change. London Academic Press, London (1985). ftp://ftp.umh.ac.be/pub/ftp_infofs/1985/ProgramEvolution.pdf
Martin, R.C.: Clean Code: A Handbook of Agile Software Craftsmanship. Pearson Education, London (2009)
Google Scholar
Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 133–142 (2003)
Google Scholar
Raychev, V., Vechev, M., Yahav, E.: Code completion with statistical language models. In: ACM SIGPLAN Notices, vol. 49, pp. 419–428. ACM (2014)
Google Scholar
Rush, A.M., Harvard, S., Chopra, S., Weston, J.: A neural attention model for sentence summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, ACLWeb (2017)
Google Scholar
Sasaki, Y., et al.: The truth of the F-measure. Teach Tutor Mater 1(5), 1–5 (2007)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)
Google Scholar
White, M., Vendome, C., Linares-Vásquez, M., Poshyvanyk, D.: Toward deep learning software repositories. In: Proceedings of the 12th Working Conference on Mining Software Repositories, pp. 334–345. IEEE Press (2015)
Google Scholar
Zaitsev, O.: Aspects of software naturalness through the generation of identifier names. Master’s thesis, Ukrainian Catholic University, Faculty of Applied Sciences, Department of Computer Sciences, Lviv, Ukraine (January 2019). http://er.ucu.edu.ua/handle/1/1338. under sup. of Stéphane Ducasse and Alexandre Bergel
Zaitsev, O., Ducasse, S., Anquetil, N.: Characterizing pharo code: a technical report. Technical report, Inria Lille Nord Europe - Laboratoire CRIStAL - Université de Lille; Arolla (January 2020). https://hal.inria.fr/hal-02440055

Download references

Acknowledgements

This work is based on the Master’s thesis of Oleksandr Zaitsev defended at the Ukrainian Catholic University [25]. Oleksandr would like to thank the University of Chile, Inria Lille, Pharo Association, and Arolla for financial support. Alexandre Bergel thanks the financial sponsor of Lam Research and project FONDECYT Regular 1200067.

Author information

Authors and Affiliations

Inria, Univ. Lille, CNRS, Centrale Lille, UMR 9189 - CRIStAL, Lille, France
Oleksandr Zaitsev & Stephane Ducasse
Arolla, Paris, France
Oleksandr Zaitsev & Mathieu Eveillard
ISCLab, Department of Computer Science (DCC), University of Chile, Santiago, Chile
Alexandre Bergel

Authors

Oleksandr Zaitsev
View author publications
You can also search for this author in PubMed Google Scholar
Stephane Ducasse
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Bergel
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu Eveillard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oleksandr Zaitsev .

Editor information

Editors and Affiliations

Brunel University, London, UK
Martin Shepperd
Lisbon University Institute, Lisbon, Portugal
Fernando Brito e Abreu
University of Lisbon, Lisbon, Portugal
Alberto Rodrigues da Silva
University of Castilla-La Mancha, Talavera de la Reina, Spain
Ricardo Pérez-Castillo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zaitsev, O., Ducasse, S., Bergel, A., Eveillard, M. (2020). Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Learning Approaches. In: Shepperd, M., Brito e Abreu, F., Rodrigues da Silva, A., Pérez-Castillo, R. (eds) Quality of Information and Communications Technology. QUATIC 2020. Communications in Computer and Information Science, vol 1266. Springer, Cham. https://doi.org/10.1007/978-3-030-58793-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-58793-2_8
Published: 31 August 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58792-5
Online ISBN: 978-3-030-58793-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Learning Approaches

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Accurate module name prediction using similarity based and sequence generation models

Using LSTMs to Model the Java Programming Language

Semantic Code Search in Software Repositories using Neural Machine Translation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Learning Approaches

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Accurate module name prediction using similarity based and sequence generation models

Using LSTMs to Model the Java Programming Language

Semantic Code Search in Software Repositories using Neural Machine Translation

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation