Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Learning Approaches

  • Conference paper
  • First Online:
Quality of Information and Communications Technology (QUATIC 2020)

Abstract

Programming is a form of communication between the person who is writing code and the one reading it. Nevertheless, very often developers neglect readability, and even well-written code becomes less understandable as software evolves. Together with the growing complexity of software systems, this creates an increasing need for automated tools for improving the readability of source code. In this work, we focus on method names and study how a descriptive name can be automatically generated from a method’s body. We experiment with two approaches from the field of text summarization: One based on TF-IDF and the other on deep recurrent neural network. We collect a dataset of methods from 50 real world projects. We evaluate our approaches by comparing the generated names to the actual ones and report the result using Precision and Recall metrics. For TF-IDF, we get results as good as 28% precision and 45% recall; and for deep neural network, 46% precision and 32% recall.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    pharo.org/.

  2. 2.

    The actual meaning of the code is not important, but, double quotes delimit comments, pipes delimit local variables declaration, square brackets delimit lambda functions, and caret is a return.

  3. 3.
  4. 4.

    The probability that during training the word generated by the model is substituted by the word from a real name. It is used to make the training smoother.

  5. 5.

    Harmonic mean is more intuitive than the arithmetic mean when computing a mean of ratios.

  6. 6.

    The complete list of stop words that we used in this study can be found here: https://gist.github.com/olekscode/125804150f2a559a171bf695c0a3f809.

References

  1. Allamanis, M., Barr, E.T., Bird, C., Sutton, C.: Suggesting accurate method and class names. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, pp. 38–49. ACM (2015)

    Google Scholar 

  2. Allamanis, M., Barr, E.T., Devanbu, P., Sutton, C.: A survey of machine learning for big code and naturalness. ACM Comput. Surv. (CSUR) 51(4), 81 (2018)

    Article  Google Scholar 

  3. Allamanis, M., Peng, H., Sutton, C.: A convolutional attention network for extreme summarization of source code. In: International Conference on Machine Learning, pp. 2091–2100 (2016)

    Google Scholar 

  4. Alon, U., Zilberstein, M., Levy, O., Yahav, E.: code2vec: learning distributed representations of code. ar**v preprint ar**v:1803.09473 (2018)

  5. Bavishi, R., Pradel, M., Sen, K.: Context2name: a deep learning-based approach to infer natural variable names from usage contexts. ar**v preprint ar**v:1809.05193 (2018)

  6. Beck, K.: Test Driven Development: By Example. Addison-Wesley Longman (2002)

    Google Scholar 

  7. Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. ar**v preprint ar**v:1409.1259 (2014)

  8. Demeyer, S., Ducasse, S., Nierstrasz, O.: Object-Oriented Reengineering Patterns. Morgan Kaufmann, Burlington (2002)

    Google Scholar 

  9. Fowler, M., Beck, K., Brant, J., Opdyke, W., Roberts, D.: Refactoring: Improving the Design of Existing Code. Addison Wesley, Boston (1999)

    Google Scholar 

  10. Gabel, M., Su, Z.: A study of the uniqueness of source code. In: Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 147–156. ACM (2010)

    Google Scholar 

  11. Hindle, A., Barr, E.T., Gabel, M., Su, Z., Devanbu, P.: On the naturalness of software. Commun. ACM 59(5), 122–131 (2016)

    Article  Google Scholar 

  12. Hindle, A., Barr, E.T., Su, Z., Gabel, M., Devanbu, P.: On the naturalness of software. In: 2012 34th International Conference on Software Engineering (ICSE), pp. 837–847. IEEE (2012)

    Google Scholar 

  13. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  14. Iyer, S., Konstas, I., Cheung, A., Zettlemoyer, L.: Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 2073–2083 (2016)

    Google Scholar 

  15. Knuth, D.E.: Literate programming. Comput. J. 27(2), 97–111 (1984)

    Article  Google Scholar 

  16. Koenig, A.: Patterns and antipatterns. J. Object-Oriented Program. 8(1), 46–48 (1995)

    Google Scholar 

  17. Lehman, M., Belady, L.: Program Evolution: Processes of Software Change. London Academic Press, London (1985). ftp://ftp.umh.ac.be/pub/ftp_infofs/1985/ProgramEvolution.pdf

  18. Martin, R.C.: Clean Code: A Handbook of Agile Software Craftsmanship. Pearson Education, London (2009)

    Google Scholar 

  19. Ramos, J.: Using TF-IDF to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning, vol. 242, pp. 133–142 (2003)

    Google Scholar 

  20. Raychev, V., Vechev, M., Yahav, E.: Code completion with statistical language models. In: ACM SIGPLAN Notices, vol. 49, pp. 419–428. ACM (2014)

    Google Scholar 

  21. Rush, A.M., Harvard, S., Chopra, S., Weston, J.: A neural attention model for sentence summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, ACLWeb (2017)

    Google Scholar 

  22. Sasaki, Y., et al.: The truth of the F-measure. Teach Tutor Mater 1(5), 1–5 (2007)

    Google Scholar 

  23. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)

    Google Scholar 

  24. White, M., Vendome, C., Linares-VĂ¡squez, M., Poshyvanyk, D.: Toward deep learning software repositories. In: Proceedings of the 12th Working Conference on Mining Software Repositories, pp. 334–345. IEEE Press (2015)

    Google Scholar 

  25. Zaitsev, O.: Aspects of software naturalness through the generation of identifier names. Master’s thesis, Ukrainian Catholic University, Faculty of Applied Sciences, Department of Computer Sciences, Lviv, Ukraine (January 2019). http://er.ucu.edu.ua/handle/1/1338. under sup. of Stéphane Ducasse and Alexandre Bergel

  26. Zaitsev, O., Ducasse, S., Anquetil, N.: Characterizing pharo code: a technical report. Technical report, Inria Lille Nord Europe - Laboratoire CRIStAL - Université de Lille; Arolla (January 2020). https://hal.inria.fr/hal-02440055

Download references

Acknowledgements

This work is based on the Master’s thesis of Oleksandr Zaitsev defended at the Ukrainian Catholic University [25]. Oleksandr would like to thank the University of Chile, Inria Lille, Pharo Association, and Arolla for financial support. Alexandre Bergel thanks the financial sponsor of Lam Research and project FONDECYT Regular 1200067.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oleksandr Zaitsev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zaitsev, O., Ducasse, S., Bergel, A., Eveillard, M. (2020). Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Learning Approaches. In: Shepperd, M., Brito e Abreu, F., Rodrigues da Silva, A., PĂ©rez-Castillo, R. (eds) Quality of Information and Communications Technology. QUATIC 2020. Communications in Computer and Information Science, vol 1266. Springer, Cham. https://doi.org/10.1007/978-3-030-58793-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-58793-2_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58792-5

  • Online ISBN: 978-3-030-58793-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation