Mining Source Code for Snippet Reuse

Diamantopoulos, Themistoklis; Symeonidis, Andreas L.

doi:10.1007/978-3-030-30106-4_7

Themistoklis Diamantopoulos¹⁴ &
Andreas L. Symeonidis¹⁴

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

520 Accesses

Abstract

As developers rely more and more on reusing components from online sources, an important challenge is that of finding snippets in order to integrate these components and/or to address common programming problems. Thus, several snippet mining systems have been developed, which however have important limitations. API usage mining systems require the developer to know which library to use beforehand, while more generic snippet mining systems usually output a list of examples, without distinguishing among different implementations and without assessing the quality and the reusability of the proposed snippets. In this chapter, we present CodeCatch, a system that receives queries in natural language and assesses the retrieved snippets both for their quality and for their preference by the developers. Furthermore, our system clusters the snippets according to their API calls, thus allowing the developer to select among the different implementations. We provide an example usage scenario for CodeCatch and evaluate it in a set of programming queries to illustrate how it can be useful for the developer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
As already mentioned in the previous chapters, the Google Code Search Engine resided in http://www.google.com/codesearch, however, the service was discontinued in 2013.
2.
https://scrapy.org/.
3.
As a side note, this local index is only used to assess the reusability of the components; our search, however, is not limited within it (as is the case with other systems) as we employ a search engine and crawl multiple pages. To further ensure that our reusability evaluator is always up-to-date, we could rebuild its index along with the rebuild cycles of the index of AGORA.
4.
https://gist.github.com/jaysridhar/d61ea9cbede617606256933378d71751.

References

**e T, Pei J (2006) MAPO: mining API usages from open source repositories. In: Proceedings of the 2006 international workshop on mining software repositories, MSR ’06, pp 54–57, New York, NY, USA. ACM
Google Scholar
Wang J, Dang Y, Zhang H, Chen K, **e T, Zhang D (2013) Mining succinct and high-coverage API usage patterns from source code. In: Proceedings of the 10th working conference on mining software repositories, pp 319–328, Piscataway, NJ, USA. IEEE Press
Google Scholar
Fowkes J, Sutton C (2016) Parameter-free probabilistic API mining across GitHub. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 254–265, New York, NY, USA. ACM
Google Scholar
Montandon JE, Borges H, Felix D, Valente MT (2013) Documenting APIs with examples: lessons learned with the APIMiner platform. In: 2013 20th working conference on reverse engineering (WCRE), pp 401–408, Piscataway, NJ, USA. IEEE Computer Society
Google Scholar
Buse RPL, Weimer W (2012) Synthesizing API usage examples. In: Proceedings of the 34th international conference on software engineering, ICSE ’12, pp 782–792, Piscataway, NJ, USA. IEEE Press
Google Scholar
Kim J, Lee S, Hwang SW, Kim S (2010) Towards an intelligent code search engine. In: Proceedings of the Twenty-Fourth AAAI conference on artificial intelligence, AAAI’10, pp 1358–1363, Palo Alto, CA, USA. AAAI Press
Google Scholar
Wightman D, Ye Z, Brandt J, Vertegaal R (2012) SnipMatch: using source code context to enhance snippet retrieval and parameterization. In: Proceedings of the 25th annual ACM symposium on user interface software and technology, UIST ’12, pp 219–228, New York, NY, USA. ACM
Google Scholar
Brandt J, Dontcheva M, Weskamp M, Klemmer SR (2010) Example-centric programming: integrating web search into the development environment. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’10, pp 513–522, New York, NY, USA. ACM
Google Scholar
Wei Y, Chandrasekaran N, Gulwani S, Hamadi Y (2015) Building bing developer assistant. Technical Report MSR-TR-2015-36, Microsoft Research
Google Scholar
Mandelin D, Lin X, Bodík R, Kimelman D (2005) Jungloid mining: hel** to navigate the API jungle. SIGPLAN Not 40(6):48–61
Article Google Scholar
Thummalapenta S, **e T (2007) PARSEWeb: a programmer assistant for reusing open source code on the web. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering, ASE ’07, pp 204–213, New York, NY, USA. ACM
Google Scholar
Wang J, Han J (2004) BIDE: efficient mining of frequent closed sequences. In: Proceedings of the 20th international conference on data engineering, ICDE ’04, pp 79–90, Washington, DC, USA. IEEE Computer Society
Google Scholar
Katirtzis N, Diamantopoulos T, Sutton C (2018) Summarizing software API usage examples using clustering techniques. In: 21th international conference on fundamental approaches to software engineering, pp 189–206, Cham. Springer International Publishing
Google Scholar
Jiang L, Misherghi G, Su Z, Glondu S (2007) DECKARD: scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th international conference on software engineering, ICSE ’07, pp 96–105, Washington, DC, USA. IEEE Computer Society
Google Scholar
Papamichail M, Diamantopoulos T, Symeonidis AL (2016) User-perceived source code quality estimation based on static analysis metrics. In: Proceedings of the 2016 IEEE international conference on software quality, reliability and security, QRS, pp 100–107, Vienna, Austria
Google Scholar
Aggarwal K, Hindle A, Stroulia E (2014) Co-evolution of project documentation and popularity within Github. In: Proceedings of the 11th working conference on mining software repositories, MSR 2014, pp 360–363, New York, NY, USA. ACM
Google Scholar
Diamantopoulos T, Symeonidis AL (2015) Employing source code information to improve question-answering in stack overflow. In: Proceedings of the 12th working conference on mining software repositories, MSR ’15, pp 454–457, Piscataway, NJ, USA. IEEE Press
Google Scholar
Dimaridou V, Kyprianidis A-C, Papamichail M, Diamantopoulos T, Symeonidis A (2017) Towards modeling the user-perceived quality of source code using static analysis metrics. In: Proceedings of the 12th international conference on software technologies - Volume 1, ICSOFT, pp 73–84, Setubal, Portugal. INSTICC, SciTePress
Google Scholar
Buse RPL, Weimer WR (2010) Learning a metric for code readability. IEEE Trans Softw Eng 36(4):546–558
Google Scholar
Aggarwal CC, Zhai C (2012) A survey of text clustering algorithms, pp 77–128. Springer, Boston
Google Scholar

Download references

Author information

Authors and Affiliations

Thessaloniki, Greece
Themistoklis Diamantopoulos & Andreas L. Symeonidis

Authors

Themistoklis Diamantopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Andreas L. Symeonidis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Themistoklis Diamantopoulos .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Diamantopoulos, T., Symeonidis, A.L. (2020). Mining Source Code for Snippet Reuse. In: Mining Software Engineering Data for Software Reuse. Advanced Information and Knowledge Processing. Springer, Cham. https://doi.org/10.1007/978-3-030-30106-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-30106-4_7
Published: 31 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30105-7
Online ISBN: 978-3-030-30106-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics