Mining Source Code for Snippet Reuse

  • Chapter
  • First Online:
Mining Software Engineering Data for Software Reuse

Abstract

As developers rely more and more on reusing components from online sources, an important challenge is that of finding snippets in order to integrate these components and/or to address common programming problems. Thus, several snippet mining systems have been developed, which however have important limitations. API usage mining systems require the developer to know which library to use beforehand, while more generic snippet mining systems usually output a list of examples, without distinguishing among different implementations and without assessing the quality and the reusability of the proposed snippets. In this chapter, we present CodeCatch, a system that receives queries in natural language and assesses the retrieved snippets both for their quality and for their preference by the developers. Furthermore, our system clusters the snippets according to their API calls, thus allowing the developer to select among the different implementations. We provide an example usage scenario for CodeCatch and evaluate it in a set of programming queries to illustrate how it can be useful for the developer.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    As already mentioned in the previous chapters, the Google Code Search Engine resided in http://www.google.com/codesearch, however, the service was discontinued in 2013.

  2. 2.

    https://scrapy.org/.

  3. 3.

    As a side note, this local index is only used to assess the reusability of the components; our search, however, is not limited within it (as is the case with other systems) as we employ a search engine and crawl multiple pages. To further ensure that our reusability evaluator is always up-to-date, we could rebuild its index along with the rebuild cycles of the index of AGORA.

  4. 4.

    https://gist.github.com/jaysridhar/d61ea9cbede617606256933378d71751.

References

  1. **e T, Pei J (2006) MAPO: mining API usages from open source repositories. In: Proceedings of the 2006 international workshop on mining software repositories, MSR ’06, pp 54–57, New York, NY, USA. ACM

    Google Scholar 

  2. Wang J, Dang Y, Zhang H, Chen K, **e T, Zhang D (2013) Mining succinct and high-coverage API usage patterns from source code. In: Proceedings of the 10th working conference on mining software repositories, pp 319–328, Piscataway, NJ, USA. IEEE Press

    Google Scholar 

  3. Fowkes J, Sutton C (2016) Parameter-free probabilistic API mining across GitHub. In: Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 254–265, New York, NY, USA. ACM

    Google Scholar 

  4. Montandon JE, Borges H, Felix D, Valente MT (2013) Documenting APIs with examples: lessons learned with the APIMiner platform. In: 2013 20th working conference on reverse engineering (WCRE), pp 401–408, Piscataway, NJ, USA. IEEE Computer Society

    Google Scholar 

  5. Buse RPL, Weimer W (2012) Synthesizing API usage examples. In: Proceedings of the 34th international conference on software engineering, ICSE ’12, pp 782–792, Piscataway, NJ, USA. IEEE Press

    Google Scholar 

  6. Kim J, Lee S, Hwang SW, Kim S (2010) Towards an intelligent code search engine. In: Proceedings of the Twenty-Fourth AAAI conference on artificial intelligence, AAAI’10, pp 1358–1363, Palo Alto, CA, USA. AAAI Press

    Google Scholar 

  7. Wightman D, Ye Z, Brandt J, Vertegaal R (2012) SnipMatch: using source code context to enhance snippet retrieval and parameterization. In: Proceedings of the 25th annual ACM symposium on user interface software and technology, UIST ’12, pp 219–228, New York, NY, USA. ACM

    Google Scholar 

  8. Brandt J, Dontcheva M, Weskamp M, Klemmer SR (2010) Example-centric programming: integrating web search into the development environment. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’10, pp 513–522, New York, NY, USA. ACM

    Google Scholar 

  9. Wei Y, Chandrasekaran N, Gulwani S, Hamadi Y (2015) Building bing developer assistant. Technical Report MSR-TR-2015-36, Microsoft Research

    Google Scholar 

  10. Mandelin D, Lin X, Bodík R, Kimelman D (2005) Jungloid mining: hel** to navigate the API jungle. SIGPLAN Not 40(6):48–61

    Article  Google Scholar 

  11. Thummalapenta S, **e T (2007) PARSEWeb: a programmer assistant for reusing open source code on the web. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering, ASE ’07, pp 204–213, New York, NY, USA. ACM

    Google Scholar 

  12. Wang J, Han J (2004) BIDE: efficient mining of frequent closed sequences. In: Proceedings of the 20th international conference on data engineering, ICDE ’04, pp 79–90, Washington, DC, USA. IEEE Computer Society

    Google Scholar 

  13. Katirtzis N, Diamantopoulos T, Sutton C (2018) Summarizing software API usage examples using clustering techniques. In: 21th international conference on fundamental approaches to software engineering, pp 189–206, Cham. Springer International Publishing

    Google Scholar 

  14. Jiang L, Misherghi G, Su Z, Glondu S (2007) DECKARD: scalable and accurate tree-based detection of code clones. In: Proceedings of the 29th international conference on software engineering, ICSE ’07, pp 96–105, Washington, DC, USA. IEEE Computer Society

    Google Scholar 

  15. Papamichail M, Diamantopoulos T, Symeonidis AL (2016) User-perceived source code quality estimation based on static analysis metrics. In: Proceedings of the 2016 IEEE international conference on software quality, reliability and security, QRS, pp 100–107, Vienna, Austria

    Google Scholar 

  16. Aggarwal K, Hindle A, Stroulia E (2014) Co-evolution of project documentation and popularity within Github. In: Proceedings of the 11th working conference on mining software repositories, MSR 2014, pp 360–363, New York, NY, USA. ACM

    Google Scholar 

  17. Diamantopoulos T, Symeonidis AL (2015) Employing source code information to improve question-answering in stack overflow. In: Proceedings of the 12th working conference on mining software repositories, MSR ’15, pp 454–457, Piscataway, NJ, USA. IEEE Press

    Google Scholar 

  18. Dimaridou V, Kyprianidis A-C, Papamichail M, Diamantopoulos T, Symeonidis A (2017) Towards modeling the user-perceived quality of source code using static analysis metrics. In: Proceedings of the 12th international conference on software technologies - Volume 1, ICSOFT, pp 73–84, Setubal, Portugal. INSTICC, SciTePress

    Google Scholar 

  19. Buse RPL, Weimer WR (2010) Learning a metric for code readability. IEEE Trans Softw Eng 36(4):546–558

    Google Scholar 

  20. Aggarwal CC, Zhai C (2012) A survey of text clustering algorithms, pp 77–128. Springer, Boston

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Themistoklis Diamantopoulos .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Diamantopoulos, T., Symeonidis, A.L. (2020). Mining Source Code for Snippet Reuse. In: Mining Software Engineering Data for Software Reuse. Advanced Information and Knowledge Processing. Springer, Cham. https://doi.org/10.1007/978-3-030-30106-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30106-4_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30105-7

  • Online ISBN: 978-3-030-30106-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation