Source Code Indexing for Component Reuse

  • Chapter
  • First Online:
Mining Software Engineering Data for Software Reuse

Abstract

The momentum of the open-source community has been constantly increasing, thus leading to numerous tools for writing, maintaining, and sharing source code. Several code search engines have been developed to support development tasks and facilitate reuse either directly or by functioning as information sources for code recommenders. In this chapter, we present AGORA, a code search engine that facilitates reuse in component level, snippet level, and project level. Through its Elasticsearch index, AGORA fosters advanced queries (syntax-aware, regular expressions), while the engine also integrates with popular code hosting repositories and offers a well-designed API. We provide representative examples and a usage scenario to illustrate the functionality of AGORA, and perform a comparative analysis in a code reuse context, which indicates that AGORA provides an efficient alternative to current solutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 70.00
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 88.00
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
GBP 89.99
Price includes VAT (United Kingdom)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/.

  2. 2.

    https://sourceforge.net/.

  3. 3.

    http://stackoverflow.com/.

  4. 4.

    The term “agora” refers to the central spot of ancient Greek city-states. Although it is roughly translated to the English term “market”, “agora” can be better viewed as an assembly, a place where people met not only to trade goods, but also to exchange ideas. It is a good fit for our search engine since we envision it as a place where developers can (freely) distribute and exchange their source code, and subsequently their ideas.

  5. 5.

    Source: http://googleblog.blogspot.gr/2011/10/fall-sweep.html.

  6. 6.

    Source: http://techcrunch.com/2012/07/20/ohloh-wants-to-fill-the-gap-left-by-google-code-search/.

  7. 7.

    Elasticsearch supports removing stop words for several languages. In our case, the default choice of English is adequate.

  8. 8.

    Note that the standard analyzer initially splits the filename into two parts, the filename without extension and the extension since it splits according to punctuation.

  9. 9.

    One of 040000, 100644, 100664, 100755, 120000, or 160000 which correspond to directory, regular non-executable file, regular non-executable group-writeable file, regular executable file, symbolic link, or gitlink, respectively.

  10. 10.

    One of blob, tree, commit, or tag.

  11. 11.

    The CamelCase analyzer is quite effective for fields including Java types; the types are conventionally in camelCase, while the primitives are not affected by the CamelCase tokenizer, i.e., the text “float” results in the token “float”.

References

  1. Thummalapenta S, **e T (2007) PARSEWeb: a programmer assistant for reusing open source code on the web. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering, ASE ’07, New York, NY, USA. ACM, pp 204–213

    Google Scholar 

  2. **e T, Pei J (2006) MAPO: mining API usages from open source repositories. In: Proceedings of the 2006 international workshop on mining software repositories, MSR ’06, New York, NY, USA. ACM, pp 54–57

    Google Scholar 

  3. Hummel O, Janjic W, Atkinson C (2008) Code conjurer: pulling reusable software out of thin air. IEEE Softw 25(5):45–52

    Article  Google Scholar 

  4. Lazzarini Lemos OA, Bajracharya SK, Ossher J (2007) CodeGenie: a tool for test-driven source code search. In: Companion to the 22nd ACM SIGPLAN conference on object-oriented programming systems and applications companion, OOPSLA ’07, New York, NY, USA. ACM, pp 917–918

    Google Scholar 

  5. Diamantopoulos T, Symeonidis AL (2018) AGORA: a search engine for source code reuse. SoftwareX, page under review

    Google Scholar 

  6. Janjic W, Hummel O, Schumacher M, Atkinson C (2013) An unabridged source code dataset for research in software reuse. In: Proceedings of the 10th working conference on mining software repositories, MSR ’13, Piscataway, NJ, USA. IEEE Press, pp 339–342

    Google Scholar 

  7. Linstead E, Bajracharya S, Ngo T, Rigor P, Lopes C, Baldi P (2009) Sourcerer: mining and searching internet-scale software repositories. Data Min Knowl Discov 18(2):300–336

    Article  MathSciNet  Google Scholar 

  8. Elasticsearch: RESTful, distributed search & analytics (2016). https://www.elastic.co/products/elasticsearch. Accessed April 2016

  9. GitHub API, GitHub Developer (2016). https://developer.github.com/v3/. Accessed April 2016

  10. Java Compiler Tree API (2016). http://docs.oracle.com/javase/8/docs/jdk/api/javac/tree/index.html. Accessed April 2016

  11. Unicode Standard Annex #29 (2016) Unicode text segmentation. In: Davis M (ed) An integral part of The Unicode Standard. http://www.unicode.org/reports/tr29/. Accessed April 2016

  12. CamelCase tokenizer, pattern analyzer, analyzers, Elasticsearch analysis (2016). http://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-analyzer.html#_camelcase_tokenizer. Accessed April 2016

  13. Lucene’s practical scoring function, controlling relevance, search in depth, elasticsearch: The definitive guide (2016). http://www.elastic.co/guide/en/elasticsearch/guide/current/practical-scoring-function.html. Accessed April 2016

  14. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York

    Google Scholar 

  15. Gamma E, Vlissides J, Johnson R, Helm R (1998) Design patterns: elements of reusable object-oriented software. Addison-Wesley Longman Publishing Co. Inc, Boston

    MATH  Google Scholar 

  16. Papamichail M, Diamantopoulos T, Symeonidis AL (2016) User-perceived source code quality estimation based on static analysis metrics. In: Proceedings of the 2016 IEEE international conference on software quality, reliability and security, QRS, Vienna, Austria, pp 100–107

    Google Scholar 

  17. Aggarwal K, Hindle A, Stroulia E (2014) Co-evolution of project documentation and popularity within github. In: Proceedings of the 11th working conference on mining software repositories, MSR 2014, New York, NY, USA. ACM, pp 360–363

    Google Scholar 

  18. Weber S, Luo J (2014) What makes an open source code popular on GitHub? In: 2014 IEEE international conference on data mining workshop, ICDMW, pp 851–855

    Google Scholar 

  19. Borges H, Hora A, Valente MT (2016) Understanding the factors that impact the popularity of GitHub repositories. In: 2016 IEEE international conference on software maintenance and evolution (ICSME), ICSME, pp 334–344

    Google Scholar 

  20. Dimaridou V, Kyprianidis A-C, Papamichail M, Diamantopoulos T, Symeonidis A (2017) Towards modeling the user-perceived quality of source code using static analysis metrics. In: Proceedings of the 12th international conference on software technologies - volume 1, ICSOFT, Setubal, Portugal, 2017. INSTICC, SciTePress, pp 73–84

    Google Scholar 

  21. Diamantopoulos T, Thomopoulos K, Symeonidis AL (2016) QualBoa: reusability-aware recommendations of source code components. In: Proceedings of the IEEE/ACM 13th working conference on mining software repositories, MSR ’16, pp 488–491

    Google Scholar 

  22. Reiss SP (2009) Semantics-based code search. In: Proceedings of the 31st international conference on software engineering, ICSE ’09, Washington, DC, USA. IEEE Computer Society, pp 243–253

    Google Scholar 

  23. Sahavechaphan N, Claypool K (2006) XSnippet: mining for sample code. SIGPLAN Not. 41(10):413–430

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Themistoklis Diamantopoulos .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Diamantopoulos, T., Symeonidis, A.L. (2020). Source Code Indexing for Component Reuse. In: Mining Software Engineering Data for Software Reuse. Advanced Information and Knowledge Processing. Springer, Cham. https://doi.org/10.1007/978-3-030-30106-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30106-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30105-7

  • Online ISBN: 978-3-030-30106-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation