Theoretical Background and State-of-the-Art

  • Chapter
  • First Online:
Mining Software Engineering Data for Software Reuse

Abstract

This chapter provides an overview of the background knowledge that is relevant to the main areas of application of this book. The areas of software engineering, software reuse, and software quality are discussed in the context of taking advantage of useful data in order to improve the software development process. Upon providing the relevant definitions and outlining the data and metrics provided as part of software development, we discuss how data mining techniques can be applied to software engineering data and illustrate the reuse potential that is provided in an integrated manner.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    According to Sommerville [1], the term was first proposed in 1968 at a conference held by NATO. Nowadays, we may use the definition by IEEE [2], which describes software engineering as “the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software; that is, the application of engineering to software”.

  2. 2.

    In the context of this book, software reuse can be defined as “the systematic application of existing software artifacts during the process of building a new software system, or the physical incorporation of existing software artifacts in a new software system” [5]. Note that the term “artifact” corresponds to any piece of software engineering-related data and covers not only source code components, but also software requirements, documentation, etc.

  3. 3.

    A formalized alternative is that of reuse-oriented software engineering, which can be defined as defining requirements, determining whether some can be covered by reusing existing components, and finally designing and develo** the system with reuse in mind (i.e., including designing component inputs/outputs and integrating them). However, this is a development style already followed by several development teams as part of their process, regardless of its traditional or modern nature. As a result, we argue that there is no need to force development teams to use a different methodology and do not focus on the reuse-oriented perspective as a formalized way of develo** software. Instead, we present here a set of methodologies and indicate the potential from using software engineering data in order to actually incorporate reuse as a philosophy of writing software.

  4. 4.

    There are several open data repositories, one of the most prominent being the EU Open Data Portal (https://data.europa.eu/euodp/en/home), which contains more than 14000 public datasets provided by different publishers.

  5. 5.

    Note that we focus here mainly on software engineering data that are available online. There are also other resources worth mining, which form what we may call the landscape of information that a developer is faced with when joining a project, a term introduced in [15]. In specific, apart from the source code of the project, the developer has to familiarize himself/herself with the architecture of the project, the software process followed, the dependent APIs, the development environment, etc.

  6. 6.

    https://stackoverflow.com/.

  7. 7.

    The term is attributed to Jeff Atwood, the one of the two creators of Stack Overflow, the other being Joel Spolsky [19, 20].

  8. 8.

    https://stackexchange.com/sites.

  9. 9.

    https://softwareengineering.stackexchange.com/.

  10. 10.

    https://pm.stackexchange.com/.

  11. 11.

    https://sqa.stackexchange.com/.

  12. 12.

    https://www.quora.com/.

  13. 13.

    https://www.reddit.com/.

  14. 14.

    https://www.reddit.com/r/programming/.

  15. 15.

    https://www.reddit.com/r/softwaredevelopment/.

  16. 16.

    https://www.reddit.com/r/softwarearchitecture/.

  17. 17.

    https://www.codeproject.com/.

  18. 18.

    http://central.sonatype.org/.

  19. 19.

    https://www.npmjs.com/.

  20. 20.

    https://pypi.python.org/pypi.

  21. 21.

    https://search.maven.org/stats.

  22. 22.

    https://www.programmableweb.com/.

  23. 23.

    https://www.bugzilla.org/.

  24. 24.

    https://bugzilla.mozilla.org/.

  25. 25.

    https://bugs.eclipse.org/bugs/.

  26. 26.

    https://bugzilla.kernel.org/.

  27. 27.

    http://boa.cs.iastate.edu/.

  28. 28.

    http://ghtorrent.org/.

  29. 29.

    Indicatively, we may refer to an excerpt of the rather abstract “definition” provided by Robert Persig [22]: “Quality... you know what it is, yet you don’t know what it is”.

  30. 30.

    The SWEBOK, short for Software Engineering Body of Knowledge, is an international ISO standard that has been created by the cooperative efforts of software engineering professionals and is published by the IEEE Computer Society. The standard aspires to summarize the basic knowledge of software engineering and include reference lists for its different concepts.

  31. 31.

    A rather important note is that any metric should be used when it fits the context and when its value indeed is meaningful. A popular quote, often attributed to Microsoft co-founder and former chairman Bill Gates, denotes that “Measuring programming progress by lines of code is like measuring aircraft building progress by weight”.

  32. 32.

    This original definition of LCOM has been considered rather hard to compute, as its maximum value depends on the number of method pairs. Another computation formula was proposed later by Henderson-Sellers [36], usually referred as LCOM3 (and the original metric then referred as LCOM1), which defines LCOM as \((M - A / V) / (M - 1)\), where M is the number of methods and A is the number of accesses for the V class variables.

  33. 33.

    The interested reader is further referred to [39] for a comprehensive review of object-oriented static analysis metrics.

  34. 34.

    There are several other metrics in this field; for an outline of popular metrics, the reader is referred to [40].

  35. 35.

    On the other hand, static analysis is usually focused on the characteristics of maintainability and usability.

  36. 36.

    MTTF is actually the reciprocal of ROCOF.

  37. 37.

    In fact, the original publication by Albrecht [44] indicated that function-oriented and size-oriented metrics were highly correlated; however, the rise of different programming languages and models quickly proved that their differences are quite substantial [45].

  38. 38.

    There are of course other more focused interpretations, such as the one provided in [50], which defines testing as “the process of executing a program with the intent of finding errors”. As, however, times change, testing has grown to be a major process that can add actual value to the software, and thus it is not limited to error discovery. An interesting view on this matter is provided by Beizer [51], who claims that different development teams have different levels of maturity concerning the purpose of testing, with others defining it as equivalent to that of debugging, others viewing it as a way to show that the software works, others using it to reduce product quality risks, etc.

  39. 39.

    An alternative definition provided by Hand et al. [58] involves the notions of “finding unsuspected relationships” and “summarizing data in novel ways”. These phrases indicate the real challenges that differentiate data mining from traditional data analysis.

  40. 40.

    Another possible distinction is that of the type of the input data used, which, according to **e et al. [68], fall into three large categories: sequences (e.g., execution traces), graphs (e.g., call graphs), and text (e.g., documentation).

  41. 41.

    We analyze here certain popular directions; however, we obviously do not exhaust all of them. An indicative list of applications of source code analysis for the interested reader can be found at [77].

References

  1. Sommerville I (2010) Software engineering, 9th edn. Addison-Wesley, Harlow

    MATH  Google Scholar 

  2. IEEE (1993) IEEE standards collection: software engineering, IEEE standard 610.12-1990. Technical report, IEEE

    Google Scholar 

  3. Pressman R, Maxim B (2019) Software engineering: a practitioner’s approach, 9th edn. McGraw-Hill Inc, New York

    Google Scholar 

  4. Pfleeger SL, Atlee JM (2009) Software engineering: theory and practice, 4th edn. Pearson, London

    Google Scholar 

  5. Dusink L, van Katwijk J (1995) Reuse dimensions. SIGSOFT Softw Eng Notes, 20(SI):137–149

    Google Scholar 

  6. Sametinger J (1997) Software engineering with reusable components. Springer-Verlag New York Inc, New York

    Book  Google Scholar 

  7. Krueger CW (1992) Software reuse. ACM Comput Surv 24(2):131–183

    Google Scholar 

  8. Capilla R, Gallina B, Cetina C, Favaro J (2019) Opportunities for software reuse in an uncertain world: from past to emerging trends. J Softw: Evol Process 31(8):e2217

    Google Scholar 

  9. Arango GF (1988) Domain engineering for software reuse. PhD thesis, University of California, Irvine. AAI8827979

    Google Scholar 

  10. Prieto-Diaz R, Arango G (1991) Domain analysis and software systems modeling. IEEE Computer Society Press, Washington

    Google Scholar 

  11. Frakes W, Prieto-Diaz R, Fox C (1998) DARE: domain analysis and reuse environment. Ann Softw Eng 5:125–141

    Article  Google Scholar 

  12. Czarnecki K, Østerbye K, Völter M (2002) Generative programming. In: Object-oriented technology ECOOP 2002 workshop reader, Berlin, Heidelberg. Springer, Berlin, Heidelberg, pp 15–29

    Google Scholar 

  13. Sillitti A, Vernazza T, Succi G (2002) Service oriented programming: a new paradigm of software reuse. In: Proceedings of the 7th international conference on software reuse: methods, techniques, and tools (ICSR-7), Berlin, Heidelberg. Springer, pp 269–280

    Google Scholar 

  14. Robillard MP, Maalej W, Walker RJ, Zimmermann T (2014) Recommendation Systems in Software Engineering. Springer Publishing Company, Incorporated, Berlin

    Google Scholar 

  15. Dagenais B, Ossher H, Bellamy RKE, Robillard MP, de Vries JP (2010) Moving into a new software project landscape. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering - Volume 1 (ICSE’10), New York, NY, USA. ACM, pp 275–284

    Google Scholar 

  16. Sim SE, Gallardo-Valencia RE (2015) Finding source code on the web for remix and reuse. Springer Publishing Company, Incorporated, Berlin

    Google Scholar 

  17. Brandt J, Guo PJ, Lewenstein J, Dontcheva M, Klemmer SR (2009) Two studies of opportunistic programming: interleaving web foraging, learning, and writing code. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI’09), New York, NY, USA. ACM, pp 1589–1598

    Google Scholar 

  18. Li H, **ng Z, Peng X, Zhao W (2013) What help do developers seek, when and how? In: 2013 20th working conference on reverse engineering (WCRE), pp 142–151

    Google Scholar 

  19. Atwood J (2008) Introducing stackoverflow.com. https://blog.codinghorror.com/introducing-stackoverflow-com/. Accessed Nov 2017

  20. Franck S (2008) None of us is as dumb as all of us. https://blog.codinghorror.com/stack-overflow-none-of-us-is-as-dumb-as-all-of-us/. Accessed Nov 2017

  21. Garvin DA (1984) What does ‘product quality’ really mean? MIT Sloan Manag Rev 26(1)

    Google Scholar 

  22. Pirsig RM (1974) Zen and the art of motorcycle maintenance: an inquiry into values. HarperTorch, New York

    Google Scholar 

  23. Suryn W (2014) Software quality engineering: a practitioner’s approach. Wiley-IEEE Press, Hoboken

    Google Scholar 

  24. Pfleeger SL, Kitchenham B (1996) Software quality: the elusive target. IEEE Softw:12–21

    Google Scholar 

  25. Mccall JA, Richards PK, Walters GF (1977) Factors in software quality. Volume I. Concepts and definitions of software quality. Technical report ADA049014, General Electric Co, Sunnyvale, CA

    Google Scholar 

  26. Mccall JA, Richards PK, Walters GF (1977) Factors in software quality. Volume II. Metric data collection and validation. Technical report ADA049014, General Electric Co, Sunnyvale, CA

    Google Scholar 

  27. Mccall JA, Richards PK, Walters GF (1977) Factors in software quality. Volume III. Preliminary handbook on software quality for an acquisiton manager. Technical report ADA049014, General Electric Co, Sunnyvale, CA

    Google Scholar 

  28. ISO/IEC (1991) ISO/IEC 9126:1991. Technical report, ISO/IEC

    Google Scholar 

  29. ISO/IEC 25010:2011 (2011) https://www.iso.org/standard/35733.html. Accessed Nov 2017

  30. Kan SH (2002) Metrics and models in software quality engineering, 2nd edn. Addison-Wesley Longman Publishing Co, Inc, Boston

    Google Scholar 

  31. Fenton N, Bieman J (2014) Software metrics: a rigorous and practical approach, 3rd edn. CRC Press Inc, Boca Raton

    Google Scholar 

  32. Bourque P, Fairley RE (eds) (2014) SWEBOK: guide to the software engineering body of knowledge, version 3.0 edition. IEEE Computer Society, Los Alamitos, CA

    Google Scholar 

  33. Thomas JM (1976) A complexity measure. In: Proceedings of the 2nd international conference on software engineering (ICSE’76), Los Alamitos, CA, USA. IEEE Computer Society Press, p 407

    Google Scholar 

  34. Halstead MH (1977) Elements of software science (operating and programming systems series). Elsevier Science Inc, New York

    Google Scholar 

  35. Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493

    Google Scholar 

  36. Henderson-Sellers B (1996) Object-oriented metrics: measures of complexity. Prentice-Hall Inc, Upper Saddle River

    Google Scholar 

  37. e Abreu FB, Carapuça R (1994) Candidate metrics for object-oriented software within a taxonomy framework. J Syst Softw 26(1):87–96

    Google Scholar 

  38. Lorenz M, Kidd J (1994) Object-oriented software metrics: a practical guide. Prentice-Hall Inc, Upper Saddle River

    Google Scholar 

  39. Srinivasan KP, Devi T (2014) A comprehensive review and analysis on object-oriented software metrics in software measurement. Int J Comput Sci Eng 6(7):247

    Google Scholar 

  40. Chhabra JK, Gupta V (2010) A survey of dynamic software metrics. J Comput Sci Technol 25(5):1016–1029

    Google Scholar 

  41. Yacoub SM, Ammar HH, Robinson T (1999) Dynamic metrics for object oriented designs. In: Proceedings of the 6th international symposium on software metrics (METRICS’99), Washington, DC, USA. IEEE Computer Society, p 50

    Google Scholar 

  42. Arisholm E, Briand LC, Foyen A (2004) Dynamic coupling measurement for object-oriented software. IEEE Trans Softw Eng 30(8):491–506

    Article  Google Scholar 

  43. Schneidewind N (2009) Systems and software engineering with applications. Institute of Electrical and Electronics Engineers, New York

    Google Scholar 

  44. Albrecht AJ (1979) Measuring application development productivity. In: I. B. M. Press (ed) Proceedings of the IBM application development symposium, pp 83–92

    Google Scholar 

  45. Jones TC (1998) Estimating software costs. McGraw-Hill, Inc, Hightstown

    Google Scholar 

  46. Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on software engineering (ICSE’08), New York, NY, USA. ACM, pp 181–190

    Google Scholar 

  47. Hassan AE (2009) Predicting faults using the complexity of code changes. In: Proceedings of the 31st international conference on software engineering (ICSE’09), Washington, DC, USA. IEEE Computer Society, pp 78–88

    Google Scholar 

  48. Spinellis D (2006) Code quality: the open source perspective. Effective software development series. Addison-Wesley Professional, Boston

    Google Scholar 

  49. BW Boehm (1981) Software engineering economics, 1st edn. Prentice Hall PTR, Upper Saddle River

    Google Scholar 

  50. Myers GJ, Sandler C, Badgett T (2011) The art of software testing, 3rd edn. Wiley Publishing, Hoboken

    Google Scholar 

  51. Beizer B (1990) Software testing techniques, 2nd edn. Van Nostrand Reinhold Co, New York

    MATH  Google Scholar 

  52. Copeland L (2003) A practitioner’s guide to software test design. Artech House Inc, Norwood

    MATH  Google Scholar 

  53. Beck K (2002) Test driven development: by example. Addison-Wesley Longman Publishing Co, Inc, Boston

    Google Scholar 

  54. Beck K (2000) Extreme programming explained: embrace change. Addison-Wesley Longman Publishing Co, Inc, Boston

    Google Scholar 

  55. McConnell S (2004) Code complete, 2nd edn. Microsoft Press, Redmond

    Google Scholar 

  56. Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann Publishers Inc, San Francisco

    Google Scholar 

  57. Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co, Inc, Boston

    Google Scholar 

  58. Hand DJ, Smyth P, Mannila H (2001) Principles of data mining. MIT Press, Cambridge

    Google Scholar 

  59. Mitchell TM (1997) Machine learning, 1st edn. McGraw-Hill Inc, New York

    Google Scholar 

  60. Bishop CM (2006) Pattern recognition and machine learning. Springer-Verlag New York Inc, Secaucus

    Google Scholar 

  61. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York

    Google Scholar 

  62. Engelbrecht AP (2007) Computational intelligence: an introduction, 2nd edn. Wiley Publishing, Hoboken

    Google Scholar 

  63. Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Morgan Kaufmann Publishers Inc, San Francisco

    MATH  Google Scholar 

  64. Delen D, Demirkan H (2013) Data, information and analytics as services. Decis Support Syst 55(1):359–363

    Article  Google Scholar 

  65. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer New York Inc, New York

    Google Scholar 

  66. Webb AR (2002) Statistical pattern recognition, 2 edn. Wiley, Hoboken

    Google Scholar 

  67. Urbanowicz RJ, Browne WN (2017) Introduction to learning classifier systems. Springerbriefs in intelligent systems. Springer New York Inc, New York

    Google Scholar 

  68. **e T, Thummalapenta S, Lo D, Liu C (2009) Data mining for software engineering. Computer 42(8):55–62

    Google Scholar 

  69. Monperrus M (2013) Data-mining for software engineering. https://www.monperrus.net/martin/data-mining-software-engineering. Accessed Nov 2017

  70. Halkidi M, Spinellis D, Tsatsaronis G, Vazirgiannis M (2011) Data mining in software engineering. Intell Data Anal 15(3):413–441

    Article  Google Scholar 

  71. Kumar M, Ajmeri N, Ghaisas S (2010) Towards knowledge assisted agile requirements evolution. In: Proceedings of the 2nd international workshop on recommendation systems for software engineering (RSSE’10), New York, NY, USA. ACM, pp 16–20

    Google Scholar 

  72. Ghaisas S, Ajmeri N (2013) Knowledge-assisted ontology-based requirements evolution. In: Maalej W, Thurimella AK (eds) Managing requirements knowledge, pp 143–167. Springer, Berlin

    Google Scholar 

  73. Chen K, Zhang W, Zhao H, Mei H (2005) An approach to constructing feature models based on requirements clustering. In: Proceedings of the 13th IEEE international conference on requirements engineering (RE’05), Washington, DC, USA. IEEE Computer Society, pp 31–40

    Google Scholar 

  74. Alves V, Schwanninger C, Barbosa L, Rashid A, Sawyer P, Rayson P, Pohl C, Rummler A (2008) An exploratory study of information retrieval techniques in domain analysis. In: Proceedings of the 2008 12th international software product line conference (SPLC’08), Washington, DC, USA. IEEE Computer Society, pp 67–76

    Google Scholar 

  75. Felfernig A, Schubert M, Mandl M, Ricci F, Maalej W (2010) Recommendation and decision technologies for requirements engineering. In: Proceedings of the 2nd international workshop on recommendation systems for software engineering (RSSE’10), New York, NY, USA. ACM, pp 11–15

    Google Scholar 

  76. Maalej W, Thurimella AK (2009) Towards a research agenda for recommendation systems in requirements engineering. In: Proceedings of the 2009 2nd international workshop on managing requirements knowledge (MARK’09), Washington, DC, USA. IEEE Computer Society, pp 32–39

    Google Scholar 

  77. Binkley D (2007) Source code analysis: a road map. In: 2007 Future of software engineering (FOSE’07), Washington, DC, USA. IEEE Computer Society, pp 104–119

    Google Scholar 

  78. Janjic W, Hummel O, Schumacher M, Atkinson C (2013) An unabridged source code dataset for research in software reuse. In: Proceedings of the 10th working conference on mining software repositories (MSR’13), Piscataway, NJ, USA. IEEE Press, pp 339–342

    Google Scholar 

  79. Bajracharya S, Ngo T, Linstead E, Dou Y, Rigor P, Baldi P, Lopes C (2006) Sourcerer: a search engine for open source code supporting structure-based search. In: Companion to the 21st ACM SIGPLAN symposium on object-oriented programming systems, languages, and applications (OOPSLA’06), New York, NY, USA. ACM, pp 681–682

    Google Scholar 

  80. Linstead E, Bajracharya S, Ngo T, Rigor P, Lopes C, Baldi P (2009) Sourcerer: mining and searching internet-scale software repositories. Data Min Knowl Discov 18(2):300–336

    Article  MathSciNet  Google Scholar 

  81. Holmes R, Murphy GC (2005) Using structural context to recommend source code examples. In: Proceedings of the 27th international conference on software engineering (ICSE’05), New York, NY, USA. ACM, pp 117–125

    Google Scholar 

  82. Mandelin D, Lin X, Bodík R, Kimelman D (2005) Jungloid mining: hel** to navigate the API jungle. SIGPLAN Not 40(6):48–61

    Article  Google Scholar 

  83. Thummalapenta S, **e T (2007) PARSEWeb: a programmer assistant for reusing open source code on the web. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering (ASE’07), New York, NY, USA. ACM, pp 204–213

    Google Scholar 

  84. Hummel O, Janjic W, Atkinson C (2008) Code conjurer: pulling reusable software out of thin air. IEEE Softw 25(5):45–52

    Article  Google Scholar 

  85. Lazzarini Lemos OA, Bajracharya SK, Ossher J (2007) CodeGenie: a tool for test-driven source code search. In: Companion to the 22nd ACM SIGPLAN conference on object-oriented programming systems and applications companion (OOPSLA’07), New York, NY, USA. ACM, pp 917–918

    Google Scholar 

  86. Reiss SP (2009) Semantics-based code search. In: Proceedings of the 31st international conference on software engineering (ICSE’09), Washington, DC, USA. IEEE Computer Society, pp 243–253

    Google Scholar 

  87. Hummel O, Atkinson C (2004) Extreme harvesting: test driven discovery and reuse of software components. In: Proceedings of the 2004 IEEE international conference on information reuse and integration (IRI 2004), pp 66–72

    Google Scholar 

  88. Nurolahzade M, Walker RJ, Maurer F (2013) An assessment of test-driven reuse: promises and pitfalls. In: Favaro J, Morisio M (eds) Safe and secure software reuse. Lecture notes in computer science, vol 7925. Springer, Berlin, pp 65–80

    Google Scholar 

  89. **e T, Pei J (2006) MAPO: mining API usages from open source repositories. In: Proceedings of the 2006 international workshop on mining software repositories (MSR’06), New York, NY, USA. ACM, pp 54–57

    Google Scholar 

  90. Sahavechaphan N, Claypool K (2006) XSnippet: mining for sample code. SIGPLAN Not 41(10):413–430

    Article  Google Scholar 

  91. Zagalsky A, Barzilay O, Yehudai A (2012) Example overflow: using social media for code recommendation. In: Proceedings of the 3rd international workshop on recommendation systems for software engineering (RSSE’12), Piscataway, NJ, USA. IEEE Press, pp 38–42

    Google Scholar 

  92. Wightman D, Ye Z, Brandt J, Vertegaal R (2012) SnipMatch: using source code context to enhance snippet retrieval and parameterization. In: Proceedings of the 25th annual ACM symposium on user interface software and technology (UIST’12), New York, NY, USA. ACM, pp 219–228

    Google Scholar 

  93. Wei Y, Chandrasekaran N, Gulwani S, Hamadi Y (2015) Building bing developer assistant. Technical report MSR-TR-2015-36, Microsoft Research

    Google Scholar 

  94. Kagdi H, Collard ML, Maletic JI (2007) A survey and taxonomy of approaches for mining software repositories in the context of software evolution. J Softw Maint Evol 19(2):77–131

    Article  Google Scholar 

  95. Alves TL, Ypma C, Visser J (2010) Deriving metric thresholds from benchmark data. In: Proceedings of the IEEE international conference on software maintenance (ICSM). IEEE, pp 1–10

    Google Scholar 

  96. Ferreira KAM, Bigonha MAS, Bigonha RS, Mendes LFO, Almeida HC (2012) Identifying thresholds for object-oriented software metrics. J Syst Softw 85(2):244–257

    Google Scholar 

  97. Zhong S, Khoshgoftaar TM, Seliya N (2004) Unsupervised learning for expert-based software quality estimation. In: Proceedings of the 8th IEEE international conference on high assurance systems engineering (HASE’04), pp 149–155

    Google Scholar 

  98. Hovemeyer D, Spacco J, Pugh W (2005) Evaluating and tuning a static analysis to find null pointer bugs. SIGSOFT Softw Eng Notes 31(1):13–19

    Article  Google Scholar 

  99. Ayewah N, Hovemeyer D, Morgenthaler JD, Penix J, Pugh W (2008) Using static analysis to find bugs. IEEE Softw 25(5):22–29

    Google Scholar 

  100. Le Goues C, Weimer W (2012) Measuring code quality to improve specification mining. IEEE Trans Softw Eng 38(1):175–190

    Article  Google Scholar 

  101. Washizaki H, Namiki R, Fukuoka T, Harada Y, Watanabe H (2007) A framework for measuring and evaluating program source code quality. In: Proceedings of the 8th international conference on product-focused software process improvement (PROFES). Springer, pp 284–299

    Google Scholar 

  102. Cai T, Lyu MR, Wong K-F, Wong M (2001) ComPARE: a generic quality assessment environment for component-based software systems. In: Proceedings of the 2001 international symposium on information systems and engineering (ISE’2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Themistoklis Diamantopoulos .

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Diamantopoulos, T., Symeonidis, A.L. (2020). Theoretical Background and State-of-the-Art. In: Mining Software Engineering Data for Software Reuse. Advanced Information and Knowledge Processing. Springer, Cham. https://doi.org/10.1007/978-3-030-30106-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30106-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30105-7

  • Online ISBN: 978-3-030-30106-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation