Abstract
This chapter provides an overview of the background knowledge that is relevant to the main areas of application of this book. The areas of software engineering, software reuse, and software quality are discussed in the context of taking advantage of useful data in order to improve the software development process. Upon providing the relevant definitions and outlining the data and metrics provided as part of software development, we discuss how data mining techniques can be applied to software engineering data and illustrate the reuse potential that is provided in an integrated manner.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
According to Sommerville [1], the term was first proposed in 1968 at a conference held by NATO. Nowadays, we may use the definition by IEEE [2], which describes software engineering as “the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software; that is, the application of engineering to software”.
- 2.
In the context of this book, software reuse can be defined as “the systematic application of existing software artifacts during the process of building a new software system, or the physical incorporation of existing software artifacts in a new software system” [5]. Note that the term “artifact” corresponds to any piece of software engineering-related data and covers not only source code components, but also software requirements, documentation, etc.
- 3.
A formalized alternative is that of reuse-oriented software engineering, which can be defined as defining requirements, determining whether some can be covered by reusing existing components, and finally designing and develo** the system with reuse in mind (i.e., including designing component inputs/outputs and integrating them). However, this is a development style already followed by several development teams as part of their process, regardless of its traditional or modern nature. As a result, we argue that there is no need to force development teams to use a different methodology and do not focus on the reuse-oriented perspective as a formalized way of develo** software. Instead, we present here a set of methodologies and indicate the potential from using software engineering data in order to actually incorporate reuse as a philosophy of writing software.
- 4.
There are several open data repositories, one of the most prominent being the EU Open Data Portal (https://data.europa.eu/euodp/en/home), which contains more than 14000 public datasets provided by different publishers.
- 5.
Note that we focus here mainly on software engineering data that are available online. There are also other resources worth mining, which form what we may call the landscape of information that a developer is faced with when joining a project, a term introduced in [15]. In specific, apart from the source code of the project, the developer has to familiarize himself/herself with the architecture of the project, the software process followed, the dependent APIs, the development environment, etc.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.
- 27.
- 28.
- 29.
Indicatively, we may refer to an excerpt of the rather abstract “definition” provided by Robert Persig [22]: “Quality... you know what it is, yet you don’t know what it is”.
- 30.
The SWEBOK, short for Software Engineering Body of Knowledge, is an international ISO standard that has been created by the cooperative efforts of software engineering professionals and is published by the IEEE Computer Society. The standard aspires to summarize the basic knowledge of software engineering and include reference lists for its different concepts.
- 31.
A rather important note is that any metric should be used when it fits the context and when its value indeed is meaningful. A popular quote, often attributed to Microsoft co-founder and former chairman Bill Gates, denotes that “Measuring programming progress by lines of code is like measuring aircraft building progress by weight”.
- 32.
This original definition of LCOM has been considered rather hard to compute, as its maximum value depends on the number of method pairs. Another computation formula was proposed later by Henderson-Sellers [36], usually referred as LCOM3 (and the original metric then referred as LCOM1), which defines LCOM as \((M - A / V) / (M - 1)\), where M is the number of methods and A is the number of accesses for the V class variables.
- 33.
The interested reader is further referred to [39] for a comprehensive review of object-oriented static analysis metrics.
- 34.
There are several other metrics in this field; for an outline of popular metrics, the reader is referred to [40].
- 35.
On the other hand, static analysis is usually focused on the characteristics of maintainability and usability.
- 36.
MTTF is actually the reciprocal of ROCOF.
- 37.
- 38.
There are of course other more focused interpretations, such as the one provided in [50], which defines testing as “the process of executing a program with the intent of finding errors”. As, however, times change, testing has grown to be a major process that can add actual value to the software, and thus it is not limited to error discovery. An interesting view on this matter is provided by Beizer [51], who claims that different development teams have different levels of maturity concerning the purpose of testing, with others defining it as equivalent to that of debugging, others viewing it as a way to show that the software works, others using it to reduce product quality risks, etc.
- 39.
An alternative definition provided by Hand et al. [58] involves the notions of “finding unsuspected relationships” and “summarizing data in novel ways”. These phrases indicate the real challenges that differentiate data mining from traditional data analysis.
- 40.
Another possible distinction is that of the type of the input data used, which, according to **e et al. [68], fall into three large categories: sequences (e.g., execution traces), graphs (e.g., call graphs), and text (e.g., documentation).
- 41.
We analyze here certain popular directions; however, we obviously do not exhaust all of them. An indicative list of applications of source code analysis for the interested reader can be found at [77].
References
Sommerville I (2010) Software engineering, 9th edn. Addison-Wesley, Harlow
IEEE (1993) IEEE standards collection: software engineering, IEEE standard 610.12-1990. Technical report, IEEE
Pressman R, Maxim B (2019) Software engineering: a practitioner’s approach, 9th edn. McGraw-Hill Inc, New York
Pfleeger SL, Atlee JM (2009) Software engineering: theory and practice, 4th edn. Pearson, London
Dusink L, van Katwijk J (1995) Reuse dimensions. SIGSOFT Softw Eng Notes, 20(SI):137–149
Sametinger J (1997) Software engineering with reusable components. Springer-Verlag New York Inc, New York
Krueger CW (1992) Software reuse. ACM Comput Surv 24(2):131–183
Capilla R, Gallina B, Cetina C, Favaro J (2019) Opportunities for software reuse in an uncertain world: from past to emerging trends. J Softw: Evol Process 31(8):e2217
Arango GF (1988) Domain engineering for software reuse. PhD thesis, University of California, Irvine. AAI8827979
Prieto-Diaz R, Arango G (1991) Domain analysis and software systems modeling. IEEE Computer Society Press, Washington
Frakes W, Prieto-Diaz R, Fox C (1998) DARE: domain analysis and reuse environment. Ann Softw Eng 5:125–141
Czarnecki K, Østerbye K, Völter M (2002) Generative programming. In: Object-oriented technology ECOOP 2002 workshop reader, Berlin, Heidelberg. Springer, Berlin, Heidelberg, pp 15–29
Sillitti A, Vernazza T, Succi G (2002) Service oriented programming: a new paradigm of software reuse. In: Proceedings of the 7th international conference on software reuse: methods, techniques, and tools (ICSR-7), Berlin, Heidelberg. Springer, pp 269–280
Robillard MP, Maalej W, Walker RJ, Zimmermann T (2014) Recommendation Systems in Software Engineering. Springer Publishing Company, Incorporated, Berlin
Dagenais B, Ossher H, Bellamy RKE, Robillard MP, de Vries JP (2010) Moving into a new software project landscape. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering - Volume 1 (ICSE’10), New York, NY, USA. ACM, pp 275–284
Sim SE, Gallardo-Valencia RE (2015) Finding source code on the web for remix and reuse. Springer Publishing Company, Incorporated, Berlin
Brandt J, Guo PJ, Lewenstein J, Dontcheva M, Klemmer SR (2009) Two studies of opportunistic programming: interleaving web foraging, learning, and writing code. In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI’09), New York, NY, USA. ACM, pp 1589–1598
Li H, **ng Z, Peng X, Zhao W (2013) What help do developers seek, when and how? In: 2013 20th working conference on reverse engineering (WCRE), pp 142–151
Atwood J (2008) Introducing stackoverflow.com. https://blog.codinghorror.com/introducing-stackoverflow-com/. Accessed Nov 2017
Franck S (2008) None of us is as dumb as all of us. https://blog.codinghorror.com/stack-overflow-none-of-us-is-as-dumb-as-all-of-us/. Accessed Nov 2017
Garvin DA (1984) What does ‘product quality’ really mean? MIT Sloan Manag Rev 26(1)
Pirsig RM (1974) Zen and the art of motorcycle maintenance: an inquiry into values. HarperTorch, New York
Suryn W (2014) Software quality engineering: a practitioner’s approach. Wiley-IEEE Press, Hoboken
Pfleeger SL, Kitchenham B (1996) Software quality: the elusive target. IEEE Softw:12–21
Mccall JA, Richards PK, Walters GF (1977) Factors in software quality. Volume I. Concepts and definitions of software quality. Technical report ADA049014, General Electric Co, Sunnyvale, CA
Mccall JA, Richards PK, Walters GF (1977) Factors in software quality. Volume II. Metric data collection and validation. Technical report ADA049014, General Electric Co, Sunnyvale, CA
Mccall JA, Richards PK, Walters GF (1977) Factors in software quality. Volume III. Preliminary handbook on software quality for an acquisiton manager. Technical report ADA049014, General Electric Co, Sunnyvale, CA
ISO/IEC (1991) ISO/IEC 9126:1991. Technical report, ISO/IEC
ISO/IEC 25010:2011 (2011) https://www.iso.org/standard/35733.html. Accessed Nov 2017
Kan SH (2002) Metrics and models in software quality engineering, 2nd edn. Addison-Wesley Longman Publishing Co, Inc, Boston
Fenton N, Bieman J (2014) Software metrics: a rigorous and practical approach, 3rd edn. CRC Press Inc, Boca Raton
Bourque P, Fairley RE (eds) (2014) SWEBOK: guide to the software engineering body of knowledge, version 3.0 edition. IEEE Computer Society, Los Alamitos, CA
Thomas JM (1976) A complexity measure. In: Proceedings of the 2nd international conference on software engineering (ICSE’76), Los Alamitos, CA, USA. IEEE Computer Society Press, p 407
Halstead MH (1977) Elements of software science (operating and programming systems series). Elsevier Science Inc, New York
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
Henderson-Sellers B (1996) Object-oriented metrics: measures of complexity. Prentice-Hall Inc, Upper Saddle River
e Abreu FB, Carapuça R (1994) Candidate metrics for object-oriented software within a taxonomy framework. J Syst Softw 26(1):87–96
Lorenz M, Kidd J (1994) Object-oriented software metrics: a practical guide. Prentice-Hall Inc, Upper Saddle River
Srinivasan KP, Devi T (2014) A comprehensive review and analysis on object-oriented software metrics in software measurement. Int J Comput Sci Eng 6(7):247
Chhabra JK, Gupta V (2010) A survey of dynamic software metrics. J Comput Sci Technol 25(5):1016–1029
Yacoub SM, Ammar HH, Robinson T (1999) Dynamic metrics for object oriented designs. In: Proceedings of the 6th international symposium on software metrics (METRICS’99), Washington, DC, USA. IEEE Computer Society, p 50
Arisholm E, Briand LC, Foyen A (2004) Dynamic coupling measurement for object-oriented software. IEEE Trans Softw Eng 30(8):491–506
Schneidewind N (2009) Systems and software engineering with applications. Institute of Electrical and Electronics Engineers, New York
Albrecht AJ (1979) Measuring application development productivity. In: I. B. M. Press (ed) Proceedings of the IBM application development symposium, pp 83–92
Jones TC (1998) Estimating software costs. McGraw-Hill, Inc, Hightstown
Moser R, Pedrycz W, Succi G (2008) A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th international conference on software engineering (ICSE’08), New York, NY, USA. ACM, pp 181–190
Hassan AE (2009) Predicting faults using the complexity of code changes. In: Proceedings of the 31st international conference on software engineering (ICSE’09), Washington, DC, USA. IEEE Computer Society, pp 78–88
Spinellis D (2006) Code quality: the open source perspective. Effective software development series. Addison-Wesley Professional, Boston
BW Boehm (1981) Software engineering economics, 1st edn. Prentice Hall PTR, Upper Saddle River
Myers GJ, Sandler C, Badgett T (2011) The art of software testing, 3rd edn. Wiley Publishing, Hoboken
Beizer B (1990) Software testing techniques, 2nd edn. Van Nostrand Reinhold Co, New York
Copeland L (2003) A practitioner’s guide to software test design. Artech House Inc, Norwood
Beck K (2002) Test driven development: by example. Addison-Wesley Longman Publishing Co, Inc, Boston
Beck K (2000) Extreme programming explained: embrace change. Addison-Wesley Longman Publishing Co, Inc, Boston
McConnell S (2004) Code complete, 2nd edn. Microsoft Press, Redmond
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann Publishers Inc, San Francisco
Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining, 1st edn. Addison-Wesley Longman Publishing Co, Inc, Boston
Hand DJ, Smyth P, Mannila H (2001) Principles of data mining. MIT Press, Cambridge
Mitchell TM (1997) Machine learning, 1st edn. McGraw-Hill Inc, New York
Bishop CM (2006) Pattern recognition and machine learning. Springer-Verlag New York Inc, Secaucus
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York
Engelbrecht AP (2007) Computational intelligence: an introduction, 2nd edn. Wiley Publishing, Hoboken
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. Morgan Kaufmann Publishers Inc, San Francisco
Delen D, Demirkan H (2013) Data, information and analytics as services. Decis Support Syst 55(1):359–363
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer New York Inc, New York
Webb AR (2002) Statistical pattern recognition, 2 edn. Wiley, Hoboken
Urbanowicz RJ, Browne WN (2017) Introduction to learning classifier systems. Springerbriefs in intelligent systems. Springer New York Inc, New York
**e T, Thummalapenta S, Lo D, Liu C (2009) Data mining for software engineering. Computer 42(8):55–62
Monperrus M (2013) Data-mining for software engineering. https://www.monperrus.net/martin/data-mining-software-engineering. Accessed Nov 2017
Halkidi M, Spinellis D, Tsatsaronis G, Vazirgiannis M (2011) Data mining in software engineering. Intell Data Anal 15(3):413–441
Kumar M, Ajmeri N, Ghaisas S (2010) Towards knowledge assisted agile requirements evolution. In: Proceedings of the 2nd international workshop on recommendation systems for software engineering (RSSE’10), New York, NY, USA. ACM, pp 16–20
Ghaisas S, Ajmeri N (2013) Knowledge-assisted ontology-based requirements evolution. In: Maalej W, Thurimella AK (eds) Managing requirements knowledge, pp 143–167. Springer, Berlin
Chen K, Zhang W, Zhao H, Mei H (2005) An approach to constructing feature models based on requirements clustering. In: Proceedings of the 13th IEEE international conference on requirements engineering (RE’05), Washington, DC, USA. IEEE Computer Society, pp 31–40
Alves V, Schwanninger C, Barbosa L, Rashid A, Sawyer P, Rayson P, Pohl C, Rummler A (2008) An exploratory study of information retrieval techniques in domain analysis. In: Proceedings of the 2008 12th international software product line conference (SPLC’08), Washington, DC, USA. IEEE Computer Society, pp 67–76
Felfernig A, Schubert M, Mandl M, Ricci F, Maalej W (2010) Recommendation and decision technologies for requirements engineering. In: Proceedings of the 2nd international workshop on recommendation systems for software engineering (RSSE’10), New York, NY, USA. ACM, pp 11–15
Maalej W, Thurimella AK (2009) Towards a research agenda for recommendation systems in requirements engineering. In: Proceedings of the 2009 2nd international workshop on managing requirements knowledge (MARK’09), Washington, DC, USA. IEEE Computer Society, pp 32–39
Binkley D (2007) Source code analysis: a road map. In: 2007 Future of software engineering (FOSE’07), Washington, DC, USA. IEEE Computer Society, pp 104–119
Janjic W, Hummel O, Schumacher M, Atkinson C (2013) An unabridged source code dataset for research in software reuse. In: Proceedings of the 10th working conference on mining software repositories (MSR’13), Piscataway, NJ, USA. IEEE Press, pp 339–342
Bajracharya S, Ngo T, Linstead E, Dou Y, Rigor P, Baldi P, Lopes C (2006) Sourcerer: a search engine for open source code supporting structure-based search. In: Companion to the 21st ACM SIGPLAN symposium on object-oriented programming systems, languages, and applications (OOPSLA’06), New York, NY, USA. ACM, pp 681–682
Linstead E, Bajracharya S, Ngo T, Rigor P, Lopes C, Baldi P (2009) Sourcerer: mining and searching internet-scale software repositories. Data Min Knowl Discov 18(2):300–336
Holmes R, Murphy GC (2005) Using structural context to recommend source code examples. In: Proceedings of the 27th international conference on software engineering (ICSE’05), New York, NY, USA. ACM, pp 117–125
Mandelin D, Lin X, Bodík R, Kimelman D (2005) Jungloid mining: hel** to navigate the API jungle. SIGPLAN Not 40(6):48–61
Thummalapenta S, **e T (2007) PARSEWeb: a programmer assistant for reusing open source code on the web. In: Proceedings of the 22nd IEEE/ACM international conference on automated software engineering (ASE’07), New York, NY, USA. ACM, pp 204–213
Hummel O, Janjic W, Atkinson C (2008) Code conjurer: pulling reusable software out of thin air. IEEE Softw 25(5):45–52
Lazzarini Lemos OA, Bajracharya SK, Ossher J (2007) CodeGenie: a tool for test-driven source code search. In: Companion to the 22nd ACM SIGPLAN conference on object-oriented programming systems and applications companion (OOPSLA’07), New York, NY, USA. ACM, pp 917–918
Reiss SP (2009) Semantics-based code search. In: Proceedings of the 31st international conference on software engineering (ICSE’09), Washington, DC, USA. IEEE Computer Society, pp 243–253
Hummel O, Atkinson C (2004) Extreme harvesting: test driven discovery and reuse of software components. In: Proceedings of the 2004 IEEE international conference on information reuse and integration (IRI 2004), pp 66–72
Nurolahzade M, Walker RJ, Maurer F (2013) An assessment of test-driven reuse: promises and pitfalls. In: Favaro J, Morisio M (eds) Safe and secure software reuse. Lecture notes in computer science, vol 7925. Springer, Berlin, pp 65–80
**e T, Pei J (2006) MAPO: mining API usages from open source repositories. In: Proceedings of the 2006 international workshop on mining software repositories (MSR’06), New York, NY, USA. ACM, pp 54–57
Sahavechaphan N, Claypool K (2006) XSnippet: mining for sample code. SIGPLAN Not 41(10):413–430
Zagalsky A, Barzilay O, Yehudai A (2012) Example overflow: using social media for code recommendation. In: Proceedings of the 3rd international workshop on recommendation systems for software engineering (RSSE’12), Piscataway, NJ, USA. IEEE Press, pp 38–42
Wightman D, Ye Z, Brandt J, Vertegaal R (2012) SnipMatch: using source code context to enhance snippet retrieval and parameterization. In: Proceedings of the 25th annual ACM symposium on user interface software and technology (UIST’12), New York, NY, USA. ACM, pp 219–228
Wei Y, Chandrasekaran N, Gulwani S, Hamadi Y (2015) Building bing developer assistant. Technical report MSR-TR-2015-36, Microsoft Research
Kagdi H, Collard ML, Maletic JI (2007) A survey and taxonomy of approaches for mining software repositories in the context of software evolution. J Softw Maint Evol 19(2):77–131
Alves TL, Ypma C, Visser J (2010) Deriving metric thresholds from benchmark data. In: Proceedings of the IEEE international conference on software maintenance (ICSM). IEEE, pp 1–10
Ferreira KAM, Bigonha MAS, Bigonha RS, Mendes LFO, Almeida HC (2012) Identifying thresholds for object-oriented software metrics. J Syst Softw 85(2):244–257
Zhong S, Khoshgoftaar TM, Seliya N (2004) Unsupervised learning for expert-based software quality estimation. In: Proceedings of the 8th IEEE international conference on high assurance systems engineering (HASE’04), pp 149–155
Hovemeyer D, Spacco J, Pugh W (2005) Evaluating and tuning a static analysis to find null pointer bugs. SIGSOFT Softw Eng Notes 31(1):13–19
Ayewah N, Hovemeyer D, Morgenthaler JD, Penix J, Pugh W (2008) Using static analysis to find bugs. IEEE Softw 25(5):22–29
Le Goues C, Weimer W (2012) Measuring code quality to improve specification mining. IEEE Trans Softw Eng 38(1):175–190
Washizaki H, Namiki R, Fukuoka T, Harada Y, Watanabe H (2007) A framework for measuring and evaluating program source code quality. In: Proceedings of the 8th international conference on product-focused software process improvement (PROFES). Springer, pp 284–299
Cai T, Lyu MR, Wong K-F, Wong M (2001) ComPARE: a generic quality assessment environment for component-based software systems. In: Proceedings of the 2001 international symposium on information systems and engineering (ISE’2001)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Diamantopoulos, T., Symeonidis, A.L. (2020). Theoretical Background and State-of-the-Art. In: Mining Software Engineering Data for Software Reuse. Advanced Information and Knowledge Processing. Springer, Cham. https://doi.org/10.1007/978-3-030-30106-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-30106-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30105-7
Online ISBN: 978-3-030-30106-4
eBook Packages: Computer ScienceComputer Science (R0)