Improved binary similarity measures for software modularization

Naseem, Rashid; Deris, Mustafa Bin Mat; Maqbool, Onaiza; Li, **g-peng; Shahzad, Sara; Shah, Habib

doi:10.1631/FITEE.1500373

Improved binary similarity measures for software modularization

Published: 22 September 2017

Volume 18, pages 1082–1107, (2017)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Rashid Naseem ORCID: orcid.org/0000-0002-4952-8100¹,
Mustafa Bin Mat Deris¹,
Onaiza Maqbool²,
**g-peng Li³,
Sara Shahzad⁴ &
…
Habib Shah⁵

122 Accesses
6 Citations
6 Altmetric
2 Mentions
Explore all metrics

Abstract

Various binary similarity measures have been employed in clustering approaches to make homogeneous groups of similar entities in the data. These similarity measures are mostly based only on the presence or absence of features. Binary similarity measures have also been explored with different clustering approaches (e.g., agglomerative hierarchical clustering) for software modularization to make software systems understandable and manageable. Each similarity measure has its own strengths and weaknesses which improve and deteriorate the clustering results, respectively. We highlight the strengths of some well-known existing binary similarity measures for software modularization. Furthermore, based on these existing similarity measures, we introduce several improved new binary similarity measures. Proofs of the correctness with illustration and a series of experiments are presented to evaluate the effectiveness of our new binary similarity measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

A New Binary Similarity Measure Based on Integration of the Strengths of Existing Measures: Application to Software Clustering

Evaluating the Effectiveness of Multi-level Greedy Modularity Clustering for Software Architecture Recovery

Weighing lexical information for software clustering in the context of architecture recovery

Article 21 March 2015

References

Andreopoulos, B., An, A.J., Tzerpos, V., et al., 2005. Multiple layer clustering of large software systems. Proc. 12th Working Conf. on Reverse Engineering, p.79–88. https://doi.org/10.1109/wcre.2005.24
Google Scholar
Andritsos, P., Tzerpos, V., 2005. Information-theoretic software clustering. IEEE Trans. Softw. Eng., 31(2): 150–165. https://doi.org/10.1109/tse.2005.25
Article Google Scholar
Anquetil, N., Lethbridge, T.C., 1999. Experiments with clustering as a software remodularization method. Proc. 6th Working Conf. on Reverse Engineering, p.235–255. https://doi.org/10.1109/wcre.1999.806964
Google Scholar
Bauer, M., Trifu, M., 2004. Architecture-aware adaptive clustering of OO systems. Proc. 8th European Conf. on Software Maintenance and Reengineering, p.3–14. https://doi.org/10.1109/csmr.2004.1281401
Google Scholar
Bittencourt, R.A., Guerrero, D.D.S., 2009. Comparison of graph clustering algorithms for recovering software architecture module views. Proc. 13th European Conf. on Software Maintenance and Reengineering, p.251–254. https://doi.org/10.1109/csmr.2009.28
Google Scholar
Cheetham, A.H., Hazel, J.E., 1969. Binary (presenceabsence) similarity coefficents. J. Paleontol., 43(5): 1130–1136.
Google Scholar
Chong, C.Y., Lee, S.P., Ling, T.C., 2013. Efficient software clustering technique using an adaptive and preventive dendrogram cutting approach. Inform. Softw. Technol., 55(11): 1994–2012. https://doi.org/10.1016/j.infsof.2013.07.002
Article Google Scholar
Cui, J.F., Chae, H.S., 2011. Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems. Inform. Softw. Technol., 53(6): 601–614. https://doi.org/10.1016/j.infsof.2011.01.006
Article Google Scholar
Davey, J., Burd, E., 2000. Evaluating the suitability of data clustering for software remodularisation. Proc. 7th Working Conf. on Reverse Engineering, p.268–276. https://doi.org/10.1109/wcre.2000.891478
Google Scholar
Dugerdil, P., Jossi, S., 2008. Reverse-architecting legacy software based on roles: an industrial experiment. Commun. Comput. Inform. Sci., 22: 114–127. https://doi.org/10.1007/978-3-540-88655-6_9
Article Google Scholar
Glorie, M., Zaidman, A., van Deursen, A., et al., 2009. Splitting a large software repository for easing future software evolution—an industrial experience report. J. Softw. Mainten. Evol. Res. Pract., 21(2): 113–141. https://doi.org/10.1002/smr.401
Article Google Scholar
Godfrey, M.W., Lee, E.H., 2000. Secrets from the monster: extracting Mozilla’s software architecture. Proc. Int. Symp. on Constructing Software Engineering Tools, p.1–10.
Google Scholar
Hall, M., Walkinshaw, N., McMinn, P., 2012. Supervised software modularisation. Proc. 28th IEEE Int. Conf. on Software Maintenance, p.472–481. https://doi.org/10.1109/icsm.2012.6405309
Google Scholar
Hussain, I., Khanum, A., Abbasi, A.Q., et al., 2015. A novel approach for software architecture recovery using particle swarm optimization. Int. Arab. J. Inform. Technol., 12(1): 1–10.
Google Scholar
Jackson, D.A., Somers, K.M., Harvey, H.H., 1989. Similarity coefficients: measures of co-occurrence and association or simply measures of occurrence. Am. Nat., 133(3): 436–453. https://doi.org/10.1086/284927
Article Google Scholar
Jahnke, J.H., 2004. Reverse engineering software architecture using rough clusters. Proc. IEEE Annual Meeting of the Fuzzy Information, p.4–9. https://doi.org/10.1109/nafips.2004.1336239
Google Scholar
Kanellopoulos, Y., Antonellis, P., Tjortjis, C., et al., 2007. K-attractors: a clustering algorithm for software measurement data analysis. Proc. 19th IEEE Int. Conf. on Tools with Artificial Intelligence, p.358–365. https://doi.org/10.1109/ictai.2007.31
Google Scholar
Lakhotia, A., 1997. A unified framework for expressing software subsystem classification techniques. J. Syst. Softw., 36(3): 211–231. https://doi.org/10.1016/0164-1212(95)00098-4
Article Google Scholar
Lesot, M.J., Rifqi, M., Benhadda, H., 2009. Similarity measures for binary and numerical data: a survey. Int. J. Knowl. Eng. Soft Data Parad., 1(1): 63. https://doi.org/10.1504/ijkesdp.2009.021985
Article Google Scholar
Lung, C.H., Zaman, M., Nandi, A., 2004. Applications of clustering techniques to software partitioning, recovery and restructuring. J. Syst. Softw., 73(2): 227–244. https://doi.org/10.1016/s0164-1212(03)00234-6
Article Google Scholar
Lutellier, T., Chollak, D., Garcia, J., et al., 2015. Comparing software architecture recovery techniques using accurate dependencies. Proc. 37th IEEE Int. Conf. on Software Engineering, p.69–78. https://doi.org/10.1109/icse.2015.136
Google Scholar
Maqbool, O., Babri, H., 2004. The weighted combined algorithm: a linkage algorithm for software clustering. Proc. 8th European Conf. on Software Maintenance and Reengineering, p.15–24. https://doi.org/10.1109/csmr.2004.1281402
Google Scholar
Maqbool, O., Babri, H., 2007. Hierarchical clustering for software architecture recovery. IEEE Trans. Softw. Eng., 33(11): 759–780. https://doi.org/10.1109/tse.2007.70732
Article Google Scholar
Mitchell, B.S., 2006. Clustering Software Systems to Identify Subsystem Structures. Technical Report, Department of Mathematics and Computer Science, Drexel University, USA.
Google Scholar
Mitchell, B.S., Mancoridis, S., 2006. On the automatic modularization of software systems using the Bunch tool. IEEE Trans. Softw. Eng., 32(3): 193–208. https://doi.org/10.1109/tse.2006.31
Article Google Scholar
Muhammad, S., Maqbool, O., Abbasi, A.Q., 2012. Evaluating relationship categories for clustering object-oriented software systems. IET Softw., 6(3): 260–274. https://doi.org/10.1049/iet-sen.2011.0061
Article Google Scholar
Naseem, R., Maqbool, O., Muhammad, S., 2010. An improved similarity measure for binary features in software clustering. Proc. 2nd Int. Conf. on Computational Intelligence, Modelling and Simulation, p.111–116. https://doi.org/10.1109/cimsim.2010.34
Google Scholar
Naseem, R., Maqbool, O., Muhammad, S., 2011. Improved similarity measures for software clustering. Proc. 15th European Conf. on Software Maintenance and Reengineering, p.45–54. https://doi.org/10.1109/csmr.2011.9
Google Scholar
Naseem, R., Maqbool, O., Muhammad, S., 2013. Cooperative clustering for software modularization. J. Syst. Softw., 86(8): 2045–2062. https://doi.org/10.1016/j.jss.2013.03.080
Article Google Scholar
Patel, C., Hamou-Lhadj, A., Rilling, J., 2009. Software clustering using dynamic analysis and static dependencies. Proc. 13th European Conf. on Software Maintenance and Reengineering, p.27–36. https://doi.org/10.1109/csmr.2009.62
Google Scholar
Praditwong, K., 2011. Solving software module clustering problem by evolutionary algorithms. Proc. 8th Int. Joint Conf. on Computer Science and Software Engineering, p.154–159. https://doi.org/10.1109/jcsse.2011.5930112
Google Scholar
Praditwong, K., Harman, M., Yao, X., 2011. Software module clustering as a multi-objective search problem. IEEE Trans. Softw. Eng., 37(2): 264–282. https://doi.org/10.1109/tse.2010.26
Article Google Scholar
Saeed, M., Maqbool, O., Babri, H., et al., 2003. Software clustering techniques and the use of combined algorithm. Proc. 7th European Conf. on Software Maintenance and Reengineering, p.301–306. https://doi.org/10.1109/csmr.2003.1192438
Google Scholar
Sartipi, K., Kontogiannis, K., 2003. On modeling software architecture recovery as graph matching. Proc. Int. Conf. on Software Maintenance, p.224–234. https://doi.org/10.1109/icsm.2003.1235425
Google Scholar
Seung-Seok, C., Cha, S.H., Tappert, C.C., 2010. A survey of binary similarity and distance measures. J. Syst. Cybern. Inform., 8(1): 43–48.
Google Scholar
Shah, Z., Naseem, R., Orgun, M., et al., 2013. Software clustering using automated feature subset selection. Proc. Int. Conf. on Advanced Data Mining and Applications, p.47–58. https://doi.org/10.1007/978-3-642-53917-6_5
Google Scholar
Shtern, M., Tzerpos, V., 2010. On the comparability of software clustering algorithms. Proc. IEEE 18th Int. Conf. on Program Comprehension, p.64–67. https://doi.org/10.1109/icpc.2010.25
Google Scholar
Shtern, M., Tzerpos, V., 2012. Clustering methodologies for software engineering. Adv. Softw. Eng., 2012: 792024.1-792024.18. https://doi.org/10.1155/2012/792024
Article Google Scholar
Shtern, M., Tzerpos, V., 2014. Methods for selecting and improving software clustering algorithms. Softw. Pract. Exp., 44(1): 33–46. https://doi.org/10.1002/spe.2147
Article Google Scholar
Siddique, F., Maqbool, O., 2012. Enhancing comprehensibility of software clustering results. IET Softw., 6(4): 283. https://doi.org/10.1049/iet-sen.2012.0027
Article Google Scholar
Synytskyy, N., Holt, R.C., Davis, I., 2005. Browsing software architectures with LSEdit. Proc. 13th Int. Workshop on Program Comprehension, p.176–178. https://doi.org/10.1109/wpc.2005.11
Google Scholar
Tonella, P., 2001. Concept analysis for module restructuring. IEEE Trans. Softw. Eng., 27(4): 351–363. https://doi.org/10.1109/32.917524
Article Google Scholar
Tzerpos, V., Holt, R.C., 1999. MoJo: a distance metric for software clusterings. Proc. 6th Working Conf. on Reverse Engineering, p.187–193. https://doi.org/10.1109/wcre.1999.806959
Google Scholar
Tzerpos, V., Holt, R.C., 2000. On the stability of software clustering algorithms. Proc. 8th Int. Workshop on Program Comprehension, p.211–218. https://doi.org/10.1109/wpc.2000.852495
Google Scholar
Vasconcelos, A., Werner, C., 2007. Architecture recovery and evaluation aiming at program understanding and reuse. Proc. Int. Conf. on the Quality of Software Architectures, p.72–89. https://doi.org/10.1007/978-3-540-77619-2_5
Google Scholar
Veal, B.W.G., 2011. Binary Similarity Measures and Their Applications in Machine Learning. PhD Thesis, London School of Economics, London, UK.
Google Scholar
Wang, Y., Liu, P., Guo, H., et al., 2010. Improved hierarchical clustering algorithm for software architecture recovery. Proc. Int. Conf. on Intelligent Computing and Cognitive Informatics, p.247–250. https://doi.org/10.1109/icicci.2010.45
Google Scholar
Wen, Z., Tzerpos, V., 2003. An optimal algorithm for MoJo distance. Proc. 11th IEEE Int. Workshop on Program Comprehension, p.227–235. https://doi.org/10.1109/wpc.2003.1199206
Google Scholar
Wen, Z., Tzerpos, V., 2004. An effectiveness measure for software clustering algorithms. Proc. 12th IEEE Int. Workshop on Program Comprehension, p.194–203. https://doi.org/10.1109/wpc.2004.1311061
Google Scholar
Wiggerts, T.A., 1997. Using clustering algorithms in legacy systems remodularization. Proc. 4th Working Conf. on Reverse Engineering, p.33–43. https://doi.org/10.1109/wcre.1997.624574
Google Scholar
Wu, J., Hassan, A.E., Holt, R.C., 2005. Comparison of clustering algorithms in the context of software evolution. Proc. 21st IEEE Int. Conf. on Software Maintenance, p.525–535. https://doi.org/10.1109/icsm.2005.31
Google Scholar
Xanthos, S., Goodwin, N., 2006. Clustering object-oriented software systems using spectral graph partitioning. Urbana, 51(1): 1–5.
Google Scholar
**a, C., Tzerpos, V., 2005. Software clustering based on dynamic dependencies. Proc. 9th European Conf. on Software Maintenance and Reengineering, p.124–133. https://doi.org/10.1109/csmr.2005.49
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Parit Raja, 86400, Malaysia
Rashid Naseem & Mustafa Bin Mat Deris
Department of Computer Science, Quaid-i-Azam University, Islamabad, 45320, Pakistan
Onaiza Maqbool
Division of Computer Science and Mathematics, University of Stirling, Stirling, FK9 4LA, UK
**g-peng Li
Department of Computer Science, University of Peshawar, Peshawar, 25120, Pakistan
Sara Shahzad
Faculty of Computer and Information Systems, Islamic University Madina, Madina, POBox 170, Kingdom of Saudi Arabia
Habib Shah

Authors

Rashid Naseem
View author publications
You can also search for this author in PubMed Google Scholar
Mustafa Bin Mat Deris
View author publications
You can also search for this author in PubMed Google Scholar
Onaiza Maqbool
View author publications
You can also search for this author in PubMed Google Scholar
**g-peng Li
View author publications
You can also search for this author in PubMed Google Scholar
Sara Shahzad
View author publications
You can also search for this author in PubMed Google Scholar
Habib Shah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rashid Naseem.

Additional information

Project supported by the Office of Research, Innovation, Commercialization and Consultancy (ORICC), Universiti Tun Hussein Onn Malaysia (UTHM), Malaysia (No. U063)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Naseem, R., Deris, M.B.M., Maqbool, O. et al. Improved binary similarity measures for software modularization. Frontiers Inf Technol Electronic Eng 18, 1082–1107 (2017). https://doi.org/10.1631/FITEE.1500373

Download citation

Received: 30 October 2015
Accepted: 12 April 2016
Published: 22 September 2017
Issue Date: August 2017
DOI: https://doi.org/10.1631/FITEE.1500373

Key words

CLC number

TP311

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Improved binary similarity measures for software modularization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A New Binary Similarity Measure Based on Integration of the Strengths of Existing Measures: Application to Software Clustering

Evaluating the Effectiveness of Multi-level Greedy Modularity Clustering for Software Architecture Recovery

Weighing lexical information for software clustering in the context of architecture recovery

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Subscribe and save

Buy Now

Navigation

Improved binary similarity measures for software modularization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A New Binary Similarity Measure Based on Integration of the Strengths of Existing Measures: Application to Software Clustering

Evaluating the Effectiveness of Multi-level Greedy Modularity Clustering for Software Architecture Recovery

Weighing lexical information for software clustering in the context of architecture recovery

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Subscribe and save

Buy Now

Search

Navigation