Log in

Improved binary similarity measures for software modularization

  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

Various binary similarity measures have been employed in clustering approaches to make homogeneous groups of similar entities in the data. These similarity measures are mostly based only on the presence or absence of features. Binary similarity measures have also been explored with different clustering approaches (e.g., agglomerative hierarchical clustering) for software modularization to make software systems understandable and manageable. Each similarity measure has its own strengths and weaknesses which improve and deteriorate the clustering results, respectively. We highlight the strengths of some well-known existing binary similarity measures for software modularization. Furthermore, based on these existing similarity measures, we introduce several improved new binary similarity measures. Proofs of the correctness with illustration and a series of experiments are presented to evaluate the effectiveness of our new binary similarity measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Andreopoulos, B., An, A.J., Tzerpos, V., et al., 2005. Multiple layer clustering of large software systems. Proc. 12th Working Conf. on Reverse Engineering, p.79–88. https://doi.org/10.1109/wcre.2005.24

    Google Scholar 

  • Andritsos, P., Tzerpos, V., 2005. Information-theoretic software clustering. IEEE Trans. Softw. Eng., 31(2): 150–165. https://doi.org/10.1109/tse.2005.25

    Article  Google Scholar 

  • Anquetil, N., Lethbridge, T.C., 1999. Experiments with clustering as a software remodularization method. Proc. 6th Working Conf. on Reverse Engineering, p.235–255. https://doi.org/10.1109/wcre.1999.806964

    Google Scholar 

  • Bauer, M., Trifu, M., 2004. Architecture-aware adaptive clustering of OO systems. Proc. 8th European Conf. on Software Maintenance and Reengineering, p.3–14. https://doi.org/10.1109/csmr.2004.1281401

    Google Scholar 

  • Bittencourt, R.A., Guerrero, D.D.S., 2009. Comparison of graph clustering algorithms for recovering software architecture module views. Proc. 13th European Conf. on Software Maintenance and Reengineering, p.251–254. https://doi.org/10.1109/csmr.2009.28

    Google Scholar 

  • Cheetham, A.H., Hazel, J.E., 1969. Binary (presenceabsence) similarity coefficents. J. Paleontol., 43(5): 1130–1136.

    Google Scholar 

  • Chong, C.Y., Lee, S.P., Ling, T.C., 2013. Efficient software clustering technique using an adaptive and preventive dendrogram cutting approach. Inform. Softw. Technol., 55(11): 1994–2012. https://doi.org/10.1016/j.infsof.2013.07.002

    Article  Google Scholar 

  • Cui, J.F., Chae, H.S., 2011. Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems. Inform. Softw. Technol., 53(6): 601–614. https://doi.org/10.1016/j.infsof.2011.01.006

    Article  Google Scholar 

  • Davey, J., Burd, E., 2000. Evaluating the suitability of data clustering for software remodularisation. Proc. 7th Working Conf. on Reverse Engineering, p.268–276. https://doi.org/10.1109/wcre.2000.891478

    Google Scholar 

  • Dugerdil, P., Jossi, S., 2008. Reverse-architecting legacy software based on roles: an industrial experiment. Commun. Comput. Inform. Sci., 22: 114–127. https://doi.org/10.1007/978-3-540-88655-6_9

    Article  Google Scholar 

  • Glorie, M., Zaidman, A., van Deursen, A., et al., 2009. Splitting a large software repository for easing future software evolution—an industrial experience report. J. Softw. Mainten. Evol. Res. Pract., 21(2): 113–141. https://doi.org/10.1002/smr.401

    Article  Google Scholar 

  • Godfrey, M.W., Lee, E.H., 2000. Secrets from the monster: extracting Mozilla’s software architecture. Proc. Int. Symp. on Constructing Software Engineering Tools, p.1–10.

    Google Scholar 

  • Hall, M., Walkinshaw, N., McMinn, P., 2012. Supervised software modularisation. Proc. 28th IEEE Int. Conf. on Software Maintenance, p.472–481. https://doi.org/10.1109/icsm.2012.6405309

    Google Scholar 

  • Hussain, I., Khanum, A., Abbasi, A.Q., et al., 2015. A novel approach for software architecture recovery using particle swarm optimization. Int. Arab. J. Inform. Technol., 12(1): 1–10.

    Google Scholar 

  • Jackson, D.A., Somers, K.M., Harvey, H.H., 1989. Similarity coefficients: measures of co-occurrence and association or simply measures of occurrence. Am. Nat., 133(3): 436–453. https://doi.org/10.1086/284927

    Article  Google Scholar 

  • Jahnke, J.H., 2004. Reverse engineering software architecture using rough clusters. Proc. IEEE Annual Meeting of the Fuzzy Information, p.4–9. https://doi.org/10.1109/nafips.2004.1336239

    Google Scholar 

  • Kanellopoulos, Y., Antonellis, P., Tjortjis, C., et al., 2007. K-attractors: a clustering algorithm for software measurement data analysis. Proc. 19th IEEE Int. Conf. on Tools with Artificial Intelligence, p.358–365. https://doi.org/10.1109/ictai.2007.31

    Google Scholar 

  • Lakhotia, A., 1997. A unified framework for expressing software subsystem classification techniques. J. Syst. Softw., 36(3): 211–231. https://doi.org/10.1016/0164-1212(95)00098-4

    Article  Google Scholar 

  • Lesot, M.J., Rifqi, M., Benhadda, H., 2009. Similarity measures for binary and numerical data: a survey. Int. J. Knowl. Eng. Soft Data Parad., 1(1): 63. https://doi.org/10.1504/ijkesdp.2009.021985

    Article  Google Scholar 

  • Lung, C.H., Zaman, M., Nandi, A., 2004. Applications of clustering techniques to software partitioning, recovery and restructuring. J. Syst. Softw., 73(2): 227–244. https://doi.org/10.1016/s0164-1212(03)00234-6

    Article  Google Scholar 

  • Lutellier, T., Chollak, D., Garcia, J., et al., 2015. Comparing software architecture recovery techniques using accurate dependencies. Proc. 37th IEEE Int. Conf. on Software Engineering, p.69–78. https://doi.org/10.1109/icse.2015.136

    Google Scholar 

  • Maqbool, O., Babri, H., 2004. The weighted combined algorithm: a linkage algorithm for software clustering. Proc. 8th European Conf. on Software Maintenance and Reengineering, p.15–24. https://doi.org/10.1109/csmr.2004.1281402

    Google Scholar 

  • Maqbool, O., Babri, H., 2007. Hierarchical clustering for software architecture recovery. IEEE Trans. Softw. Eng., 33(11): 759–780. https://doi.org/10.1109/tse.2007.70732

    Article  Google Scholar 

  • Mitchell, B.S., 2006. Clustering Software Systems to Identify Subsystem Structures. Technical Report, Department of Mathematics and Computer Science, Drexel University, USA.

    Google Scholar 

  • Mitchell, B.S., Mancoridis, S., 2006. On the automatic modularization of software systems using the Bunch tool. IEEE Trans. Softw. Eng., 32(3): 193–208. https://doi.org/10.1109/tse.2006.31

    Article  Google Scholar 

  • Muhammad, S., Maqbool, O., Abbasi, A.Q., 2012. Evaluating relationship categories for clustering object-oriented software systems. IET Softw., 6(3): 260–274. https://doi.org/10.1049/iet-sen.2011.0061

    Article  Google Scholar 

  • Naseem, R., Maqbool, O., Muhammad, S., 2010. An improved similarity measure for binary features in software clustering. Proc. 2nd Int. Conf. on Computational Intelligence, Modelling and Simulation, p.111–116. https://doi.org/10.1109/cimsim.2010.34

    Google Scholar 

  • Naseem, R., Maqbool, O., Muhammad, S., 2011. Improved similarity measures for software clustering. Proc. 15th European Conf. on Software Maintenance and Reengineering, p.45–54. https://doi.org/10.1109/csmr.2011.9

    Google Scholar 

  • Naseem, R., Maqbool, O., Muhammad, S., 2013. Cooperative clustering for software modularization. J. Syst. Softw., 86(8): 2045–2062. https://doi.org/10.1016/j.jss.2013.03.080

    Article  Google Scholar 

  • Patel, C., Hamou-Lhadj, A., Rilling, J., 2009. Software clustering using dynamic analysis and static dependencies. Proc. 13th European Conf. on Software Maintenance and Reengineering, p.27–36. https://doi.org/10.1109/csmr.2009.62

    Google Scholar 

  • Praditwong, K., 2011. Solving software module clustering problem by evolutionary algorithms. Proc. 8th Int. Joint Conf. on Computer Science and Software Engineering, p.154–159. https://doi.org/10.1109/jcsse.2011.5930112

    Google Scholar 

  • Praditwong, K., Harman, M., Yao, X., 2011. Software module clustering as a multi-objective search problem. IEEE Trans. Softw. Eng., 37(2): 264–282. https://doi.org/10.1109/tse.2010.26

    Article  Google Scholar 

  • Saeed, M., Maqbool, O., Babri, H., et al., 2003. Software clustering techniques and the use of combined algorithm. Proc. 7th European Conf. on Software Maintenance and Reengineering, p.301–306. https://doi.org/10.1109/csmr.2003.1192438

    Google Scholar 

  • Sartipi, K., Kontogiannis, K., 2003. On modeling software architecture recovery as graph matching. Proc. Int. Conf. on Software Maintenance, p.224–234. https://doi.org/10.1109/icsm.2003.1235425

    Google Scholar 

  • Seung-Seok, C., Cha, S.H., Tappert, C.C., 2010. A survey of binary similarity and distance measures. J. Syst. Cybern. Inform., 8(1): 43–48.

    Google Scholar 

  • Shah, Z., Naseem, R., Orgun, M., et al., 2013. Software clustering using automated feature subset selection. Proc. Int. Conf. on Advanced Data Mining and Applications, p.47–58. https://doi.org/10.1007/978-3-642-53917-6_5

    Google Scholar 

  • Shtern, M., Tzerpos, V., 2010. On the comparability of software clustering algorithms. Proc. IEEE 18th Int. Conf. on Program Comprehension, p.64–67. https://doi.org/10.1109/icpc.2010.25

    Google Scholar 

  • Shtern, M., Tzerpos, V., 2012. Clustering methodologies for software engineering. Adv. Softw. Eng., 2012: 792024.1-792024.18. https://doi.org/10.1155/2012/792024

    Article  Google Scholar 

  • Shtern, M., Tzerpos, V., 2014. Methods for selecting and improving software clustering algorithms. Softw. Pract. Exp., 44(1): 33–46. https://doi.org/10.1002/spe.2147

    Article  Google Scholar 

  • Siddique, F., Maqbool, O., 2012. Enhancing comprehensibility of software clustering results. IET Softw., 6(4): 283. https://doi.org/10.1049/iet-sen.2012.0027

    Article  Google Scholar 

  • Synytskyy, N., Holt, R.C., Davis, I., 2005. Browsing software architectures with LSEdit. Proc. 13th Int. Workshop on Program Comprehension, p.176–178. https://doi.org/10.1109/wpc.2005.11

    Google Scholar 

  • Tonella, P., 2001. Concept analysis for module restructuring. IEEE Trans. Softw. Eng., 27(4): 351–363. https://doi.org/10.1109/32.917524

    Article  Google Scholar 

  • Tzerpos, V., Holt, R.C., 1999. MoJo: a distance metric for software clusterings. Proc. 6th Working Conf. on Reverse Engineering, p.187–193. https://doi.org/10.1109/wcre.1999.806959

    Google Scholar 

  • Tzerpos, V., Holt, R.C., 2000. On the stability of software clustering algorithms. Proc. 8th Int. Workshop on Program Comprehension, p.211–218. https://doi.org/10.1109/wpc.2000.852495

    Google Scholar 

  • Vasconcelos, A., Werner, C., 2007. Architecture recovery and evaluation aiming at program understanding and reuse. Proc. Int. Conf. on the Quality of Software Architectures, p.72–89. https://doi.org/10.1007/978-3-540-77619-2_5

    Google Scholar 

  • Veal, B.W.G., 2011. Binary Similarity Measures and Their Applications in Machine Learning. PhD Thesis, London School of Economics, London, UK.

    Google Scholar 

  • Wang, Y., Liu, P., Guo, H., et al., 2010. Improved hierarchical clustering algorithm for software architecture recovery. Proc. Int. Conf. on Intelligent Computing and Cognitive Informatics, p.247–250. https://doi.org/10.1109/icicci.2010.45

    Google Scholar 

  • Wen, Z., Tzerpos, V., 2003. An optimal algorithm for MoJo distance. Proc. 11th IEEE Int. Workshop on Program Comprehension, p.227–235. https://doi.org/10.1109/wpc.2003.1199206

    Google Scholar 

  • Wen, Z., Tzerpos, V., 2004. An effectiveness measure for software clustering algorithms. Proc. 12th IEEE Int. Workshop on Program Comprehension, p.194–203. https://doi.org/10.1109/wpc.2004.1311061

    Google Scholar 

  • Wiggerts, T.A., 1997. Using clustering algorithms in legacy systems remodularization. Proc. 4th Working Conf. on Reverse Engineering, p.33–43. https://doi.org/10.1109/wcre.1997.624574

    Google Scholar 

  • Wu, J., Hassan, A.E., Holt, R.C., 2005. Comparison of clustering algorithms in the context of software evolution. Proc. 21st IEEE Int. Conf. on Software Maintenance, p.525–535. https://doi.org/10.1109/icsm.2005.31

    Google Scholar 

  • Xanthos, S., Goodwin, N., 2006. Clustering object-oriented software systems using spectral graph partitioning. Urbana, 51(1): 1–5.

    Google Scholar 

  • **a, C., Tzerpos, V., 2005. Software clustering based on dynamic dependencies. Proc. 9th European Conf. on Software Maintenance and Reengineering, p.124–133. https://doi.org/10.1109/csmr.2005.49

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rashid Naseem.

Additional information

Project supported by the Office of Research, Innovation, Commercialization and Consultancy (ORICC), Universiti Tun Hussein Onn Malaysia (UTHM), Malaysia (No. U063)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Naseem, R., Deris, M.B.M., Maqbool, O. et al. Improved binary similarity measures for software modularization. Frontiers Inf Technol Electronic Eng 18, 1082–1107 (2017). https://doi.org/10.1631/FITEE.1500373

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.1500373

Key words

CLC number

Navigation