An effective approach to improve the performance of eCPDP (early cross-project defect prediction) via data-transformation and parameter optimization

Kwon, Sunjae; Ryu, Duksan; Baik, Jongmoon

doi:10.1007/s11219-023-09624-6

An effective approach to improve the performance of eCPDP (early cross-project defect prediction) via data-transformation and parameter optimization

Published: 16 March 2023

Volume 31, pages 1009–1044, (2023)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

Sunjae Kwon¹,
Duksan Ryu² &
Jongmoon Baik¹

292 Accesses
Explore all metrics

Abstract

Cross-project defect prediction (CPDP) utilizes other finished projects (i.e., source project) data to predict defects of the current working project. Transfer learning (TL) has been mainly applied at CPDP to improve prediction performance by alleviating the data distribution discrepancy between different projects. However, existing TL-based CPDP techniques are not applicable at the unit testing phase since they require the entire historical target project data. As a result, they lose the chance to increase the product’s reliability in the early phase by applying the prediction results. The objective of the present study is to increase the product’s reliability in the early phase by proposing a novel TL-based CPDP technique applicable at the unit testing phase (i.e., eCPDP). We utilize singular value decomposition (SVD), which only requires source project data for TL. eCPDP performs similarly or better than the 8 state-of-the-art TL-based CPDP techniques on 9 different performance metrics over 24 projects. In conclusion, (1) we show that eCPDP is an applicable CPDP model at the unit testing phase. (2) It can help practitioners find and fix defects in an earlier phase than other TL-based CPDP techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparing Hyperparameter Optimization in Cross- and Within-Project Defect Prediction: A Case Study

Article 28 September 2018

A software defect prediction method with metric compensation based on feature selection and transfer learning

Article 04 April 2022

Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction

Article 16 August 2017

Notes

References

Arcuri, A., & Briand, L. (2011). A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: 2011 33rd International Conference on Software Engineering (ICSE), IEEE, pp 1–10.
Ba, Q., Li, X., & Bai, Z. (2013). Clustering collaborative filtering recommendation system based on svd algorithm. In: 2013 IEEE 4th International Conference on Software Engineering and Service Science, IEEE, pp 963–967.
Bennin, K. E., Toda, K., Kamei, Y., Keung, J., Monden, A., & Ubayashi, N. (2016). Empirical evalua- tion of cross-release effort-aware defect prediction models. In: 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS), IEEE, pp 214–221.
Brunton, S. L., & Kutz, J. N. (2019). Data-driven science and engineering: Machine learning, dy- namical systems, and control. Cambridge University Press.
Book MATH Google Scholar
Chen, L., Fang, B., Shang, Z., & Tang, Y. (2015). Negative samples reduction in cross-company software defects prediction. Information and Software Technology, 62, 67–77.
Article Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences–second edition. 12 lawrence erlbaum associates inc. Hillsdale, New Jersey 13.
Cruz, A. E. C., & Ochimizu, K. (2009). Towards logistic regression models for predicting fault-prone code across software projects. In: 2009 3rd international symposium on empirical software engineering and measurement, IEEE, pp 460–463.
D’Ambros, M., Lanza, M., & Robbes, R. (2010). An extensive comparison of bug prediction ap- proaches. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), IEEE, pp 31–41.
Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7, 1–30.
MathSciNet MATH Google Scholar
Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics, 11(1), 86–92.
Article MathSciNet MATH Google Scholar
Gong, L., Jiang, S., Bo, L., Jiang, L., & Qian, J. (2020). A novel class-imbalance learning approach for both within-project and cross-project defect prediction. IEEE Transactions on Reli- Ability, 69(1), 40–54.
Article Google Scholar
Gretton, A., Borgwardt, K. M., Rasch, M. J., & Sch¨olkopf B, Smola A,. (2012). A kernel two-sample test. The Journal of Machine Learning Research, 13(1), 723–773.
MathSciNet MATH Google Scholar
He, Z., Shu, F., Yang, Y., Li, M., & Wang, Q. (2012). An investigation on the feasibility of cross project defect prediction. Automated Software Engineering, 1.
Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques (3rd ed.). Morgan Kaufmann.
MATH Google Scholar
Herbold, S. (2013). Training data selection for cross-project defect prediction. Proceedings of the 9th International Conference on Predictive Models in Software Engineering, pp 1–10.
Herbold, S., Trautsch, A., & Grabowski, J. (2018). A comparative study to benchmark cross-project defect prediction approaches. In: Proceedings of the 40th International Conference on Software Engineering, pp 1063–1063.
Hosseini, S., Turhan, B., & Gunarathna, D. (2017). A systematic literature review and meta-analysis on cross project defect prediction. IEEE Transactions on Software Engineering, 45(2), 111–147.
Article Google Scholar
Hosseini, S., Turhan, B., & M¨antyla¨ M,. (2018). A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. Informationand Software Technology, 95, 296–312.
Article Google Scholar
Jureczko, M., & Madeyski, L. (2010). Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering, pp 1–10.
Kang, J., Kwon, S., Ryu, D., & Baik, J. (2021). Haspo: Harmony search-based parameter optimiza- tion for just-in-time software defect prediction in maritime software. Applied Sciences, 11(5), 2002.
Kawata, K., Amasaki, S., & Yokogawa, T. (2015). Improving relevancy filter methods for cross- project defect prediction. In: 2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence, IEEE, pp 2–7.
Kwon, S., Ryu, D., & Baik, J. (2021). eCPDP: Early cross-project defect prediction. In: 2021 21th IEEE international Conference on Software Quality, Reliability, and Security (QRS), IEEE, pp 470–481.
Li, K., **ang, Z., Chen, T., & Tan, K. C. (2020a). Bilo-cpdp: bi-level programming for automated model discovery in cross-project defect prediction. In: 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, pp 573–584.
Li. K., **ang, Z., Chen, T., Wang, S., & Tan, K. C. (2020b). Understanding the automated param- eter optimization on transfer learning for cross-project defect prediction: an empirical study. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pp 566–577.
Li. Z., **g. X. Y., Zhu, X., & Zhang, H. (2017). Heterogeneous defect prediction through multi- ple kernel learning and ensemble learning. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE, pp 91–102.
Li, Z., Niu, J., **g, X. Y., Yu, W., & Qi, C. (2021). Cross-project defect prediction via landmark selection-based kernelized discriminant subspace alignment. IEEE Transactions on Reliability, 70(3), 996–1013.
Article Google Scholar
Limsettho, N., Bennin, K. E., Keung, J. W., Hata, H., & Matsumoto, K. (2018). Cross project defect pre- diction using class distribution estimation and oversampling. Information and Software Technology, 100, 87–102.
Article Google Scholar
Liu, C., Yang, D., **a, X., Yan, M., & Zhang, X. (2019). A two-phase transfer learning model for cross-project defect prediction. Information and Software Technology, 107, 125–136.
Article Google Scholar
Mende, T., & Koschke, R. (2010). Effort-aware defect prediction models. In: 2010 14th European Conference on Software Maintenance and Reengineering, IEEE, pp 107–116.
Menzies, T., Dekhtyar, A., Distefano, J., & Greenwald, J. (2007). Problems with precision: response to comments on data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering 33(9), 637–640. https://doi.org/10.1109/TSE.2007
Misra, S., Adewumi, A., & Maskeliunas, R., Damaˇseviˇcius, R., Cafer, F. (2017). Unit testing in global software development environment. International Conference on Recent De- velopments in Science (pp. 309–317). Springer.
Google Scholar
Nam. J., Pan, S. J., & Kim, S. (2013). Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE), IEEE, pp 382–391.
Nemenyi, P. B. (1963). Distribution-free multiple comparisons. Princeton University.
Ni, C., Liu, W., Gu, Q., Chen, X., & Chen, D. (2017). Fesch: a feature selection method using clusters of hybrid-data for cross-project defect prediction. In: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), IEEE, vol 1, pp 51–56.
Panichella, A., Alexandru, C. V., Panichella, S., Bacchelli, A., & Gall, H. C. (2016). A search-based training algorithm for cost-aware defect prediction. Proceedings of the Genetic and Evolutionary Computation Conference, 2016, 1077–1084.
Google Scholar
Pascarella, L., Palomba, F., & Bacchelli, A. (2019). Fine-grained just-in-time defect prediction. Journal of Systems and Software, 150, 22–36.
Article Google Scholar
Planning, S. (2002). The economic impacts of inadequate infrastructure for software testing. National Institute of Standards and Technology.
Reddy, M. S., & Adilakshmi, T. (2014). Music recommendation system based on matrix factor- ization technique-svd. In: 2014 International Conference on Computer Communication and Informatics, IEEE, pp 1–6.
Shin, Y., Meneely, A., Williams, L., & Osborne, J. A. (2010). Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Transactions on Software Engineering, 37(6), 772–787.
Article Google Scholar
Sun, Z., Li, J., Sun, H., & He, L. (2021). Cfps: Collaborative filtering based source projects selection for cross-project defect prediction. Applied Soft Computing, 99, 106940.
Article Google Scholar
Tantithamthavorn, C., McIntosh, S., Hassan, A. E., & Matsumoto, K. (2018). The impact of au- tomated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering, 45(7), 683–711.
Article Google Scholar
Turhan, B., Menzies, T., Bener, A. B., & Di Stefano, J. (2009). On the relative value of cross- company and within-company data for defect prediction. Empirical Software Engineer- Ing, 14(5), 540–578.
Article Google Scholar
Wilcoxon, F. (1946). Individual comparisons of grouped data by ranking methods. Journal of Economic Entomology, 39(2), 269–270.
Article Google Scholar
**a, X., Lo, D., Pan, S. J., Nagappan, N., & Wang, X. (2016). Hydra: Massively compositional model for cross-project defect prediction. IEEE Transactions on Software Engineering, 42(10), 977–998.
Article Google Scholar
Xu, Z., Pang, S., Zhang, T., Luo, X. P., Liu, J., Tang, Y. T., Yu, X., & Xue, L. (2019). Cross project defect prediction via balanced distribution adaptation based transfer learning. Journal of Computer Science and Technology, 34(5), 1039–1062.
Article Google Scholar
Yatish, S., Jiarpakdee, J., Thongtanunam, P., & Tantithamthavorn, C. (2019). Mining software defects: should we consider affected releases? In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), IEEE, pp 654–665.
Yuan, X., Han, L., Qian, S., Xu, G., & Yan, H. (2019). Singular value decomposition based recom- mendation using imputed data. Knowledge-Based Systems, 163, 485–494.
Article Google Scholar
Zhang, F., Keivanloo, I., & Zou, Y. (2017). Data transformation in cross-project defect prediction. Empirical Software Engineering, 22(6), 3186–3218.
Article Google Scholar
Zhang, H., & Cheung, S. C. (2013). A cost-effectiveness criterion for applying software defect pre- diction models. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pp 643–646.
Zhou, Y., Yang, Y., Lu, H., Chen, L., Li, Y., Zhao, Y., Qian, J., & Xu, B. (2018). How far we have progressed in the journey? An examination of cross-project defect prediction. ACM Transactions on Software Engineering and Methodology (TOSEM), 27(1), 1–51.
Article Google Scholar
Zimmermann, T., Nagappan, N., Gall, H., Giger, E., & Murphy, B. (2009). Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp 91–10.

Download references

Acknowledgements

The authors thank the Editor-in-Chief and the anonymous reviewers for their thoughtful comments and suggestions.

Funding

This research was supported by the National Research Foundation of Korea (NRF-2020R1F1A1071888), the Ministry of Science and ICT (MSIT), Korea, under the Information Technology Research Center (ITRC) support program supervised by the Institute of Information & Communications Technology Planning & Evaluation (IITP-2021-2020-0-01795), and the National Research Foundation of Korea (NRF) funded by the Korean Government through the Ministry of Education under Grant (NRF-2022R1I1A3069233).

Author information

Authors and Affiliations

Korea Advanced Institute of Science and Technology, Daejeon, South Korea
Sunjae Kwon & Jongmoon Baik
Jeonbuk National University, Jeonju, South Korea
Duksan Ryu

Authors

Sunjae Kwon
View author publications
You can also search for this author in PubMed Google Scholar
Duksan Ryu
View author publications
You can also search for this author in PubMed Google Scholar
Jongmoon Baik
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All the authors contributed equally to this work.

Corresponding author

Correspondence to Jongmoon Baik.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kwon, S., Ryu, D. & Baik, J. An effective approach to improve the performance of eCPDP (early cross-project defect prediction) via data-transformation and parameter optimization. Software Qual J 31, 1009–1044 (2023). https://doi.org/10.1007/s11219-023-09624-6

Download citation

Accepted: 23 February 2023
Published: 16 March 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11219-023-09624-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An effective approach to improve the performance of eCPDP (early cross-project defect prediction) via data-transformation and parameter optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparing Hyperparameter Optimization in Cross- and Within-Project Defect Prediction: A Case Study

A software defect prediction method with metric compensation based on feature selection and transfer learning

Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An effective approach to improve the performance of eCPDP (early cross-project defect prediction) via data-transformation and parameter optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparing Hyperparameter Optimization in Cross- and Within-Project Defect Prediction: A Case Study

A software defect prediction method with metric compensation based on feature selection and transfer learning

Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation