Abstract
Cross-project defect prediction (CPDP) utilizes other finished projects (i.e., source project) data to predict defects of the current working project. Transfer learning (TL) has been mainly applied at CPDP to improve prediction performance by alleviating the data distribution discrepancy between different projects. However, existing TL-based CPDP techniques are not applicable at the unit testing phase since they require the entire historical target project data. As a result, they lose the chance to increase the product’s reliability in the early phase by applying the prediction results. The objective of the present study is to increase the product’s reliability in the early phase by proposing a novel TL-based CPDP technique applicable at the unit testing phase (i.e., eCPDP). We utilize singular value decomposition (SVD), which only requires source project data for TL. eCPDP performs similarly or better than the 8 state-of-the-art TL-based CPDP techniques on 9 different performance metrics over 24 projects. In conclusion, (1) we show that eCPDP is an applicable CPDP model at the unit testing phase. (2) It can help practitioners find and fix defects in an earlier phase than other TL-based CPDP techniques.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11219-023-09624-6/MediaObjects/11219_2023_9624_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11219-023-09624-6/MediaObjects/11219_2023_9624_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11219-023-09624-6/MediaObjects/11219_2023_9624_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11219-023-09624-6/MediaObjects/11219_2023_9624_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11219-023-09624-6/MediaObjects/11219_2023_9624_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11219-023-09624-6/MediaObjects/11219_2023_9624_Fig6_HTML.png)
Similar content being viewed by others
Notes
References
Arcuri, A., & Briand, L. (2011). A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: 2011 33rd International Conference on Software Engineering (ICSE), IEEE, pp 1–10.
Ba, Q., Li, X., & Bai, Z. (2013). Clustering collaborative filtering recommendation system based on svd algorithm. In: 2013 IEEE 4th International Conference on Software Engineering and Service Science, IEEE, pp 963–967.
Bennin, K. E., Toda, K., Kamei, Y., Keung, J., Monden, A., & Ubayashi, N. (2016). Empirical evalua- tion of cross-release effort-aware defect prediction models. In: 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS), IEEE, pp 214–221.
Brunton, S. L., & Kutz, J. N. (2019). Data-driven science and engineering: Machine learning, dy- namical systems, and control. Cambridge University Press.
Chen, L., Fang, B., Shang, Z., & Tang, Y. (2015). Negative samples reduction in cross-company software defects prediction. Information and Software Technology, 62, 67–77.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences–second edition. 12 lawrence erlbaum associates inc. Hillsdale, New Jersey 13.
Cruz, A. E. C., & Ochimizu, K. (2009). Towards logistic regression models for predicting fault-prone code across software projects. In: 2009 3rd international symposium on empirical software engineering and measurement, IEEE, pp 460–463.
D’Ambros, M., Lanza, M., & Robbes, R. (2010). An extensive comparison of bug prediction ap- proaches. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), IEEE, pp 31–41.
Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7, 1–30.
Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings. The Annals of Mathematical Statistics, 11(1), 86–92.
Gong, L., Jiang, S., Bo, L., Jiang, L., & Qian, J. (2020). A novel class-imbalance learning approach for both within-project and cross-project defect prediction. IEEE Transactions on Reli- Ability, 69(1), 40–54.
Gretton, A., Borgwardt, K. M., Rasch, M. J., & Sch¨olkopf B, Smola A,. (2012). A kernel two-sample test. The Journal of Machine Learning Research, 13(1), 723–773.
He, Z., Shu, F., Yang, Y., Li, M., & Wang, Q. (2012). An investigation on the feasibility of cross project defect prediction. Automated Software Engineering, 1.
Han, J., Kamber, M., & Pei, J. (2012). Data mining: Concepts and techniques (3rd ed.). Morgan Kaufmann.
Herbold, S. (2013). Training data selection for cross-project defect prediction. Proceedings of the 9th International Conference on Predictive Models in Software Engineering, pp 1–10.
Herbold, S., Trautsch, A., & Grabowski, J. (2018). A comparative study to benchmark cross-project defect prediction approaches. In: Proceedings of the 40th International Conference on Software Engineering, pp 1063–1063.
Hosseini, S., Turhan, B., & Gunarathna, D. (2017). A systematic literature review and meta-analysis on cross project defect prediction. IEEE Transactions on Software Engineering, 45(2), 111–147.
Hosseini, S., Turhan, B., & M¨antyla¨ M,. (2018). A benchmark study on the effectiveness of search-based data selection and feature selection for cross project defect prediction. Informationand Software Technology, 95, 296–312.
Jureczko, M., & Madeyski, L. (2010). Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering, pp 1–10.
Kang, J., Kwon, S., Ryu, D., & Baik, J. (2021). Haspo: Harmony search-based parameter optimiza- tion for just-in-time software defect prediction in maritime software. Applied Sciences, 11(5), 2002.
Kawata, K., Amasaki, S., & Yokogawa, T. (2015). Improving relevancy filter methods for cross- project defect prediction. In: 2015 3rd International Conference on Applied Computing and Information Technology/2nd International Conference on Computational Science and Intelligence, IEEE, pp 2–7.
Kwon, S., Ryu, D., & Baik, J. (2021). eCPDP: Early cross-project defect prediction. In: 2021 21th IEEE international Conference on Software Quality, Reliability, and Security (QRS), IEEE, pp 470–481.
Li, K., **ang, Z., Chen, T., & Tan, K. C. (2020a). Bilo-cpdp: bi-level programming for automated model discovery in cross-project defect prediction. In: 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, pp 573–584.
Li. K., **ang, Z., Chen, T., Wang, S., & Tan, K. C. (2020b). Understanding the automated param- eter optimization on transfer learning for cross-project defect prediction: an empirical study. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pp 566–577.
Li. Z., **g. X. Y., Zhu, X., & Zhang, H. (2017). Heterogeneous defect prediction through multi- ple kernel learning and ensemble learning. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE, pp 91–102.
Li, Z., Niu, J., **g, X. Y., Yu, W., & Qi, C. (2021). Cross-project defect prediction via landmark selection-based kernelized discriminant subspace alignment. IEEE Transactions on Reliability, 70(3), 996–1013.
Limsettho, N., Bennin, K. E., Keung, J. W., Hata, H., & Matsumoto, K. (2018). Cross project defect pre- diction using class distribution estimation and oversampling. Information and Software Technology, 100, 87–102.
Liu, C., Yang, D., **a, X., Yan, M., & Zhang, X. (2019). A two-phase transfer learning model for cross-project defect prediction. Information and Software Technology, 107, 125–136.
Mende, T., & Koschke, R. (2010). Effort-aware defect prediction models. In: 2010 14th European Conference on Software Maintenance and Reengineering, IEEE, pp 107–116.
Menzies, T., Dekhtyar, A., Distefano, J., & Greenwald, J. (2007). Problems with precision: response to comments on data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering 33(9), 637–640. https://doi.org/10.1109/TSE.2007
Misra, S., Adewumi, A., & Maskeliunas, R., Damaˇseviˇcius, R., Cafer, F. (2017). Unit testing in global software development environment. International Conference on Recent De- velopments in Science (pp. 309–317). Springer.
Nam. J., Pan, S. J., & Kim, S. (2013). Transfer defect learning. In: 2013 35th international conference on software engineering (ICSE), IEEE, pp 382–391.
Nemenyi, P. B. (1963). Distribution-free multiple comparisons. Princeton University.
Ni, C., Liu, W., Gu, Q., Chen, X., & Chen, D. (2017). Fesch: a feature selection method using clusters of hybrid-data for cross-project defect prediction. In: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), IEEE, vol 1, pp 51–56.
Panichella, A., Alexandru, C. V., Panichella, S., Bacchelli, A., & Gall, H. C. (2016). A search-based training algorithm for cost-aware defect prediction. Proceedings of the Genetic and Evolutionary Computation Conference, 2016, 1077–1084.
Pascarella, L., Palomba, F., & Bacchelli, A. (2019). Fine-grained just-in-time defect prediction. Journal of Systems and Software, 150, 22–36.
Planning, S. (2002). The economic impacts of inadequate infrastructure for software testing. National Institute of Standards and Technology.
Reddy, M. S., & Adilakshmi, T. (2014). Music recommendation system based on matrix factor- ization technique-svd. In: 2014 International Conference on Computer Communication and Informatics, IEEE, pp 1–6.
Shin, Y., Meneely, A., Williams, L., & Osborne, J. A. (2010). Evaluating complexity, code churn, and developer activity metrics as indicators of software vulnerabilities. IEEE Transactions on Software Engineering, 37(6), 772–787.
Sun, Z., Li, J., Sun, H., & He, L. (2021). Cfps: Collaborative filtering based source projects selection for cross-project defect prediction. Applied Soft Computing, 99, 106940.
Tantithamthavorn, C., McIntosh, S., Hassan, A. E., & Matsumoto, K. (2018). The impact of au- tomated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering, 45(7), 683–711.
Turhan, B., Menzies, T., Bener, A. B., & Di Stefano, J. (2009). On the relative value of cross- company and within-company data for defect prediction. Empirical Software Engineer- Ing, 14(5), 540–578.
Wilcoxon, F. (1946). Individual comparisons of grouped data by ranking methods. Journal of Economic Entomology, 39(2), 269–270.
**a, X., Lo, D., Pan, S. J., Nagappan, N., & Wang, X. (2016). Hydra: Massively compositional model for cross-project defect prediction. IEEE Transactions on Software Engineering, 42(10), 977–998.
Xu, Z., Pang, S., Zhang, T., Luo, X. P., Liu, J., Tang, Y. T., Yu, X., & Xue, L. (2019). Cross project defect prediction via balanced distribution adaptation based transfer learning. Journal of Computer Science and Technology, 34(5), 1039–1062.
Yatish, S., Jiarpakdee, J., Thongtanunam, P., & Tantithamthavorn, C. (2019). Mining software defects: should we consider affected releases? In: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), IEEE, pp 654–665.
Yuan, X., Han, L., Qian, S., Xu, G., & Yan, H. (2019). Singular value decomposition based recom- mendation using imputed data. Knowledge-Based Systems, 163, 485–494.
Zhang, F., Keivanloo, I., & Zou, Y. (2017). Data transformation in cross-project defect prediction. Empirical Software Engineering, 22(6), 3186–3218.
Zhang, H., & Cheung, S. C. (2013). A cost-effectiveness criterion for applying software defect pre- diction models. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pp 643–646.
Zhou, Y., Yang, Y., Lu, H., Chen, L., Li, Y., Zhao, Y., Qian, J., & Xu, B. (2018). How far we have progressed in the journey? An examination of cross-project defect prediction. ACM Transactions on Software Engineering and Methodology (TOSEM), 27(1), 1–51.
Zimmermann, T., Nagappan, N., Gall, H., Giger, E., & Murphy, B. (2009). Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp 91–10.
Acknowledgements
The authors thank the Editor-in-Chief and the anonymous reviewers for their thoughtful comments and suggestions.
Funding
This research was supported by the National Research Foundation of Korea (NRF-2020R1F1A1071888), the Ministry of Science and ICT (MSIT), Korea, under the Information Technology Research Center (ITRC) support program supervised by the Institute of Information & Communications Technology Planning & Evaluation (IITP-2021-2020-0-01795), and the National Research Foundation of Korea (NRF) funded by the Korean Government through the Ministry of Education under Grant (NRF-2022R1I1A3069233).
Author information
Authors and Affiliations
Contributions
All the authors contributed equally to this work.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kwon, S., Ryu, D. & Baik, J. An effective approach to improve the performance of eCPDP (early cross-project defect prediction) via data-transformation and parameter optimization. Software Qual J 31, 1009–1044 (2023). https://doi.org/10.1007/s11219-023-09624-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-023-09624-6