Abstract
Bug prediction is an approach that helps make bug detection more automated during software development. Based on a bug dataset a prediction model is built to locate future bugs. Bug datasets contain information about previous defects in the code, process metrics, or source code metrics, etc. As code smells can indicate potential flaws in the source code, they can be used for bug prediction as well.
In our previous work, we introduced several source code metrics to detect and describe the occurrence of Primitive Obsession in Java. This paper is a further study on three of the Primitive Obsession metrics. We integrated them into an existing, source code metrics-based bug dataset, and studied the effectiveness of the prediction built upon it. We performed a 10 fold cross-validation on the whole dataset and a cross-project validation as well. We compared the new models with the results of the original dataset. While the cross-validation showed no significant change, in the case of the cross-project validation, we have found that the amount of improvement exceeded the amount of deterioration by 5%. Furthermore, the variance added to the dataset was confirmed by correlation and PCA calculations.
This work was partially supported by grant 2018-1.2.1-NKP-2018-00004 “Security Enhancing Technologies for the IoT” funded by the Hungarian National Research, Development and Innovation Office. It was also supported by the European Union, co-financed by the European Social Fund (EFOP-3.6.3-VEKOP-16-2017-00002). The research was supported by the Ministry of Innovation and Technology NRDI Office within the framework of the Artificial Intelligence National Laboratory Program (MILAB).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
The precision can be calculated as the number of true positive elements divided by the sum of the true and false positive elements. It describes what portion of the identifications was actually correct.
- 3.
It describes what portion of the actual relevant elements was identified correctly.
- 4.
F-measure is the harmonic mean of precision and recall.
- 5.
The ROC curve displays the performance of a classification model at all classification threshold. The Area under the ROC curve (AUC) means the area under this curve. It is an aggregated measure of the performance across every classification thresholds.
- 6.
Weka calculates the Weighted Avg. F-Measure with the following formula for an n and y class: \(Weighted\ Avg.\ F-measure= \frac{F-measure(n) \cdot NumOfInstances(x) + F-measure(y) \cdot NumOfInstances(y)}{NumOfInstances(n) + NumOfInstances(y)}\), where NumOfInstances(n), NumOfInstances(y) correspond to the number of instances in the given class.
- 7.
Correlation shows the degree to which a pair of variables are linearly related. We used Pearson correlation for our experiment.
References
Becker, B., Mooney, C.: Categorizing compiler error messages with principal component analysis. In: Proceedings of the 12th China-Europe International Symposium on Software Engineering Education (CEISEE 2016), pp. 1–8 (2016). https://researchrepository.ucd.ie/handle/10197/7889
Behera, R.K., Rath, S.K., Misra, S., Leon, M., Adewumi, A.: Machine learning approach for reliability assessment of open source software. In: Misra, S., et al. (eds.) ICCSA 2019. LNCS, vol. 11622, pp. 472–482. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-24305-0_35
Boehm, B., Basili, V.R.: Software defect reduction top 10 list. Computer 34(1), 135–137 (2001). https://doi.org/10.1109/2.962984
Catal, C., Diri, B.: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009). https://doi.org/10.1016/j.eswa.2008.10.027. http://www.sciencedirect.com/science/article/pii/S0957417408007215
D’Ambros, M., Lanza, M., Robbes, R.: An extensive comparison of bug prediction approaches. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010), pp. 31–41 (2010). https://doi.org/10.1109/MSR.2010.5463279
Ferenc, R., Tóth, Z., Ladányi, G., Siket, I., Gyimóthy, T.: A public unified bug dataset for java and its assessment regarding metrics and bug prediction. Softw. Qual. J. (2020). https://doi.org/10.1007/s11219-020-09515-0
Fowler, M.: Refactoring: Improving the Design of Existing Code. Addison-Wesley, Boston (1999)
Frank, E., Hall, M.A., Witten, I.H.: Online Appendix for Data Mining: Practical Machine Learning Tools and Techniques, Fourth Edition. Morgan Kaufmann (2016)
Gál, P., Pengő, E.: Primitive enthusiasm: a road to primitive obsession. In: The 11th Conference of PhD Students in Computer Science, pp. 134–137. University of Szeged (2018)
Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Longman Publishing Co. Inc., USA (1995)
Godfrey, K.: Correlation methods. In: IFAC Proceedings, vol. 12, pp. 527–534 (1979). https://doi.org/10.1016/S1474-6670(17)53974-9. Tutorials presented at the 5th IFAC Symposium on Identification and System Parameter Estimation, Darmstadt, Germany, September
Gupta, A., Suri, B., Kumar, V., Misra, S., Blazauskas, T., Damasevicius, R.: Software code smell prediction model using shannon, rényi and tsallis entropies. Entropy 20, 372 (2018). https://doi.org/10.3390/e20050372
Gupta, A., Suri, B., Misra, S.: A systematic literature review: code bad smells in java source code. In: Gervasi, O., et al. (eds.) ICCSA 2017. LNCS, vol. 10408, pp. 665–682. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-62404-4_49
Gyimóthy, T., Ferenc, R., Siket, I.: Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans. Softw. Eng. 31(10), 897–910 (2005). https://doi.org/10.1109/TSE.2005.112
Hall, T., Zhang, M., Bowes, D., Sun, Y.: Some code smells have a significant but small effect on faults. ACM Trans. Softw. Eng. Methodol. 23(4) (2014). https://doi.org/10.1145/2629648
Jiang, Y., Cukic, B., Ma, Y.: Techniques for evaluating fault prediction models. Empirical Softw. Engg. 13(5), 561–595 (2008). https://doi.org/10.1007/s10664-008-9079-3
Jureczko, M., Madeyski, L.: Towards identifying software project clusters with regard to defect prediction. In: Proceedings of the 6th International Conference on Predictive Models in Software Engineering. PROMISE 2010. ACM (2010). https://doi.org/10.1145/1868328.1868342
Kaur, A., Kaur, I.: An empirical evaluation of classification algorithms for fault prediction in open source projects. J. King Saud Univ. Comput. Inf. Sci. 30 (2016). https://doi.org/10.1016/j.jksuci.2016.04.002
Khomh, F., Di Penta, M., Gueheneuc, Y.: An exploratory study of the impact of code smells on software change-proneness. In: 2009 16th Working Conference on Reverse Engineering, pp. 75–84 (2009). https://doi.org/10.1109/WCRE.2009.28
Mäntylä, M.V., Vanhanen, J., Lassenius, C.: A taxonomy and an initial empirical study of bad smells in code. In: Proceedings of the International Conference on Software Maintenance. ICSM, pp. 381–384. IEEE (2003). https://doi.org/10.1109/ICSM.2003.1235447
Menzies, T., Milton, Z., Turhan, B., Cukic, B., Jiang, Y., Bener, A.: Defect prediction from static code features: current results, limitations, new approaches. Autom. Softw. Eng. 17, 375–407 (2010). https://doi.org/10.1007/s10515-010-0069-5
Moonen, L., Yamashita, A.: Do code smells reflect important maintainability aspects? In: Proceedings of the 2012 IEEE International Conference on Software Maintenance. ICSM, pp. 306–315. IEEE (2012). https://doi.org/10.1109/ICSM.2012.6405287
Open Static Analyser GitHub Page: https://github.com/sed-inf-u-szeged/OpenStaticAnalyzer
Palomba, F., Zanoni, M., Fontana, F.A., De Lucia, A., Oliveto, R.: Toward a smell-aware bug prediction model. IEEE Trans. Softw. Eng. 45(2), 194–218 (2019). https://doi.org/10.1109/TSE.2017.2770122
Pengő, E., Gál, P.: Gras** primitive enthusiasm - approaching primitive obsession in steps. In: Proceedings of the 13th International Conference on Software Technologies. ICSOFT, pp. 389–396. INSTICC, SciTePress (2018). https://doi.org/10.5220/0006918804230430
Song, Q., Shepperd, M., Cartwright, M., Mair, C.: Software defect association mining and defect correction effort prediction. IEEE Trans. Softw. Eng. 32(2), 69–82 (2006). https://doi.org/10.1109/TSE.2006.1599417
Roperia, N.: JSmell: a bad smell detection tool for java systems. Master’s thesis, Maharishi Dayanand University (2009)
Shlens, J.: A tutorial on principal component analysis (2014)
Shukla, S., Behera, R.K., Misra, S., Rath, S.K.: Software reliability assessment using deep learning technique. In: Chakraverty, S., Goel, A., Misra, S. (eds.) Towards Extensible and Adaptable Methods in Computing, pp. 57–68. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-2348-5_5
Singh, V., Misra, S., Sharma, M.: Bug severity assessment in cross project context and identifying training candidates. J. Inf. Knowl. Manage. 16, 1750005 (2017). https://doi.org/10.1142/S0219649217500058
Sjøberg, D.I.K., Yamashita, A., Anda, B.C.D., Mockus, A., Dybå, T.: Quantifying the effect of code smells on maintenance effort. IEEE Trans. Softw. Eng. 39(8), 1144–1156 (2013). https://doi.org/10.1109/TSE.2012.89
Tóth, Z., Gyimesi, P., Ferenc, R.: A public bug database of github projects and its application in bug prediction. In: Gervasi, O., et al. (eds.) ICCSA 2016. LNCS, vol. 9789, pp. 625–638. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-42089-9_44
Wahono, R.S.: A systematic literature review of software defect prediction: research trends, datasets, methods and frameworks. J. Softw. Eng. 1, 1–16 (2015). https://doi.org/10.3923/JSE.2007.1.12
Weyuker, E., Ostrand, T., Bell, R.: Comparing the effectiveness of several modeling methods for fault prediction. Empirical Softw. Eng. 15, 277–295 (2010). https://doi.org/10.1007/s10664-009-9111-2
Yamashita, A., Moonen, L.: Exploring the impact of inter-smell relations on software maintainability: an empirical study. In: 2013 35th International Conference on Software Engineering (ICSE), pp. 682–691 (2013). https://doi.org/10.1109/ICSE.2013.6606614
Yu, Z., Rajlich, V.: Hidden dependencies in program comprehension and change propagation. In: Proceedings 9th International Workshop on Program Comprehension. IWPC 2001, pp. 293–299 (2001). https://doi.org/10.1109/WPC.2001.921739
Zhang, M., Hall, T., Baddoo, N.: Code bad smells: a review of current knowledge. J. Softw. Maintenance Evol. 23(3), 179–202 (2011). https://doi.org/10.1002/smr.521
Zimmermann, T., Nagappan, N., Gall, H., Giger, E., Murphy, B.: Cross-project defect prediction: a large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, pp. 91–100. ESEC/FSE 2009, Association for Computing Machinery, New York (2009). https://doi.org/10.1145/1595696.1595713
Zimmermann, T., Premraj, R., Zeller, A.: Predicting defects for eclipse. In: Proceedings of the Third International Workshop on Predictor Models in Software Engineering, p. 9. PROMISE 2007, IEEE (2007). https://doi.org/10.1109/PROMISE.2007.10
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Pengő, E. (2021). Examining the Bug Prediction Capabilities of Primitive Obsession Metrics. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2021. ICCSA 2021. Lecture Notes in Computer Science(), vol 12955. Springer, Cham. https://doi.org/10.1007/978-3-030-87007-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-87007-2_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87006-5
Online ISBN: 978-3-030-87007-2
eBook Packages: Computer ScienceComputer Science (R0)