Abstract
An important goal of Software Defect Prediction is to help increase the efficiency of software development. Increase in defect prediction accuracy would result in less resource consumption, and hence, effective feature selection techniques are needed to provide better inputs to the classifier. A carefully selected subset of features can not only increase prediction accuracy, but also result in less expensive computation. Wrapper methods for feature selection are based on evaluating feature subsets with a predetermined criterion. In this paper, the authors suggest an approach for feature selection by carefully evaluating feature subsets using clustering-based method. Cost based feature selection method based on Self-Organizing maps (CFSSOM) can be divided into three steps. First, computing the subsets of the feature set which are to be considered for evaluation. Second, clustering each of those feature subsets into two clusters. We use an ANN-based learning algorithm called self-organizing maps for clustering feature subsets. Third, applying labels on those two clusters based on data representation and measuring how strongly they are related to original labels. We have successfully implemented and proved that this feature selection technique improves prediction results based on experiments on PROMISE repository datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahmad A, Yusoff R, Ismail MN, Rosli NR (2017) Clustering the imbalanced datasets using modified Kohonen self-organizing map (KSOM). Comput Conf 2017:751–755. https://doi.org/10.1109/SAI.2017.8252180
Alsolai H, Roper M (2019) A systematic review of feature selection techniques in software quality prediction. In: 2019 ınternational conference on electrical and computing technologies and applications (ICECTA). Ras Al Khaimah, United Arab Emirates, pp 1–5. https://doi.org/10.1109/ICECTA48151.2019.8959566
Eibe F, Hall MA, Witten IH (2016) The WEKA workbench. Online appendix for “Data Mining: Practical Machine Learning Tools and Techniques”. Morgan Kaufmann, 4th edn
Hall MA (1998) Correlation-based feature subset selection for machine learning. Hamilton, New Zealand
Mishra B, Shukla KK (2011) Impact of attribute selection on defect proneness prediction in OO software. In: 2011 2nd ınternational conference on computer and communication technology (ICCCT-2011). Allahabad, India, pp 367–372. https://doi.org/10.1109/ICCCT.2011.6075151
Jiarpakdee J, Tantithamthavorn C, Ihara A, Matsumoto K (2016) A study of redundant metrics in defect prediction datasets. In: 2016 IEEE ınternational symposium on software reliability engineering workshops (ISSREW). Ottawa, ON, Canada, pp 51–52.https://doi.org/10.1109/ISSREW.2016.30
Khadijah AA, Wirawan PW, Kurniawan K (2020) The comparison of feature selection methods in software defect prediction. In: 2020 4th ınternational conference on ınformatics and computational sciences (ICICoS), pp 1–6. https://doi.org/10.1109/ICICoS51170.2020.9299022
Kira K, Rendell LA (1992) A practical approach to feature selection. In: Ninth ınternational workshop on machine learning, pp 249–256
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: European conference on machine learning, pp 171–182
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4): 491–502. https://doi.org/10.1109/TKDE.2005.66
Liu S, Chen X, Liu W, Chen J, Gu Q, Chen D (2014) FECAR: a feature selection framework for software defect prediction. In: 2014 IEEE 38th annual computer software and applications conference, pp 426–435. https://doi.org/10.1109/COMPSAC.2014.66.a
Mangla M, Sharma N, Mohanty SN (2022) A sequential ensemble model for software fault prediction. Innov Syst Softw Eng 18(2):301–308
Priyavrat SN, Sikka G (2021) Multimodal sentiment analysis of social media data: a review. In: Singh PK, Singh Y, Kolekar MH, Kar AK, Chhabra JK, Sen A (eds) Recent ınnovations in computing. ICRIC 2020. Lecture notes in electrical engineering, vol 701. Springer, Singapore. https://doi.org/10.1007/978-981-15-8297-4_44
Putri SA (2017) Combining integreted sampling technique with feature selection for software defect prediction. In: 2017 5th ınternational conference on cyber and IT service management (CITSM), pp 1–6. https://doi.org/10.1109/CITSM.2017.8089264
Qiu Y, Liu Y, Liu A, Zhu J, Xu J (2019) Automatic feature exploration and an application in defect prediction. IEEE Access 7:112097–112112. https://doi.org/10.1109/ACCESS.2019.2934530
Robnik-Sikonja M, Kononenko I (1997) An adaptation of relief for attribute estimation in regression. In: Fourteenth ınternational conference on machine learning, pp 296–304
Sharma N, Awasthi LK, Mangla M, Sharma KP, Kumar R (eds) (2022) Cyber-physical systems: a comprehensive guide, 1st edn. Chapman and Hall/CRC. https://doi.org/10.1201/9781003202752
Shrivastava VK, Shrivastava A, Sharma N et al (2022) Deep learning model for temperature prediction: an empirical study. Model Earth Syst Environ. https://doi.org/10.1007/s40808-022-01609-x
Somol P, Novovičová J (2010) Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality. IEEE Trans Pattern Anal Mach Intell 32(11):1921–1939. https://doi.org/10.1109/TPAMI.2010.34
Tong H; Liu B; Wang S (2017) “Benchmark data sets”, Mendeley Data, V1, https://doi.org/10.17632/923xvkk5mm
Xu Z, Xuan J, Liu J, Cui X (2016a) MICHAC: defect prediction via feature selection based on maximal ınformation coefficient with hierarchical agglomerative clustering. In: 2016a IEEE 23rd ınternational conference on software analysis, evolution, and reengineering (SANER), pp 370–381. https://doi.org/10.1109/SANER.2016.34
Xu Z, Liu J, Yang Z, An G, Jia X (2016b) The ımpact of feature selection on defect prediction performance: an empirical comparison. In: 2016b IEEE 27th ınternational symposium on software reliability engineering (ISSRE), pp 309–320. https://doi.org/10.1109/ISSRE.2016.13
Yadav S, Tomar P, Nehra V, Sharma N (2022) Hybrid model for software fault prediction. In: Cyber-physical systems. Chapman and Hall/CRC, pp 85–103
Yu Q, Qian J, Jiang S, Wu Z, Zhang G (2019) An empirical study on the effectiveness of feature selection for cross-project defect prediction. IEEE Access 7:35710–35718. https://doi.org/10.1109/ACCESS.2019.2895614
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Sharma, K.P., Shivam, Sharma, N., Sharma, R., Mishra, M. (2024). A Feature Selection Technique Using Self-Organizing Maps for Software Defect Prediction. In: Sharma, N., Mangla, M., Shinde, S.K. (eds) Big Data Analytics in Intelligent IoT and Cyber-Physical Systems. Transactions on Computer Systems and Networks. Springer, Singapore. https://doi.org/10.1007/978-981-99-4518-4_10
Download citation
DOI: https://doi.org/10.1007/978-981-99-4518-4_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-4517-7
Online ISBN: 978-981-99-4518-4
eBook Packages: EngineeringEngineering (R0)