A Feature Selection Technique Using Self-Organizing Maps for Software Defect Prediction

  • Chapter
  • First Online:
Big Data Analytics in Intelligent IoT and Cyber-Physical Systems

Abstract

An important goal of Software Defect Prediction is to help increase the efficiency of software development. Increase in defect prediction accuracy would result in less resource consumption, and hence, effective feature selection techniques are needed to provide better inputs to the classifier. A carefully selected subset of features can not only increase prediction accuracy, but also result in less expensive computation. Wrapper methods for feature selection are based on evaluating feature subsets with a predetermined criterion. In this paper, the authors suggest an approach for feature selection by carefully evaluating feature subsets using clustering-based method. Cost based feature selection method based on Self-Organizing maps (CFSSOM) can be divided into three steps. First, computing the subsets of the feature set which are to be considered for evaluation. Second, clustering each of those feature subsets into two clusters. We use an ANN-based learning algorithm called self-organizing maps for clustering feature subsets. Third, applying labels on those two clusters based on data representation and measuring how strongly they are related to original labels. We have successfully implemented and proved that this feature selection technique improves prediction results based on experiments on PROMISE repository datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Ahmad A, Yusoff R, Ismail MN, Rosli NR (2017) Clustering the imbalanced datasets using modified Kohonen self-organizing map (KSOM). Comput Conf 2017:751–755. https://doi.org/10.1109/SAI.2017.8252180

    Article  Google Scholar 

  • Alsolai H, Roper M (2019) A systematic review of feature selection techniques in software quality prediction. In: 2019 ınternational conference on electrical and computing technologies and applications (ICECTA). Ras Al Khaimah, United Arab Emirates, pp 1–5. https://doi.org/10.1109/ICECTA48151.2019.8959566

  • Eibe F, Hall MA, Witten IH (2016) The WEKA workbench. Online appendix for “Data Mining: Practical Machine Learning Tools and Techniques”. Morgan Kaufmann, 4th edn

    Google Scholar 

  • Hall MA (1998) Correlation-based feature subset selection for machine learning. Hamilton, New Zealand

    Google Scholar 

  • Mishra B, Shukla KK (2011) Impact of attribute selection on defect proneness prediction in OO software. In: 2011 2nd ınternational conference on computer and communication technology (ICCCT-2011). Allahabad, India, pp 367–372. https://doi.org/10.1109/ICCCT.2011.6075151

  • Jiarpakdee J, Tantithamthavorn C, Ihara A, Matsumoto K (2016) A study of redundant metrics in defect prediction datasets. In: 2016 IEEE ınternational symposium on software reliability engineering workshops (ISSREW). Ottawa, ON, Canada, pp 51–52.https://doi.org/10.1109/ISSREW.2016.30

  • Khadijah AA, Wirawan PW, Kurniawan K (2020) The comparison of feature selection methods in software defect prediction. In: 2020 4th ınternational conference on ınformatics and computational sciences (ICICoS), pp 1–6. https://doi.org/10.1109/ICICoS51170.2020.9299022

  • Kira K, Rendell LA (1992) A practical approach to feature selection. In: Ninth ınternational workshop on machine learning, pp 249–256

    Google Scholar 

  • Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324

    Article  MATH  Google Scholar 

  • Kononenko I (1994) Estimating attributes: analysis and extensions of RELIEF. In: European conference on machine learning, pp 171–182

    Google Scholar 

  • Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4): 491–502. https://doi.org/10.1109/TKDE.2005.66

  • Liu S, Chen X, Liu W, Chen J, Gu Q, Chen D (2014) FECAR: a feature selection framework for software defect prediction. In: 2014 IEEE 38th annual computer software and applications conference, pp 426–435. https://doi.org/10.1109/COMPSAC.2014.66.a

  • Mangla M, Sharma N, Mohanty SN (2022) A sequential ensemble model for software fault prediction. Innov Syst Softw Eng 18(2):301–308

    Article  Google Scholar 

  • Priyavrat SN, Sikka G (2021) Multimodal sentiment analysis of social media data: a review. In: Singh PK, Singh Y, Kolekar MH, Kar AK, Chhabra JK, Sen A (eds) Recent ınnovations in computing. ICRIC 2020. Lecture notes in electrical engineering, vol 701. Springer, Singapore. https://doi.org/10.1007/978-981-15-8297-4_44

  • Putri SA (2017) Combining integreted sampling technique with feature selection for software defect prediction. In: 2017 5th ınternational conference on cyber and IT service management (CITSM), pp 1–6. https://doi.org/10.1109/CITSM.2017.8089264

  • Qiu Y, Liu Y, Liu A, Zhu J, Xu J (2019) Automatic feature exploration and an application in defect prediction. IEEE Access 7:112097–112112. https://doi.org/10.1109/ACCESS.2019.2934530

    Article  Google Scholar 

  • Robnik-Sikonja M, Kononenko I (1997) An adaptation of relief for attribute estimation in regression. In: Fourteenth ınternational conference on machine learning, pp 296–304

    Google Scholar 

  • Sharma N, Awasthi LK, Mangla M, Sharma KP, Kumar R (eds) (2022) Cyber-physical systems: a comprehensive guide, 1st edn. Chapman and Hall/CRC. https://doi.org/10.1201/9781003202752

  • Shrivastava VK, Shrivastava A, Sharma N et al (2022) Deep learning model for temperature prediction: an empirical study. Model Earth Syst Environ. https://doi.org/10.1007/s40808-022-01609-x

    Article  Google Scholar 

  • Somol P, Novovičová J (2010) Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality. IEEE Trans Pattern Anal Mach Intell 32(11):1921–1939. https://doi.org/10.1109/TPAMI.2010.34

    Article  Google Scholar 

  • Tong H; Liu B; Wang S (2017) “Benchmark data sets”, Mendeley Data, V1, https://doi.org/10.17632/923xvkk5mm

  • Xu Z, Xuan J, Liu J, Cui X (2016a) MICHAC: defect prediction via feature selection based on maximal ınformation coefficient with hierarchical agglomerative clustering. In: 2016a IEEE 23rd ınternational conference on software analysis, evolution, and reengineering (SANER), pp 370–381. https://doi.org/10.1109/SANER.2016.34

  • Xu Z, Liu J, Yang Z, An G, Jia X (2016b) The ımpact of feature selection on defect prediction performance: an empirical comparison. In: 2016b IEEE 27th ınternational symposium on software reliability engineering (ISSRE), pp 309–320. https://doi.org/10.1109/ISSRE.2016.13

  • Yadav S, Tomar P, Nehra V, Sharma N (2022) Hybrid model for software fault prediction. In: Cyber-physical systems. Chapman and Hall/CRC, pp 85–103

    Google Scholar 

  • Yu Q, Qian J, Jiang S, Wu Z, Zhang G (2019) An empirical study on the effectiveness of feature selection for cross-project defect prediction. IEEE Access 7:35710–35718. https://doi.org/10.1109/ACCESS.2019.2895614

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krishna Pal Sharma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Sharma, K.P., Shivam, Sharma, N., Sharma, R., Mishra, M. (2024). A Feature Selection Technique Using Self-Organizing Maps for Software Defect Prediction. In: Sharma, N., Mangla, M., Shinde, S.K. (eds) Big Data Analytics in Intelligent IoT and Cyber-Physical Systems. Transactions on Computer Systems and Networks. Springer, Singapore. https://doi.org/10.1007/978-981-99-4518-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-4518-4_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-4517-7

  • Online ISBN: 978-981-99-4518-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Navigation