Log in

Evaluating the impact of filter-based feature selection in intrusion detection systems

  • Regular Contribution
  • Published:
International Journal of Information Security Aims and scope Submit manuscript

Abstract

High dimensionality can lead to overfitting and affect the modeling power of classification algorithms, resulting an increase in false positive rate (FPR) and false negative rate (FNR). Therefore, feature selection is a critical issue to deal with by means of efficient techniques when develo** Intrusion Detection Systems. This study seeks to assess and compare the impacts of five univariate filters (Relieff, Pearson Correlation, Mutual Information, ANOVA, and Chi2) with different selection thresholds, and three multivariate filters (Correlation-based feature subset selection, Double Input Symmetric Relevance and Consistency-based subset selection) on the performances of four classifiers (Multilayer Perceptron, Support Vector Machines, XGBoost and Random Forest) over CIC-IDS2017, CSE-CIC-IDS2018 and CIC-ToN-IoT intrusion detection datasets. We evaluate 228 variants of classifiers to determine the features that positively impact the classification efficiency of all the cyber-attack scenarios used. The obtained results show that using XGBoost and Random Forest trained with multivariate methods, such as CON and DISR, can effectively reduce the number of features without affecting the classification performance and detection rate, compared to other filtering methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

This article uses public datasets, which are publicly available online at: (https://www.unb.ca/cic/datasets/index.html) and (https://staff.itee.uq.edu.au/marius/NIDS_datasets/).

References

  1. Barry, B.I.A., Chan, H.A.: Intrusion detection systems. In: Handbook of information and communication security. Berlin and Heidelberg: Springer, p. 193-205 (2010), https://doi.org/10.1007/978-3-642-04117-4_10

  2. Lazarevic, A., Kumar, V., Srivastava, J.: Intrusion detection: a survey. In: Managing cyber threats. New York: Springer; p. 19-78 (2005), https://doi.org/10.1007/0-387-24230-9_2

  3. García-Teodoro, P., Díaz-Verdejo, J., Maciá-Fernández, G., Vázquez, E.: Anomaly-based network intrusion detection: techniques, systems and challenges. Comput Secur. 1–2, 18–28 (2009). https://doi.org/10.1016/j.cose.2008.08.003

    Article  Google Scholar 

  4. Hajisalem, V., Babaie, S.: A hybrid intrusion detection system based on ABC-AFS algorithm for misuse and anomaly detection. Computer Netw. 136, 37–50 (2018). https://doi.org/10.1016/j.comnet.2018.02.028

    Article  Google Scholar 

  5. Liu, H., Lang, B.: Machine learning and deep learning methods for intrusion detection systems: a survey. Appl. Sci. 9(20), 4396 (2019). https://doi.org/10.3390/app9204396

    Article  Google Scholar 

  6. Sy, B.K.: Signature-based approach for intrusion detection. In International workshop on machine learning and data mining in pattern recognition pp. 526-536 (2005), https://doi.org/10.1007/11510888_52

  7. Thakkar, A., Lohiya, R.: A survey on intrusion detection system: feature selection, model, performance measures, application perspective, challenges, and future research directions. Artif. Intell. Rev. 55, 453–563 (2022). https://doi.org/10.1007/s10462-021-10037-9

    Article  Google Scholar 

  8. Thakkar, A., Lohiya, R.: Attack classification using feature selection techniques: a comparative study. J. Ambient Intell. Humaniz Comput. 12(1), 1249–1266 (2021). https://doi.org/10.1007/S12652-020-02167-9

    Article  Google Scholar 

  9. Chen, P., Li, F., Wu, C.: Research on intrusion detection method based on Pearson correlation coefficient feature selection algorithm. J. Phys.: Conf. Series 1757(1), 012054 (2021). https://doi.org/10.1088/1742-6596/1757/1/012054

    Article  Google Scholar 

  10. Kononenko, I., Robnik-Sikonja, M., Pompe, U.: ReliefF for estimation and discretization of attributes in classification, regression, and ILP problems. Artif. Intell.: Methodol., Syst., Appl. 18, 31–40 (1996)

    Google Scholar 

  11. Urbanowicz, R.J., Meeker, M., La Cava, W., Olson, R.S., Moore, J.H.: Relief-based feature selection: introduction and review. J. Biomed. Inform. 1(85), 189–203 (2018). https://doi.org/10.1016/j.jbi.2018.07.014

    Article  Google Scholar 

  12. Vergara, J.R., Estévez, P.A.: A review of feature selection methods based on mutual information. Neural Comput. Appl. 24, 175–186 (2014). https://doi.org/10.1007/s00521-013-1368-0

    Article  Google Scholar 

  13. Kim, H.Y.: Analysis of variance (ANOVA) comparing means of more than two groups. Restor Dent Endod. 39(1), 74–7 (2014). https://doi.org/10.5395/rde.2015.39.1.74

    Article  Google Scholar 

  14. Liu, H.: Setiono R. Chi2: feature selection and discretization of numeric attributes. Proceedings of the international conference on tools with artificial intelligence. 388–391 pp. (1995), https://doi.org/10.1109/TAI.1995.479783

  15. Dash, M., Liu, H.: Consistency-based search in feature selection. Artif. Intell. 151(1–2), 155–176 (2003). https://doi.org/10.1016/S0004-3702(03)00079-1

    Article  MathSciNet  Google Scholar 

  16. Hall, MA., Lloyd, A.S.: Feature subset selection: a correlation based filter approach. In: International conference on neural information processing and intelligent information systems. pp. 855–858 (1997)

  17. Meyer, P.E., Schretter, C., Bontempi, G.: Information-theoretic feature selection in microarray data using variable complementarity. IEEE J. Select. Top. Sig. Process. 2(3), 261–274 (2008). https://doi.org/10.1109/JSTSP.2008.923858

    Article  Google Scholar 

  18. Benhar, H., Hosni, M., Idri, A.: Univariate and multivariate filter feature selection for heart disease classification. J. Inform. Sci. Eng. 38(4), 791–803 (2022)

    Google Scholar 

  19. Hosni, M., Idri, A., Abran, A.: Investigating heterogeneous ensembles with filter feature selection for software effort estimation. In Proceedings of the 27th international workshop on software measurement and 12th international conference on software process and product measurement. pp. 207–220 (2017), https://doi.org/10.1145/3143435.3143456

  20. Zaffar, M., Hashmani, M.A., Savita, K.S.: Performance analysis of feature selection algorithm for educational data mining, In 2017 IEEE Conference on big data and analytics (ICBDA), IEEE, pp. 7–12 (2017), https://doi.org/10.1109/ICBDAA.2017.8284099

  21. Labani, M., Moradi, P., Ahmadizar, F., Jalili, M.: A novel multivariate filter method for feature selection in text classification problems. Eng. Appl. Artif. Intell. 70, 25–37 (2018). https://doi.org/10.1016/j.engappai.2017.12.014

    Article  Google Scholar 

  22. Amiri, F., Rezaei Yousefi, M., Lucas, C., Shakery, A., Yazdani, N.: Mutual information-based feature selection for intrusion detection systems. J. Netw. Comput. Appl. 34(4), 1184–1199 (2011). https://doi.org/10.1016/J.JNCA.2011.01.002

    Article  Google Scholar 

  23. Thakkar, A., Lohiya, R.: Attack classification using feature selection techniques: a comparative study. J. Ambient Intell. Humaniz Comput. 12(1), 1249–1266 (2021). https://doi.org/10.1007/s12652-020-02167-9

    Article  Google Scholar 

  24. Thaseen, I.S., Kumar, C.A.: An integrated intrusion detection model using consistency based feature selection and LPBoost. In: 2016 Online international conference on green engineering and technologies (IC-GET). pp. 1–6 (2016), https://doi.org/10.1109/GET.2016.7916729

  25. Uzun, B., Ballı, S.: A novel method for intrusion detection in computer networks by identifying multivariate outliers and ReliefF feature selection. Neural Comput. Appl. 34(20), 17647–17662 (2022). https://doi.org/10.1007/s00521-022-07402-2

    Article  Google Scholar 

  26. Kurniabudi, D., Stiawan, D., Bin Idris, M.Y., Bamhdi, A.M., Budiarto, R.: "CICIDS-2017 Dataset Feature Analysis With Information Gain for Anomaly Detection," In: IEEE Access, vol. 8, pp. 132911–132921 (2020), https://doi.org/10.1109/ACCESS.2020.3009843

  27. Aksu, D., Üstebay, S., Aydin, M.A., Atmaca, T.: Intrusion detection with comparative analysis of supervised learning techniques and fisher score feature selection algorithm. Commun. Comput. Inform. Sci. 935, 141–149 (2018). https://doi.org/10.1007/978-3-030-00840-6_16

    Article  Google Scholar 

  28. Jairu, P., Mailewa, A.B.: Network anomaly uncovering on CICIDS-2017 dataset: a supervised artificial intelligence approach, IEEE International conference on electro information technology, vol. 2022-May, pp. 606–615 (2022), https://doi.org/10.1109/EIT53891.2022.9814045

  29. Van Efferen, L., Ali-Eldin, A.M.: A multi-layer perceptron approach for flow-based anomaly detection. In international symposium on networks, computers and communications (ISNCC). pp. 1–6 (2017), https://doi.org/10.1109/ISNCC.2017.8072036

  30. Dhaliwal, S.S., Nahid, A.A., Abbas, R.: Effective intrusion detection system using XGBoost. Information 9(7), 149 (2018). https://doi.org/10.3390/info9070149

    Article  Google Scholar 

  31. Ghosh, A., Ibrahim, H. M., Mohammad, W., Nova, F. C., Hasan, A., Rab, R.: CoWrap: an approach of feature selection for network anomaly detection, Lecture Notes in Networks and Systems, vol. 450 LNNS, pp. 547–559 (2022), https://doi.org/10.1007/978-3-030-99587-4_47

  32. Wu, J., Chen, X.Y., Zhang, H., **ong, L.D., Lei, H., Deng, S.H.: Hyperparameter optimization for machine learning models based on Bayesian optimization. J. Electr. Sci. Technol. 17(1), 26–40 (2019). https://doi.org/10.11989/JEST.1674-862X.80904120

    Article  Google Scholar 

  33. Jelihovschi, E., Faria, J.C., Allaman, I.B.: ScottKnott: a package for performing the Scott–Knott clustering algorithm in R. TEMA (Sño Carlos) 15(1), 3–17 (2014). https://doi.org/10.5540/TEMA.2014.015.01.0003

    Article  MathSciNet  Google Scholar 

  34. Venkatesh, B., Anuradha, J.: A review of feature selection and its methods. Cybern. Inform. Technol. 19(1), 3–26 (2019). https://doi.org/10.2478/cait-2019-0001

    Article  MathSciNet  Google Scholar 

  35. Bommert, A., Sun, X., Bischl, B., Rahnenführer, J., Lang, M.: Benchmark for filter methods for feature selection in high-dimensional classification data. Comput. Stat. Data Anal. 143, 106839 (2020). https://doi.org/10.1016/J.CSDA.2019.106839

    Article  MathSciNet  Google Scholar 

  36. Jovic, A., Brkic, K.,Bogunovic, N.: A review of feature selection methods with applications, In: 2015 38th International convention on information and communication technology, electronics and microelectronics (MIPRO), pp. 1200–1205 (2015), https://doi.org/10.1109/MIPRO.2015.7160458

  37. Nnamoko, N., Arshad, F., England, D., Vora, J., Norman, J.: Evaluation of filter and wrapper methods for feature selection in supervised machine learning. Age 21(81), 2–33 (2014)

    Google Scholar 

  38. Lal, T.N., Chapelle, O., Western, J., Elisseeff, A.: Embedded methods, studies in fuzziness and soft. Computing 207, 137–165 (2006). https://doi.org/10.1007/978-3-540-35488-8_6

    Article  Google Scholar 

  39. Jović, A., Brkić, K., and Bogunović, N.: A review of feature selection methods with applications, 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2015 - Proceedings, pp. 1200-1205,(2015), https://doi.org/10.1109/MIPRO.2015.7160458

  40. Belaoued, M., Mazouzi, S.: A chi-square-based decision for real-time malware detection using PE-file features. J. Inform. Process. Syst. 12(4), 644–660 (2016). https://doi.org/10.3745/JIPS.03.0058

    Article  Google Scholar 

  41. Steven, F.: Sawyer analysis of variance: the fundamental concepts. J. Manual Manipul. Therapy 17(2), 27E-38E (2009). https://doi.org/10.1179/JMT.2009.17.2.27E

    Article  MathSciNet  Google Scholar 

  42. Sluga, D., Lotrič, U.: Quadratic mutual information feature selection. Entropy (2017). https://doi.org/10.3390/E19040157

    Article  MathSciNet  Google Scholar 

  43. Angadi, S., Reddy, V.R.: Multimodal sentiment analysis using reliefF feature selection and random forest classifier. Int. J. Comput. Appl. 43(9), 931–939 (2021). https://doi.org/10.1080/1206212X.2019.1658054

    Article  Google Scholar 

  44. Jany Shabu, S., et al.: Research on intrusion detection method based on pearson correlation coefficient feature selection algorithm. J. Phys. Conf. Ser. 1757(1), 012054 (2021). https://doi.org/10.1088/1742-6596/1757/1/012054

    Article  Google Scholar 

  45. Wang, J., Zhang, D., Li, J.: PREAL: prediction of allergenic protein by maximum relevance minimum redundancy (mRMR) feature selection. BMC Syst. Biol. 7(SUPPL 5), 1–9 (2013). https://doi.org/10.1186/1752-0509-7-S5-S9/FIGURES/6

    Article  Google Scholar 

  46. Shin, K., Xu, X. M.: Consistency-based feature selection, Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol. 5711 LNAI, no. PART 1, pp. 342-350, (2009) https://doi.org/10.1007/978-3-642-04595-0_42

  47. Meyer, P.E., Bontempi, G.: On the use of variable complementarity for feature selection in cancer classification, Lecture Notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 3907 LNCS, pp. 91–102 (2006), https://doi.org/10.1007/11732242_9

  48. Vapnik, V.N.: The nature of statistical learning theory. Springer, Berlin (2000). https://doi.org/10.1007/978-1-4757-3264-1

    Book  Google Scholar 

  49. Deng, C., Wu, J., Shao, X.: Reliability assessment of machining accuracy on support vector machine. In: intelligent robotics and applications: first international conference, ICIRA 2008 Wuhan, China, Proceedings, Part II pp. 669-678 (2008). https://doi.org/10.1007/978-3-540-88518-4_72

  50. Deris, A.M., Zain, A.M., Sallehuddin, R.: Overview of support vector machine in modeling machining performances. Procedia Eng. 24, 308–312 (2011). https://doi.org/10.1016/j.proeng.2011.11.2647

    Article  Google Scholar 

  51. Lorena, A.C., De Carvalho, A.C.: Evolutionary tuning of SVM parameter values in multiclass problems. Neurocomputing 71(16–18), 3326–3334 (2008). https://doi.org/10.1016/j.neucom.2008.01.031

    Article  Google Scholar 

  52. Gardner, M.W., Dorling, S.R.: Artificial neural networks (the multilayer perceptron)-a review of applications in the atmospheric sciences. Atmosph. Environ. 32(14–15), 2627–2636 (1998). https://doi.org/10.1016/S1352-2310(97)00447-0

    Article  Google Scholar 

  53. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324/METRICS

    Article  Google Scholar 

  54. Bentéjac, C., Csörgő, A., Martínez-Muñoz, G.: A comparative analysis of gradient boosting algorithms. Artif. Intell. Rev. 54, 1937–1967 (2021). https://doi.org/10.1007/S10462-020-09896-5

    Article  Google Scholar 

  55. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, vol. 13–17-August-2016, pp. 785-794 (2016), https://doi.org/10.1145/2939672.2939785

  56. Di Francescomarino, C., Dumas, M., Federici, M., Ghidini, C., Maggi, F.M., Rizzi, W., Simonetto, L.: Genetic algorithms for hyperparameter optimization in predictive business process monitoring. Inform. Syst. 74, 67–83 (2018). https://doi.org/10.1016/j.is.2018.01.003

    Article  Google Scholar 

  57. Cai, J., Luo, J., Wang, S., Yang, S.: Feature selection in machine learning: a new perspective. Neurocomputing 300, 70–79 (2018). https://doi.org/10.1016/J.NEUCOM.2017.11.077

    Article  Google Scholar 

  58. Uzun, B., Ballı, S.: A novel method for intrusion detection in computer networks by identifying multivariate outliers and ReliefF feature selection. Neural Comput. Applicat. 34(20), 17647–17662 (2022). https://doi.org/10.1007/S00521-022-07402-2

    Article  Google Scholar 

  59. Haq, N.F., Onik, A.R., Shah, F.M.: An ensemble framework of anomaly detection using hybridized feature selection approach (HFSA). In 2015 SAI intelligent systems conference (IntelliSys) pp. 989–995 (2015), https://doi.org/10.1109/INTELLISYS.2015.7361264

  60. Zhou, Y., Cheng, G., Jiang, S., Dai, M.: Building an efficient intrusion detection system based on feature selection and ensemble classifier. Comput. Netw. 174, 107247 (2020). https://doi.org/10.1016/J.COMNET.2020.107247

    Article  Google Scholar 

  61. Linhares, T., Patel, A., Barros, A.L., Fernandez, M.: SDNTruth: innovative DDoS detection scheme for software-defined networks (SDN). J. Netw. Syst. Manag. 31(3), 55 (2023)

    Article  Google Scholar 

  62. IDS 2017 | Datasets | Research | Canadian Institute for Cybersecurity | UNB. https://www.unb.ca/cic/datasets/ids-2017.html (accessed Dec. 19, 2022)

  63. Canadian Institute for Cybersecurity | UNB. https://www.unb.ca/cic/ (accessed Jan. 03, 2023)

  64. Sharafaldin, I., Lashkari, A.H. and Ghorbani, A.A.: Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: ICISSp, 1, pp. 108–116 (2018). https://doi.org/10.5220/0006639801080116

  65. Sarhan, M., Layeghy, S., Portmann, M.: Evaluating standard feature sets towards increased generalisability and explainability of ML-based network intrusion detection. Big Data Res. 30, 100359 (2021). https://doi.org/10.1016/j.bdr.2022.100359

    Article  Google Scholar 

  66. Nour M,: ToN_datasets, IEEE Dataport, No. https://doi.org/10.21227/fesz-dm97. (2019)

  67. ToN_IoT datasets | IEEE DataPort. https://ieee-dataport.org/documents/toniot-datasets (accessed Dec. 21, 2022)

  68. Scott, A.J., Knott, M.A.: cluster analysis method for grou** means in the analysis of variance. Biometrics, pp. 507–512 (1974), https://doi.org/10.2307/2529204

  69. Azzeh, M., Nassif, A.B., Minku, L.L.: An empirical evaluation of ensemble adjustment methods for analogy-based effort estimation. J. Syst. Softw. 103, 36–52 (2015). https://doi.org/10.1016/J.JSS.2015.01.028

    Article  Google Scholar 

  70. Fitni, Q.R.S., and Ramli, K.: Implementation of ensemble learning and feature selection for performance improvements in anomaly-based intrusion detection systems. In: 2020 IEEE International conference on industry 4.0, artificial intelligence, and communications technology (IAICT) pp. 118–124 (2020), https://doi.org/10.1109/IAICT50021.2020.9172014

  71. Shahbaz, M.B., Wang, X., Behnad, A., Samarabandu, J.: On efficiency enhancement of the correlation-based feature selection for intrusion detection systems. In: IEEE 7th annual information technology, electronics and mobile communication conference (IEMCON) pp. 1–7 (2016), https://doi.org/10.1109/IEMCON.2016.7746286

  72. Mohammadi, S., Mirvaziri, H., Ghazizadeh-Ahsaee, M.: Multivariate correlation coefficient and mutual information-based feature selection in intrusion detection. Inform. Security J.: Global Persp. 26(5), 229–239 (2017). https://doi.org/10.1080/19393555.2017.1358779

  73. Schober, P., Boer, C., Schwarte, L.A.: Correlation Coefficients: Appropriate Use and Interpretation. Anesth Analg. 126(5), 1763–1768 (2018). https://doi.org/10.1213/ANE.0000000000002864

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

HZ and AI designed the experiments to empirically evaluate the impacts of feature selection in IDS. HZ implemented all the experiments. HZ and AI analysed and interpreted the results. All the authors prepared, reviewed and edited the paper.

Corresponding author

Correspondence to Ali Idri.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Tables 15, 16 and 17.

Table 15 Description list of features designed by CICFlowMeter tool
Table 16 Selected features in each dataset using univariate filters (each number refers to a feature ID)
Table 17 Selected features in each dataset using multivariate filters (each number refers to a feature ID)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zouhri, H., Idri, A. & Ratnani, A. Evaluating the impact of filter-based feature selection in intrusion detection systems. Int. J. Inf. Secur. 23, 759–785 (2024). https://doi.org/10.1007/s10207-023-00767-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10207-023-00767-y

Keywords

Navigation