Log in

A distributed platform for intrusion detection system using data stream mining in a big data environment

  • Original Research
  • Published:
Annals of Telecommunications Aims and scope Submit manuscript

Abstract

With the growth of computer networks worldwide, there has been a greater need to protect local networks from malicious data that travel over the network. The increase in volume, speed, and variety of data requires a more robust, accurate intrusion detection system capable of analyzing a huge amount of data. This work proposes the creation of an intrusion detection system using stream classifiers and three classification layers—with and without a reduction in the number of features of the records and three classifiers in parallel with a voting system. The results obtained by the proposed system are compared against other models proposed in the literature, using two datasets to validate the proposed system. In all cases, gains in accuracy of up to 18.52% and 3.55% were obtained, using the datasets NSL-KDD and CICIDS2017, respectively. Reductions in classification time up to 35.51% and 94.90% were also obtained using the NSL-KDD and CICIDS2017 datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Availability of Data and Materials

Data is available; the author will be pleased to provide the date requested during publication process. Both datasets, the results obtained, as well as the manipulated data, are in a public repository [42].

References

  1. Symantec (2019) Internet security threat report, vol 24. https://www.symantec.com/content/dam/symantec/docs/reports/istr-24-2019-en.pdf

  2. Crowdstrike (2021) Global threat report. https://go.crowdstrike.com/rs/281-OBQ-266/images/Report2021GTR.pdf

  3. Lopez MA, Lobato AGP, Duarte OCMB, Pujolle G (2018) An evaluation of a virtual network function for real-time threat detection using stream processing. In: 2018 Fourth international conference on mobile and secure services (MobiSecServ), Miami Beach, FL, pp 1–5. https://doi.org/10.1109/MOBISECSERV.2018.8311440

  4. Wang F, Wang H, Xue L (2021) Research on data security in big data cloud computing environment. In: 2021 IEEE 5th advanced information technology, electronic and automation control conference (IAEAC), vol 5, pp 1446-1450. https://doi.org/10.1109/IAEAC50856.2021.9391025

  5. Schuartz FC, Fonseca MSP, Munaretto A (2022) A distributed platform for intrusion detection system using data stream mining in a big data environment. In: 6th Cyber security in networking conference , Rio de Janeiro

  6. Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) NSL-KDD. https://www.unb.ca/cic/datasets/nsl.html

  7. Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP 99 data set. Submitted to Second IEEE symposium on computational intelligence for security and defense applications (CISDA),

  8. Chae H-S, Jo B-O, Choi S-H, Park T-K (2013) Feature selection for intrusion detection using NSL-KDD. Recent Adv Comput Sci 20132:184–187

    Google Scholar 

  9. Sharafaldin I, Lashkari AH, Ghorbani AA (2017) CICIDS2017. https://www.unb.ca/cic/datasets/ids-2017.html

  10. Shone N, Ngoc TN, Phai VD, Shi Q (2018) A deep learning approach to network intrusion detection. IEEE Trans Emerg Top Comput Intell 2(1):41–50. https://doi.org/10.1109/TETCI.2017.2772792

    Article  Google Scholar 

  11. Wang Z (2018) Deep learning-based intrusion detection with adversaries. IEEE Access 6:38367–38384. https://doi.org/10.1109/ACCESS.2018.2854599

    Article  Google Scholar 

  12. Papamartzivanos D, Mármol FG, Kambourakis G (2019) Introducing deep learning self-adaptive misuse network intrusion detection systems. IEEE Access 7:13546–13560. https://doi.org/10.1109/ACCESS.2019.2893871

    Article  Google Scholar 

  13. Al-Qatf M, Lasheng Y, Al-Habib M, Al-Sabahi K (2018) Deep learning approach combining sparse autoencoder with SVM for network intrusion detection. IEEE Access 6:52843–52856. https://doi.org/10.1109/ACCESS.2018.2869577

    Article  Google Scholar 

  14. Schuartz FC, Fonseca MSP, Munaretto A (2019) Distributed system for threat detection in networks using machine learning. In: 1st Blockchain, robotics and ai for networking security conference - BRAINS 2019

  15. Lopez MA, Mattos DMF, Duarte OCMB, Pujolle G (2019) Toward a monitoring and threat detection system based on stream processing as a virtual network function for big data. Concurr Comput Pract Exp 31(20). https://doi.org/10.1002/cpe.5344.

  16. Viegas E, Santin A, Bessani A, Neves N (2019) BigFlow: real-time and reliable anomaly-based intrusion detection for high-speed networks. Future Gener Comput Syst 93:473–485. https://doi.org/10.1016/j.future.2018.09.051. ISSN 0167-739X

  17. Alghushairy O, Alsini R, Ma X (2020) An efficient local outlier factor for data stream processing: a case study. In: 2020 International conference on computational science and computational intelligence (CSCI), pp 1525–1528. https://doi.org/10.1109/CSCI51800.2020.00282

  18. Seth S, Singh G, Chahal KK (2021) Drift-based approach for evolving data stream classification in Intrusion detection system

  19. ADWIN (2023) Jäger Computergesteuerte Messtechnik. https://www.adwin.de/index-us.html

  20. Gadal S, Mokhtar R, Abdelhaq M, Alsaqour R, Ali ES, Saeed R (2022) Machine learning-based anomaly detection using K-Mean array and sequential minimal optimization. Electronics 11(14). https://doi.org/10.3390/electronics11142158

  21. Jaradat AS, Barhoush MM, Easa RB (2022) Network intrusion detection system: machine learning approach. Indones J Electr Eng Comput Sci 25(2):1151–1158. https://doi.org/10.11591/ijeecs.v25.i2.pp1151-1158. ISSN: 2502-4752

  22. Qazi EUH, Imran M, Haider N, Shoaib M, Razzak I (2022) An intelligent and efficient network intrusion detection system using deep learning. Comput Electr Eng 99:107764. https://doi.org/10.1016/j.compeleceng.2022.107764. ISSN 0045-7906

  23. Kumar S, Pathak P, Agrawal K, Goswami V, Mahindru A (2023) Network intrusion detection system using machine learning. In: Noor A, Saroha K, Pricop E, Sen A, Trivedi G (eds) Proceedings of third emerging trends and technologies on intelligent systems. ETTIS 2023. Lecture notes in networks and systems, vol 730. Springer, Singapore. https://doi.org/10.1007/978-981-99-3963-3_56

  24. Ansari S, Rajeev SG, Chandrashekar HS (2003) Packet sniffing: a brief introduction. IEEE Potentials 21(5):17–19. https://doi.org/10.1109/MP.2002.1166620

    Article  Google Scholar 

  25. Hsu C-H, Wang S-D (2013) An embedded NIDS with multi-core aware packet capture. In: 2013 IEEE 16th International conference on computational science and engineering, pp 778-785. https://doi.org/10.1109/CSE.2013.119

  26. Masud MM, Al-khateeb T, Khan L, Thuraisingham B, Hamlen KW (2008) Flow-based identification of botnet traffic by mining multiple log files. In: 2008 First international conference on distributed framework and applications, pp 200–206. https://doi.org/10.1109/ICDFMA.2008.4784437

  27. Mahfouz AM, Venugopal D, Shiva SG (2020) Comparative analysis of ML classifiers for network intrusion detection. Fourth international congress on information and communication technology, pp 193–207. isbn: 978-981-32-9343-4

  28. Bhargava N, Sharma G, Bhargava R, Mathuria M (2013) Decision tree analysis on j48 algorithm for datamining. In: Proceedings of international journal of advanced research in computer science and software engineering, vol 3, Issue 6. ISSN: 2277 128X

  29. Song YY, Lu Y (2015) Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 27(2):130-5. https://doi.org/10.11919/j.issn.1002-0829.215044. PMID: 26120265; PMCID: PMC4466856

  30. Sun J (2010) Application of data mining for decision tree model of multi-variety discrete production and manufacture. 2010 Third international symposium on intelligent information technology and security informatics. **ggangshan pp 724–728. https://doi.org/10.1109/IITSI.2010.13

  31. Aggarwal CC (2014) Data classification: algorithms and applications (1st ed.). Chapman & Hall/CRC. ISBN:1466586745 9781466586741

  32. John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence (UAI’95), Philippe Besnard and Steve Hanks (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 338-345. ISBN:1-55860-385-9

  33. Popescu M-C, Balas V, Perescu-Popescu L, Mastorakis N (2009) Multilayer perceptron and neural networks. WSEAS Transactions on Circuits and Systems. 8

  34. Karimi Z, Kashani MMR, Harounabadi A (2023) Feature ranking in intrusion detection dataset using combination of filtering methods. Int J Comput Appl 78:21–27. https://doi.org/10.5120/13478-1164

    Article  Google Scholar 

  35. Hal Daume III (2020) A course in machine learning. http://ciml.info/dl/v0_99/ciml-v0_99-all.pdf

  36. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324. https://doi.org/10.1016/S0004-3702(97)00043-X

    Article  Google Scholar 

  37. Alhaj TA, Siraj MM, Zainal A, Elshoush HT, Elhaj F (2016) Feature selection using information gain for improved structural-based alert correlation. Public Libr Sci 11(11):e0166017. https://doi.org/10.1371/journal.pone.0166017

    Article  Google Scholar 

  38. Bereziński P, Jasiul B, Szpyrka M (2015) An entropy-based network anomaly detection method. Entropy 17(4):2367–2408. https://doi.org/10.3390/e17042367

    Article  Google Scholar 

  39. Hastie T, Tibshirani R, Friedman J, Franklin J (2004) The elements of statistical learning: data mining, inference, and prediction. Math Intell 27:83–85. https://doi.org/10.1007/BF02985802

    Article  Google Scholar 

  40. Kurniabudi, Stiawan D, Darmawijoyo, Idris MYB, Bamhdi AM, Budiarto R (2020) CICIDS-2017 Dataset feature analysis with information gain for anomaly detection. IEEE Access 8:132911-132921. https://doi.org/10.1109/ACCESS.2020.3009843

  41. Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. Proceedings of the 4th international conference on information systems security and privacy - vol 1: ICISSP, pp 108–116. https://doi.org/10.5220/0006639801080116

  42. Schuartz FC, Fonseca MSP, Munaretto A (2020) Data worked on the NSL-KDD and CICIDS2017 datasets. https://doi.org/10.6084/m9.figshare.25656966

  43. Information, C. S. U. of California. Kddcup 1999 data (1999). http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

  44. Eid HF, Darwish A, Hassanien AE, Abraham A (2010) Principle components analysis and support vector machine based intrusion detection system. 2010 10th International conference on intelligent systems design and applications, pp 363-367. https://doi.org/10.1109/ISDA.2010.5687239

  45. Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP 99 data set 2009 IEEE symposium on computational intelligence for security and defense applications. Ottawa, ON pp 1–6. https://doi.org/10.1109/CISDA.2009.5356528

  46. Hasan MAM, Xu S, Kabir MMJ, Ahmad S (2016) Performance evaluation of different kernels for support vector machine used in intrusion detection system. Int J Comput Netw Commun 8:39–53. https://doi.org/10.5121/ijcnc.2016.8604

    Article  Google Scholar 

  47. Bifet A, Holmes G, Pfahringer B, Frank E (2010) Fast perceptron decision tree learning from evolving data streams. Adv Knowl Discovery Data Mining, 299-310. isbn: 978-3-642-13672-6

  48. Gomes HM, Bifet A, Read J, Barddal JP, Enembreck F, Pfahringer B, Holmes G, Abdessalem T (2017) Adaptive random forests for evolving data stream classification. Mach Learn 106:1–27. https://doi.org/10.1007/s10994-017-5642-8

    Article  MathSciNet  Google Scholar 

  49. Bifet A, de Francisci Morales G, Read J, Holmes G, Pfahringer B (2015) Efficient online evaluation of big data stream classifiers. Proceedings of the 21th ACM SIGKDD International conference on knowledge discovery and data mining, pp 59-68. https://doi.org/10.1145/2783258.2783372

  50. Apache Kafka, Apache Software Foundation. https://www.unb.ca/cic/datasets/ids-2017.html

  51. Sahu SK, Sarangi S, Jena SK (2014) A detail analysis on intrusion detection datasets. 2014 IEEE International advance computing conference (IACC). Gurgaon, pp 1348–1353. https://doi.org/10.1109/IAdCC.2014.6779523

  52. Van NT, Thinh TN, Sach LT (2017) An anomaly-based network intrusion detection system using deep learning. 2017 International conference on system science and engineering (ICSSE), Ho Chi Minh City, pp 210–214. https://doi.org/10.1109/ICSSE.2017.8030867

  53. Kim K, Aminanto ME (2017) Deep learning in intrusion detection perspective: overview and further challenges. 2017 International workshop on big data and information security (IWBIS). Jakarta pp 5–10. https://doi.org/10.1109/IWBIS.2017.8275095

  54. Alom MZ, Taha TM (2017) Network intrusion detection for cyber security using unsupervised deep learning approaches. In: 2017 IEEE National aerospace and electronics conference (NAECON). Dayton, OH pp 63–69. https://doi.org/10.1109/NAECON.2017.8268746

  55. Žliobaitė I, Bifet A, Read J et al (2015) Evaluation methods and decision theory for classification of streaming data with temporal dependence. Mach Learn 98:455–482. https://doi.org/10.1007/s10994-014-5441-4

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fábio César Schuartz.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schuartz, F.C., Fonseca, M. & Munaretto, A. A distributed platform for intrusion detection system using data stream mining in a big data environment. Ann. Telecommun. (2024). https://doi.org/10.1007/s12243-024-01046-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12243-024-01046-0

Keywords

Navigation