Abstract
With the growth of computer networks worldwide, there has been a greater need to protect local networks from malicious data that travel over the network. The increase in volume, speed, and variety of data requires a more robust, accurate intrusion detection system capable of analyzing a huge amount of data. This work proposes the creation of an intrusion detection system using stream classifiers and three classification layers—with and without a reduction in the number of features of the records and three classifiers in parallel with a voting system. The results obtained by the proposed system are compared against other models proposed in the literature, using two datasets to validate the proposed system. In all cases, gains in accuracy of up to 18.52% and 3.55% were obtained, using the datasets NSL-KDD and CICIDS2017, respectively. Reductions in classification time up to 35.51% and 94.90% were also obtained using the NSL-KDD and CICIDS2017 datasets, respectively.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12243-024-01046-0/MediaObjects/12243_2024_1046_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12243-024-01046-0/MediaObjects/12243_2024_1046_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12243-024-01046-0/MediaObjects/12243_2024_1046_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12243-024-01046-0/MediaObjects/12243_2024_1046_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12243-024-01046-0/MediaObjects/12243_2024_1046_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12243-024-01046-0/MediaObjects/12243_2024_1046_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12243-024-01046-0/MediaObjects/12243_2024_1046_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs12243-024-01046-0/MediaObjects/12243_2024_1046_Fig8_HTML.png)
Similar content being viewed by others
Availability of Data and Materials
Data is available; the author will be pleased to provide the date requested during publication process. Both datasets, the results obtained, as well as the manipulated data, are in a public repository [42].
References
Symantec (2019) Internet security threat report, vol 24. https://www.symantec.com/content/dam/symantec/docs/reports/istr-24-2019-en.pdf
Crowdstrike (2021) Global threat report. https://go.crowdstrike.com/rs/281-OBQ-266/images/Report2021GTR.pdf
Lopez MA, Lobato AGP, Duarte OCMB, Pujolle G (2018) An evaluation of a virtual network function for real-time threat detection using stream processing. In: 2018 Fourth international conference on mobile and secure services (MobiSecServ), Miami Beach, FL, pp 1–5. https://doi.org/10.1109/MOBISECSERV.2018.8311440
Wang F, Wang H, Xue L (2021) Research on data security in big data cloud computing environment. In: 2021 IEEE 5th advanced information technology, electronic and automation control conference (IAEAC), vol 5, pp 1446-1450. https://doi.org/10.1109/IAEAC50856.2021.9391025
Schuartz FC, Fonseca MSP, Munaretto A (2022) A distributed platform for intrusion detection system using data stream mining in a big data environment. In: 6th Cyber security in networking conference , Rio de Janeiro
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) NSL-KDD. https://www.unb.ca/cic/datasets/nsl.html
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP 99 data set. Submitted to Second IEEE symposium on computational intelligence for security and defense applications (CISDA),
Chae H-S, Jo B-O, Choi S-H, Park T-K (2013) Feature selection for intrusion detection using NSL-KDD. Recent Adv Comput Sci 20132:184–187
Sharafaldin I, Lashkari AH, Ghorbani AA (2017) CICIDS2017. https://www.unb.ca/cic/datasets/ids-2017.html
Shone N, Ngoc TN, Phai VD, Shi Q (2018) A deep learning approach to network intrusion detection. IEEE Trans Emerg Top Comput Intell 2(1):41–50. https://doi.org/10.1109/TETCI.2017.2772792
Wang Z (2018) Deep learning-based intrusion detection with adversaries. IEEE Access 6:38367–38384. https://doi.org/10.1109/ACCESS.2018.2854599
Papamartzivanos D, Mármol FG, Kambourakis G (2019) Introducing deep learning self-adaptive misuse network intrusion detection systems. IEEE Access 7:13546–13560. https://doi.org/10.1109/ACCESS.2019.2893871
Al-Qatf M, Lasheng Y, Al-Habib M, Al-Sabahi K (2018) Deep learning approach combining sparse autoencoder with SVM for network intrusion detection. IEEE Access 6:52843–52856. https://doi.org/10.1109/ACCESS.2018.2869577
Schuartz FC, Fonseca MSP, Munaretto A (2019) Distributed system for threat detection in networks using machine learning. In: 1st Blockchain, robotics and ai for networking security conference - BRAINS 2019
Lopez MA, Mattos DMF, Duarte OCMB, Pujolle G (2019) Toward a monitoring and threat detection system based on stream processing as a virtual network function for big data. Concurr Comput Pract Exp 31(20). https://doi.org/10.1002/cpe.5344.
Viegas E, Santin A, Bessani A, Neves N (2019) BigFlow: real-time and reliable anomaly-based intrusion detection for high-speed networks. Future Gener Comput Syst 93:473–485. https://doi.org/10.1016/j.future.2018.09.051. ISSN 0167-739X
Alghushairy O, Alsini R, Ma X (2020) An efficient local outlier factor for data stream processing: a case study. In: 2020 International conference on computational science and computational intelligence (CSCI), pp 1525–1528. https://doi.org/10.1109/CSCI51800.2020.00282
Seth S, Singh G, Chahal KK (2021) Drift-based approach for evolving data stream classification in Intrusion detection system
ADWIN (2023) Jäger Computergesteuerte Messtechnik. https://www.adwin.de/index-us.html
Gadal S, Mokhtar R, Abdelhaq M, Alsaqour R, Ali ES, Saeed R (2022) Machine learning-based anomaly detection using K-Mean array and sequential minimal optimization. Electronics 11(14). https://doi.org/10.3390/electronics11142158
Jaradat AS, Barhoush MM, Easa RB (2022) Network intrusion detection system: machine learning approach. Indones J Electr Eng Comput Sci 25(2):1151–1158. https://doi.org/10.11591/ijeecs.v25.i2.pp1151-1158. ISSN: 2502-4752
Qazi EUH, Imran M, Haider N, Shoaib M, Razzak I (2022) An intelligent and efficient network intrusion detection system using deep learning. Comput Electr Eng 99:107764. https://doi.org/10.1016/j.compeleceng.2022.107764. ISSN 0045-7906
Kumar S, Pathak P, Agrawal K, Goswami V, Mahindru A (2023) Network intrusion detection system using machine learning. In: Noor A, Saroha K, Pricop E, Sen A, Trivedi G (eds) Proceedings of third emerging trends and technologies on intelligent systems. ETTIS 2023. Lecture notes in networks and systems, vol 730. Springer, Singapore. https://doi.org/10.1007/978-981-99-3963-3_56
Ansari S, Rajeev SG, Chandrashekar HS (2003) Packet sniffing: a brief introduction. IEEE Potentials 21(5):17–19. https://doi.org/10.1109/MP.2002.1166620
Hsu C-H, Wang S-D (2013) An embedded NIDS with multi-core aware packet capture. In: 2013 IEEE 16th International conference on computational science and engineering, pp 778-785. https://doi.org/10.1109/CSE.2013.119
Masud MM, Al-khateeb T, Khan L, Thuraisingham B, Hamlen KW (2008) Flow-based identification of botnet traffic by mining multiple log files. In: 2008 First international conference on distributed framework and applications, pp 200–206. https://doi.org/10.1109/ICDFMA.2008.4784437
Mahfouz AM, Venugopal D, Shiva SG (2020) Comparative analysis of ML classifiers for network intrusion detection. Fourth international congress on information and communication technology, pp 193–207. isbn: 978-981-32-9343-4
Bhargava N, Sharma G, Bhargava R, Mathuria M (2013) Decision tree analysis on j48 algorithm for datamining. In: Proceedings of international journal of advanced research in computer science and software engineering, vol 3, Issue 6. ISSN: 2277 128X
Song YY, Lu Y (2015) Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 27(2):130-5. https://doi.org/10.11919/j.issn.1002-0829.215044. PMID: 26120265; PMCID: PMC4466856
Sun J (2010) Application of data mining for decision tree model of multi-variety discrete production and manufacture. 2010 Third international symposium on intelligent information technology and security informatics. **ggangshan pp 724–728. https://doi.org/10.1109/IITSI.2010.13
Aggarwal CC (2014) Data classification: algorithms and applications (1st ed.). Chapman & Hall/CRC. ISBN:1466586745 9781466586741
John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence (UAI’95), Philippe Besnard and Steve Hanks (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 338-345. ISBN:1-55860-385-9
Popescu M-C, Balas V, Perescu-Popescu L, Mastorakis N (2009) Multilayer perceptron and neural networks. WSEAS Transactions on Circuits and Systems. 8
Karimi Z, Kashani MMR, Harounabadi A (2023) Feature ranking in intrusion detection dataset using combination of filtering methods. Int J Comput Appl 78:21–27. https://doi.org/10.5120/13478-1164
Hal Daume III (2020) A course in machine learning. http://ciml.info/dl/v0_99/ciml-v0_99-all.pdf
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324. https://doi.org/10.1016/S0004-3702(97)00043-X
Alhaj TA, Siraj MM, Zainal A, Elshoush HT, Elhaj F (2016) Feature selection using information gain for improved structural-based alert correlation. Public Libr Sci 11(11):e0166017. https://doi.org/10.1371/journal.pone.0166017
Bereziński P, Jasiul B, Szpyrka M (2015) An entropy-based network anomaly detection method. Entropy 17(4):2367–2408. https://doi.org/10.3390/e17042367
Hastie T, Tibshirani R, Friedman J, Franklin J (2004) The elements of statistical learning: data mining, inference, and prediction. Math Intell 27:83–85. https://doi.org/10.1007/BF02985802
Kurniabudi, Stiawan D, Darmawijoyo, Idris MYB, Bamhdi AM, Budiarto R (2020) CICIDS-2017 Dataset feature analysis with information gain for anomaly detection. IEEE Access 8:132911-132921. https://doi.org/10.1109/ACCESS.2020.3009843
Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. Proceedings of the 4th international conference on information systems security and privacy - vol 1: ICISSP, pp 108–116. https://doi.org/10.5220/0006639801080116
Schuartz FC, Fonseca MSP, Munaretto A (2020) Data worked on the NSL-KDD and CICIDS2017 datasets. https://doi.org/10.6084/m9.figshare.25656966
Information, C. S. U. of California. Kddcup 1999 data (1999). http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
Eid HF, Darwish A, Hassanien AE, Abraham A (2010) Principle components analysis and support vector machine based intrusion detection system. 2010 10th International conference on intelligent systems design and applications, pp 363-367. https://doi.org/10.1109/ISDA.2010.5687239
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP 99 data set 2009 IEEE symposium on computational intelligence for security and defense applications. Ottawa, ON pp 1–6. https://doi.org/10.1109/CISDA.2009.5356528
Hasan MAM, Xu S, Kabir MMJ, Ahmad S (2016) Performance evaluation of different kernels for support vector machine used in intrusion detection system. Int J Comput Netw Commun 8:39–53. https://doi.org/10.5121/ijcnc.2016.8604
Bifet A, Holmes G, Pfahringer B, Frank E (2010) Fast perceptron decision tree learning from evolving data streams. Adv Knowl Discovery Data Mining, 299-310. isbn: 978-3-642-13672-6
Gomes HM, Bifet A, Read J, Barddal JP, Enembreck F, Pfahringer B, Holmes G, Abdessalem T (2017) Adaptive random forests for evolving data stream classification. Mach Learn 106:1–27. https://doi.org/10.1007/s10994-017-5642-8
Bifet A, de Francisci Morales G, Read J, Holmes G, Pfahringer B (2015) Efficient online evaluation of big data stream classifiers. Proceedings of the 21th ACM SIGKDD International conference on knowledge discovery and data mining, pp 59-68. https://doi.org/10.1145/2783258.2783372
Apache Kafka, Apache Software Foundation. https://www.unb.ca/cic/datasets/ids-2017.html
Sahu SK, Sarangi S, Jena SK (2014) A detail analysis on intrusion detection datasets. 2014 IEEE International advance computing conference (IACC). Gurgaon, pp 1348–1353. https://doi.org/10.1109/IAdCC.2014.6779523
Van NT, Thinh TN, Sach LT (2017) An anomaly-based network intrusion detection system using deep learning. 2017 International conference on system science and engineering (ICSSE), Ho Chi Minh City, pp 210–214. https://doi.org/10.1109/ICSSE.2017.8030867
Kim K, Aminanto ME (2017) Deep learning in intrusion detection perspective: overview and further challenges. 2017 International workshop on big data and information security (IWBIS). Jakarta pp 5–10. https://doi.org/10.1109/IWBIS.2017.8275095
Alom MZ, Taha TM (2017) Network intrusion detection for cyber security using unsupervised deep learning approaches. In: 2017 IEEE National aerospace and electronics conference (NAECON). Dayton, OH pp 63–69. https://doi.org/10.1109/NAECON.2017.8268746
Žliobaitė I, Bifet A, Read J et al (2015) Evaluation methods and decision theory for classification of streaming data with temporal dependence. Mach Learn 98:455–482. https://doi.org/10.1007/s10994-014-5441-4
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Schuartz, F.C., Fonseca, M. & Munaretto, A. A distributed platform for intrusion detection system using data stream mining in a big data environment. Ann. Telecommun. (2024). https://doi.org/10.1007/s12243-024-01046-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12243-024-01046-0