A distributed platform for intrusion detection system using data stream mining in a big data environment

Schuartz, Fábio César; Fonseca, Mauro; Munaretto, Anelise

doi:10.1007/s12243-024-01046-0

A distributed platform for intrusion detection system using data stream mining in a big data environment

Original Research
Published: 08 June 2024

(2024)
Cite this article

Annals of Telecommunications Aims and scope Submit manuscript

Fábio César Schuartz ORCID: orcid.org/0000-0002-7545-415X¹,
Mauro Fonseca¹ &
Anelise Munaretto¹

58 Accesses
1 Citation
Explore all metrics

Abstract

With the growth of computer networks worldwide, there has been a greater need to protect local networks from malicious data that travel over the network. The increase in volume, speed, and variety of data requires a more robust, accurate intrusion detection system capable of analyzing a huge amount of data. This work proposes the creation of an intrusion detection system using stream classifiers and three classification layers—with and without a reduction in the number of features of the records and three classifiers in parallel with a voting system. The results obtained by the proposed system are compared against other models proposed in the literature, using two datasets to validate the proposed system. In all cases, gains in accuracy of up to 18.52% and 3.55% were obtained, using the datasets NSL-KDD and CICIDS2017, respectively. Reductions in classification time up to 35.51% and 94.90% were also obtained using the NSL-KDD and CICIDS2017 datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Challenges and Opportunities for Network Intrusion Detection in a Big Data Environment

An improved Hoeffding-ID data-stream classification algorithm

Article 26 November 2015

Real time intrusion detection system for ultra-high-speed big data environments

Article 23 February 2016

Availability of Data and Materials

Data is available; the author will be pleased to provide the date requested during publication process. Both datasets, the results obtained, as well as the manipulated data, are in a public repository [42].

References

Symantec (2019) Internet security threat report, vol 24. https://www.symantec.com/content/dam/symantec/docs/reports/istr-24-2019-en.pdf
Crowdstrike (2021) Global threat report. https://go.crowdstrike.com/rs/281-OBQ-266/images/Report2021GTR.pdf
Lopez MA, Lobato AGP, Duarte OCMB, Pujolle G (2018) An evaluation of a virtual network function for real-time threat detection using stream processing. In: 2018 Fourth international conference on mobile and secure services (MobiSecServ), Miami Beach, FL, pp 1–5. https://doi.org/10.1109/MOBISECSERV.2018.8311440
Wang F, Wang H, Xue L (2021) Research on data security in big data cloud computing environment. In: 2021 IEEE 5th advanced information technology, electronic and automation control conference (IAEAC), vol 5, pp 1446-1450. https://doi.org/10.1109/IAEAC50856.2021.9391025
Schuartz FC, Fonseca MSP, Munaretto A (2022) A distributed platform for intrusion detection system using data stream mining in a big data environment. In: 6th Cyber security in networking conference , Rio de Janeiro
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) NSL-KDD. https://www.unb.ca/cic/datasets/nsl.html
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP 99 data set. Submitted to Second IEEE symposium on computational intelligence for security and defense applications (CISDA),
Chae H-S, Jo B-O, Choi S-H, Park T-K (2013) Feature selection for intrusion detection using NSL-KDD. Recent Adv Comput Sci 20132:184–187
Google Scholar
Sharafaldin I, Lashkari AH, Ghorbani AA (2017) CICIDS2017. https://www.unb.ca/cic/datasets/ids-2017.html
Shone N, Ngoc TN, Phai VD, Shi Q (2018) A deep learning approach to network intrusion detection. IEEE Trans Emerg Top Comput Intell 2(1):41–50. https://doi.org/10.1109/TETCI.2017.2772792
Article Google Scholar
Wang Z (2018) Deep learning-based intrusion detection with adversaries. IEEE Access 6:38367–38384. https://doi.org/10.1109/ACCESS.2018.2854599
Article Google Scholar
Papamartzivanos D, Mármol FG, Kambourakis G (2019) Introducing deep learning self-adaptive misuse network intrusion detection systems. IEEE Access 7:13546–13560. https://doi.org/10.1109/ACCESS.2019.2893871
Article Google Scholar
Al-Qatf M, Lasheng Y, Al-Habib M, Al-Sabahi K (2018) Deep learning approach combining sparse autoencoder with SVM for network intrusion detection. IEEE Access 6:52843–52856. https://doi.org/10.1109/ACCESS.2018.2869577
Article Google Scholar
Schuartz FC, Fonseca MSP, Munaretto A (2019) Distributed system for threat detection in networks using machine learning. In: 1st Blockchain, robotics and ai for networking security conference - BRAINS 2019
Lopez MA, Mattos DMF, Duarte OCMB, Pujolle G (2019) Toward a monitoring and threat detection system based on stream processing as a virtual network function for big data. Concurr Comput Pract Exp 31(20). https://doi.org/10.1002/cpe.5344.
Viegas E, Santin A, Bessani A, Neves N (2019) BigFlow: real-time and reliable anomaly-based intrusion detection for high-speed networks. Future Gener Comput Syst 93:473–485. https://doi.org/10.1016/j.future.2018.09.051. ISSN 0167-739X
Alghushairy O, Alsini R, Ma X (2020) An efficient local outlier factor for data stream processing: a case study. In: 2020 International conference on computational science and computational intelligence (CSCI), pp 1525–1528. https://doi.org/10.1109/CSCI51800.2020.00282
Seth S, Singh G, Chahal KK (2021) Drift-based approach for evolving data stream classification in Intrusion detection system
ADWIN (2023) Jäger Computergesteuerte Messtechnik. https://www.adwin.de/index-us.html
Gadal S, Mokhtar R, Abdelhaq M, Alsaqour R, Ali ES, Saeed R (2022) Machine learning-based anomaly detection using K-Mean array and sequential minimal optimization. Electronics 11(14). https://doi.org/10.3390/electronics11142158
Jaradat AS, Barhoush MM, Easa RB (2022) Network intrusion detection system: machine learning approach. Indones J Electr Eng Comput Sci 25(2):1151–1158. https://doi.org/10.11591/ijeecs.v25.i2.pp1151-1158. ISSN: 2502-4752
Qazi EUH, Imran M, Haider N, Shoaib M, Razzak I (2022) An intelligent and efficient network intrusion detection system using deep learning. Comput Electr Eng 99:107764. https://doi.org/10.1016/j.compeleceng.2022.107764. ISSN 0045-7906
Kumar S, Pathak P, Agrawal K, Goswami V, Mahindru A (2023) Network intrusion detection system using machine learning. In: Noor A, Saroha K, Pricop E, Sen A, Trivedi G (eds) Proceedings of third emerging trends and technologies on intelligent systems. ETTIS 2023. Lecture notes in networks and systems, vol 730. Springer, Singapore. https://doi.org/10.1007/978-981-99-3963-3_56
Ansari S, Rajeev SG, Chandrashekar HS (2003) Packet sniffing: a brief introduction. IEEE Potentials 21(5):17–19. https://doi.org/10.1109/MP.2002.1166620
Article Google Scholar
Hsu C-H, Wang S-D (2013) An embedded NIDS with multi-core aware packet capture. In: 2013 IEEE 16th International conference on computational science and engineering, pp 778-785. https://doi.org/10.1109/CSE.2013.119
Masud MM, Al-khateeb T, Khan L, Thuraisingham B, Hamlen KW (2008) Flow-based identification of botnet traffic by mining multiple log files. In: 2008 First international conference on distributed framework and applications, pp 200–206. https://doi.org/10.1109/ICDFMA.2008.4784437
Mahfouz AM, Venugopal D, Shiva SG (2020) Comparative analysis of ML classifiers for network intrusion detection. Fourth international congress on information and communication technology, pp 193–207. isbn: 978-981-32-9343-4
Bhargava N, Sharma G, Bhargava R, Mathuria M (2013) Decision tree analysis on j48 algorithm for datamining. In: Proceedings of international journal of advanced research in computer science and software engineering, vol 3, Issue 6. ISSN: 2277 128X
Song YY, Lu Y (2015) Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 27(2):130-5. https://doi.org/10.11919/j.issn.1002-0829.215044. PMID: 26120265; PMCID: PMC4466856
Sun J (2010) Application of data mining for decision tree model of multi-variety discrete production and manufacture. 2010 Third international symposium on intelligent information technology and security informatics. **ggangshan pp 724–728. https://doi.org/10.1109/IITSI.2010.13
Aggarwal CC (2014) Data classification: algorithms and applications (1st ed.). Chapman & Hall/CRC. ISBN:1466586745 9781466586741
John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence (UAI’95), Philippe Besnard and Steve Hanks (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 338-345. ISBN:1-55860-385-9
Popescu M-C, Balas V, Perescu-Popescu L, Mastorakis N (2009) Multilayer perceptron and neural networks. WSEAS Transactions on Circuits and Systems. 8
Karimi Z, Kashani MMR, Harounabadi A (2023) Feature ranking in intrusion detection dataset using combination of filtering methods. Int J Comput Appl 78:21–27. https://doi.org/10.5120/13478-1164
Article Google Scholar
Hal Daume III (2020) A course in machine learning. http://ciml.info/dl/v0_99/ciml-v0_99-all.pdf
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324. https://doi.org/10.1016/S0004-3702(97)00043-X
Article Google Scholar
Alhaj TA, Siraj MM, Zainal A, Elshoush HT, Elhaj F (2016) Feature selection using information gain for improved structural-based alert correlation. Public Libr Sci 11(11):e0166017. https://doi.org/10.1371/journal.pone.0166017
Article Google Scholar
Bereziński P, Jasiul B, Szpyrka M (2015) An entropy-based network anomaly detection method. Entropy 17(4):2367–2408. https://doi.org/10.3390/e17042367
Article Google Scholar
Hastie T, Tibshirani R, Friedman J, Franklin J (2004) The elements of statistical learning: data mining, inference, and prediction. Math Intell 27:83–85. https://doi.org/10.1007/BF02985802
Article Google Scholar
Kurniabudi, Stiawan D, Darmawijoyo, Idris MYB, Bamhdi AM, Budiarto R (2020) CICIDS-2017 Dataset feature analysis with information gain for anomaly detection. IEEE Access 8:132911-132921. https://doi.org/10.1109/ACCESS.2020.3009843
Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. Proceedings of the 4th international conference on information systems security and privacy - vol 1: ICISSP, pp 108–116. https://doi.org/10.5220/0006639801080116
Schuartz FC, Fonseca MSP, Munaretto A (2020) Data worked on the NSL-KDD and CICIDS2017 datasets. https://doi.org/10.6084/m9.figshare.25656966
Information, C. S. U. of California. Kddcup 1999 data (1999). http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
Eid HF, Darwish A, Hassanien AE, Abraham A (2010) Principle components analysis and support vector machine based intrusion detection system. 2010 10th International conference on intelligent systems design and applications, pp 363-367. https://doi.org/10.1109/ISDA.2010.5687239
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP 99 data set 2009 IEEE symposium on computational intelligence for security and defense applications. Ottawa, ON pp 1–6. https://doi.org/10.1109/CISDA.2009.5356528
Hasan MAM, Xu S, Kabir MMJ, Ahmad S (2016) Performance evaluation of different kernels for support vector machine used in intrusion detection system. Int J Comput Netw Commun 8:39–53. https://doi.org/10.5121/ijcnc.2016.8604
Article Google Scholar
Bifet A, Holmes G, Pfahringer B, Frank E (2010) Fast perceptron decision tree learning from evolving data streams. Adv Knowl Discovery Data Mining, 299-310. isbn: 978-3-642-13672-6
Gomes HM, Bifet A, Read J, Barddal JP, Enembreck F, Pfahringer B, Holmes G, Abdessalem T (2017) Adaptive random forests for evolving data stream classification. Mach Learn 106:1–27. https://doi.org/10.1007/s10994-017-5642-8
Article MathSciNet Google Scholar
Bifet A, de Francisci Morales G, Read J, Holmes G, Pfahringer B (2015) Efficient online evaluation of big data stream classifiers. Proceedings of the 21th ACM SIGKDD International conference on knowledge discovery and data mining, pp 59-68. https://doi.org/10.1145/2783258.2783372
Apache Kafka, Apache Software Foundation. https://www.unb.ca/cic/datasets/ids-2017.html
Sahu SK, Sarangi S, Jena SK (2014) A detail analysis on intrusion detection datasets. 2014 IEEE International advance computing conference (IACC). Gurgaon, pp 1348–1353. https://doi.org/10.1109/IAdCC.2014.6779523
Van NT, Thinh TN, Sach LT (2017) An anomaly-based network intrusion detection system using deep learning. 2017 International conference on system science and engineering (ICSSE), Ho Chi Minh City, pp 210–214. https://doi.org/10.1109/ICSSE.2017.8030867
Kim K, Aminanto ME (2017) Deep learning in intrusion detection perspective: overview and further challenges. 2017 International workshop on big data and information security (IWBIS). Jakarta pp 5–10. https://doi.org/10.1109/IWBIS.2017.8275095
Alom MZ, Taha TM (2017) Network intrusion detection for cyber security using unsupervised deep learning approaches. In: 2017 IEEE National aerospace and electronics conference (NAECON). Dayton, OH pp 63–69. https://doi.org/10.1109/NAECON.2017.8268746
Žliobaitė I, Bifet A, Read J et al (2015) Evaluation methods and decision theory for classification of streaming data with temporal dependence. Mach Learn 98:455–482. https://doi.org/10.1007/s10994-014-5441-4
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Universidade Tecnólogica Federal do Paraná, Curitiba, PR, Brazil
Fábio César Schuartz, Mauro Fonseca & Anelise Munaretto

Authors

Fábio César Schuartz
View author publications
You can also search for this author in PubMed Google Scholar
Mauro Fonseca
View author publications
You can also search for this author in PubMed Google Scholar
Anelise Munaretto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fábio César Schuartz.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Schuartz, F.C., Fonseca, M. & Munaretto, A. A distributed platform for intrusion detection system using data stream mining in a big data environment. Ann. Telecommun. (2024). https://doi.org/10.1007/s12243-024-01046-0

Download citation

Received: 02 February 2023
Accepted: 23 May 2024
Published: 08 June 2024
DOI: https://doi.org/10.1007/s12243-024-01046-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A distributed platform for intrusion detection system using data stream mining in a big data environment

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Challenges and Opportunities for Network Intrusion Detection in a Big Data Environment

An improved Hoeffding-ID data-stream classification algorithm

Real time intrusion detection system for ultra-high-speed big data environments

Availability of Data and Materials

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A distributed platform for intrusion detection system using data stream mining in a big data environment

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Challenges and Opportunities for Network Intrusion Detection in a Big Data Environment

An improved Hoeffding-ID data-stream classification algorithm

Real time intrusion detection system for ultra-high-speed big data environments

Availability of Data and Materials

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation