Abstract
Due to the increasing volume of network traffic and growing complexity of network environment, rapid identification of heavy hitters is quite challenging. To deal with the massive data streams in real-time, accurate and scalable solution is required. The traditional method to keep an individual counter for each host in the whole data streams is very resource-consuming. This paper presents a new data structure called FCM and its associated algorithms. FCM combines the count-min sketch with the stream-summary structure simultaneously for efficient TOP-K heavy hitter identification in one pass. The key point of this algorithm is that it introduces a novel filter-and-jump mechanism. Given that the Internet traffic has the property of being heavy-tailed and hosts of low frequencies account for the majority of the IP addresses, FCM periodically filters the mice from input streams to efficiently improve the accuracy of TOP-K heavy hitter identification. On the other hand, considering that abnormal events are always time sensitive, our algorithm works by adjusting its measurement window to the newly arrived elements in the data streams automatically. Our experimental results demonstrate that the performance of FCM is superior to the previous related algorithm. Additionally this solution has a good prospect of application in advanced network environment.
Similar content being viewed by others
References
Zhao Q, Kumar A, Xu J (2005) Joint data streaming and sampling techniques for detection of super sources and destinations. Proc 5th ACM SIGCOMM Conf Internet Measure: 7–7
Kompella R, Singh S, Varghese G (2004) On scalable attack detection in the network. Proc 4th ACM SIGCOMM Conf Internet Measure: 187–200
Akamai (2016) Akamai Q1 2016 State of the Internet Security Report [Online]. https://content.akamai.com/PG6292-SOTI-Security.html
Shapsough S, Qatan F, Aburukba R, Aloul F, Ali A (2015) Smart grid cyber security: challenges and solutions. Int Conf Smart Grid Clean Energy Technol (ICSGCE): 170–175
Yao Y, **ong S, Qi H, Liu Y, Tolbert L, Cao Q (2014) Efficient histogram estimation for smart grid data processing with the Loglog-bloom-filter. IEEE Trans Smart Grid 6(1):199–208
Procopiou A, Komninos N (2015) Current and future threats framework in smart grid domain. IEEE Int Conf Cyber Technol Auto Contrl Intell Syst(CYBER): 1852–1857
Homem N, Carvalho J (December 2010) Finding top- k elements in data streams. Inf Sci 180(24):4958–4974
Roesch M (1999) Snort–lightweight intrusion detection for networks. Proc USENIX LISA 1999:229–238
Plonka D (2000) FlowScan: a network traffic flow reporting and visualization tool. Proc USENIX LISA 2000:305–317
Estan C, Varghese G, Fiskin M (2003) Bitmap algorithms for counting active flows on high speed links. Proc 3rd ACM SIGCOMM Conf Internet Measure: 153–166
Wang P, Guan X, Gong W, Towsley D (2011) A new virtual indexing method for measuring host connection degrees. INFOCOM 2011:156–160
S. Venkataraman, D. Song, P. Gibbons, and A. Blum, (2005) New streaming algorithms for fast detection of Superspreaders. Proc Netwk Distributed Syst Security Sym (NDSS): 149–166
Bandi N, Agrawal D, El A (2007) Fast Algorithms for heavy distinct hitters using associative memories. Int Conf Distrib Comput Syst: 6–6
Dimitropoulos X, Hurley P, Kind A (January 2008) Probabilistic lossy counting: an efficient algorithm for finding heavy hitters. ACM SIGCOMM Comput Commun Rev 38(1):5–5
Karp R, Shenker S, Papadimitriou C (March 2003) A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems (TODS) 28(1):51–55
Metwally A, Agrawal D, El A (2005) Efficient computation of frequent and top-k elements in data streams. Int Conf Database Theory. Springer Berlin Heidelberg: 398–412
M. Charikar, K. Chen, and M. Farach-Colton, (2002) Finding frequent items in data streams. International colloquium on automata, languages, and programming. Springer berlin Heidelberg, pp. 693–703
Cormode G (November 2014) Count-min sketch. Encyclopedia Algorithms Springer US 29(1):1–6
Huang Q, Lee P (2014) LD-sketch: a distributed sketching design for accurate and scalable anomaly detection in network data streams. Int Conf Comput Commun: 1420–1428
Anceaume E, Busnel Y, Rivetti N (2015) Estimating the frequency of data items in massive distributed streams. IEEE Sym Netwrk Cloud Comput Appl (NCCA): 59–66
Roy P, Khan A, Alonso G (2016) Augmented sketch: faster and more accurate stream processing. Proc 2016 Int Conf Manag Data: 1449–1463
Pitel G, Fouquier G (2015) Count-min-log sketch: approximately counting with approximate counters. 1st Int Sym Web Algorithms
Ben-Basat R, Einziger G, Friedman R, Kassner Y (2016) Heavy hitters in streams and sliding windows. IEEE INFOCOM 2016:1–9
Roy P, Teubner J, Alonso G (2012) Efficient frequent item counting in multi-core hardware. Proc 18th ACM SIGKDD Int Conf Knowledge Discov Data Mining: 1451–1459
Das S, Antony S, Agrawal D, El A (2009) Thread cooperation in multicore architectures for frequency counting over multiple data streams. Proc VLDB Endowment 2(1):217–228
Einziger G, Friedman R (2016) Counting with TinyTable: every bit counts!. Proc Int Conf Distrib Comput Network (ICDCN 2016), Article No 27
Homem N, Carvalho J (2011) Finding top-k elements in a time-sliding window. Evol Syst 2(1):51–70
Zhang Z, Wang B, Lan J (2015) Identifying elephant flows in internet backbone traffic with bloom filters and LRU. Comput Commun 61:70–78
Cafaro M, Pulimeno M, Epicoco I, Aloisio G (2016) Mining frequent items in the time fading model. Inf Sci 370:221–238
Cormode G, Hadjieleftheriou M (2010) Methods for finding frequent items in data streams. VLDB J 19(1):3–20
Cormode G, Hadjieleftheriou M (2008) Finding frequent items in data streams. Proc VLDB Endowment 1(2):1530–1541
Acknowledgements
This work was supported in part by the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No. XDA06010306 and the National Natural Science Foundation of China under Grant No. 61303241. Furthermore, this work is done also with the support of Chinese Academy of Sciences project under Grant No. CXJJ-16 M119.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tang, H., Wu, Y., Li, T. et al. Efficient Identification of TOP-K Heavy Hitters over Sliding Windows. Mobile Netw Appl 24, 1732–1741 (2019). https://doi.org/10.1007/s11036-018-1051-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11036-018-1051-x