Abstract
The advances of 5G and the Internet of Things enable more devices and sensors to be interconnected. Unlike traditional data, the large amount of data generated from various sensors and devices requires real-time analysis. The data objects in a stream will change over time and only have a single access. Thus, traditional methods no longer meet the needs of fast exploratory data analysis for continuously generated data. Cluster tendency assessment is an effective method to determine the number of potential clusters. Recently, there are methods based on Visual Assessment of cluster Tendency (VAT) proposed for visualising cluster structures in streaming data using cluster heat maps. However, those heat maps rely on Euclidean distance that does not consider the data distribution characteristics. Consequently, it would be difficult to separate adjacent clusters of varied densities. In this paper, we discuss this issue for the latest inc-siVAT method, and propose to use a data-dependent kernel method to overcome it for clustering streaming data. Extensive evaluation on 7 large synthetic and real-world datasets shows the superiority of kernel-based inc-siVAT over 4 recently published state-of-the-art online and offline clustering algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The code of inc-IKiVAT is on https://github.com/charles-cao/inc-IKiVAT.
References
Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: Streamkm++ a clustering algorithm for data streams. J. Exp. Algorithmics (JEA) 17, 2–1 (2012)
Aggarwal, C.C., Philip, S.Y., Han, J., Wang, J.: A framework for clustering evolving data streams. In: Proceedings 2003 VLDB Conference, pp. 81–92. Elsevier (2003)
Arthur, D., Vassilvitskii, S.: K-means++: The advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, pp. 1027–1035. Society for Industrial and Applied Mathematics, USA (2007)
Bezdek, J. C. Hathaway, R.J.: Vat: a tool for visual assessment of (cluster) tendency. In: International Joint Conference on Neural Networks (2002)
Borgwardt, K.M., Gretton, A., Rasch, M.J., Kriegel, H.P., Schölkopf, B., Smola, A.J.: Integrating structured biological data by Kernel Maximum Mean Discrepancy. Bioinformatics 22(14), 49–57 (2006)
Cao, F., Estert, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the 2006 SIAM International Conference on Data Mining, pp. 328–339. SIAM (2006)
Chenaghlou, M., Moshtaghi, M., Leckie, C., Salehi, M.: Online clustering for evolving data streams with online anomaly detection. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10938, pp. 508–521. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93037-4_40
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: kdd, vol. 96, pp. 226–231 (1996)
Havens, T.C., Bezdek, J.C., Palaniswami, M.: Scalable single linkage hierarchical clustering for big data. In: 2013 IEEE Eighth International Conference on Intelligent Sensors, Sensor Networks and Information Processing, pp. 396–401. IEEE (2013)
Kang, Z., Lin, Z., Zhu, X., Xu, W.: Structured graph learning for scalable subspace clustering: from single view to multiview. IEEE Trans. Cybernetics (2021)
Kang, Z., Peng, C., Cheng, Q., Xu, Z.: Unified spectral clustering with optimal graph. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Kumar, D., Bezdek, J.C., Rajasegarar, S., Leckie, C., Palaniswami, M.: A visual-numeric approach to clustering and anomaly detection for trajectory data. Vis. Comput. 33(3), 265–281 (2017)
Kumar, D., Bezdek, J.C., Rajasegarar, S., Palaniswami, M., Leckie, C., Chan, J., Gubbi, J.: Adaptive cluster tendency visualization and anomaly detection for streaming data. ACM Trans. Knowl. Discovery Data (TKDD) 11(2), 1–40 (2016)
Li, Y., Hu, P., Liu, Z., Peng, D., Zhou, J.T., Peng, X.: Contrastive clustering. In: 2021 AAAI Conference on Artificial Intelligence (AAAI) (2021)
Liu, H., Wu, J., Liu, T., Tao, D., Fu, Y.: Spectral ensemble clustering via weighted k-means: theoretical and practical evidence. IEEE Trans. Knowl. Data Eng. 29(5), 1129–1143 (2017)
Qin, X., Ting, K.M., Zhu, Y., Lee, V.C.: Nearest-neighbour-induced isolation similarity and its impact on density-based clustering. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 4755–4762 (2019)
Rathore, P., Kumar, D., Bezdek, J.C., Rajasegarar, S., Palaniswami, M.: Visual structural assessment and anomaly detection for high-velocity data streams. IEEE Trans. Cybernetics 51(12), 5979–5992 (2021)
Ting, K.M., Liu, Z., Zhang, H., Zhu, Y.: A new distributional treatment for time series and an anomaly detection investigation. Proc. VLDB Endowment 15(11), 2321–2333 (2022)
Ting, K.M., Washio, T., Wells, J., Zhang, H., Zhu, Y.: Isolation kernel estimators. Knowledge and Information Systems, pp. 1–29 (2022)
Ting, K.M., Wells, J.R., Washio, T.: Isolation kernel: the x factor in efficient and effective large scale online kernel learning. Data Min. Knowl. Disc. 35(6), 2282–2312 (2021)
Ting, K.M., Xu, B.C., Washio, T., Zhou, Z.H.: Isolation distributional kernel: a new tool for kernel based anomaly detection. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 198–206 (2020)
Ting, K.M., Zhu, Y., Carman, M., Zhu, Y., Washio, T., Zhou, Z.H.: Lowest probability mass neighbour algorithms: relaxing the metric constraint in distance-based neighbourhood algorithms. Mach. Learn. 108(2), 331–376 (2019)
Vinh, N.X., Epps, J., Bailey, J.: Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1073–1080 (2009)
Wang, L., Nguyen, U.T., Bezdek, J.C., Leckie, C.A., Ramamohanarao, K.: ivat and avat: enhanced visual analysis for cluster tendency assessment. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 16–27. Springer (2010)
Yang, Y., Deng, S., Lu, J., Li, Y., Gong, Z., U, L.H., Hao, Z.: Graphlshc: towards large scale spectral hypergraph clustering. Inf. Sci. 544, 117–134 (2021)
Zhu, Y., Ting, K.M., Carman, M.J., Angelova, M.: Cdf transform-and-shift: an effective way to deal with datasets of inhomogeneous cluster densities. Pattern Recogn. 117, 107977 (2021)
Acknowledgements
This work is supported by Natural Science Foundation of Heilongjiang Province under grant number LH2021F015, National Foreign Cultural and Educational Expert Project under grant number G2021180008L.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, B. et al. (2023). An Improved Visual Assessment with Data-Dependent Kernel for Stream Clustering. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13935. Springer, Cham. https://doi.org/10.1007/978-3-031-33374-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-33374-3_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33373-6
Online ISBN: 978-3-031-33374-3
eBook Packages: Computer ScienceComputer Science (R0)