Abstract
Concept drift is a common phenomenon appearing in evolving data streams of a wide range of applications including credit card fraud protection, weather forecast, network monitoring, etc. For online data streams it is difficult to determine a proper size of the sliding window for detection of concept drift, making the existing dataset-distance based algorithms not effective in application. In this paper, we propose a novel framework of Density-based Concept Drift Detection (DCDD) for detecting concept drifts in data streams using density-based clustering on a variable-size sliding window through dynamically adjusting the size of the sliding window. Our DCDD uses XGBoost (eXtreme Gradient Boosting) to predict the amount of data in the same concept and adjusts the size of the sliding window dynamically based on the collected information about concept drifting. To detect concept drift between two datasets, DCDD calculates the distance between the datasets using a new detection formula that considers the attribute of time as the weight for old data and calculates the distance between the data in the current sliding window and all data in the current concept rather than between two adjacent windows as used in the exiting work DCDA [2]. This yields an observable improvement on the detection accuracy and a significant improvement on the detection efficiency. Experimental results have shown that our framework detects the concept drift more accurately and efficiently than the existing work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, C.C., Yu, P.S., Han, J., Wang, J.: A framework for clustering evolving data streams. In: International Conference on Very Large Data Bases, pp. 81–92 (2003)
Cao, F., Liang, J., Bai, L., Zhao, X., Dang, C.: A framework for clustering categorical time-evolving data. IEEE Trans. Fuzzy Syst. 18(5), 872–882 (2010)
Chen, H.L., Chen, M.S., Lin, S.C.: Catching the trend: a framework for clustering concept-drifting categorical data. IEEE Trans. Knowl. Data Eng. 21(5), 652–665 (2009)
Chen, T., He, T., Benesty, M.: Xgboost: extreme gradient boosting (2015)
Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133–142 (2007)
Chi, Y., Song, X., Zhou, D., Hino, K., Tseng, B.L.: Evolutionary spectral clustering by incorporating temporal smoothness. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, pp. 153–162 (2007)
Cai, B., Hu, C., Ren, J.: Clustering over an evolving data stream based on grid density and correlation. ICIC Exp. Lett. 45(A), 1603–1609 (2010)
Corne, D., Handl, J., Knowles, J.: Evolutionary clustering. In: Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, pp. 332–337 (2006)
Cui, Z., Shen, H.: The framework of relative density-based clustering. In: Chen, G., Shen, H., Chen, M. (eds.) PAAP 2017. CCIS, vol. 729, pp. 343–352. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-6442-5_31
Gaber, M.M., Yu, P.S.: Detection and classification of changes in evolving data streams. Int. J. Inf. Technol. Decis. Mak. 05(5), 659–670 (2006)
Granger, R.H., Schlimmer, J.C.: Beyond incremental processing: tracking concept drift. In: Proceeding of the Twenty-Second International Conference on Very Large Databases, pp. 502–507 (1986)
Jia, C., Tan, C.Y., Yong, A.: A grid and density-based clustering algorithm for processing data stream. In: International Conference on Genetic and Evolutionary Computing, pp. 517–521 (2008)
Pawlak, Z.: Rough sets. Int. J. Comput. Inform. Sci. 11(5), 341–356 (1982)
Ren, J., Cai, B., Hu, C.: Clustering over data streams based on grid density and index tree. J. Converg. Inf. Technol. 6(1), 83–93 (2011)
Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: the algorithm gdbscan and its applications. Data Min. Knowl. Disc. 2(2), 169–194 (1998)
Souza, V.M.A., Chowdhury, F.A., Mueen, A.: Unsupervised drift detection on high-speed data streams. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 102–111 (2020)
Tsymbal, A., Pechenizkiy, M., Cunningham, X.: Dynamic integration of classifiers for handling concept drift. Information Fusion 9(1), 56–68 (2008)
Tu, L., Chen, Y.: Stream data clustering based on grid density and attraction. ACM Trans. Knowl. Discov. Data 3(3), 167–176 (2009)
Wang, P., **, N., Fehringer, G.: Concept drift detection with false positive rate for multi-label classification in iot data stream. In: 2020 International Conference on UK-China Emerging Technologies (UCET), pp. 1–4 (2020)
Acknowledgement
This work is supported by Macao Polytechnic University Research Grant RP/FCA- 13/2022. The corresponding author is Hong Shen.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Cui, Z., Tian, H., Shen, H. (2024). Effective Density-Based Concept Drift Detection for Evolving Data Streams. In: Park, J.S., Takizawa, H., Shen, H., Park, J.J. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2023. Lecture Notes in Electrical Engineering, vol 1112. Springer, Singapore. https://doi.org/10.1007/978-981-99-8211-0_18
Download citation
DOI: https://doi.org/10.1007/978-981-99-8211-0_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8210-3
Online ISBN: 978-981-99-8211-0
eBook Packages: Computer ScienceComputer Science (R0)