Effective Density-Based Concept Drift Detection for Evolving Data Streams

  • Conference paper
  • First Online:
Parallel and Distributed Computing, Applications and Technologies (PDCAT 2023)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1112))

  • 144 Accesses

Abstract

Concept drift is a common phenomenon appearing in evolving data streams of a wide range of applications including credit card fraud protection, weather forecast, network monitoring, etc. For online data streams it is difficult to determine a proper size of the sliding window for detection of concept drift, making the existing dataset-distance based algorithms not effective in application. In this paper, we propose a novel framework of Density-based Concept Drift Detection (DCDD) for detecting concept drifts in data streams using density-based clustering on a variable-size sliding window through dynamically adjusting the size of the sliding window. Our DCDD uses XGBoost (eXtreme Gradient Boosting) to predict the amount of data in the same concept and adjusts the size of the sliding window dynamically based on the collected information about concept drifting. To detect concept drift between two datasets, DCDD calculates the distance between the datasets using a new detection formula that considers the attribute of time as the weight for old data and calculates the distance between the data in the current sliding window and all data in the current concept rather than between two adjacent windows as used in the exiting work DCDA [2]. This yields an observable improvement on the detection accuracy and a significant improvement on the detection efficiency. Experimental results have shown that our framework detects the concept drift more accurately and efficiently than the existing work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 117.69
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
EUR 160.49
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aggarwal, C.C., Yu, P.S., Han, J., Wang, J.: A framework for clustering evolving data streams. In: International Conference on Very Large Data Bases, pp. 81–92 (2003)

    Google Scholar 

  2. Cao, F., Liang, J., Bai, L., Zhao, X., Dang, C.: A framework for clustering categorical time-evolving data. IEEE Trans. Fuzzy Syst. 18(5), 872–882 (2010)

    Article  Google Scholar 

  3. Chen, H.L., Chen, M.S., Lin, S.C.: Catching the trend: a framework for clustering concept-drifting categorical data. IEEE Trans. Knowl. Data Eng. 21(5), 652–665 (2009)

    Article  Google Scholar 

  4. Chen, T., He, T., Benesty, M.: Xgboost: extreme gradient boosting (2015)

    Google Scholar 

  5. Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133–142 (2007)

    Google Scholar 

  6. Chi, Y., Song, X., Zhou, D., Hino, K., Tseng, B.L.: Evolutionary spectral clustering by incorporating temporal smoothness. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, pp. 153–162 (2007)

    Google Scholar 

  7. Cai, B., Hu, C., Ren, J.: Clustering over an evolving data stream based on grid density and correlation. ICIC Exp. Lett. 45(A), 1603–1609 (2010)

    Google Scholar 

  8. Corne, D., Handl, J., Knowles, J.: Evolutionary clustering. In: Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, pp. 332–337 (2006)

    Google Scholar 

  9. Cui, Z., Shen, H.: The framework of relative density-based clustering. In: Chen, G., Shen, H., Chen, M. (eds.) PAAP 2017. CCIS, vol. 729, pp. 343–352. Springer, Singapore (2017). https://doi.org/10.1007/978-981-10-6442-5_31

    Chapter  Google Scholar 

  10. Gaber, M.M., Yu, P.S.: Detection and classification of changes in evolving data streams. Int. J. Inf. Technol. Decis. Mak. 05(5), 659–670 (2006)

    Article  Google Scholar 

  11. Granger, R.H., Schlimmer, J.C.: Beyond incremental processing: tracking concept drift. In: Proceeding of the Twenty-Second International Conference on Very Large Databases, pp. 502–507 (1986)

    Google Scholar 

  12. Jia, C., Tan, C.Y., Yong, A.: A grid and density-based clustering algorithm for processing data stream. In: International Conference on Genetic and Evolutionary Computing, pp. 517–521 (2008)

    Google Scholar 

  13. Pawlak, Z.: Rough sets. Int. J. Comput. Inform. Sci. 11(5), 341–356 (1982)

    Article  MATH  Google Scholar 

  14. Ren, J., Cai, B., Hu, C.: Clustering over data streams based on grid density and index tree. J. Converg. Inf. Technol. 6(1), 83–93 (2011)

    Google Scholar 

  15. Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: the algorithm gdbscan and its applications. Data Min. Knowl. Disc. 2(2), 169–194 (1998)

    Article  Google Scholar 

  16. Souza, V.M.A., Chowdhury, F.A., Mueen, A.: Unsupervised drift detection on high-speed data streams. In: 2020 IEEE International Conference on Big Data (Big Data), pp. 102–111 (2020)

    Google Scholar 

  17. Tsymbal, A., Pechenizkiy, M., Cunningham, X.: Dynamic integration of classifiers for handling concept drift. Information Fusion 9(1), 56–68 (2008)

    Article  Google Scholar 

  18. Tu, L., Chen, Y.: Stream data clustering based on grid density and attraction. ACM Trans. Knowl. Discov. Data 3(3), 167–176 (2009)

    Google Scholar 

  19. Wang, P., **, N., Fehringer, G.: Concept drift detection with false positive rate for multi-label classification in iot data stream. In: 2020 International Conference on UK-China Emerging Technologies (UCET), pp. 1–4 (2020)

    Google Scholar 

Download references

Acknowledgement

This work is supported by Macao Polytechnic University Research Grant RP/FCA- 13/2022. The corresponding author is Hong Shen.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong Shen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cui, Z., Tian, H., Shen, H. (2024). Effective Density-Based Concept Drift Detection for Evolving Data Streams. In: Park, J.S., Takizawa, H., Shen, H., Park, J.J. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2023. Lecture Notes in Electrical Engineering, vol 1112. Springer, Singapore. https://doi.org/10.1007/978-981-99-8211-0_18

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8211-0_18

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8210-3

  • Online ISBN: 978-981-99-8211-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation