An Incremental Data Stream Clustering Algorithm Based on Dense Units Detection

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3518))

Included in the following conference series:

Abstract

The data stream model of computation is often used for analyzing huge volumes of continuously arriving data. In this paper, we present a novel algorithm called DUCstream for clustering data streams. Our work is motivated by the needs to develop a single-pass algorithm that is capable of detecting evolving clusters, and yet requires little memory and computation time. To that end, we propose an incremental clustering method based on dense units detection. Evolving clusters are identified on the basis of the dense units, which contain relatively large number of points. For efficiency reasons, a bitwise dense unit representation is introduced. Our experimental results demonstrate DUCstream’s efficiency and efficacy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the 21st ACM Symposium on Principles of Database Systems, pp. 1–16 (2002)

    Google Scholar 

  2. O’Callaghan, L., Meyerson, A., Motwani, R., Mishra, N., Guha, S.: Streaming-data algorithms for high-quality clustering. In: Proceedings of IEEE International Conference on Data Engineering, pp. 685–696 (2002)

    Google Scholar 

  3. Aggarwal, C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the International Conference on Very Large Data Bases, pp. 81–92 (2003)

    Google Scholar 

  4. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 226–231 (1996)

    Google Scholar 

  5. Ester, M., Kriegel, H.P., Sander, J., Wimmer, M., Xu, X.: Incremental clustering for mining in a data warehousing environment. In: Proceedings of the International Conference on Very Large Data Bases, pp. 323–333 (1998)

    Google Scholar 

  6. Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering for high dimensional data for data mining applications. In: Proceedings of the ACM International Conference on Management of Data, pp. 94–105 (1998)

    Google Scholar 

  7. Nasraoui, O., Cardona, C., Rojas, C., Gonzlez, F.: TECNO-STREAMS: Tracking evolving clusters in noisy data streams with a scalable immune system learning model. In: Proceedings of the IEEE International Conference on Data Mining, pp. 235–242 (2003)

    Google Scholar 

  8. Park, N.H., Lee, W.S.: Statistical grid-based clustering over data streams. ACM SIGMOD Record 33(1), 32–37 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gao, J., Li, J., Zhang, Z., Tan, PN. (2005). An Incremental Data Stream Clustering Algorithm Based on Dense Units Detection. In: Ho, T.B., Cheung, D., Liu, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science(), vol 3518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11430919_49

Download citation

  • DOI: https://doi.org/10.1007/11430919_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26076-9

  • Online ISBN: 978-3-540-31935-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation