Abstract
We explore the use of instance and cluster-level constraints with agglomerative hierarchical clustering. Though previous work has illustrated the benefits of using constraints for non-hierarchical clustering, their application to hierarchical clustering is not straight-forward for two primary reasons. First, some constraint combinations make the feasibility problem (Does there exist a single feasible solution?) NP-complete. Second, some constraint combinations when used with traditional agglomerative algorithms can cause the dendrogram to stop prematurely in a dead-end solution even though there exist other feasible solutions with a significantly smaller number of clusters. When constraints lead to efficiently solvable feasibility problems and standard agglomerative algorithms do not give rise to dead-end solutions, we empirically illustrate the benefits of using constraints to improve cluster purity and average distortion. Furthermore, we introduce the new γ constraint and use it in conjunction with the triangle inequality to considerably improve the efficiency of agglomerative clustering.
Chapter PDF
Similar content being viewed by others
Keywords
- Hierarchical Cluster
- Triangle Inequality
- Feasibility Problem
- Agglomerative Cluster
- Hierarchical Cluster Algorithm
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Basu, S., Banerjee, A., Mooney, R.: Semi-supervised Clustering by Seeding. In: 19th ICML (2002)
Basu, S., Bilenko, M., Mooney, R.J.: Active Semi-Supervision for Pairwise Constrained Clustering. 4th SIAM Data Mining Conf. (2004)
Bradley, P., Fayyad, U., Reina, C.: Scaling Clustering Algorithms to Large Databases. In: 4th ACM KDD Conference (1998)
Davidson, I., Ravi, S.S.: Clustering with Constraints: Feasibility Issues and the k-Means Algorithm. In: SIAM International Conference on Data Mining (2005)
Davidson, I., Ravi, S.S.: Towards Efficient and Improved Hierarchical Clustering with Instance and Cluster-Level Constraints. Tech. Report, CS Department, SUNY - Albany (2005), Available from www.cs.albany.edu/~davidson
Elkan, C.: Using the triangle inequality to accelerate k-means. ICML (2003)
Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of NP-completeness. Freeman and Co., New York (1979)
Garey, M., Johnson, D., Witsenhausen, H.: The complexity of the generalized Lloyd-Max problem. IEEE Trans. Information Theory 28(2) (1982)
Klein, D., Kamvar, S.D., Manning, C.D.: From Instance-Level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering. ICML (2002)
Nanni, M.: Speeding-up hierarchical agglomerative clustering in presence of expensive metrics. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 378–387. Springer, Heidelberg (2005)
Schafer, T.J.: The Complexity of Satisfiability Problems. STOC (1978)
Wagstaff, K., Cardie, C.: Clustering with Instance-Level Constraints. ICML (2000)
West, D.B.: Introduction to Graph Theory, 2nd edn. Prentice-Hall, Englewood Cliffs (2001)
Yang, K., Yang, R., Kafatos, M.: A Feasible Method to Find Areas with Constraints Using Hierarchical Depth-First Clustering. In: Scientific and Stat. Database Management Conf. (2001)
Zaiane, O.R., Foss, A., Lee, C., Wang, W.: On Data Clustering Analysis: Scalability, Constraints and Validation. PAKDD (2000)
Zho, Y., Karypis, G.: Hierarchical Clustering Algorithms for Document Datasets. Data Mining and Knowledge Discovery 10(2), 141–168 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Davidson, I., Ravi, S.S. (2005). Agglomerative Hierarchical Clustering with Constraints: Theoretical and Empirical Results. In: Jorge, A.M., Torgo, L., Brazdil, P., Camacho, R., Gama, J. (eds) Knowledge Discovery in Databases: PKDD 2005. PKDD 2005. Lecture Notes in Computer Science(), vol 3721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564126_11
Download citation
DOI: https://doi.org/10.1007/11564126_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29244-9
Online ISBN: 978-3-540-31665-7
eBook Packages: Computer ScienceComputer Science (R0)