Abstract
Constrained clustering has become a topic of considerable interest in machine learning, as it has been shown to produce promising results in domains where only partial information about how to solve the problem is available. Constrained clustering can be viewed as a semi-supervised generalization of clustering, which is traditionally unsupervised. It is able to leverage a new type of information encoded by constraints that guide the clustering process. In particular, this study focuses on instance-level must-link and cannot-link constraints. We propose an agglomerative constrained clustering algorithm, which combines distance-based and clustering-engine adapting methods to incorporate constraints into the partitioning process. It computes a similarity measure on the basis of distances (in the dataset) and constraints (in the constraint set) to later apply an agglomerative clustering method, whose clustering engine has been adapted to consider constraints and raw distances. We prove its capability to produce quality results for the constrained clustering problem by comparing its performance to previous proposals on several datasets with incremental levels of constraint-based information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Benavoli, A., Corani, G., Demšar, J., Zaffalon, M.: Time for a change: a tutorial for comparing multiple classifiers through bayesian analysis. J. Mach. Learn. Res. 18(1), 2653–2688 (2017)
Bradley, P.S., Bennett, K.P., Demiriz, A.: Constrained k-means clustering. Microsoft Res. Redmond, 20 (2000)
Cai, Z., Yang, X., Huang, T., Zhu, W.: A new similarity combining reconstruction coefficient with pairwise distance for agglomerative clustering. Inf. Sci. 508, 173–182 (2020)
Carrasco, J., García, S., del Mar Rueda, M., Herrera, F.: rNPBST: an R package covering non-parametric and bayesian statistical tests. In: Martínez de Pisón, F.J., Urraca, R., Quintián, H., Corchado, E. (eds.) HAIS 2017. LNCS (LNAI), vol. 10334, pp. 281–292. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59650-1_24
Davidson, I., Basu, S.: A survey of clustering with instance level constraints. ACM Trans. Knowl. Discovery Data 1, 1–41 (2007)
Everitt, B.S., Landau, S., Leese, M.: Cluster Analysis. Wiley Publishing, 4th edn. (2009)
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2(1), 193–218 (1985)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Khashabi, D., Wieting, J., Liu, J.Y., Liang, F.: Clustering with side information: from a probabilistic model to a deterministic algorithm. ar**v preprint ar**v:1508.06235 (2015)
Klein, D., Kamvar, S.D., Manning, C.D.: From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering. Tech. rep, Stanford (2002)
Law, M.H.C., Topchy, A., Jain, A.K.: Clustering with soft and group constraints. In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.) SSPR /SPR 2004. LNCS, vol. 3138, pp. 662–670. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27868-9_72
Pelleg, D., Baras, D.: K-means with large and noisy constraint sets. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 674–682. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74958-5_67
Schmidt, J., Brandle, E.M., Kramer, S.: Clustering with attribute-level constraints. In: 2011 IEEE 11th International Conference on Data Mining, pp. 1206–1211. IEEE (2011)
Triguero, I., et al.: KEEL 3.0: an open source software for multi-stage analysis in data mining. Int. J. Comput. Intell. Syst. 10(1), 1238–1249 (2017)
Wagstaff, K., Cardie, C., Rogers, S., Schrödl, S.: Constrained k-means clustering with background knowledge. In: Proceedings of the Eighteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., pp. 577–584 (2001)
**ng, E.P., Jordan, M.I., Russell, S.J., Ng, A.Y.: Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems, pp. 521–528 (2003)
Zhang, W., Wang, X., Zhao, D., Tang, X.: Graph degree linkage: agglomerative clustering on a directed graph. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 428–441. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_31
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
González-Almagro, G., Suarez, J.L., Luengo, J., Cano, JR., García, S. (2020). Agglomerative Constrained Clustering Through Similarity and Distance Recalculation. In: de la Cal, E.A., Villar Flecha, J.R., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2020. Lecture Notes in Computer Science(), vol 12344. Springer, Cham. https://doi.org/10.1007/978-3-030-61705-9_35
Download citation
DOI: https://doi.org/10.1007/978-3-030-61705-9_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61704-2
Online ISBN: 978-3-030-61705-9
eBook Packages: Computer ScienceComputer Science (R0)