Abstract
We consider the problem of partitioning a set of \(N\) points in \(d\)-dimensional Euclidean space into two clusters minimizing the sum of the squared distances between each element and the center of the cluster to which it belongs. The center of the first cluster is its centroid (the geometric center). The center of the second cluster should be chosen among the points of the input set. We analyze the variant of the problem with given sizes (cardinalities) of the clusters; the sum of the sizes equals the cardinality of the input set. We prove that the problem is strongly NP-hard and there is no fully polynomial-time approximation scheme for it.
Similar content being viewed by others
REFERENCES
D. Aloise, A. Deshpande, P. Hansen, and P. Popat, “NP-hardness of Euclidean sum-of-squares clustering,” Mach. Learn. 75 (2), 245–248 (2009). https://doi.org/10.1007/s10994-009-5103-0
O. Kariv and S. L. Hakimi, “An algorithmic approach to network location problems. Part II: The \(p\)-medians,” SIAM J. Appl. Math. 37 (3), 513–538 (1979). https://doi.org/10.1137/0137041
E. Kh. Gimadi, A. V. Kel’manov, M. A. Kel’manova, and S. A. Khamidullin, “A posteriori detection of a quasiperiodic fragment with a given number of repetitions in a numerical sequence,” Sib. Zh. Industr. Mat. 9 (1(25)), 55–74 (2006).
E. Kh. Gimadi, A. V. Kel’manov, M. A. Kel’manova, and S. A. Khamidullin, “A posteriori detecting a quasiperiodic fragment in a numerical sequence,” Pattern Recogn. Image Anal. 18 (1), 30–42 (2008). https://doi.org/10.1134/S1054661808010057
A. E. Baburin, E. Kh. Gimadi, N. I. Glebov, and A. V. Pyatkin, “The problem of finding a subset of vectors with the maximum total weight,” J. Appl. Industr. Math. 2 (1), 32–38 (2008).
G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning (Springer, New York, 2013).
C. M. Bishop, Pattern Recognition and Machine Learning (Springer, New York, 2006).
A. S. Shirkhorshidi, S. Aghabozorgi, T. Y. Wah, and T. Herawan, “Big data clustering: A review,” in Computational Science and Its Applications: Proceedings of 14th International Conference, Part V, Guimarães, Portugal, 2014 (Springer, Cham, 2014), Ser. Lecture Notes in Computer Science 8583, pp. 707–720. https://doi.org/10.1007/978-3-319-09156-3_49
C. C. Aggarwal, Data Mining: The Textbook (Springer, Cham, 2015). https://doi.org/10.1007/978-3-319-14142-8
A. W. F. Edwards and L. L. Cavalli-Sforza, “A method for cluster analysis,” Biometrics 21, 362–375 (1965). https://doi.org/10.2307/2528096
M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness (Freeman, San Francisco, 1979).
C. H. Papadimitriou, Computational Complexity (Addison-Wesley, New York, 1994).
V. V. Vazirani, Approximation Algorithms (Springer, Berlin, 2001).
A. V. Dolgushev and A. V. Kel’manov, “An approximation algorithm for solving a problem of cluster analysis,” J. Appl. Ind. Math. 5 (4), 551–558 (2011). https://doi.org/10.1134/S1990478911040107
A. V. Dolgushev, A. V. Kel’manov, and V. V. Shenmaier, “Polynomial-time approximation scheme for a problem of cluster analysis,” in Intelligent Data Processing: Theory and Applications: Proceedings of the 9th International Conference, Budva, Montenegero, 2012 (Torus, Moscow, 2012), pp. 242–244.
A. V. Kel’manov and V. I. Khandeev, “A randomized algorithm for two-cluster partition of a set of vectors,” Comp. Math. Math. Phys. 55 (2), 330–339 (2015). https://doi.org/10.1134/S096554251502013X
A. V. Kel’manov, A. V. Motkova, and V. V. Shenmaier, “An approximation scheme for a weighted two-cluster partition problem,” in Analysis of Images, Social Networks and Texts: Proceedings of the 6th International Conference, Moscow, Russia, 2017 (Springer, Cham, 2018), pp. 323–333. https://doi.org/10.1007/978-3-319-73013-4_30
Funding
This work was supported by the Russian Foundation for Basic Research (project nos. 19-01-00308 and 18-31-00398), by the Program for Fundamental Research of the Russian Academy of Sciences (project nos. 0314-2019-0014 and 0314-2019-0015), and by the Russian Academic Excellence Project of the Ministry of Education and Science of the Russian Federation.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Translated from Trudy Instituta Matematiki i Mekhaniki UrO RAN, Vol. 25, No. 4, pp. 69 - 78, 2019 https://doi.org/10.21538/0134-4889-2019-25-4-69-78.
Translated by E. Vasil’eva
Rights and permissions
About this article
Cite this article
Kel’manov, A.V., Pyatkin, A.V. & Khandeev, V.I. Quadratic Euclidean 1-Mean and 1-Median 2-Clustering Problem with Constraints on the Size of the Clusters: Complexity and Approximability. Proc. Steklov Inst. Math. 313 (Suppl 1), S117–S124 (2021). https://doi.org/10.1134/S0081543821030123
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0081543821030123