Log in

Quadratic Euclidean 1-Mean and 1-Median 2-Clustering Problem with Constraints on the Size of the Clusters: Complexity and Approximability

  • Published:
Proceedings of the Steklov Institute of Mathematics Aims and scope Submit manuscript

Abstract

We consider the problem of partitioning a set of \(N\) points in \(d\)-dimensional Euclidean space into two clusters minimizing the sum of the squared distances between each element and the center of the cluster to which it belongs. The center of the first cluster is its centroid (the geometric center). The center of the second cluster should be chosen among the points of the input set. We analyze the variant of the problem with given sizes (cardinalities) of the clusters; the sum of the sizes equals the cardinality of the input set. We prove that the problem is strongly NP-hard and there is no fully polynomial-time approximation scheme for it.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

REFERENCES

  1. D. Aloise, A. Deshpande, P. Hansen, and P. Popat, “NP-hardness of Euclidean sum-of-squares clustering,” Mach. Learn. 75 (2), 245–248 (2009). https://doi.org/10.1007/s10994-009-5103-0

    Article  MATH  Google Scholar 

  2. O. Kariv and S. L. Hakimi, “An algorithmic approach to network location problems. Part II: The \(p\)-medians,” SIAM J. Appl. Math. 37 (3), 513–538 (1979). https://doi.org/10.1137/0137041

    Article  MathSciNet  MATH  Google Scholar 

  3. E. Kh. Gimadi, A. V. Kel’manov, M. A. Kel’manova, and S. A. Khamidullin, “A posteriori detection of a quasiperiodic fragment with a given number of repetitions in a numerical sequence,” Sib. Zh. Industr. Mat. 9 (1(25)), 55–74 (2006).

    MathSciNet  MATH  Google Scholar 

  4. E. Kh. Gimadi, A. V. Kel’manov, M. A. Kel’manova, and S. A. Khamidullin, “A posteriori detecting a quasiperiodic fragment in a numerical sequence,” Pattern Recogn. Image Anal. 18 (1), 30–42 (2008). https://doi.org/10.1134/S1054661808010057

    Article  MATH  Google Scholar 

  5. A. E. Baburin, E. Kh. Gimadi, N. I. Glebov, and A. V. Pyatkin, “The problem of finding a subset of vectors with the maximum total weight,” J. Appl. Industr. Math. 2 (1), 32–38 (2008).

    Article  MathSciNet  Google Scholar 

  6. G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning (Springer, New York, 2013).

    Book  Google Scholar 

  7. C. M. Bishop, Pattern Recognition and Machine Learning (Springer, New York, 2006).

    MATH  Google Scholar 

  8. A. S. Shirkhorshidi, S. Aghabozorgi, T. Y. Wah, and T. Herawan, “Big data clustering: A review,” in Computational Science and Its Applications: Proceedings of 14th International Conference, Part V, Guimarães, Portugal, 2014 (Springer, Cham, 2014), Ser. Lecture Notes in Computer Science 8583, pp. 707–720. https://doi.org/10.1007/978-3-319-09156-3_49

    Chapter  Google Scholar 

  9. C. C. Aggarwal, Data Mining: The Textbook (Springer, Cham, 2015). https://doi.org/10.1007/978-3-319-14142-8

    Book  MATH  Google Scholar 

  10. A. W. F. Edwards and L. L. Cavalli-Sforza, “A method for cluster analysis,” Biometrics 21, 362–375 (1965). https://doi.org/10.2307/2528096

    Article  Google Scholar 

  11. M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness (Freeman, San Francisco, 1979).

    MATH  Google Scholar 

  12. C. H. Papadimitriou, Computational Complexity (Addison-Wesley, New York, 1994).

    MATH  Google Scholar 

  13. V. V. Vazirani, Approximation Algorithms (Springer, Berlin, 2001).

    MATH  Google Scholar 

  14. A. V. Dolgushev and A. V. Kel’manov, “An approximation algorithm for solving a problem of cluster analysis,” J. Appl. Ind. Math. 5 (4), 551–558 (2011). https://doi.org/10.1134/S1990478911040107

    Article  MathSciNet  MATH  Google Scholar 

  15. A. V. Dolgushev, A. V. Kel’manov, and V. V. Shenmaier, “Polynomial-time approximation scheme for a problem of cluster analysis,” in Intelligent Data Processing: Theory and Applications: Proceedings of the 9th International Conference, Budva, Montenegero, 2012 (Torus, Moscow, 2012), pp. 242–244.

    Google Scholar 

  16. A. V. Kel’manov and V. I. Khandeev, “A randomized algorithm for two-cluster partition of a set of vectors,” Comp. Math. Math. Phys. 55 (2), 330–339 (2015). https://doi.org/10.1134/S096554251502013X

    Article  MathSciNet  MATH  Google Scholar 

  17. A. V. Kel’manov, A. V. Motkova, and V. V. Shenmaier, “An approximation scheme for a weighted two-cluster partition problem,” in Analysis of Images, Social Networks and Texts: Proceedings of the 6th International Conference, Moscow, Russia, 2017 (Springer, Cham, 2018), pp. 323–333. https://doi.org/10.1007/978-3-319-73013-4_30

    Chapter  Google Scholar 

Download references

Funding

This work was supported by the Russian Foundation for Basic Research (project nos. 19-01-00308 and 18-31-00398), by the Program for Fundamental Research of the Russian Academy of Sciences (project nos. 0314-2019-0014 and 0314-2019-0015), and by the Russian Academic Excellence Project of the Ministry of Education and Science of the Russian Federation.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to A. V. Pyatkin or V. I. Khandeev.

Additional information

Translated from Trudy Instituta Matematiki i Mekhaniki UrO RAN, Vol. 25, No. 4, pp. 69 - 78, 2019 https://doi.org/10.21538/0134-4889-2019-25-4-69-78.

Translated by E. Vasil’eva

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kel’manov, A.V., Pyatkin, A.V. & Khandeev, V.I. Quadratic Euclidean 1-Mean and 1-Median 2-Clustering Problem with Constraints on the Size of the Clusters: Complexity and Approximability. Proc. Steklov Inst. Math. 313 (Suppl 1), S117–S124 (2021). https://doi.org/10.1134/S0081543821030123

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0081543821030123

Keywords

Navigation