A Stochastic Alternating Balance k-Means Algorithm for Fair Clustering

Liu, Suyun; Vicente, Luis Nunes

doi:10.1007/978-3-031-24866-5_6

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13621))

Included in the following conference series:

International Conference on Learning and Intelligent Optimization

641 Accesses

Abstract

In the application of data clustering to human-centric decision-making systems, such as loan applications and advertisement recommendations, the clustering outcome might discriminate against people across different demographic groups, leading to unfairness. A natural conflict occurs between the cost of clustering (in terms of distance to cluster centers) and the balance representation of all demographic groups across the clusters, leading to a bi-objective optimization problem that is nonconvex and nonsmooth. To determine the complete trade-off between these two competing goals, we design a novel stochastic alternating balance fair k-means (SAfairKM) algorithm, which consists of alternating classical mini-batch k-means updates and group swap updates. The number of k-means updates and the number of swap updates essentially parameterize the weight put on optimizing each objective function. Our numerical experiments show that the proposed SAfairKM algorithm is robust and computationally efficient in constructing well-spread and high-quality Pareto fronts both on synthetic and real datasets.

L. N. Vicente—Support for this author was partially provided by the Centre for Mathematics of the University of Coimbra under grant FCT/MCTES UIDB/MAT/00324/2020.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (Canada)

eBook: USD 79.99; Price excludes VAT (Canada)

Softcover Book: USD 99.99; Price excludes VAT (Canada)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Group and Individual Fairness in Clustering Algorithms

Efficient algorithms for fair clustering with a new notion of fairness

Article 20 March 2023

Exploring Rawlsian Fairness for K-Means Clustering

Notes

1.
Our implementation code is available at https://github.com/sul217/SAfairKM. All the experiments were conducted on a MacBook Pro Intel Core i5 processor.

References

Abbasi, M., Bhaskara, A., Venkatasubramanian, S.: Fair clustering via equitable group representations. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 504–514 (2021)
Google Scholar
Abraham, S.S., Sundaram, S.S.: Fairness in clustering with multiple sensitive attributes. ar**v preprint ar**v:1910.05113 (2019)
Ahmadian, S., Epasto, A., Kumar, R., Mahdian, M.: Clustering without over-representation. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 267–275 (2019)
Google Scholar
Arthur, D., Vassilvitskii, S.: $k$-means++ the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035 (2007)
Google Scholar
Backurs, A., Indyk, P., Onak, K., Schieber, B., Vakilian, A., Wagner, T.: Scalable fair clustering. In: International Conference on Machine Learning, pp. 405–413. PMLR (2019)
Google Scholar
Barocas, S., Selbst, A.D.: Big data’s disparate impact. Calif. Law Rev. 104, 671 (2016)
Google Scholar
Bera, S., Chakrabarty, D., Flores, N., Negahbani, M.: Fair algorithms for clustering. In: Advances in Neural Information Processing Systems, pp. 4954–4965 (2019)
Google Scholar
Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grou** Multidimensional Data. Springer, Berlin, Heidelberg (2006). https://doi.org/10.1007/3-540-28349-8_2
Bottou, L., Bengio, Y.: Convergence properties of the $k$-means algorithms. In: Advances in Neural Information Processing Systems, pp. 585–592 (1995)
Google Scholar
Calders, T., Kamiran, F., Pechenizkiy, M.: Building classifiers with independency constraints. In: 2009 IEEE International Conference on Data Mining Workshops, pp. 13–18. IEEE (2009)
Google Scholar
Chen, X., Fain, B., Lyu, L., Munagala, K.: Proportionally fair clustering. In: International Conference on Machine Learning, pp. 1032–1041 (2019)
Google Scholar
Chierichetti, F., Kuma, R., Lattanzi, S., Vassilvitskii, S.: Fair clustering through fairlets. In: Advances in Neural Information Processing Systems, pp. 5029–5037 (2017)
Google Scholar
Datta, A., Tschantz, M.C., Datta, A.: Automated experiments on ad privacy settings: a tale of opacity, choice, and discrimination. Proc. Priv. Enhancing Technol. 2015, 92–112 (2015)
Article Google Scholar
Dua, D., Graff, C.: UCI Machine Learning Repository (2017). http://archive.ics.uci.edu/ml
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pp. 214–226. ACM (2012)
Google Scholar
Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications. SIAM, Philadelphia (2020)
Google Scholar
Gass, S., Saaty, T.: The computational algorithm for the parametric objective function. Nav. Res. Logist. Q. 2, 39–45 (1955)
Article MathSciNet Google Scholar
Ghadiri, M., Samadi, S., Vempala, S.: Socially fair $k$-means clustering. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 438–448 (2021)
Google Scholar
Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Advances in Neural Information Processing Systems, pp. 3315–3323 (2016)
Google Scholar
Huang, L., Jiang, S., Vishnoi, N.: Coresets for clustering with fairness constraints. In: Advances in Neural Information Processing Systems, pp. 7589–7600 (2019)
Google Scholar
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: A local search approximation algorithm for $k$-means clustering. Comput. Geom. 28, 89–112 (2004)
Article MathSciNet MATH Google Scholar
Kleindessner, M., Awasthi, P., Morgenstern, J.: Fair $k$-center clustering for data summarization. In: International Conference on Machine Learning, pp. 3448–3457. PMLR (2019)
Google Scholar
Kleindessner, M., Awasthi, P., Morgenstern, J.: A notion of individual fairness for clustering. ar**v preprint ar**v:2006.04960 (2020)
Kleindessner, M., Samadi, S., Awasthi, P., Morgenstern, J.: Guarantees for spectral clustering with fairness constraints. In: International Conference on Machine Learning, pp. 3458–3467. PMLR (2019)
Google Scholar
Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 202–207. KDD1996, AAAI Press (1996)
Google Scholar
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 129–137 (1982)
Article MathSciNet MATH Google Scholar
Mahabadi, S., Vakilian, A.: Individual fairness for $k$-clustering. In: Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 6586–6596. PMLR, Virtual (13–18 Jul 2020)
Google Scholar
Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)
Article Google Scholar
Rösner, C., Schmidt, M.: Privacy preserving clustering with constraints. In: 45th International Colloquium on Automata, Languages, and Programming. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2018)
Google Scholar
Schmidt, M., Schwiegelshohn, C., Sohler, C.: Fair coresets and streaming algorithms for fair $k$-means. In: Bampis, E., Megow, N. (eds.) WAOA 2019. LNCS, vol. 11926, pp. 232–251. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39479-0_16
Chapter MATH Google Scholar
Selim, S.Z., Ismail, M.A.: $k$-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6(1), 81–87 (1984)
Google Scholar
Ziko, I.M., Granger, E., Yuan, J., Ayed, I.B.: Variational fair clustering. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 35, pp. 11202–11209 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, Lehigh University, Bethlehem, PA, 18015, USA
Suyun Liu & Luis Nunes Vicente

Authors

Suyun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Luis Nunes Vicente
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suyun Liu .

Editor information

Editors and Affiliations

SBA Research, Vienna, Austria
Dimitris E. Simos
Moscow Aviation Institute (National Research University), Moscow, Russia
Varvara A. Rasskazova
Università degli Studi di Milano-Bicocca, Milan, Italy
Francesco Archetti
Wilfrid Laurier University, Waterloo, ON, Canada
Ilias S. Kotsireas
University of Florida, Gainesville, FL, USA
Panos M. Pardalos

Appendices

A Description of an Existing Approach for Comparison

The authors in [32] considered the fairness error computed by the Kullback-Leibler (KL)-divergence, and added it as a penalized term to the classical clustering objective. When using the k-means clustering cost, the resulting problem takes the form:

$$\begin{aligned} \min ~ f_1(s) + \mu \displaystyle \sum _{k=1}^N \mathcal {D}_{KL}(U\Vert \mathbb {P}_k) \quad \text {s.t. } \sum _{k=1}^K s_{p, k} = 1, \forall p \in [N], \end{aligned}$$

(6)

where $\mathcal {D}_{KL}$ is the KL divergence between the desired demographic proportion $U = [u_j, j \in [J]]$ (usually specified by the demographic composition of the whole dataset) and the marginal probability $\mathbb {P}_k = [\mathbb {P}(j|k) = s_k^\top v_j/{e_N}^\top s_k, j \in [J]]$. The penalty coefficient $\mu $ associated with the fairness error is the tool to control the trade-offs between the clustering cost and the clustering balance. To solve problem (6) for a fixed $\mu \ge 0$, the authors in [32] have developed an optimization scheme based on a concave-convex decomposition of the fairness term.

B More Numerical Results

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, S., Vicente, L.N. (2022). A Stochastic Alternating Balance k-Means Algorithm for Fair Clustering. In: Simos, D.E., Rasskazova, V.A., Archetti, F., Kotsireas, I.S., Pardalos, P.M. (eds) Learning and Intelligent Optimization. LION 2022. Lecture Notes in Computer Science, vol 13621. Springer, Cham. https://doi.org/10.1007/978-3-031-24866-5_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-24866-5_6
Published: 05 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24865-8
Online ISBN: 978-3-031-24866-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Stochastic Alternating Balance k-Means Algorithm for Fair Clustering