Abstract
In the application of data clustering to human-centric decision-making systems, such as loan applications and advertisement recommendations, the clustering outcome might discriminate against people across different demographic groups, leading to unfairness. A natural conflict occurs between the cost of clustering (in terms of distance to cluster centers) and the balance representation of all demographic groups across the clusters, leading to a bi-objective optimization problem that is nonconvex and nonsmooth. To determine the complete trade-off between these two competing goals, we design a novel stochastic alternating balance fair k-means (SAfairKM) algorithm, which consists of alternating classical mini-batch k-means updates and group swap updates. The number of k-means updates and the number of swap updates essentially parameterize the weight put on optimizing each objective function. Our numerical experiments show that the proposed SAfairKM algorithm is robust and computationally efficient in constructing well-spread and high-quality Pareto fronts both on synthetic and real datasets.
L. N. VicenteāSupport for this author was partially provided by the Centre for Mathematics of the University of Coimbra under grant FCT/MCTES UIDB/MAT/00324/2020.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Our implementation code is available atĀ https://github.com/sul217/SAfairKM. All the experiments were conducted on a MacBook Pro Intel Core i5 processor.
References
Abbasi, M., Bhaskara, A., Venkatasubramanian, S.: Fair clustering via equitable group representations. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 504ā514 (2021)
Abraham, S.S., Sundaram, S.S.: Fairness in clustering with multiple sensitive attributes. ar**v preprint ar**v:1910.05113 (2019)
Ahmadian, S., Epasto, A., Kumar, R., Mahdian, M.: Clustering without over-representation. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 267ā275 (2019)
Arthur, D., Vassilvitskii, S.: \(k\)-means++ the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027ā1035 (2007)
Backurs, A., Indyk, P., Onak, K., Schieber, B., Vakilian, A., Wagner, T.: Scalable fair clustering. In: International Conference on Machine Learning, pp. 405ā413. PMLR (2019)
Barocas, S., Selbst, A.D.: Big dataās disparate impact. Calif. Law Rev. 104, 671 (2016)
Bera, S., Chakrabarty, D., Flores, N., Negahbani, M.: Fair algorithms for clustering. In: Advances in Neural Information Processing Systems, pp. 4954ā4965 (2019)
Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grou** Multidimensional Data. Springer, Berlin, Heidelberg (2006). https://doi.org/10.1007/3-540-28349-8_2
Bottou, L., Bengio, Y.: Convergence properties of the \(k\)-means algorithms. In: Advances in Neural Information Processing Systems, pp. 585ā592 (1995)
Calders, T., Kamiran, F., Pechenizkiy, M.: Building classifiers with independency constraints. In: 2009 IEEE International Conference on Data Mining Workshops, pp. 13ā18. IEEE (2009)
Chen, X., Fain, B., Lyu, L., Munagala, K.: Proportionally fair clustering. In: International Conference on Machine Learning, pp. 1032ā1041 (2019)
Chierichetti, F., Kuma, R., Lattanzi, S., Vassilvitskii, S.: Fair clustering through fairlets. In: Advances in Neural Information Processing Systems, pp. 5029ā5037 (2017)
Datta, A., Tschantz, M.C., Datta, A.: Automated experiments on ad privacy settings: a tale of opacity, choice, and discrimination. Proc. Priv. Enhancing Technol. 2015, 92ā112 (2015)
Dua, D., Graff, C.: UCI Machine Learning Repository (2017). http://archive.ics.uci.edu/ml
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference, pp. 214ā226. ACM (2012)
Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications. SIAM, Philadelphia (2020)
Gass, S., Saaty, T.: The computational algorithm for the parametric objective function. Nav. Res. Logist. Q. 2, 39ā45 (1955)
Ghadiri, M., Samadi, S., Vempala, S.: Socially fair \(k\)-means clustering. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pp. 438ā448 (2021)
Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Advances in Neural Information Processing Systems, pp. 3315ā3323 (2016)
Huang, L., Jiang, S., Vishnoi, N.: Coresets for clustering with fairness constraints. In: Advances in Neural Information Processing Systems, pp. 7589ā7600 (2019)
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: A local search approximation algorithm for \(k\)-means clustering. Comput. Geom. 28, 89ā112 (2004)
Kleindessner, M., Awasthi, P., Morgenstern, J.: Fair \(k\)-center clustering for data summarization. In: International Conference on Machine Learning, pp. 3448ā3457. PMLR (2019)
Kleindessner, M., Awasthi, P., Morgenstern, J.: A notion of individual fairness for clustering. ar**v preprint ar**v:2006.04960 (2020)
Kleindessner, M., Samadi, S., Awasthi, P., Morgenstern, J.: Guarantees for spectral clustering with fairness constraints. In: International Conference on Machine Learning, pp. 3458ā3467. PMLR (2019)
Kohavi, R.: Scaling up the accuracy of Naive-Bayes classifiers: a decision-tree hybrid. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 202ā207. KDD1996, AAAI Press (1996)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 129ā137 (1982)
Mahabadi, S., Vakilian, A.: Individual fairness for \(k\)-clustering. In: Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 6586ā6596. PMLR, Virtual (13ā18 Jul 2020)
Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22ā31 (2014)
Rƶsner, C., Schmidt, M.: Privacy preserving clustering with constraints. In: 45th International Colloquium on Automata, Languages, and Programming. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2018)
Schmidt, M., Schwiegelshohn, C., Sohler, C.: Fair coresets and streaming algorithms for fair \(k\)-means. In: Bampis, E., Megow, N. (eds.) WAOA 2019. LNCS, vol. 11926, pp. 232ā251. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39479-0_16
Selim, S.Z., Ismail, M.A.: \(k\)-means-type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6(1), 81ā87 (1984)
Ziko, I.M., Granger, E., Yuan, J., Ayed, I.B.: Variational fair clustering. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 35, pp. 11202ā11209 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Description ofĀ anĀ Existing Approach forĀ Comparison
The authors inĀ [32] considered the fairness error computed by the Kullback-Leibler (KL)-divergence, and added it as a penalized term to the classical clustering objective. When using the k-means clustering cost, the resulting problem takes the form:
where \(\mathcal {D}_{KL}\) is the KL divergence between the desired demographic proportion \(U = [u_j, j \in [J]]\) (usually specified by the demographic composition of the whole dataset) and the marginal probability \(\mathbb {P}_k = [\mathbb {P}(j|k) = s_k^\top v_j/{e_N}^\top s_k, j \in [J]]\). The penalty coefficient \(\mu \) associated with the fairness error is the tool to control the trade-offs between the clustering cost and the clustering balance. To solve problemĀ (6) for a fixed \(\mu \ge 0\), the authors inĀ [32] have developed an optimization scheme based on a concave-convex decomposition of the fairness term.
B More Numerical Results
Rights and permissions
Copyright information
Ā© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, S., Vicente, L.N. (2022). A Stochastic Alternating Balance k-Means Algorithm forĀ Fair Clustering. In: Simos, D.E., Rasskazova, V.A., Archetti, F., Kotsireas, I.S., Pardalos, P.M. (eds) Learning and Intelligent Optimization. LION 2022. Lecture Notes in Computer Science, vol 13621. Springer, Cham. https://doi.org/10.1007/978-3-031-24866-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-24866-5_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24865-8
Online ISBN: 978-3-031-24866-5
eBook Packages: Computer ScienceComputer Science (R0)