Preprocessing Matters: Automated Pipeline Selection for Fair Classification

González-Zelaya, Vladimiro; Salas, Julián; Prangle, Dennis; Missier, Paolo

doi:10.1007/978-3-031-33498-6_14

Vladimiro González-Zelaya⁹,
Julián Salas¹⁰,
Dennis Prangle¹¹ &
…
Paolo Missier¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13890))

Included in the following conference series:

International Conference on Modeling Decisions for Artificial Intelligence

455 Accesses

Abstract

Improving fairness by manipulating the preprocessing stages of classification pipelines is an active area of research, closely related to AutoML. We propose a genetic optimisation algorithm, FairPipes, which optimises for user-defined combinations of fairness and accuracy and for multiple definitions of fairness, providing flexibility in the fairness-accuracy trade-off. FairPipes heuristically searches through a large space of pipeline configurations, achieving near-optimality efficiently, presenting the user with an estimate of the solutions’ Pareto front. We also observe that the optimal pipelines differ for different datasets, suggesting that no “universal best” pipeline exists and confirming that FairPipes fills a niche in the fairness-aware AutoML space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Evolutionary Algorithms for Fair Machine Learning

Fair Feature Selection with a Lexicographic Multi-objective Genetic Algorithm

Facing Many Objectives for Fairness in Machine Learning

Notes

1.
FairPipes is available at https://github.com/vladoxNCL/fairPipes.

References

Andersson, F.O., Kaiser, R., Jacobsson, S.P.: Data preprocessing by wavelets and genetic algorithms for enhanced multivariate analysis of LC peptide map**. J. Pharm. Biomed. Anal. 34(3), 531–541 (2004)
Article Google Scholar
Aydin, O.U., et al.: On the usage of average Hausdorff distance for segmentation performance assessment: hidden error when used for ranking. Europ. Radiol. Exp. 5(1), 1–7 (2021)
Article Google Scholar
Berger-Tal, O., Nathan, J., Meron, E., Saltz, D.: The exploration-exploitation dilemma: a multidisciplinary framework. PLoS ONE 9(4), e95693 (2014)
Article Google Scholar
Calmon, F., Wei, D., Vinzamuri, B., Natesan Ramamurthy, K., Varshney, K.R.: Optimized pre-processing for discrimination prevention. Adv. Neural. Inf. Process. Syst. 30, 3992–4001 (2017)
Google Scholar
Cason, T.E.: Titanic Dataset. http://biostat.app.vumc.org/wiki/Main/DataSets (1999). Accessed 25 May 2021
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39804-2_12
Chapter Google Scholar
Chiappa, S., Gillam, T.P.: Path-specific counterfactual fairness. ar**v preprint ar**v:1802.08139 (2018)
Crone, S.F., Lessmann, S., Stahlbock, R.: The impact of preprocessing on data mining: an evaluation of classifier sensitivity in direct marketing. Eur. J. Oper. Res. 173(3), 781–800 (2006)
Article MathSciNet MATH Google Scholar
Danks, D., London, A.J.: Algorithmic bias in autonomous systems. In: IJCAI, vol. 17, pp. 4691–4697 (2017)
Google Scholar
Demšar, J., et al.: Orange: data mining toolbox in python. J. Mach. Learn. 14(1), 2349–2353 (2013)
Google Scholar
Dua, D., Graff, C.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml/
Friedler, S.A., Scheidegger, C., Venkatasubramanian, S.: The (im) possibility of fairness: different value systems require different mechanisms for fair decision making. Commun. ACM 64(4), 136–143 (2021)
Article Google Scholar
García, S., Ramírez-Gallego, S., Luengo, J., Benítez, J.M., Herrera, F.: Big data preprocessing: methods and prospects. Big Data Anal. 1(1), 9 (2016). https://doi.org/10.1186/s41044-016-0014-0
Article Google Scholar
González-Zelaya, V.: Towards explaining the effects of data preprocessing on machine learning. In: 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 2086–2090. IEEE (2019)
Google Scholar
González-Zelaya, V., Salas, J., Prangle, D., Missier, P.: Optimising fairness through parametrised data sampling. In: Proceedings of the 2021 EDBT Conference (2021)
Google Scholar
Hassanat, A., Almohammadi, K., Alkafaween, E., Abunawas, E., Hammouri, A., Prasath, V.: Choosing mutation and crossover ratios for genetic algorithms-a review with a new dynamic approach. Information 10(12), 390 (2019)
Article Google Scholar
Ishibuchi, H., Tsukamoto, N., Nojima, Y.: Evolutionary many-objective optimization: A short review. In: 2008 IEEE congress on evolutionary computation (IEEE world congress on computational intelligence), pp. 2419–2426. IEEE (2008)
Google Scholar
Kamiran, F., Calders, T.: Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33(1), 1–33 (2012)
Article Google Scholar
Kusner, M., Loftus, J., Russell, C., Silva, R.: Counterfactual fairness. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4069–4079 (2017)
Google Scholar
La Cava, W., Moore, J.H.: Genetic programming approaches to learning fair classifiers. In: Proceedings of the 2020 Genetic and Evolutionary Computation Conference, pp. 967–975 (2020)
Google Scholar
Larson, J., Mattu, S., Kirchner, L., Angwin, J.: How we analyzed the compas recidivism algorithm. ProPublica 5, 9 (2016)
Google Scholar
Li, M., Yang, S., Liu, X.: Bi-goal evolution for many-objective optimization problems. Artif. Intell. 228, 45–65 (2015)
Article MathSciNet MATH Google Scholar
Menon, A.K., Williamson, R.C.: The cost of fairness in binary classification. In: Conference on Fairness, Accountability and Transparency, pp. 107–118. PMLR (2018)
Google Scholar
Olson, R.S., Moore, J.H.: TPOT: a tree-based pipeline optimization tool for automating machine learning. In: Workshop on Automatic Machine Learning, pp. 66–74. PMLR (2016)
Google Scholar
Pyle, D.: Data preparation for data mining. Morgan Kaufmann (1999)
Google Scholar
Salas, J., González-Zelaya, V.: Fair-MDAV: an algorithm for fair privacy by microaggregation. In: Torra, V., Narukawa, Y., Nin, J., Agell, N. (eds.) MDAI 2020. LNCS (LNAI), vol. 12256, pp. 286–297. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57524-3_24
Chapter Google Scholar
Schutze, O., Esquivel, X., Lara, A., Coello, C.A.C.: Using the averaged Hausdorff distance as a performance measure in evolutionary multiobjective optimization. IEEE Trans. Evol. Comput. 16(4), 504–522 (2012)
Article Google Scholar
Smith, M.J., Sala, C., Kanter, J.M., Veeramachaneni, K.: The machine learning bazaar: Harnessing the ml ecosystem for effective system development. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 785–800 (2020)
Google Scholar
Stoyanovich, J., Howe, B., Jagadish, H.: Responsible data management. Proceed. VLDB Endow. 13(12), 3474–3488 (2020)
Article Google Scholar
Stoyanovich, J., Howe, B., Jagadish, H., Miklau, G.: Panel: a debate on data and algorithmic ethics. Proceed. VLDB Endow. 11(12), 2165–2167 (2018)
Article Google Scholar
Tan, F., Fu, X., Zhang, Y., Bourgeois, A.G.: A genetic algorithm-based method for feature subset selection. Soft. Comput. 12(2), 111–120 (2008)
Article Google Scholar
Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. Inf. Process. Manage. 50(1), 104–112 (2014)
Article Google Scholar
Virtanen, P., et al.: SciPy 1.0: fundamental algorithms for scientific computing in python. Nat. Methods 17(3), 261–272 (2020)
Google Scholar
Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4(2), 65–85 (1994)
Article Google Scholar
Yang, K., Huang, B., Stoyanovich, J., Schelter, S.: Fairness-aware instrumentation of preprocessing pipelines for machine learning. In: Workshop on Human-In-the-Loop Data Analytics (HILDA2020) (2020)
Google Scholar
Yoo, S., Harman, M.: Pareto efficient multi-objective test case selection. In: Proceedings of the 2007 International Symposium on Software Testing and Analysis, pp. 140–150 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Panamericana, Facultad de Ciencias Económicas y Empresariales, Mexico City, Mexico
Vladimiro González-Zelaya
Internet Interdisciplinary Institute, Universitat Oberta de Catalunya, Barcelona, Spain
Julián Salas
University of Bristol, Institute for Statistical Science, Bristol, UK
Dennis Prangle
Newcastle University, School of Computing, Newcastle upon Tyne, UK
Paolo Missier

Authors

Vladimiro González-Zelaya
View author publications
You can also search for this author in PubMed Google Scholar
Julián Salas
View author publications
You can also search for this author in PubMed Google Scholar
Dennis Prangle
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Missier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julián Salas .

Editor information

Editors and Affiliations

Umeå University, Umeå, Sweden
Vicenç Torra
Tamagawa University, Tokyo, Japan
Yasuo Narukawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

González-Zelaya, V., Salas, J., Prangle, D., Missier, P. (2023). Preprocessing Matters: Automated Pipeline Selection for Fair Classification. In: Torra, V., Narukawa, Y. (eds) Modeling Decisions for Artificial Intelligence. MDAI 2023. Lecture Notes in Computer Science(), vol 13890. Springer, Cham. https://doi.org/10.1007/978-3-031-33498-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-33498-6_14
Published: 19 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33497-9
Online ISBN: 978-3-031-33498-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Preprocessing Matters: Automated Pipeline Selection for Fair Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Evolutionary Algorithms for Fair Machine Learning

Fair Feature Selection with a Lexicographic Multi-objective Genetic Algorithm

Facing Many Objectives for Fairness in Machine Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Preprocessing Matters: Automated Pipeline Selection for Fair Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Evolutionary Algorithms for Fair Machine Learning

Fair Feature Selection with a Lexicographic Multi-objective Genetic Algorithm

Facing Many Objectives for Fairness in Machine Learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation