Abstract
As artificial intelligence (AI) continues to advance, one key challenge is ensuring that AI aligns with certain values. However, in the current diverse and democratic society, reaching a normative consensus is complex. This paper delves into the methodological aspect of how AI ethicists can effectively determine which values AI should uphold. After reviewing the most influential methodologies, we detail an intuitionist research agenda that offers guidelines for aligning AI applications with a limited set of reliable moral intuitions, each underlying a refined cooperative view of AI. We discuss appropriate epistemic tools for collecting, filtering, and justifying moral intuitions with the aim of reducing cognitive and social biases. The proposed methodology facilitates a large collective participation in AI alignment, while ensuring the reliability of the considered moral judgments.
Similar content being viewed by others
Notes
For example, the ultimate version of ChatGPT (GPT-4) performs better than average in many academic and professional exams [43].
Jonker [31] calls this aspect “social alignment”, while distinguishing it from “value alignment”, which concerns the safety of AI. By contrast, we understand “value alignment” more broadly, comprising social alignment.
In a similar vein, Morley and colleagues [39] distinguish two aspects in AI ethics: the “what”, i.e., the ethical principles for good AI, and the “how”, i.e., the identification of the tools and methods to apply in the principles. Also, Gabriel [24] discerns the “technical” and “normative” aspects of value alignment and examine the connections between the two.
The alignment process is likely to be iterative [57]. Following value implementation, developers receive feedback from the use of the systems. This feedback may prompt a recalibration of value setting.
This means that the reliability of a methodology can be ultimately assessed by the long term consequences produced by AI on society. In the meanwhile, philosophers can debate about that based on rational expectations and predictions.
For example, the need to expand datasets to program fair, unbiased algorithms may conflict with individual privacy rights over personal information.
For example, in the ethics of autonomous vehicles, the principle of the Institute of Electrical and Electronics Engineers “to treat fairly all persons and to not engage in acts of discrimination based on race, religion, gender, disability, age, national origin, sexual orientation, gender identity, or gender expression” [29] has been challenged the particular intuition to prioritize the young over the elders when presented an avoidable accident [2, 20].
We disagree here with Huemer [28], according to which general moral intuitions are less prone to biases.
A recent study investigating algorithmic interpretability and transparency corroborates this hypothesis [59]. In the study, participants are asked to justify the implementation of an algorithm to allocate limited resources in different real-life scenarios; although the subjects opt for different solutions, moral concepts like “fairness” or “rightness” mostly guided their decisions.
For instance, a person inclined to the “authority” foundation might approve extensive data collection for security purposes, while a subject sensitive to “liberty” could not see that as a sufficient reason for the privacy intrusion [54].
For example, moral intuitions about a self-driving car’s decisions vary according to the weight given to the car’s driving style and reliability (A), the compliance with traffic norms (D), and whether the action results in an accident (C) [10].
We agree with Savulescu et al. [51] on the fact that “overlap** consensus” between intuitions from different sources is desirable and should be strongly considered in AI policy making. However, we assume here that consensus is not always possible and our discussion focuses on cases of reasonable disagreement.
References
Anderson, M., Anderson, S.L.: Case-supported principle-based behavior paradigm. In: Trappl, R. (ed.) A Construction Manual for Robots’ Ethical Systems: Requirements, Methods, Implementations, pp. 155–168. Springer, Cham (2015)
Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., ShariffRahwan, A.J.-F., Bonnefon, I.: The moral machine experiment. Nature 563(7729), 59–64 (2018)
Baase, S., Henry, T.M.: A Gift of Fire: Social, Legal, and Ethical Issues for Computing Technology. Pearson, New York (2018)
Bargh, J.A.: The ecology of automaticity: toward establishing the conditions needed to produce automatic processing effects. Am. J. Psychol. 105(2), 181–199 (1992)
Baumer, E.P.S., Polletta, F., Pierski, N., Gay, G.K.: A simple intervention to reduce framing effects in perceptions of global climate change. Environ. Commun. 11(3), 289–310 (2017)
Bengson, J.: The intellectual given. Mind 124(495), 707–760 (2015)
Bonnefon, J.-F., Shariff, A., Rahwan, I.: The moral psychology of ai and the ethical opt-out problem. In: Liao, S.M. (ed.) Ethics of Artificial Intelligence, pp. 109–126. Oxford University Press, Oxford (2020)
Bonnefon, J.-F., Shariff, A., Rahwan, I.: The social dilemma of autonomous vehicles. Science 352(6397), 36–37 (2016)
Cecchini, D.: Moral intuition, strength, and metacognition. Philos. Psychol. 36(1), 4–28 (2023)
Cecchini, D., Brantley, S., Dubljević, D.: Moral judgment in realistic traffic scenarios: moving beyond the trolley paradigm for ethics of autonomous vehicles. AI Soci (2023). https://doi.org/10.1007/s00146-023-01813-y
Christian, B.: The Alignment Problem: Machine Learning and Human Values. W.W. Norton & Company, New York (2020)
Curry, O.S., Mullins, D.A., Whitehouse, H.: Is it good to cooperate? testing the theory of morality-as-cooperation in 60 societies. Curr. Anthropol. 60(1), 47–69 (2019)
Curry, O.S., Alfano, M., Brandt, M.J., Pelican, C.: Moral molecules: morality as a combinatorial system. Rev. Philos. Psychol. 13, 1039–1058 (2021)
Dabbagh, H.: Intuitions about moral relevance—good news for moral intuitionism. Philos. Psychol. 34(7), 1047–1072 (2021)
Dasgupta, N.: Implicit attitudes and beliefs adapt to situations: A decade of research on the malleability of implicit prejudice, stereotypes, and the self-concept. In: Devine, P., Plant, A. (eds.) Advances in Experimental Social Psychology, vol. 47, pp. 233–279. Academic Press, Burlington (2013)
Dubljević, V., List, G., Milojevich, J., Ajmeri, N., Bauer, W.A., Singh, M.P., Bardaka, E., et al.: Toward a rational and ethical sociotechnical system of autonomous vehicles: A novel application of multi-criteria decision analysis. PLoS ONE 16(8), e0256224 (2021)
Dung, L.: Current cases of AI misalignment and their implications for future risks. Synthese 202, 138 (2023)
European Union: Artificialintelligenceact.eu. https://artificialintelligenceact.eu/. Accessed May 2024 (2024)
Evans, J., Stanovich, K.: Dual-process theories of higher cognition: advancing the debate. Perspect. Psychol. Sci. 8(3), 223–241 (2013)
Faulhaber, A.K., Dittmer, A., Blind, F., Wächter, M.A., Timm, S., Sütfeld, L.R., Stephan, A., Pipa, G.: Human decisions in moral dilemmas are largely described by utilitarianism: virtual car driving study provides guidelines for autonomous driving vehicles. Sci. Eng. Ethics 25, 399–418 (2019)
Floridi, L.: The Ethics of Artificial Intelligence: Principles, Challenges, and Opportunities. Oxford University Press, Oxford (2023)
Floridi, L., Cowls, J., Beltrametti, M., et al.: AI4People–-an ethical framework for good AI society: opportunities, risks, principles, and recommendations. Mind. Mach. 28, 689–707 (2018)
Forscher, P.S., Lai, C.K., Axt, J.R., Ebersole, C.R., Herman, M., Devine, P.G.: A meta-analysis of procedures to change implicit measures. J. Personal. Soc. Psychol. Attitudes Soc. Cognit. 117(3), 522–559 (2019)
Gabriel, I.: Artificial Intelligence, values, and alignment. Mind. Mach. 30, 411–437 (2020)
Hager, G.D., Drobnis, A., Fang, F., Ghani, R., Greenwald, A., Lyons, T., Parkes, D.C., Schultz, J., Saria, S., Smith. S.F.: Artificial intelligence for social good. ar**v:1901.05406 (2019)
Haidt, J.: The Moral Emotions. In: Davidson, R.J., Scherer, K.R., Goldsmith, H.H. (eds.) Handbook of Affective Sciences, pp. 852–870. Oxford University Press, Oxford (2003)
Hauser, M., Cushman, F., Young, L., **, K., Mikhail, J.: A dissociation between moral judgments and justifications. Mind Lang. 22(1), 1–21 (2007)
Huemer, M.: Revisionary Intuitionism. Soc. Philos. Policy 25(1), 368–392 (2007)
IEEE: IEEE code of ethics. https://www.ieee.org/about/corporate/governance/p7-8.html. Accessed Jun 2023 (2020)
Jobin, A., Ienca, M., Vayena, E.: The global landscape of AI ethics guidelines. Nat. Mach. Intell. 1, 389–399 (2019)
Jonker, J.D.: Automation, alignment, and the cooperative interface. J. Ethics 1–22 (2023). https://doi.org/10.1007/s10892-023-09449-2
Kneer, M., Skoczen, I.: Outcome effects, moral luck and the hindsight bias. Cognition 232, 1–21 (2023)
Luetge, C., Rusch, H., Uhl, M.: Experimental Ethics: Toward an Empirical Moral Philosophy. Palgrave Macmillan, Houndmills, Basingstoke (2014)
Machery, E.: Philosophy Within Its Proper Bounds. Oxford University Press, Oxford (2017)
Mata, A.: Social metacognition in moral judgment: decisional conflict promotes perspective taking. J. Pers. Soc. Psychol. 117(6), 1061–1082 (2019)
May, J.: Regard for Reason in the Moral Mind. Oxford University Press, Oxford (2018)
Mercier, H., Sperber, D.: The Enigma of Reason. Harvard University Press, Cambridge (2017)
Mittelstadt, B.: Principles alone cannot guarantee ethical AI. Nature Machine Intelligence 1, 501–507 (2019)
Morley, J., Elhalal, A., Garcia, F., Kinsey, L., Moekander, J., Floridi, L.: Ethics as a service: a pragmatic operationalisation of AI ethics. Minds Mach. 31, 239–256 (2021)
Dubljević, V., Douglas, S., Milojevich, J., Ajmeri, N.: Moral and social ramifications of autonomous vehicles: a qualitative study of the perceptions of professional drivers. Behav. Inf. Technol. 42, 1271–1278 (2023). https://doi.org/10.1080/0144929X.2022.2070078
Morling, B.: Research Methods in Psychology: Evaluating a world of information. Norton & Company, New York (2018)
O’Neil, C.: Weapons of Math Destruction. Crown, New York (2016)
OpenAI: GPT-4 technical report. https://cdn.openai.com/papers/gpt-4.pdf. Accessed May 2023 (2023)
Pflanzer, M., Traylor, Z., Lyons, J.B., Dubljevic, V., Nam, C.S.: Ethics in human-AI teaming: principles and perspectives. AI Ethics 3, 917–935 (2022)
Polonioli, A., Vega-Mendoza, M., Blankinship, B., Carmel, D.: Reporting in experimental philosophy: current standards and recommendations for future practice. Rev. Philos. Psychol. 12, 49–73 (2021)
Rahwan, I.: Society-in-the-loop: programming the algorithmic social contract. Ethics Inf. Technol. 20, 5–14 (2018)
Rini, R.: Debunking debunking: a regress challenge for psychological threats to moral judgment. Philos. Stud. 173, 675–697 (2016)
Rosenthal, J.: Experimental philosophy is useful—but not in a specific way. In: Luetge, L., Rusch, H., Uhl, M. (eds.) Experimental Ethics: Towards an Empirical Moral Philosophy, pp. 211–226. Palgrave Macmillan, Houndsmill, Basingstoke, Hampshire (2014)
Russell, S.: Human Compatible: Artificial Intelligence and the Problem of Control. Penguin, New York (2019)
Sauer, H.: Moral Judgments as Educated Intuitions. MIT Press, Cambridge (2017)
Savulescu, J., Gyngell, C., Kahane, G.: Collective reflective equilibrium in practice (CREP) and controversial novel technologies. Bioethics 35, 652–663 (2021)
Seligman, M.E.P.: Flourish : A Visionary New Understanding of Happiness and Well-Being. Free Press, New York (2011)
Sterelny, K., Fraser, B.: Evolution and moral realism. British Journal of the Philosophy of Science 68(4), 981–1006 (2016)
Telkamp, J.B., Anderson, M.H.: The implications of diverse human moral foundations for assessing the ethicality of artificial intelligence. J. Bus. Ethics 178, 961–976 (2022)
The White House: Blueprint for an AI bill of rights. https://www.whitehouse.gov/ostp/ai-bill-of-rights/. Accessed May 2024 (2022)
Thompson, V., Turner, J.P., Pennycook, G.: Intuition, reason and metacognition. Cogn. Psychol. 63, 107–140 (2011)
Umbrello, S., van de Poel, I.: Map** value sensitive design onto AI for social good principles. AI Ethics 1, 283–296 (2021)
Wallach, W., Allen, C.: Moral Machines: Teaching Robots Right from Wrong. Oxford University Press, Oxford (2009)
Webb, H., Patel, M., Rovatsos, M., Davoust, A., Ceppi, S., Koene, A., Dowthwaite, L., Portillo, V.: “It would be pretty immoral to choose a random algorithm”: opening up algorithmic interpretability and trasparency. J. Inf. Commun. Ethics Soc. 17(2), 210–228 (2019)
Wong, D.: Moral Relativity. University of California Press, Berkeley (1984)
Wright, J.C.: Tracking instability in our philosophical judgments: is it intuitive? Philos. Psychol. 26(4), 485–501 (2013)
Funding
Funding was provided by NSF Division of Social and Economic Sciences (grant no. 2043612, awarded to V.D).
Author information
Authors and Affiliations
Contributions
D.C.: ideation, conceptualization, and draft preparation. M.P.: review and editing. V.D.: supervision, review, and editing.
Corresponding author
Ethics declarations
Conflict of interest
No conflict of interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cecchini, D., Pflanzer, M. & Dubljević, V. Aligning artificial intelligence with moral intuitions: an intuitionist approach to the alignment problem. AI Ethics (2024). https://doi.org/10.1007/s43681-024-00496-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s43681-024-00496-5