Aligning artificial intelligence with moral intuitions: an intuitionist approach to the alignment problem

Cecchini, Dario; Pflanzer, Michael; Dubljević, Veljko

doi:10.1007/s43681-024-00496-5

Aligning artificial intelligence with moral intuitions: an intuitionist approach to the alignment problem

Original Research
Published: 27 May 2024

(2024)
Cite this article

AI and Ethics Aims and scope Submit manuscript

180 Accesses
1 Altmetric
Explore all metrics

Abstract

As artificial intelligence (AI) continues to advance, one key challenge is ensuring that AI aligns with certain values. However, in the current diverse and democratic society, reaching a normative consensus is complex. This paper delves into the methodological aspect of how AI ethicists can effectively determine which values AI should uphold. After reviewing the most influential methodologies, we detail an intuitionist research agenda that offers guidelines for aligning AI applications with a limited set of reliable moral intuitions, each underlying a refined cooperative view of AI. We discuss appropriate epistemic tools for collecting, filtering, and justifying moral intuitions with the aim of reducing cognitive and social biases. The proposed methodology facilitates a large collective participation in AI alignment, while ensuring the reliability of the considered moral judgments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence, Values, and Alignment

Article Open access 01 September 2020

Artificial Moral Agents Within an Ethos of AI4SG

Article 28 April 2020

Collective Responsibility and Artificial Intelligence

Article Open access 20 February 2024

Notes

For example, the ultimate version of ChatGPT (GPT-4) performs better than average in many academic and professional exams [43].
Jonker [31] calls this aspect “social alignment”, while distinguishing it from “value alignment”, which concerns the safety of AI. By contrast, we understand “value alignment” more broadly, comprising social alignment.
In a similar vein, Morley and colleagues [39] distinguish two aspects in AI ethics: the “what”, i.e., the ethical principles for good AI, and the “how”, i.e., the identification of the tools and methods to apply in the principles. Also, Gabriel [24] discerns the “technical” and “normative” aspects of value alignment and examine the connections between the two.
The alignment process is likely to be iterative [57]. Following value implementation, developers receive feedback from the use of the systems. This feedback may prompt a recalibration of value setting.
This means that the reliability of a methodology can be ultimately assessed by the long term consequences produced by AI on society. In the meanwhile, philosophers can debate about that based on rational expectations and predictions.
Indeed, universal principles influenced the enactment of the first laws about AI in EU [18] and US [55].
For example, the need to expand datasets to program fair, unbiased algorithms may conflict with individual privacy rights over personal information.
General intuitions might be tested by qualitative methods that elicit reflection on ethical issues in AI (e.g., [16] and [40]). Instead, particular intuitions may require quantitative measurements of moral judgment in response to specific scenarios involving AI (e.g., [20]).
In support of these statements, see the already mentioned review by Jobin et al. [30]. For the claim that general intuitions tend to be more stable, see Dabbagh [14].
For example, in the ethics of autonomous vehicles, the principle of the Institute of Electrical and Electronics Engineers “to treat fairly all persons and to not engage in acts of discrimination based on race, religion, gender, disability, age, national origin, sexual orientation, gender identity, or gender expression” [29] has been challenged the particular intuition to prioritize the young over the elders when presented an avoidable accident [2, 20].
We disagree here with Huemer [28], according to which general moral intuitions are less prone to biases.
A recent study investigating algorithmic interpretability and transparency corroborates this hypothesis [59]. In the study, participants are asked to justify the implementation of an algorithm to allocate limited resources in different real-life scenarios; although the subjects opt for different solutions, moral concepts like “fairness” or “rightness” mostly guided their decisions.
For instance, a person inclined to the “authority” foundation might approve extensive data collection for security purposes, while a subject sensitive to “liberty” could not see that as a sufficient reason for the privacy intrusion [54].
For example, moral intuitions about a self-driving car’s decisions vary according to the weight given to the car’s driving style and reliability (A), the compliance with traffic norms (D), and whether the action results in an accident (C) [10].
We agree with Savulescu et al. [51] on the fact that “overlap** consensus” between intuitions from different sources is desirable and should be strongly considered in AI policy making. However, we assume here that consensus is not always possible and our discussion focuses on cases of reasonable disagreement.

References

Anderson, M., Anderson, S.L.: Case-supported principle-based behavior paradigm. In: Trappl, R. (ed.) A Construction Manual for Robots’ Ethical Systems: Requirements, Methods, Implementations, pp. 155–168. Springer, Cham (2015)
Chapter Google Scholar
Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., ShariffRahwan, A.J.-F., Bonnefon, I.: The moral machine experiment. Nature 563(7729), 59–64 (2018)
Article Google Scholar
Baase, S., Henry, T.M.: A Gift of Fire: Social, Legal, and Ethical Issues for Computing Technology. Pearson, New York (2018)
Google Scholar
Bargh, J.A.: The ecology of automaticity: toward establishing the conditions needed to produce automatic processing effects. Am. J. Psychol. 105(2), 181–199 (1992)
Article Google Scholar
Baumer, E.P.S., Polletta, F., Pierski, N., Gay, G.K.: A simple intervention to reduce framing effects in perceptions of global climate change. Environ. Commun. 11(3), 289–310 (2017)
Article Google Scholar
Bengson, J.: The intellectual given. Mind 124(495), 707–760 (2015)
Article Google Scholar
Bonnefon, J.-F., Shariff, A., Rahwan, I.: The moral psychology of ai and the ethical opt-out problem. In: Liao, S.M. (ed.) Ethics of Artificial Intelligence, pp. 109–126. Oxford University Press, Oxford (2020)
Chapter Google Scholar
Bonnefon, J.-F., Shariff, A., Rahwan, I.: The social dilemma of autonomous vehicles. Science 352(6397), 36–37 (2016)
Google Scholar
Cecchini, D.: Moral intuition, strength, and metacognition. Philos. Psychol. 36(1), 4–28 (2023)
Article Google Scholar
Cecchini, D., Brantley, S., Dubljević, D.: Moral judgment in realistic traffic scenarios: moving beyond the trolley paradigm for ethics of autonomous vehicles. AI Soci (2023). https://doi.org/10.1007/s00146-023-01813-y
Article Google Scholar
Christian, B.: The Alignment Problem: Machine Learning and Human Values. W.W. Norton & Company, New York (2020)
Google Scholar
Curry, O.S., Mullins, D.A., Whitehouse, H.: Is it good to cooperate? testing the theory of morality-as-cooperation in 60 societies. Curr. Anthropol. 60(1), 47–69 (2019)
Article Google Scholar
Curry, O.S., Alfano, M., Brandt, M.J., Pelican, C.: Moral molecules: morality as a combinatorial system. Rev. Philos. Psychol. 13, 1039–1058 (2021)
Article Google Scholar
Dabbagh, H.: Intuitions about moral relevance—good news for moral intuitionism. Philos. Psychol. 34(7), 1047–1072 (2021)
Article Google Scholar
Dasgupta, N.: Implicit attitudes and beliefs adapt to situations: A decade of research on the malleability of implicit prejudice, stereotypes, and the self-concept. In: Devine, P., Plant, A. (eds.) Advances in Experimental Social Psychology, vol. 47, pp. 233–279. Academic Press, Burlington (2013)
Google Scholar
Dubljević, V., List, G., Milojevich, J., Ajmeri, N., Bauer, W.A., Singh, M.P., Bardaka, E., et al.: Toward a rational and ethical sociotechnical system of autonomous vehicles: A novel application of multi-criteria decision analysis. PLoS ONE 16(8), e0256224 (2021)
Article Google Scholar
Dung, L.: Current cases of AI misalignment and their implications for future risks. Synthese 202, 138 (2023)
Article MathSciNet Google Scholar
European Union: Artificialintelligenceact.eu. https://artificialintelligenceact.eu/. Accessed May 2024 (2024)
Evans, J., Stanovich, K.: Dual-process theories of higher cognition: advancing the debate. Perspect. Psychol. Sci. 8(3), 223–241 (2013)
Article Google Scholar
Faulhaber, A.K., Dittmer, A., Blind, F., Wächter, M.A., Timm, S., Sütfeld, L.R., Stephan, A., Pipa, G.: Human decisions in moral dilemmas are largely described by utilitarianism: virtual car driving study provides guidelines for autonomous driving vehicles. Sci. Eng. Ethics 25, 399–418 (2019)
Article Google Scholar
Floridi, L.: The Ethics of Artificial Intelligence: Principles, Challenges, and Opportunities. Oxford University Press, Oxford (2023)
Book Google Scholar
Floridi, L., Cowls, J., Beltrametti, M., et al.: AI4People–-an ethical framework for good AI society: opportunities, risks, principles, and recommendations. Mind. Mach. 28, 689–707 (2018)
Article Google Scholar
Forscher, P.S., Lai, C.K., Axt, J.R., Ebersole, C.R., Herman, M., Devine, P.G.: A meta-analysis of procedures to change implicit measures. J. Personal. Soc. Psychol. Attitudes Soc. Cognit. 117(3), 522–559 (2019)
Article Google Scholar
Gabriel, I.: Artificial Intelligence, values, and alignment. Mind. Mach. 30, 411–437 (2020)
Article Google Scholar
Hager, G.D., Drobnis, A., Fang, F., Ghani, R., Greenwald, A., Lyons, T., Parkes, D.C., Schultz, J., Saria, S., Smith. S.F.: Artificial intelligence for social good. ar**v:1901.05406 (2019)
Haidt, J.: The Moral Emotions. In: Davidson, R.J., Scherer, K.R., Goldsmith, H.H. (eds.) Handbook of Affective Sciences, pp. 852–870. Oxford University Press, Oxford (2003)
Google Scholar
Hauser, M., Cushman, F., Young, L., **, K., Mikhail, J.: A dissociation between moral judgments and justifications. Mind Lang. 22(1), 1–21 (2007)
Article Google Scholar
Huemer, M.: Revisionary Intuitionism. Soc. Philos. Policy 25(1), 368–392 (2007)
Article Google Scholar
IEEE: IEEE code of ethics. https://www.ieee.org/about/corporate/governance/p7-8.html. Accessed Jun 2023 (2020)
Jobin, A., Ienca, M., Vayena, E.: The global landscape of AI ethics guidelines. Nat. Mach. Intell. 1, 389–399 (2019)
Article Google Scholar
Jonker, J.D.: Automation, alignment, and the cooperative interface. J. Ethics 1–22 (2023). https://doi.org/10.1007/s10892-023-09449-2
Article Google Scholar
Kneer, M., Skoczen, I.: Outcome effects, moral luck and the hindsight bias. Cognition 232, 1–21 (2023)
Article Google Scholar
Luetge, C., Rusch, H., Uhl, M.: Experimental Ethics: Toward an Empirical Moral Philosophy. Palgrave Macmillan, Houndmills, Basingstoke (2014)
Book Google Scholar
Machery, E.: Philosophy Within Its Proper Bounds. Oxford University Press, Oxford (2017)
Book Google Scholar
Mata, A.: Social metacognition in moral judgment: decisional conflict promotes perspective taking. J. Pers. Soc. Psychol. 117(6), 1061–1082 (2019)
Article Google Scholar
May, J.: Regard for Reason in the Moral Mind. Oxford University Press, Oxford (2018)
Book Google Scholar
Mercier, H., Sperber, D.: The Enigma of Reason. Harvard University Press, Cambridge (2017)
Book Google Scholar
Mittelstadt, B.: Principles alone cannot guarantee ethical AI. Nature Machine Intelligence 1, 501–507 (2019)
Article Google Scholar
Morley, J., Elhalal, A., Garcia, F., Kinsey, L., Moekander, J., Floridi, L.: Ethics as a service: a pragmatic operationalisation of AI ethics. Minds Mach. 31, 239–256 (2021)
Article Google Scholar
Dubljević, V., Douglas, S., Milojevich, J., Ajmeri, N.: Moral and social ramifications of autonomous vehicles: a qualitative study of the perceptions of professional drivers. Behav. Inf. Technol. 42, 1271–1278 (2023). https://doi.org/10.1080/0144929X.2022.2070078
Article Google Scholar
Morling, B.: Research Methods in Psychology: Evaluating a world of information. Norton & Company, New York (2018)
Google Scholar
O’Neil, C.: Weapons of Math Destruction. Crown, New York (2016)
Google Scholar
OpenAI: GPT-4 technical report. https://cdn.openai.com/papers/gpt-4.pdf. Accessed May 2023 (2023)
Pflanzer, M., Traylor, Z., Lyons, J.B., Dubljevic, V., Nam, C.S.: Ethics in human-AI teaming: principles and perspectives. AI Ethics 3, 917–935 (2022)
Article Google Scholar
Polonioli, A., Vega-Mendoza, M., Blankinship, B., Carmel, D.: Reporting in experimental philosophy: current standards and recommendations for future practice. Rev. Philos. Psychol. 12, 49–73 (2021)
Article Google Scholar
Rahwan, I.: Society-in-the-loop: programming the algorithmic social contract. Ethics Inf. Technol. 20, 5–14 (2018)
Article Google Scholar
Rini, R.: Debunking debunking: a regress challenge for psychological threats to moral judgment. Philos. Stud. 173, 675–697 (2016)
Article Google Scholar
Rosenthal, J.: Experimental philosophy is useful—but not in a specific way. In: Luetge, L., Rusch, H., Uhl, M. (eds.) Experimental Ethics: Towards an Empirical Moral Philosophy, pp. 211–226. Palgrave Macmillan, Houndsmill, Basingstoke, Hampshire (2014)
Google Scholar
Russell, S.: Human Compatible: Artificial Intelligence and the Problem of Control. Penguin, New York (2019)
Google Scholar
Sauer, H.: Moral Judgments as Educated Intuitions. MIT Press, Cambridge (2017)
Book Google Scholar
Savulescu, J., Gyngell, C., Kahane, G.: Collective reflective equilibrium in practice (CREP) and controversial novel technologies. Bioethics 35, 652–663 (2021)
Article Google Scholar
Seligman, M.E.P.: Flourish : A Visionary New Understanding of Happiness and Well-Being. Free Press, New York (2011)
Google Scholar
Sterelny, K., Fraser, B.: Evolution and moral realism. British Journal of the Philosophy of Science 68(4), 981–1006 (2016)
Article Google Scholar
Telkamp, J.B., Anderson, M.H.: The implications of diverse human moral foundations for assessing the ethicality of artificial intelligence. J. Bus. Ethics 178, 961–976 (2022)
Article Google Scholar
The White House: Blueprint for an AI bill of rights. https://www.whitehouse.gov/ostp/ai-bill-of-rights/. Accessed May 2024 (2022)
Thompson, V., Turner, J.P., Pennycook, G.: Intuition, reason and metacognition. Cogn. Psychol. 63, 107–140 (2011)
Article Google Scholar
Umbrello, S., van de Poel, I.: Map** value sensitive design onto AI for social good principles. AI Ethics 1, 283–296 (2021)
Article Google Scholar
Wallach, W., Allen, C.: Moral Machines: Teaching Robots Right from Wrong. Oxford University Press, Oxford (2009)
Book Google Scholar
Webb, H., Patel, M., Rovatsos, M., Davoust, A., Ceppi, S., Koene, A., Dowthwaite, L., Portillo, V.: “It would be pretty immoral to choose a random algorithm”: opening up algorithmic interpretability and trasparency. J. Inf. Commun. Ethics Soc. 17(2), 210–228 (2019)
Article Google Scholar
Wong, D.: Moral Relativity. University of California Press, Berkeley (1984)
Book Google Scholar
Wright, J.C.: Tracking instability in our philosophical judgments: is it intuitive? Philos. Psychol. 26(4), 485–501 (2013)
Article Google Scholar

Download references

Funding

Funding was provided by NSF Division of Social and Economic Sciences (grant no. 2043612, awarded to V.D).

Author information

Authors and Affiliations

Department of Philosophy and Religious Studies, North Carolina State University, 101 Lampe Drive, Raleigh, NC, 27695, USA
Dario Cecchini, Michael Pflanzer & Veljko Dubljević

Authors

Dario Cecchini
View author publications
You can also search for this author in PubMed Google Scholar
Michael Pflanzer
View author publications
You can also search for this author in PubMed Google Scholar
Veljko Dubljević
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.C.: ideation, conceptualization, and draft preparation. M.P.: review and editing. V.D.: supervision, review, and editing.

Corresponding author

Correspondence to Veljko Dubljević.

Ethics declarations

Conflict of interest

No conflict of interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cecchini, D., Pflanzer, M. & Dubljević, V. Aligning artificial intelligence with moral intuitions: an intuitionist approach to the alignment problem. AI Ethics (2024). https://doi.org/10.1007/s43681-024-00496-5

Download citation

Received: 25 February 2024
Accepted: 14 May 2024
Published: 27 May 2024
DOI: https://doi.org/10.1007/s43681-024-00496-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Aligning artificial intelligence with moral intuitions: an intuitionist approach to the alignment problem

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Artificial Intelligence, Values, and Alignment

Artificial Moral Agents Within an Ethos of AI4SG

Collective Responsibility and Artificial Intelligence

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Aligning artificial intelligence with moral intuitions: an intuitionist approach to the alignment problem

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Artificial Intelligence, Values, and Alignment

Artificial Moral Agents Within an Ethos of AI4SG

Collective Responsibility and Artificial Intelligence

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation