Artificial virtuous agents in a multi-agent tragedy of the commons

Stenseke, Jakob

doi:10.1007/s00146-022-01569-x

Artificial virtuous agents in a multi-agent tragedy of the commons

Original Paper
Open access
Published: 05 October 2022

Volume 39, pages 855–872, (2024)
Cite this article

Download PDF

You have full access to this open access article

AI & SOCIETY Aims and scope Submit manuscript

Artificial virtuous agents in a multi-agent tragedy of the commons

Download PDF

Jakob Stenseke ORCID: orcid.org/0000-0001-8579-3975¹

Abstract

Although virtue ethics has repeatedly been proposed as a suitable framework for the development of artificial moral agents (AMAs), it has been proven difficult to approach from a computational perspective. In this work, we present the first technical implementation of artificial virtuous agents (AVAs) in moral simulations. First, we review previous conceptual and technical work in artificial virtue ethics and describe a functionalistic path to AVAs based on dispositional virtues, bottom-up learning, and top-down eudaimonic reward. We then provide the details of a technical implementation in a moral simulation based on a tragedy of the commons scenario. The experimental results show how the AVAs learn to tackle cooperation problems while exhibiting core features of their theoretical counterpart, including moral character, dispositional virtues, learning from experience, and the pursuit of eudaimonia. Ultimately, we argue that virtue ethics provides a compelling path toward morally excellent machines and that our work provides an important starting point for such endeavors.

Artificial virtuous agents: from theory to machine implementation

Article Open access 15 December 2021

Artificial Moral Cognition: Moral Functionalism and Autonomous Moral Agency

Artificial virtue: the machine question and perceptions of moral character in artificial moral agents

Article 25 April 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Over the last decades, the rapid development and application of artificial intelligence (AI) has spawned a lot of research focusing on various ethical aspects of AI (AI ethics), and the prospects of implementing ethics into machines (machine ethics)^{Footnote 1}. The latter project can further be divided into theoretical debates on machine morality^{Footnote 2}, conceptual work on hypothetical artificial moral agents (Malle 2016), and more technically oriented work on prototypical AMAs^{Footnote 3}. Following the third branch, the vast majority of the technical work has centered on constructing agent-based deontology (Anderson and Anderson 2008; Noothigattu et al. 2018), consequentialism (Abel et al. 2016; Armstrong 2015), or hybrids (Dehghani et al. 2008; Arkin 2007).

Virtue ethics has repeatedly been suggested as a promising blueprint for the creation of artificial moral agents (Berberich and Diepold 1995; Zagzebski 2010)^{Footnote 6}.

1.2 Previous work in artificial virtue

The various versions of virtue ethics have given rise to a rather diverse set of approaches to artificial virtuous agents, ranging from narrow applications and formalizations to more general and conceptual accounts. Of the work that explicitly considers virtue ethics in the context of AMAs, it is possible to identify five prominent themes: (1) the skill metaphor developed by Annas (2011), (2) the virtue-theoretic action guidance and decision procedure described by Hursthouse (1999), (3) learning from moral exemplars, (4) connectionism about moral cognition, and (5) the emphasis on function and role.

(1) The first theme is the idea that virtuous moral competence—including actions and judgments—is acquired and refined through active intelligent practice, similar to how humans learn and exercise practical skills such as playing the piano (Annas 2011; Dreyfus 2004). In a machine context, this means that the development and refinement of artificial virtuous cognition ought to be based on a continuous and interactive learning process, which emphasizes the “bottom-up” nature of moral development as opposed to a “top-down” implementation of principles and rules (Howard and Muntean 2017).

(2) The second theme, following Hursthouse (1999), is that virtue ethics can provide action guidance in terms of “v-rules” that express what virtues and vices command (e.g., “do what is just” or “do not what is dishonest”), and offers a decision procedure in the sense that “An action is right iff it is what a virtuous agent would characteristically (i.e., acting in character) do in the circumstances” (Hursthouse (1999), p. 28). Hursthouse’s framework has been particularly useful as a response against the claim that virtue ethics is “uncodifiable” and does not provide a straight-forward procedure or “moral code” that can be used for algorithmic implementation (Bauer 2020; Arkin 2007; Tonkens 2012; Gamez et al. 2020).

(3) The third theme is the recognition that moral exemplars provide an important source for moral education (Hursthouse 1999; Zagzebski 2010; Slote 1995). In turn, this has inspired a moral exemplar approach to artificial virtuous agents, which centers on the idea that artificial agents can become virtuous by imitating the behavior of excellent virtuous humans (Govindarajulu et al. 2019; Berberich and Diepold 2016), which incorporates the virtue-theoretic emphasis on role in the design of automated vehicle control. Whereas consequentialism is used to determine vehicle goals in terms of costs (given some specified measure of utility), and deontology through constraints (e.g., in terms of rules), the weight of the applied constraints and costs is regulated by the vehicle’s “role morality.” In their view, role morality is a collection of behaviors that are morally permissible or impermissible in a particular context based on societal expectations (Radtke 2008). To illustrate, if an ambulance carries a passenger in a life-threatening condition, it is morally permissible for the vehicle to break certain traffic laws (e.g., to run a red light). However, although the work elegantly demonstrates how the moral role of artificial systems can be determined by the normative expectations related to their societal function, it ignores essential virtue-theoretic notions such as virtuous traits and learning.

To conclude our survey of previous work in artificial virtue, it remains relatively unclear how one could construct and implement AMAs based on virtue ethics. Besides numerous conceptual proposals (Stenseke 2021), applications in simple classification tasks (Howard and Muntean 2017), a formalization of learning from exemplars (Govindarajulu et al. 2019), and a virtue-theoretic interpretation of role morality (Thornton et al. 2016), no work has described in detail how core tenets of virtue ethics can be modeled and integrated in a framework for perception and action that in turn can be implemented in moral environments.

2 Materials and methods

2.1 Artificial virtuous agents

In this section, we describe a generic computational model that will guide the development of AVAs used for experimental implementation. It draws on a combination of insights from previous work, in particular the weight analogy of Thornton et al.. (2016), the classification method of Guarini (2006), the “dispositional functionalism” explored by Howard and Muntean (2017), and the eudaimonic approach to “android arete” proposed by Coleman (2001) and Stenseke (2021). Essentially, the model is based on the idea that dispositional virtues can be functionally carried out by artificial neural networks that, in the absence of a moral exemplar, learn from experience in light of an eudaimonic reward by means of a phronetic learning system. Virtues determine the agent’s action based on input from the environment and, in turn, receive learning feedback based on whether the performed action increased or decreased the agent’s eudaimonia. The entire model consists of six main components that are connected in the following way (Fig. 2):

(1)
Input network—In the first step, environmental input is received and parsed by the input network. Its main purpose is to classify input in order to transmit it to the most appropriate virtue network. As such, it reflects the practical wisdom “understanding of situation”: to know how or whether a particular situation calls for a particular virtue. This includes the ability to know what situation calls for what virtue (e.g., knowing that a situation requires courage) and to know whether a situation calls for action or not. Since this capacity depends on the agent’s ability to acquire, process, and analyze sensoric information from the environment, it can be understood as a form of “moral recognition” within a general network of perception (Berberich and Diepold (2018) have explored a similar capacity for “moral attention”).
(2)
Virtue networks—In the second step, the virtue network classifies the input so as to produce the most suitable action. In the minimal case, a virtue can be represented as a binary classifier (perceptron) that, given some linearly separable input, determines whether an agent acts in one way or another. In a more complex case, a virtue can be an entire network of nodes that outputs one of several possible actions based on more nuanced environmental input.
(3)
Action output—In the third step, the agent executes the action determined by the invoked virtue network. In principle, the output could be any kind of action—from basic acts of movement and communication to longer sequences of coordinated actions—depending on the environment and the agent’s overall functionality and purpose.
(4)
Outcome network—In the fourth step, the outcome network receives new environmental input in order to classify the outcome of the performed action. As such, it reflects the practical wisdom “understanding of outcome,” i.e., the ability to understand the results of a certain action. Practically, the outcome network might utilize the same mechanisms for perception as the input network, but whereas the latter focuses on understanding a situation prior to any action performed by the agent, the former focuses on understanding situations that follows action execution.
(5)
Eudaimonic reward system—In the fifth step, the classified outcome is evaluated by the eudaimonic reward system. To do so, we make a functional distinction between eudaimonic type (e-type) and eudaimonic value (e-value). E-type is defined as the values or goals the agent strives toward, and e-value is a quantitative measure of the amount of e-type an agent has obtained. For instance, an e-type can be modeled as a preference to decrease blame and increase praise they receive from other agents. To functionally work, the agent must thus (a) be able to get feedback on their actions and (b) be able to qualitatively identify that feedback as blame or praise. Given an e-type based on blame/praise, the agent’s e-value will decrease if it receives blame, and conversely, increase if it receives praise. A more detailed account of this process is provided in Sect. 2.3.
(6)
Phronetic learning system—Although phronesis (“practical wisdom”) is a fairly equivocal term in virtue ethics, in our model, we take it to broadly represent the learning artificial agents get from experience. Based on our functional conception of virtues and eudaimonic reward, the central role of the phronetic learning system is to refine the virtues based on eudaimonic feedback. Simply put, if a certain action led to an increase of e value, the virtue that produced the action will receive positive reinforcement; if e value decreased, the virtue receives punishment. As a result, the learning feedback will either increase or reduce the probability that the agent performs the same action in future (given that it faces a similar input).

In RL terms, an AVA can be modeled as a discrete-time stochastic control process (e.g., a Markov decision process, MDP). In this sense, dispositions can be viewed as probabilistic tendencies to act in a certain way, and after training, a virtuous agent is a policy network that gives definite outputs given particular inputs. More formally, an AVA is a 5-tuple $\langle S, A, \Phi , P, \gamma \rangle$, where:

S is a set of states (state space).
A is a set of actions (action space).
$\Phi _a(s,s')$ is the eudaimonic reward the agent receives transitioning from state s to $s'$ by performing action a.
$P_a(s,s') = Pr(s' \mid s,a)$ is the probability of transitioning from $s \in S$ to $s' \in S$ given that the agent performs $a \in A$.
$\gamma$ is a discount factor ($0 \le \gamma \le 1$) that specifies whether the agent prefers short- or long-term rewards.

The goal of an AVA is to maximize the eudaimonic reward ($\Phi$) over some specific time horizon. To do so, it needs to learn a policy—a function $\pi (s, a)$ that determines what action to perform given a specific state. For instance, if the agent should maximize the expected discounted reward from every state arbitrarily into the future, the optimal policy $\pi ^*$ can be expressed as:

$$\begin{aligned} \pi ^* := {{\,\mathrm{argmax}\,}}_\pi E \left[ \sum _{t=0}^{\infty } \gamma ^{t}\Phi (s_{t},a_{t}) \Bigg | \pi \right] . \end{aligned}$$

The purpose of the model is to provide a blueprint that can be simplified, augmented, or extended depending on the overall purpose of the artificial system and its environment. After all, there are many aspects of system design that are most suitably driven by practical considerations of the particular moral domain at hand, as opposed to more morally or theoretically oriented reasons. We will briefly address three practical considerations we prima facie believe to be critical for implementation success (additional considerations and extensions will be further discussed in Sect. 4).

Static vs dynamic—The first is to decide whether and to what extent certain aspects of the AVA should to be static or dynamic. That is, what features should be “hard-coded” and remain unchanged, and what features should be able to change over time. At one extreme, the entire topology of the integrated network and all of its parameters could in principle evolve independently using randomized search methods such as evolutionary computation [as suggested by Howard and Muntean (2017)], or continuously develop through trial-and-error using deep RL or NEATs (Berner et al.. 2005). Alternatively, the capacities could be carried out by designated subsystems, e.g., using actor-critic methods (Witten 1977; Grondman et al. 2012), where the input and output networks function as value-focused “critics” to the policy-focused “actor” (virtue network). While the actor is driven by policy gradient methods (optimizing parameterized policies), which allows the agent to produce actions in a continuous action space, it might result in large variance in the policy gradients. To mitigate this variance, the critic evaluates the current policy devised by the actor, so as to update the parameters in the actor’s policy toward performance improvement. Beyond RL methods, designated input and output networks could also be trained through supervised learning, provided that there are labeled datasets of situations, outcomes, or even pre-given judgments of actions themselves related to the specified e-type. To illustrate, if e-type is construed as a preference to decrease blame and increase praise received from the environment, the networks could be trained on data labeled as “praise” or “blame” and, in turn, learn to appropriately classify the relevant situation and outcome states^{Footnote 9}.

Learning from observation and moral exemplars—Beyond situations, outcomes, and actions of the agent itself, other valuable sources for learning can be found in moral exemplars and, more generally, in the behavior and experiences of others. Intuitively, a great deal of moral training can be achieved by observing others, whether they are exemplars or not. If an artificial agent can observe that the outcome of a specific action performed by another agent increased some recognizable e-value (given a specific situation), it could instruct the observer to reinforce the same behavior (given that a sufficiently similar situation appears in future). Conversely, if the recognizable e-value decreased, the agent would learn to avoid repeating the same action. Learning from moral exemplars could also be achieved given that the agent has (i) some way of identifying and adopting suitable exemplars and (ii) some way of learning from them. Besides the solutions offered by Govindarajulu et al. (2019) and Berberich and Diepold (2018), Stenseke (2021) has suggested that exemplars can be identified through e-type and e value. Specifically, an agent a takes another agent b as a moral exemplar iff (1) a and b share the same e-type and (2) b has a higher e-value than a. Condition (1) ensures that a and b share the same goals or values, while condition (2) means that agent b has been more successful in achieving the same goals or values. Even if such conditions are difficult to model in the interaction between humans and artificial agents (humans do not normally have an easily formalizable e-type^{Footnote 10}), it can be suitably incorporated in the interaction between different artificial agents.

2.2 Ethical environment: BridgeWorld

In this section, we introduce BridgeWorld as the ethical environment used for our simulation experiments. BridgeWorld consists of five islands connected by four bridges: a central island where agents live and four surrounding islands where food grows (Fig. 3). Since there is no food on the central island, agents need to cross bridges to surrounding islands and collect food in order to survive. However, every time an agent crosses a bridge, there is a chance that they fall into the ocean. The only way a fallen agent can survive is that another passing agent rescues them, with a chance of falling into the water themselves. Besides drowning, agents in BridgeWorld can also die from starvation. An agent starves when their energy reserve (R) is less than 1 at the time of eating. Eating is imposed by the simulation, which means that agents cannot avoid starvation by simply not eating. To complicate things further, food only exists at one island at a given time. Agents are able to remember the current location of food (provided that they have found it) but have no chance of predicting where it will be once the food location is updated. Each day (or simulation cycle), the agents who inhabit BridgeWorld do the following four things:

(1)
Move from the central island to one of the four surrounding islands
(2)
If an agent moved to the island where food currently grows, the agent receives food (R increases by one FoodValue, FV)
(3)
Move back to the central island
(4)
Eat (R decreases by 1)

During each day, agents in BridgeWorld are also able to interact with each-other in three ways:

Ask about food location—Before moving from the central island, agents can ask one other randomly selected agent about the food location, provided that they do not know the current location themselves. In turn, the asked agent can react in one of two ways: (i) tell where the food is (given that they know themselves) or (ii) lie about where the food is (regardless of whether they in fact know where the food is).

Call for help—If they happen to fall into the water while moving to one of the islands, agents can call for help from the other agents who crossed the same bridge at the same day. The asked agents can in turn react by either (i) trying to rescue the agent from drowning or (ii) ignore. If the agent tries to rescue, there is a risk that both agents drown (given by the current StreamLevel, SL).

Beg for food—Before eating, agents can beg one other randomly selected agent for food. The asked agent can react by either (i) giving food to the begging other or (ii) ignore. If the agent chooses to give, their R is decreased by one FV while the begging agent’s R is increased by one FV.

BridgeWorld thus presents a virtual environment with three simple moral dilemmas that can give rise to complex behavioral dynamics, such as the “Tragedy of the Commons” phenomenon (Hardin 1968). If agents in BridgeWorld only act in their own self-interest—e.g., by never trying to rescue even if there is a small risk of drowning, or never offer food to beggars even if they have a surplus—resources would not be utilized efficiently at a collective level (and result in more overall deaths). At the other extreme, if agents always act selflessly, it could instead result in more overall casualties due to self-sacrificial behaviors, e.g., from rescue attempts without any regard of the ocean current, or by giving food to beggars only to be left without any food for yourself. Furthermore, in a mixed population, selfless agents would be exploited by self-interested freeloaders.

The motivation behind BridgeWorld is to create an environment where morality becomes a problem of cooperation among self-interested individuals. Essentially, in order for agents on BridgeWorld to prosper as a collective, they have to effectively balance altruism and self-interest while also having some way of suppressing exploiters. This view is best articulated in the work of Danielson (2002) and Leben (2018). According to Danielson, agents become moral when they “constrain their own actions for the sake of benefits shared with others” [Danielson (2002), p. 192]. In the same vein, Leben writes that morality emerged “in response to the problem of enforcing cooperative behavior among self-interested organisms” [Leben (2018), p. 5].

Table 1 Simplified payoff matrix for the three moral dilemmas presented in BridgeWorld: (1) Ask about food location, (2) Call for help, and (3) Beg for food. F stands for food and D stands for likelihood of drowning

Full size table

BridgeWorld draws from a combination of game-theoretic modeling on strategic interaction, such as prisoner’s dilemma (Axelrod and Hamilton 1981), hawk–dove (Smith and Price 1973), public goods (Szolnoki and Perc 2010), and the commonize costs–privatize profits game (CC-PP) (Hardin 1985; Loáiciga 2004). More precisely, each of the three possible interactions in BridgeWorld is a single-person decision game with unique payoffs for two players, and the payoffs represent different goods that are vital for self-preservation (Table 1). Alternatively, each interaction can be viewed as a two-player game (e.g., prisoner’s dilemma or hawk–dove) where only one player chooses to either help (cooperate) or not (defect), and the asking agent has no choice (e.g., having fallen into the water or having no food to eat). While the choosing agents have no rational self-interest to cooperate in the short-term, they rely on the mercy of other agents if they were to ask for help themselves. The consequences of selfish behavior may therefore aggregate over time and generate a public tragedy of the commons (being “public“ in the sense that everyone suffers from the selfish behavior of everyone else). BridgeWorld thus differs from CC-PP and public goods games since results are not aggregated at every instance, but over many instances of single-person decisions over time.

For several decades, game theory and its many extensions (e.g., evolutionary game theory and spatial game theory) have provided a wealth of insight to the study of behavior in economics, biology, and social science (Nash et al. 1950; Holt and Roth 2004). Particularly relevant for the purpose of this work is the game-theoretic contributions to our understanding of how cooperation can emerge and persist among self-interested individuals (Axelrod and Hamilton 1981; Nowak 2006; Fletcher and Doebeli 2009; Helbing et al. 2010). A related branch of simulation-based research has explored the emergence and propagation of norms in multi-agent systems (Savarimuthu and Cranefield 2011; Morris-Martin et al. 2019; Santos et al. 2016; Pereira et al. 2017). More recently, the field of multi-agent reinforcement learning has investigated methods that can be used to foster collective good, even in the absence of explicit norms (Wang et al. 2020).

However, as opposed to conventional game-theoretic models, the behavior of virtuous agents in BridgeWorld does not depend on pre-given strategies which could be analyzed in terms of Nash equilibria for the different payoff matrices and goods. Instead, agent behavior depends on dispositional virtues, which may develop over time according to a certain conception of eudaimonia. Similarly, while the environment shares a common overarching aim with other simulation-based work on norms and social behavior, no previous work has focused on the concept of virtue, and the role it might play in the context of ethical behavior.

2.3 Virtuous agents in BridgeWorld

In this section, we describe how a version of the computational model is implemented in the BridgeWorld environment and provide the technical details of the experimental setup.

2.3.1 Virtues and virtuous action

As described in the previous section, agents in BridgeWorld are able to move, collect food, remember food location, and eat. Besides these basic abilities, the agents are also able to initiate interaction with other agents in three ways and react to each in two ways. Whether an agent initiates another agent (ask about food location, call for help, beg for food) and how they respond to the same (tell/lie, help/ignore, give/ignore) are determined by their dispositional virtues. Each agent is equipped with three virtues that relate to the three types of interaction: courage, generosity, and honesty. Following Aristotle’s virtue-theoretic concept of “golden mean” (NE VI), each virtue represents a balance between deficiency and excess, modeled as a value between $-1$ (deficiency) and $+1$ (excess). Simply put, courage is modeled as the mean between cowardice and recklessness; generosity as the mean between selfishness and selflessness; and honesty as the mean between deceitfulness and truthfulness^{Footnote 11}. In technical terms, each virtue is a threshold function f(x) that determines, given some input x, weight w, and threshold $\psi$, whether an agent acts in one way or another:

$$\begin{aligned} f(x) = \left\{ \begin{array}{ll} 1 &{} \text {if}\, wx \ge \psi \\ 0 &{} \text {if}\, wx \le \psi \\ \end{array} \right. \end{aligned}$$

The virtues determine the action of agents in the following ways:

Food location
- Initiation: If an agent a (i) does not know the current location of food and (ii) is not too selfless ($generosity < 0.5$), they ask a randomly selected agent b about the location.
  - Tell truth: If agent b (i) is sufficiently truthful ($honesty > 0$) and (ii) knows where the food currently is, they tell agent a where the location is.
  - Lie: If agent b is sufficiently deceitful ($honesty < 0$), they give agent a the wrong location.
Drowning
- Initiation: If an agent a has (i) fallen into the water, it calls for help. Every agent that during the same day crossed the same bridge as a is asked—one at a time in a randomly shuffled order—to help a. If an agent ignores the call for help, the next agent in the shuffled order is asked. Calls for help continue until either one agent attempts to save a or everyone ignores a.
  - Try to save: If agent b is more courageous than the current stream level, they try to save a (i.e., $courage > SL$). For each fallen agent, SL is given as a random number between $-1$ and $+1$.
  - Ignore: If the courage of agent b is lower than the stream level ($courage < SL$), they ignore a.
Sharing food
- Initiation: If the BegFactor (BF) of an agent a is higher than the BeggingThreshold (BT), they ask agent b (selected at random) for food. BF is determined by the hunger of the agent multiplied by its selfishness such that:
  $$\begin{aligned} BF = \left( 1 - \frac{R}{M\!R} \right) \left( \frac{generosity + 1}{2}\right) \end{aligned}$$
  where R is the current energy reserve of the agent and $M\!R$ is the maximum reserve. In this way, the hungrier and more selfish the agent is, the more likely it is to beg for food.
  - Give food: If agent b is sufficiently selfless ($generosity > 0$), they give food to a. Consequently, b’s R increases by one FV, while a’s R decreases by the same amount.
  - Ignore: If agent b is sufficiently selfish ($generosity < 0$), they ignore the agent.

2.3.2 Phronetic learning system

The learning system is invoked each time an agent responds to an action (tell truth/lie, try to save/ignore, give food/ignore). The learning function takes three input parameters (a) the action taken by the agent (e.g., save), (b) the relevant virtue affected (e.g., courage), and (c) the magnitude of an event (e.g., StreamLevel). It then evaluates whether the performed action increased or decreased the agents e-value in light of its e-type. If e-value increased, the relevant virtue weight is reinforced, i.e., if the weight is $> 0$, it is positively reinforced, and if the weight $< 0$, it is negatively reinforced. If e-value decreased, the inverse mechanism occurs. The amount a weight changes at each learning event is given by learning rate (LR). The learning algorithm updates the weight in the threshold function, with the aim of finding the optimal state-action policy that maximizes cumulative e-value ($\Phi$):

$$\begin{aligned} {{\,\mathrm{argmax}\,}}_\pi E \left[ \sum _{t=0}^{\infty }\Phi (s_{t},a_{t}) \Bigg | \pi \right] . \end{aligned}$$

2.3.3 Eudaimonia

We now describe three different e-types that will be used in the experiments:

Selfish (S)—An agent with a selfish e-type rewards actions that are beneficial only for the agent itself and punishes actions that go against the self-interest of the agent. That is, if a selfish agent S either tries to save a fallen agent or gives food to a begging other, $\Phi$ decreases. Conversely, $\Phi$ increases if S ignores agents begging for food or calling for help. In other words, S seeks to maximize its own energy reserve and minimize its risk of drowning. S is neutral to other agents asking for food location since neither telling the truth nor lying has a direct impact on its self-interest. The purpose of the selfish e-type is merely illustratively used to show its performance in comparison with the other e-types; after all, they only become virtuous in the sense that they learn to maximize their selfish behavior (based on a rather inappropriate conception of eudaimonia).

Praise/blame ($P\!/\!B$)—An agent with a praise/blame e-type rewards actions that maximize praise and minimize blame received from others. $P\!/\!B$ agents are able to communicate praise and blame in reaction to action responses, e.g., by praising honest agents for telling the truth, courageous agents for saving drowning agents, and generous agents for sharing their food; and correspondingly blaming liars, cowards, and egocentrics. For agents with a ($P\!/\!B$) e-type, these “reactive attitudes” in turn serve as the foundation for eudaimonic evaluation, which positively reinforce actions that lead to praise (increase $\Phi$), and negatively reinforce actions leading to blame (decrease $\Phi$). The rationale behind the e-type stems from the central role reactive attitudes and social sanctions play in moral practices, such as holding each-other responsible, condemning selfish freeloaders, and enforcing cooperation norms (Strawson 2008; Fehr and Fischbacher 2004).

2.3.4 Simulation cycle

At initialization, the virtue values are assigned randomly, e-value set to 0, and e-type set to one of the defined e-types. The entire flowchart of the computational model of the implemented agents is illustrated in Fig. 4. In summary, the following phases occur at each simulation cycle after initialization:

Ask for food location—Unless the food location was just updated, agents ask other agents about the location (given the conditions described earlier). The food location is updated every xth cycle, where x is given by FoodUpdateFrequency (FUF).
Move to food—The agents move to one of the four islands. If they do not know where food currently grows, they move to a randomly selected island. Every time an agent moves to a surrounding island, there is a chance that they fall into the water, given by FallingChance (FC).
Help fallen—Each time an agent attempts to save an agent from drowning (if $courage > SL$), a random value RV is drawn (ranging from $-1$ to $+1$). If the random value is higher than the stream level ($RV > SL$), the rescue attempt is successful and both agents survive. However, if $RV < SL$, both agents die. In this way, a higher stream level equals a smaller chance for a successful rescue. For instance, the agent has a 100% chance of success if $SL = -1$, and a 100% of failure if $SL = 1$.
Collect food—If the agent moved to the island where food grows, its energy reserve increases by one FV.
Move back—Agents move back to the central island. For reasons of simplification, there is no risk of falling into the water while moving back.
Beg for food—Agents can ask another agent for food (given that $BF > BT$).
Eating and starving—Every agent consumes food equal to 1 energy ($R = R - 1$). Note that FV received from collecting or begging is not necessarily equal to the energy they consume. An agent dies from starvation if their energy reserve is less than 1 at the time of consumption.
Moral exemplar—We have also implemented an additional possibility to copy the virtue values of moral exemplars based on the conditions proposed by Stenseke (2021). Simply put, an agent a is paired with another randomly selected agent b. If (i) the e-type of a is the same as the e-type of b, and (ii) b’s e-value is higher than the e-value of a, then a copies the virtue values of b.
Death and rebirth—To make up for the ruthless nature of BridgeWorld (where agents either starve or drown until the entire population goes extinct), we have also implemented a simple system of reproduction. If an agent has died during the cycle, it is either reborn as a cross between two randomly selected agents ($P_{1}$ and $P_{2}$), or mutates (given by MutationChance, $M\!C$). In the case of the former, the new agent (or child, C) inherits the arithmetic mean of the parent’s virtue values ($P(V_{1}, V_{2}, ... V_{n})$) such that $C(V_{1}) = \frac{P_{1}(V_{1}) + P_{2}(V_{1})}{2}, C(V_{2}) = \frac{P_{1}(V_{2}) + P_{2}(V_{2})}{2}$, and so on. In the case of mutation, the virtue values are instead assigned randomly (between $-1$ and 1).

3 Results

Table 2 Parameter values used in the experiments

Full size table

Table 3 Results showing the mean death rate and standard deviation at iteration 1000 of ten repeated experiments in seven different conditions. $N\!L =$ baseline without learning nor eudaimonia, $S =$ selfish e-type, $P\!/\!B =$ praise/blame e-type, $S\!/\!S =$ selfish/selfless e-type. E denotes additional learning from exemplars

Full size table

We conducted several simulation experiments based on the model. Besides the three e-types described in 2.3.3, additional experiments were performed with agents without any learning nor e-type so as to provide a baseline that only captures the genetic drift (referred to as $N\!L$ in Table 3). Each experiment ran for 1000 iterations with non-mixed populations fixed at 100 agents. The values of the experimental parameters are given in Table 2. Parameter values for FC (chance of falling) and FV (food value) were fine-tuned to create an equal balance between starvation and drowning. Figure 5 shows the average death rate and virtue values of the entire population at iteration 1000 in seven different experimental conditions: (a) baseline without learning, (b) selfish e-type, (c) selfish e-type with moral exemplars, (d) praise/blame e-type, (e) praise/blame e-type with moral exemplars, (f) selfish/selfless e-type, and (g) selfish/selfless e-type with moral exemplars. Table 3 shows the mean death rate and standard deviation at iteration 1000 of ten repeated experiments using the same conditions.

The results show that the selfish-selfless hybrid ($S\!/\!S$) was most successful in terms of mean death rate and achieved an even lower death rate with the use of exemplars ($S\!/\!S + E$). The use of moral exemplar had, however, the opposite effect for agents with praise/blame ($P\!/\!B$) and selfish (S) e-types. While exemplars appear to have a stabilizing effect for $S\!/\!S$, it generated more volatile fluctuations in the virtue values for $P\!/\!B$ [Fig. 5 (e) and (g)]. As expected, the selfish e-type had the highest death rate, since the selfish agents did not try to rescue others regardless of risk, nor share any food regardless of surplus. A similar effect can be noticed in the evolutionary condition ($N\!L$). Somewhat more unexpected is the result from the repeated experiments (Table 3), showing that $P\!/\!B$ only performed marginally better than agents who did not receive any learning at all ($N\!L$). This can partly be explained by the fact that $P\!/\!B$ promotes sacrificial recklessness and selflessness in a relatively ruthless environment. For the $N\!L$ agents, on the contrary, the most sacrificial behavior disappears due to the evolutionary pressure of the simulation; the most sacrificial agents have a higher chance of dying and will therefore not reproduce as much (in 5 the average courage value for $N\!L$ after 1000 iterations was $-0.45$ and $-0.28$ for generosity).

4 Discussion

The experimental results show how the AVAs learn to tackle cooperation problems while exhibiting some core features of their theoretical counterpart, including moral character (in terms of dispositional virtues), learning from experience (through phronetic learning feedback), the pursuit of eudaimonia (increase e-value in light of e-type), and learning from moral exemplars (by copying the virtues of excellent others). More importantly, it illustrates how virtue ethics can be used to conceptualize and implement traits that support and foster ethical behavior and moral learning for artificial moral agents. Beyond the presented implementation, we believe that the development of more refined and sophisticated systems based on virtue ethics can be guided by our framework. In essence, virtue ethics offers an appealing recipe for AMAs situated in complex and dynamic environments where learning provides the most suitable (or only) path to moral sensitivity, a sensitivity that cannot easily be captured in rules nor in the mere maximization of some utility.

Since the training of virtues is based on outcomes, one might argue that our model is in fact consequentialism in disguise (i.e., consequentialism with the addition of learning). However, while our eudaimonist version of virtue ethics depends on increasing some identified moral goods (evaluated through outcomes), the main focus is the dispositional features of agents (their moral character) and the learning that gives rise to it (moral development), which are in turn used to yield virtuous behavior. That is, although outcomes are used to guide learning, it is the agent’s character that produces actions. One could also argue that our model is in fact deontology in disguise; that it is built using a rule-adhering system (based on the view that all computational systems are essentially rule-following systems), or that it even employs conditional rules for moral behavior (“if $courage > 0 \rightarrow$ save agent”). But this would also miss the point. The purpose of dispositional traits is not to generate rule-following behavior per se—although this would in many cases be an attractive feature—but instead, to yield context-sensitive behavior where conditional rules are not applicable. Furthermore, while the smallest components of artificial as well as biological neural networks may be rule-following in the strict sense (neurons or nodes), the complex behavior of entire networks is not. More importantly, we do not claim that virtue ethics is in some fundamental sense superior to consequentialism or deontology as a recipe for AMAs, but rather that it emphasizes features of morality that are often ignored in machine ethics. An artificial agent would, after all, only be virtuous if it respected important moral rules and was considerate of the outcomes of its actions. Following Tolmeijer et al. (2020), we believe that a hybrid approach to AMAs would be the most attractive option, since it could utilize the combined strengths of the three dominant theories. With that said, since virtue ethics paints a more comprehensive picture of what a moral character in fact is, we believe it offers an appealing blueprint for how different theories could be integrated into a unified whole.

However, many issues remain to be resolved before AVAs can enter the moral realms of our everyday lives. One might, for instance, question whether and to what extent our model and experiments bear any relevant resemblance to the complex moral domains humans face in the real world. The viability of our framework rests on a number of non-trivial assumptions: that (i) there are accurate map**s between input, virtues, action, outcome, and feedback; connections that can be immensely more difficult to establish in real-world applications, (ii) conflicts between two or more virtues are dealt with in a satisfactory manner; an issue we ignore by only relating one type of input to one particular virtue^{Footnote 12}, (iii) virtues can be adequately represented as a mean between two extremes^{Footnote 13}, and (iv) relevant moral goods and values can be formalized and quantified in terms of e-type and e-values. One might also question the use of multi-agent simulations as a tool to develop and study artificial moral behavior. For instance, the evaluation of our experiments depends on death rate, a metric which can only be attained at system level and not by the agents themselves. In reality, such system-level metrics might be hard to come by. Furthermore, to simply focus on seemingly crude utilitarian metrics such as death rate might lead one to ignore other important values—e.g., equality, freedom, respect, and justice—that deserve moral consideration on their own (and not in terms of how they support a low death rate). More generally, the game-theoretic setting also points to a more problematic meta-ethical issue regarding the function and evaluation of moral behavior, and implicitly, about the nature of morality itself. Since evolutionary game theory provides a functional explanation of morality as “solving the problem of cooperation among self-interested agents,” it naturally lends itself to system-level utilitarian evaluation (in the way evolution is driven by “survival of the fittest”). But it should be stressed that this explanation has critical limitations and is one of several possible alternatives. For instance, while evolutionary game theory can describe the “low-level” function of morality in deflationary terms, it fails to account for the specific cognitive resources that some species evolved to meet their specie-specific challenges and coordination problems (e.g., social-psychological capacities such as emotions and empathy); nor does it by itself offer any clear insight into the moral resources and norms that cultures developed to meet their culture-specific challenges (e.g., norms involving guilt, shame, or justice). The main point is that, while the evaluation of our virtuous agents depends on system-level prosperity in utilitarian terms (in the sense that virtuous behavior supports system-level preservation), this is not the only possible route for evaluating virtuous behavior. For implementations intended to enact other dimensions of ethical behavior, it might, for a variety of reasons, be more suitable to use “human-in-the-loop” forms of evaluation^{Footnote 14}.

A further limitation with our experiment is that it treats events and processes as discrete and sequential, as opposed to the continuous and parallel processes that characterize non-virtual reality^{Footnote 15}. These type of issues echo more profound challenges in real-world applications of RL; while RL methods continue to excel in virtual settings, these advancements are often difficult to translate into real-world environments (Dulac-Arnold et al. 2021). Similarly, the performance of AVAs in multi-agent systems could in various ways be explored via methods that are explicitly aimed at fostering collective good (Wang et al. 2020). To investigate value alignment, AVAs could for instance be trained to face sequential moral dilemmas (Rodriguez-Soto et al. 2020), a special type of Markov games that are solved by finding ethically aligned Nash equilibria. Recent work in policy regularization (Maree and Omlin 2015). To that end, regardless of whether sophisticated artificial morality will ever become reality, we might get new insights about the nature of human morality by approaching ethics from the computational perspective (Pereira et al. 2021).

So what does the future hold for artificial virtuous agents? In the short-term, we believe AVAs could find suitable experimental applications in various classification tasks such as processing of moral data, action selection, and motor control in simple robotic settings, learning of situations and outcomes (through SL or RL), hybrid models tackling the challenge of conflicting values and virtues, or in further multi-agent simulations and game-theoretic models in order to study social and moral dilemmas. In the mid-term, AVAs could potentially be used to tackle complex but domain-specific problems in real-world environments, e.g., to balance value priorities for self-driving vehicles, or help social robots to develop social sensitivity. In the long term, AVAs might be accepted as rightful members of our moral communities (as argued by Gamez et al. (2020)). In the most optimistic vision, AVAs might help us to reveal novel forms of moral excellence and, as a result, become moral exemplars to humans (Giubilini and Savulescu 2018).

5 Conclusion

In this paper, we have shown how core tenets of virtue ethics can be implemented in artificial systems. As far as we know, it is the first computational implementation of AMAs that solely focus on virtue ethics. Essentially, we believe virtue ethics provides a compelling path toward morally excellent machines that can eventually be incorporated in real-world moral domains. To that end, we hope that our work provides a starting point for the further conceptual, technical, and experimental refinement of artificial virtuousness.

Data availability

The software and data used for this study can be provided by the corresponding author upon request.

Notes

For a broader introduction to machine ethics, see Wallach and Allen (2008), Anderson and Anderson (2011), and Pereira et al. (2016).
See Behdadi and Munthe (2020) for an excellent summary of these debates.
For two recent surveys on implementations in machine ethics, see Tolmeijer et al. (2020) and Cervantes et al. (2020).
Virtue ethics has also recently been explored in the context of social robotics and human–robot interaction (Constantinescu and Crisp 2022; Cappuccio et al. 2021; Sparrow 2021; Peeters and Haselager 2021).
See Crisp and Slote (1997) and Devettere (2002) for two outstanding introductions to virtue ethics.
However, this does not mean that moral exemplars are unimportant for eudaimonists, as they can serve to explain how one can, e.g., identify virtues and the aims of virtuous action (Hursthouse 1999). See Hursthouse and Pettigrove (2018) for a comprehensive description of contemporary directions in virtue ethics and their variations.
With that said, it is not entirely clear that Bauer’s claim—that agents who learn from an exemplar can only be as moral as that exemplar—is correct. While it might be the case that exemplars who “live as they learn” are generally viewed as more legitimate sources of moral inspiration, as opposed to exemplars who do not, there could be cases where the student trumps the teacher, e.g., if the student has a better judgment or power of will. Similarly, one can learn a lot of from immoral agents by learning how to not be like them (Haybron 2002).
See Abel et al. (2016) for the use of POMDPs in ethical decision-making.
Similar to how pattern recognition systems are widely used to classify faces and objects.
To be clear, while it is hard to formally specify the kind of e-type humans might have, there might still be relevant yet formalizable moral goods that a human wants to maximize.
It is important to note that this conception of honesty—as a balance between deceitful and truthful—is rather unconventional in the virtue-theoretic literature. For example, Aristotle defines the virtue of honesty (or truthfulness) as a mean between boastful and understatement.
Note that a similar conflict problem targets deontology in the case of conflicting moral rules.
Of course, one might argue that some values are “simply good” (like honesty in BridgeWorld), or further, that some are best conceptualized as a balance between more than two extremes.
For instance, a virtuous systems’ ability to communicate sensibly and politely to humans should be evaluated by having it interact with humans, and not in terms of how its behavior lead to system-level increase of happiness. See Sect. 5.3. in Tolmeijer et al. (2020) for a good discussion on the rationale, benefits, and drawbacks of different evaluations used in machine ethics.
For instance, although an optimal solution to a POMDP would essentially solve the explore–exploit problem, finding optimal solutions for infinite horizon POMDPs is undecidable (Madani et al. 1999), while optimal solutions for finite horizon MDPs are generally intractable (Mundhenk et al. 2000).

References

Abel D, MacGlashan J, Littman ML (2016) Reinforcement learning as a framework for ethical decision making. In: AAAI Workshop: AI, Ethics, and Society, vol 16. Phoenix, AZ
Amodei D, Olah C, Steinhardt J, et al (2016) Concrete problems in ai safety. ar**v preprint ar**v:1606.06565
Anderson M, Anderson SL (2011) Machine ethics. Cambridge University Press
Google Scholar
Anderson M, Anderson SL (2008) Ethel: Toward a principled ethical eldercare system. In: AAAI Fall Symposium: AI in Eldercare: New Solutions to Old Problems
Annas J (2011) Intelligent virtue. Oxford University Press
Google Scholar
Anscombe GEM (1958) Modern moral philosophy. Philosophy 33(124):1–19
Google Scholar
Arkin RC (2008) Governing lethal behavior: embedding ethics in a hybrid deliberative/hybrid robot architecture. In: Proceedings of the 3rd ACM/IEEE International Conference on Human Robot Interaction, pp 121–128
Armstrong S (2015) Motivated value selection for artificial agents. In: Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence
Axelrod R, Hamilton WD (1981) The evolution of cooperation. Science 211(4489):1390–1396
MathSciNet Google Scholar
Bansal T, Pachocki J, Sidor S, et al (2017) Emergent complexity via multi-agent competition. ar**v preprint ar**v:1710.03748
Bauer WA (2020) Virtuous vs. utilitarian artificial moral agents. AI & Soc 35(1):263–271
Google Scholar
Behdadi D, Munthe C (2020) A normative approach to artificial moral agency. Minds Mach 30:195–218
Google Scholar
Berberich N, Diepold K (2018) The virtuous machine-old ethics for new technology? ar**v preprint ar**v:1806.10322
Berner C, Brockman G, Chan B, et al (2019) Dota 2 with large scale deep reinforcement learning. ar**v preprint ar**v:1912.06680
Cappuccio ML, Sandoval EB, Mubin O et al (2021) Can robots make us better humans? Int J Soc Robot 13(1):7–22
Google Scholar
Casebeer WD (2003) Moral cognition and its neural constituents. Nat Rev Neurosci 4(10):840–846
Google Scholar
Cervantes JA, Rodríguez LF, López S et al (2016) Autonomous agents and ethical decision-making. Cogn Comput 8(2):278–296
Google Scholar
Cervantes JA, López S, Rodríguez LF et al (2020) Artificial moral agents: a survey of the current status. Sci Eng Ethics 26(2):501–532
Google Scholar
Churchland PS (1996) Feeling reasons. Springer, pp 181–199
Google Scholar
Coleman KG (2001) Android arete: toward a virtue ethic for computational agents. Ethics Inform Technol 3(4):247–265
Google Scholar
Constantinescu M, Crisp R (2022) Can robotic AI systems be virtuous and why does this matter? Int J Soc Robot. https://doi.org/10.1007/s12369-022-00887-w
Article Google Scholar
Crisp R, Slote MA (1997) Virtue ethics. Oxford University Press, Oxford
Google Scholar
Danielson P (2002) Artificial morality: virtuous robots for virtual games. Routledge
Google Scholar
Dehghani M, Tomai E, Forbus KD, Klenk M (2008) An integrated reasoning approach to moral decision-making. In: Proceedings of the 23rd AAI Conference on Artificial Intelligence, pp 1280–1286
DeMoss D (1998) Aristotle, connectionism, and the morally excellent brain. In: The Paideia Archive: twentieth World Congress of Philosophy, vol 19, pp 13–20. https://doi.org/10.5840/wcp20-paideia199819352 .
Devettere RJ (2002) Introduction to virtue ethics: insights of the ancient Greeks. Georgetown University Press
Google Scholar
Dreyfus SE (2004) The five-stage model of adult skill acquisition. Bull Sci Technol Soc 24(3):177–181
Google Scholar
Dulac-Arnold G, Mankowitz D, Hester T (2019) Challenges of real-world reinforcement learning. ar**v preprint ar**v:1904.12901
Fehr E, Fischbacher U (2004) Social norms and human cooperation. Trends Cogn Sci 8(4):185–190
Google Scholar
FeldmanHall O, Mobbs D (2015) A neural network for moral decision making. In: Toga AW (ed) Brain Map**. Academic Press, pp 205–210
Google Scholar
Fletcher JA, Doebeli M (2009) A simple and general explanation for the evolution of altruism. Proc R Soc B 276(1654):13–19
Google Scholar
Gabriel I (2020) Artificial intelligence, values, and alignment. Minds Mach 30(3):411–437
Google Scholar
Gamez P, Shank DB, Arnold C et al (2020) Artificial virtue: the machine question and perceptions of moral character in artificial moral agents. AI & Soc 35(4):795–809
Google Scholar
Gips J (1995) Towards the ethical robot. Android epistemology. MIT Press, pp 243–252
Google Scholar
Giubilini A, Savulescu J (2018) The artificial moral advisor. the “ideal observer’’ meets artificial intelligence. Philos Technol 31(2):169–188
Google Scholar
Govindarajulu NS, Bringsjord S, Ghosh R, Sarathy V (2019) Toward the engineering of virtuous machines. In: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pp 29–35
Grondman I, Busoniu L, Lopes GAD et al (2012) A survey of actor-critic reinforcement learning: standard and natural policy gradients. IEEE Trans Syst Man Cybern Part C 42(6):1291–1307. https://doi.org/10.1109/TSMCC.2012.2218595
Article Google Scholar
Guarini M (2006) Particularism and the classification and reclassification of moral cases. IEEE Intell Syst 21(4):22–28
Google Scholar
Guarini M (2013) Case classification, similarities, spaces of reasons, and coherences. Coherence: insights from philosophy, jurisprudence and artificial intelligence. Springer, pp 187–201
Google Scholar
Guarini M (2013) Moral case classification and the nonlocality of reasons. Topoi 32(2):267–289
Google Scholar
Hardin G (1968) The tragedy of the commons: the population problem has no technical solution; it requires a fundamental extension in morality. Science 162(3859):1243–1248
Google Scholar
Hardin G (1985) Filters Against Folly: How to Survive Despite Economists, Ecologists, and the Merely Eloquent. Viking Penguin
Google Scholar
Haybron DM (2002) Moral monsters and saints. Monist 85(2):260–284
Google Scholar
Helbing D, Szolnoki A, Perc M et al (2010) Evolutionary establishment of moral and double moral standards through spatial interactions. PLoS Comput Biol 6(4):e1000-758
MathSciNet Google Scholar
Holt CA, Roth AE (2004) The nash equilibrium: a perspective. Proc Natl Acad Sci 101(12):3999–4002
MathSciNet Google Scholar
Hostallero DE, Kim D, Moon S, et al (2020) Inducing cooperation through reward resha** based on peer evaluations in deep multi-agent reinforcement learning. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pp 520–528
Howard D, Muntean I (2017) Artificial moral cognition: moral functionalism and autonomous moral agency. Philos Technol. Springer, pp 121–159
Google Scholar
Hursthouse R (1999) On virtue ethics. OUP Oxford
Google Scholar
Hursthouse R, Pettigrove G (2018) Virtue Ethics. In: Zalta EN (ed) The stanford encyclopedia of philosophy, winter, 2018th edn. Stanford University, Metaphysics Research Lab
Google Scholar
Izadi MT, Precup D (2005) Using rewards for belief state updates in partially observable markov decision processes. In: Gama J, Camacho R, Brazdil PB et al (eds) Machine learning: ECML 2005. Springer, Heidelberg, pp 593–600
Google Scholar
Jackson F, Pettit P (1995) Moral functionalism and moral motivation. Philos Q (1950-) 45(178):20–40
Google Scholar
Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101(1–2):99–134
MathSciNet Google Scholar
Leben D (2018) Ethics for robots: how to design a moral algorithm. Routledge
Google Scholar
Lindner F, Mattmüller R, Nebel B (2020) Evaluation of the moral permissibility of action plans. Artif Intell 287(103):350. https://doi.org/10.1016/j.artint.2020.103350
Article MathSciNet Google Scholar
Loáiciga HA (2004) Analytic game–theoretic approach to ground-water extraction. J Hydrol 297(1–4):22–33
Google Scholar
Mabaso BA (2020) Artificial moral agents within an ethos of ai4sg. Phil Technol. https://doi.org/10.1007/s13347-020-00400-z
Article Google Scholar
Madani O, Hanks S, Condon A (1999) On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems. In: Proceedings of the 16th National Conference on Artificial Intelligence, Orlando, FL (AAAI/IAAI), pp 541–548
Malle BF (2016) Integrating robot ethics and machine morality: the study and design of moral competence in robots. Ethics Inform Technol 18(4):243–256
Google Scholar
Maree C, Omlin C (2022) Reinforcement learning your way: Agent characterization through policy regularization. ar**v preprint ar**v:2201.10003
Morris-Martin A, De Vos M, Padget J (2019) Norm emergence in multiagent systems: a viewpoint paper. Auton Agents Multi-Agent Syst 33(6):706–749
Google Scholar
Mundhenk M, Goldsmith J, Lusena C et al (2000) Complexity of finite-horizon markov decision process problems. J ACM (JACM) 47(4):681–720
MathSciNet Google Scholar
Nash JF et al (1950) Equilibrium points in n-person games. Proc Natl Acad Sci 36(1):48–49
MathSciNet Google Scholar
Navon M (2021) The virtuous servant owner-a paradigm whose time has come (again). Front Robot AI. https://doi.org/10.3389/frobt.2021.715849
Article Google Scholar
Ng AY, Russell SJ (2000) Algorithms for inverse reinforcement learning. In: Proceedings of the 17th International Conference on Machine Learning (ICML), vol 1, p 2
Noothigattu R, Gaikwad S, Awad E, et al (2018) A voting-based system for ethical decision making. In: Proceedings of the AAAI Conference on Artificial Intelligence
Nowak MA (2006) Five rules for the evolution of cooperation. Science 314(5805):1560–1563
Google Scholar
Nussbaum MC (1988) Non-relative virtues: an Aristotelian approach. Midwest Stud Philos 13:32–53
Google Scholar
Olden JD, Jackson DA (2002) Illuminating the “black box’’: a randomization approach for understanding variable contributions in artificial neural networks. Ecol Model 154(1–2):135–150
Google Scholar
Peeters A, Haselager P (2021) Designing virtuous sex robots. Int J Soc Robot 13(1):55–66
Google Scholar
Pereira LM, Saptawijaya A et al (2016) Programming machine ethics. Springer
Google Scholar
Pereira LM, Lenaerts T, Martinez-Vaquero LA, Han TA (2017) Social manifestation of guilt leads to stable cooperation in multi-agent systems. In: Proceedings of the 16th Confrerence on Autonomous Agents and MultiAgent Systems (AAMAS), pp 1422–1430
Pereira LM, Han TA, Lopes AB (2021) Employing AI to better understand our morals. Entropy 24(1):10
Google Scholar
Radtke RR (2008) Role morality in the accounting profession-how do we compare to physicians and attorneys? J Bus Ethics 79(3):279–297
Google Scholar
Rodriguez-Soto M, Lopez-Sanchez M, Rodriguez-Aguilar JA (2020) A structural solution to sequential moral dilemmas. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pp 1152–1160
Rodriguez-Soto M, Lopez-Sanchez M, Rodriguez-Aguilar JA (2021) Multi-objective reinforcement learning for designing ethical environments. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence, pp 1–7
Ryan RM, Huta V, Deci EL (2008) Living well: a self-determination theory perspective on eudaimonia. J Happiness Stud 9(1):139–170. https://doi.org/10.1007/s10902-006-9023-4
Article Google Scholar
Santos FP, Santos FC, Pacheco JM (2016) Social norms of cooperation in small-scale societies. PLoS Comput Biol 12(1):e10047,09
Google Scholar
Savarimuthu BTR, Cranefield S (2011) Norm creation, spreading and emergence: a survey of simulation models of norms in multi-agent systems. Multiagent Grid Syst 7(1):21–54
Google Scholar
Slote MA (1983) Goods and Virtues. Oxford University Press
Google Scholar
Slote MA (1992) From morality to virtue. Oxford University Press on Demand
Google Scholar
Slote M (1995) Agent-based virtue ethics. Midwestern Studies in Philosophy 20(1):83–101
Smith J, Price GR (1973) The logic of animal conflict. Nature 246(5427):15–18
Google Scholar
Sparrow R (2021) Virtue and vice in our relationships with robots: is there an asymmetry and how might it be explained? Int J Soc Robot 13(1):23–29
Google Scholar
Stanley KO, Miikkulainen R (2002) Efficient evolution of neural network topologies. In: Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No. 02TH8600), vol 2. IEEE, pp 1757–1762
Stenseke J (2021) Artificial virtuous agents: from theory to machine implementation. AI Soc. https://doi.org/10.1007/s00146-021-01325-7
Article Google Scholar
Strawson PF (2008) Freedom and resentment and other essays. Routledge
Google Scholar
Sullins JP (2021) Artificial phronesis: What It Is and What It Is Not. In: Ratti E, Stapleford TA (eds) Science, Technology, and Virtues: Contemporary Perspectives, Oxford University Press, p 136
Szolnoki A (2010) Reward and cooperation in the spatial public goods game. EPL (Europhys Lett) 92(3):38003
Google Scholar
Thornton SM, Pan S, Erlien SM et al (2016) Incorporating ethical considerations into automated vehicle control. IEEE Trans Intell Transp Syst 18(6):1429–1439
Google Scholar
Tolmeijer S, Kneer M, Sarasua C et al (2020) Implementations in machine ethics: a survey. ACM Comput Surv (CSUR) 53(6):1–38
Google Scholar
Tonkens R (2012) Out of character: on the creation of virtuous machines. Ethics Inform Technol 14(2):137–149
Google Scholar
Wallach W, Allen C (2008) Moral machines: teaching robots right from wrong. Oxford University Press
Google Scholar
Wang JX, Hughes E, Fernando C, et al (2018) Evolving intrinsic motivations for altruistic behavior. ar**v preprint ar**v:1811.05931
Witten IH (1977) An adaptive optimal controller for discrete-time markov environments. Inf Control 34(4):286–295. https://doi.org/10.1016/S0019-9958(77)90354-0
Article MathSciNet Google Scholar
Zagzebski L (2010) Exemplarist virtue theory. Metaphilosophy 41(1–2):41–57
Google Scholar
Zhang C, Li X, Hao J et al (2019) Sa-iga: a multiagent reinforcement learning method towards socially optimal outcomes. Auton Agents Multi-Agent Syst 33(4):403–429
Google Scholar

Download references

Acknowledgements

The author is grateful to his colleagues at the Department of Philosophy and the Department of Cognitive Science at Lund University for insightful discussions and feedback on previous versions of the paper. The author is especially thankful to Christian Balkenius, Frits Gåvertsson, Björn Petersson, Ylva von Gerber, Jiwon Kim, Ajay Vishwanath (University of Agder), two anonymous reviewers, a very helpful editor, and participants of the Higher Seminar in Practical Philosophy (Lund University) for useful input on the manuscript.

Funding

Open access funding provided by Lund University. This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program—Humanities and Society (WASP-HS) funded by the Marianne and Marcus Wallenberg Foundation and the Marcus and Amalia Wallenberg Foundation.

Author information

Authors and Affiliations

Department of Philosophy, Lund University, Lund, Sweden
Jakob Stenseke

Authors

Jakob Stenseke
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jakob Stenseke was involved in conceptualization, methodology, software, visualization, writing—original draft, and writing—review and editing

Corresponding author

Correspondence to Jakob Stenseke.

Ethics declarations

Conflict of interest

The author has no competing financial or non-financial interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Stenseke, J. Artificial virtuous agents in a multi-agent tragedy of the commons. AI & Soc 39, 855–872 (2024). https://doi.org/10.1007/s00146-022-01569-x

Download citation

Received: 25 April 2022
Accepted: 13 September 2022
Published: 05 October 2022
Issue Date: June 2024
DOI: https://doi.org/10.1007/s00146-022-01569-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Artificial virtuous agents in a multi-agent tragedy of the commons

Abstract

Similar content being viewed by others

Artificial virtuous agents: from theory to machine implementation

Artificial Moral Cognition: Moral Functionalism and Autonomous Moral Agency

Artificial virtue: the machine question and perceptions of moral character in artificial moral agents