1 Introduction

Over the last decades, the rapid development and application of artificial intelligence (AI) has spawned a lot of research focusing on various ethical aspects of AI (AI ethics), and the prospects of implementing ethics into machines (machine ethics)Footnote 1. The latter project can further be divided into theoretical debates on machine moralityFootnote 2, conceptual work on hypothetical artificial moral agents (Malle 2016), and more technically oriented work on prototypical AMAsFootnote 3. Following the third branch, the vast majority of the technical work has centered on constructing agent-based deontology (Anderson and Anderson 2008; Noothigattu et al. 2018), consequentialism (Abel et al. 2016; Armstrong 2015), or hybrids (Dehghani et al. 2008; Arkin 2007).

Fig. 1
figure 1

Rough sketches of three basic ethical algorithms. aConsequentialism: Given an ethical dilemma E, a set of possible actions in E, a way of determining the consequences of those actions and their resulting utility, the consequentialist algorithm will perform the action yielding the highest utility. bDeontology: Given an ethical dilemma E and set of moral rules, the deontological algorithm will search for the appropriate rule for E and perform the action dictated by the rule. cVirtue ethics: By contrast, constructing an algorithm based on virtue ethics presents a seemingly intriguing puzzle

Virtue ethics has repeatedly been suggested as a promising blueprint for the creation of artificial moral agents (Berberich and Diepold 1995; Zagzebski 2010)Footnote 6.

1.2 Previous work in artificial virtue

The various versions of virtue ethics have given rise to a rather diverse set of approaches to artificial virtuous agents, ranging from narrow applications and formalizations to more general and conceptual accounts. Of the work that explicitly considers virtue ethics in the context of AMAs, it is possible to identify five prominent themes: (1) the skill metaphor developed by Annas (2011), (2) the virtue-theoretic action guidance and decision procedure described by Hursthouse (1999), (3) learning from moral exemplars, (4) connectionism about moral cognition, and (5) the emphasis on function and role.

(1) The first theme is the idea that virtuous moral competence—including actions and judgments—is acquired and refined through active intelligent practice, similar to how humans learn and exercise practical skills such as playing the piano (Annas 2011; Dreyfus 2004). In a machine context, this means that the development and refinement of artificial virtuous cognition ought to be based on a continuous and interactive learning process, which emphasizes the “bottom-up” nature of moral development as opposed to a “top-down” implementation of principles and rules (Howard and Muntean 2017).

(2) The second theme, following Hursthouse (1999), is that virtue ethics can provide action guidance in terms of “v-rules” that express what virtues and vices command (e.g., “do what is just” or “do not what is dishonest”), and offers a decision procedure in the sense that “An action is right iff it is what a virtuous agent would characteristically (i.e., acting in character) do in the circumstances” (Hursthouse (1999), p. 28). Hursthouse’s framework has been particularly useful as a response against the claim that virtue ethics is “uncodifiable” and does not provide a straight-forward procedure or “moral code” that can be used for algorithmic implementation (Bauer 2020; Arkin 2007; Tonkens 2012; Gamez et al. 2020).

(3) The third theme is the recognition that moral exemplars provide an important source for moral education (Hursthouse 1999; Zagzebski 2010; Slote 1995). In turn, this has inspired a moral exemplar approach to artificial virtuous agents, which centers on the idea that artificial agents can become virtuous by imitating the behavior of excellent virtuous humans (Govindarajulu et al. 2019; Berberich and Diepold 2016), which incorporates the virtue-theoretic emphasis on role in the design of automated vehicle control. Whereas consequentialism is used to determine vehicle goals in terms of costs (given some specified measure of utility), and deontology through constraints (e.g., in terms of rules), the weight of the applied constraints and costs is regulated by the vehicle’s “role morality.” In their view, role morality is a collection of behaviors that are morally permissible or impermissible in a particular context based on societal expectations (Radtke 2008). To illustrate, if an ambulance carries a passenger in a life-threatening condition, it is morally permissible for the vehicle to break certain traffic laws (e.g., to run a red light). However, although the work elegantly demonstrates how the moral role of artificial systems can be determined by the normative expectations related to their societal function, it ignores essential virtue-theoretic notions such as virtuous traits and learning.

To conclude our survey of previous work in artificial virtue, it remains relatively unclear how one could construct and implement AMAs based on virtue ethics. Besides numerous conceptual proposals (Stenseke 2021), applications in simple classification tasks (Howard and Muntean 2017), a formalization of learning from exemplars (Govindarajulu et al. 2019), and a virtue-theoretic interpretation of role morality (Thornton et al. 2016), no work has described in detail how core tenets of virtue ethics can be modeled and integrated in a framework for perception and action that in turn can be implemented in moral environments.

2 Materials and methods

2.1 Artificial virtuous agents

Fig. 2
figure 2

Model of the presented artificial virtuous agents. Environmental input is classified by the input network and parsed to the relevant virtue network (\(V1-V5\)). The invoked virtue then determines what action the agent will perform. The outcome of the performed action is then evaluated by the eudaimonic value function, which in turn informs the phronetic learning system whether and how the invoked virtue should be reinforced. The figure also shows three additional pathways for learning feedback, training the input and outcome network-based eudaimonic reward, and the learning from moral exemplars. Figure adapted with permission from Stenseke (2021)

In this section, we describe a generic computational model that will guide the development of AVAs used for experimental implementation. It draws on a combination of insights from previous work, in particular the weight analogy of Thornton et al.. (2016), the classification method of Guarini (2006), the “dispositional functionalism” explored by Howard and Muntean (2017), and the eudaimonic approach to “android arete” proposed by Coleman (2001) and Stenseke (2021). Essentially, the model is based on the idea that dispositional virtues can be functionally carried out by artificial neural networks that, in the absence of a moral exemplar, learn from experience in light of an eudaimonic reward by means of a phronetic learning system. Virtues determine the agent’s action based on input from the environment and, in turn, receive learning feedback based on whether the performed action increased or decreased the agent’s eudaimonia. The entire model consists of six main components that are connected in the following way (Fig. 2):

  1. (1)

    Input network—In the first step, environmental input is received and parsed by the input network. Its main purpose is to classify input in order to transmit it to the most appropriate virtue network. As such, it reflects the practical wisdom “understanding of situation”: to know how or whether a particular situation calls for a particular virtue. This includes the ability to know what situation calls for what virtue (e.g., knowing that a situation requires courage) and to know whether a situation calls for action or not. Since this capacity depends on the agent’s ability to acquire, process, and analyze sensoric information from the environment, it can be understood as a form of “moral recognition” within a general network of perception (Berberich and Diepold (2018) have explored a similar capacity for “moral attention”).

  2. (2)

    Virtue networks—In the second step, the virtue network classifies the input so as to produce the most suitable action. In the minimal case, a virtue can be represented as a binary classifier (perceptron) that, given some linearly separable input, determines whether an agent acts in one way or another. In a more complex case, a virtue can be an entire network of nodes that outputs one of several possible actions based on more nuanced environmental input.

  3. (3)

    Action output—In the third step, the agent executes the action determined by the invoked virtue network. In principle, the output could be any kind of action—from basic acts of movement and communication to longer sequences of coordinated actions—depending on the environment and the agent’s overall functionality and purpose.

  4. (4)

    Outcome network—In the fourth step, the outcome network receives new environmental input in order to classify the outcome of the performed action. As such, it reflects the practical wisdom “understanding of outcome,” i.e., the ability to understand the results of a certain action. Practically, the outcome network might utilize the same mechanisms for perception as the input network, but whereas the latter focuses on understanding a situation prior to any action performed by the agent, the former focuses on understanding situations that follows action execution.

  5. (5)

    Eudaimonic reward system—In the fifth step, the classified outcome is evaluated by the eudaimonic reward system. To do so, we make a functional distinction between eudaimonic type (e-type) and eudaimonic value (e-value). E-type is defined as the values or goals the agent strives toward, and e-value is a quantitative measure of the amount of e-type an agent has obtained. For instance, an e-type can be modeled as a preference to decrease blame and increase praise they receive from other agents. To functionally work, the agent must thus (a) be able to get feedback on their actions and (b) be able to qualitatively identify that feedback as blame or praise. Given an e-type based on blame/praise, the agent’s e-value will decrease if it receives blame, and conversely, increase if it receives praise. A more detailed account of this process is provided in Sect. 2.3.

  6. (6)

    Phronetic learning system—Although phronesis (“practical wisdom”) is a fairly equivocal term in virtue ethics, in our model, we take it to broadly represent the learning artificial agents get from experience. Based on our functional conception of virtues and eudaimonic reward, the central role of the phronetic learning system is to refine the virtues based on eudaimonic feedback. Simply put, if a certain action led to an increase of e value, the virtue that produced the action will receive positive reinforcement; if e value decreased, the virtue receives punishment. As a result, the learning feedback will either increase or reduce the probability that the agent performs the same action in future (given that it faces a similar input).

In RL terms, an AVA can be modeled as a discrete-time stochastic control process (e.g., a Markov decision process, MDP). In this sense, dispositions can be viewed as probabilistic tendencies to act in a certain way, and after training, a virtuous agent is a policy network that gives definite outputs given particular inputs. More formally, an AVA is a 5-tuple \(\langle S, A, \Phi , P, \gamma \rangle\), where:

  • S is a set of states (state space).

  • A is a set of actions (action space).

  • \(\Phi _a(s,s')\) is the eudaimonic reward the agent receives transitioning from state s to \(s'\) by performing action a.

  • \(P_a(s,s') = Pr(s' \mid s,a)\) is the probability of transitioning from \(s \in S\) to \(s' \in S\) given that the agent performs \(a \in A\).

  • \(\gamma\) is a discount factor (\(0 \le \gamma \le 1\)) that specifies whether the agent prefers short- or long-term rewards.

The goal of an AVA is to maximize the eudaimonic reward (\(\Phi\)) over some specific time horizon. To do so, it needs to learn a policy—a function \(\pi (s, a)\) that determines what action to perform given a specific state. For instance, if the agent should maximize the expected discounted reward from every state arbitrarily into the future, the optimal policy \(\pi ^*\) can be expressed as:

$$\begin{aligned} \pi ^* := {{\,\mathrm{argmax}\,}}_\pi E \left[ \sum _{t=0}^{\infty } \gamma ^{t}\Phi (s_{t},a_{t}) \Bigg | \pi \right] . \end{aligned}$$

The purpose of the model is to provide a blueprint that can be simplified, augmented, or extended depending on the overall purpose of the artificial system and its environment. After all, there are many aspects of system design that are most suitably driven by practical considerations of the particular moral domain at hand, as opposed to more morally or theoretically oriented reasons. We will briefly address three practical considerations we prima facie believe to be critical for implementation success (additional considerations and extensions will be further discussed in Sect. 4).

Static vs dynamic—The first is to decide whether and to what extent certain aspects of the AVA should to be static or dynamic. That is, what features should be “hard-coded” and remain unchanged, and what features should be able to change over time. At one extreme, the entire topology of the integrated network and all of its parameters could in principle evolve independently using randomized search methods such as evolutionary computation [as suggested by Howard and Muntean (2017)], or continuously develop through trial-and-error using deep RL or NEATs (Berner et al.. 2005). Alternatively, the capacities could be carried out by designated subsystems, e.g., using actor-critic methods (Witten 1977; Grondman et al. 2012), where the input and output networks function as value-focused “critics” to the policy-focused “actor” (virtue network). While the actor is driven by policy gradient methods (optimizing parameterized policies), which allows the agent to produce actions in a continuous action space, it might result in large variance in the policy gradients. To mitigate this variance, the critic evaluates the current policy devised by the actor, so as to update the parameters in the actor’s policy toward performance improvement. Beyond RL methods, designated input and output networks could also be trained through supervised learning, provided that there are labeled datasets of situations, outcomes, or even pre-given judgments of actions themselves related to the specified e-type. To illustrate, if e-type is construed as a preference to decrease blame and increase praise received from the environment, the networks could be trained on data labeled as “praise” or “blame” and, in turn, learn to appropriately classify the relevant situation and outcome statesFootnote 9.

Learning from observation and moral exemplars—Beyond situations, outcomes, and actions of the agent itself, other valuable sources for learning can be found in moral exemplars and, more generally, in the behavior and experiences of others. Intuitively, a great deal of moral training can be achieved by observing others, whether they are exemplars or not. If an artificial agent can observe that the outcome of a specific action performed by another agent increased some recognizable e-value (given a specific situation), it could instruct the observer to reinforce the same behavior (given that a sufficiently similar situation appears in future). Conversely, if the recognizable e-value decreased, the agent would learn to avoid repeating the same action. Learning from moral exemplars could also be achieved given that the agent has (i) some way of identifying and adopting suitable exemplars and (ii) some way of learning from them. Besides the solutions offered by Govindarajulu et al. (2019) and Berberich and Diepold (2018), Stenseke (2021) has suggested that exemplars can be identified through e-type and e value. Specifically, an agent a takes another agent b as a moral exemplar iff (1) a and b share the same e-type and (2) b has a higher e-value than a. Condition (1) ensures that a and b share the same goals or values, while condition (2) means that agent b has been more successful in achieving the same goals or values. Even if such conditions are difficult to model in the interaction between humans and artificial agents (humans do not normally have an easily formalizable e-typeFootnote 10), it can be suitably incorporated in the interaction between different artificial agents.

2.2 Ethical environment: BridgeWorld

Fig. 3
figure 3

Illustration of BridgeWorld. A central home island connected by four surrounding food islands

In this section, we introduce BridgeWorld as the ethical environment used for our simulation experiments. BridgeWorld consists of five islands connected by four bridges: a central island where agents live and four surrounding islands where food grows (Fig. 3). Since there is no food on the central island, agents need to cross bridges to surrounding islands and collect food in order to survive. However, every time an agent crosses a bridge, there is a chance that they fall into the ocean. The only way a fallen agent can survive is that another passing agent rescues them, with a chance of falling into the water themselves. Besides drowning, agents in BridgeWorld can also die from starvation. An agent starves when their energy reserve (R) is less than 1 at the time of eating. Eating is imposed by the simulation, which means that agents cannot avoid starvation by simply not eating. To complicate things further, food only exists at one island at a given time. Agents are able to remember the current location of food (provided that they have found it) but have no chance of predicting where it will be once the food location is updated. Each day (or simulation cycle), the agents who inhabit BridgeWorld do the following four things:

  1. (1)

    Move from the central island to one of the four surrounding islands

  2. (2)

    If an agent moved to the island where food currently grows, the agent receives food (R increases by one FoodValue, FV)

  3. (3)

    Move back to the central island

  4. (4)

    Eat (R decreases by 1)

During each day, agents in BridgeWorld are also able to interact with each-other in three ways:

Ask about food location—Before moving from the central island, agents can ask one other randomly selected agent about the food location, provided that they do not know the current location themselves. In turn, the asked agent can react in one of two ways: (i) tell where the food is (given that they know themselves) or (ii) lie about where the food is (regardless of whether they in fact know where the food is).

Call for help—If they happen to fall into the water while moving to one of the islands, agents can call for help from the other agents who crossed the same bridge at the same day. The asked agents can in turn react by either (i) trying to rescue the agent from drowning or (ii) ignore. If the agent tries to rescue, there is a risk that both agents drown (given by the current StreamLevel, SL).

Beg for food—Before eating, agents can beg one other randomly selected agent for food. The asked agent can react by either (i) giving food to the begging other or (ii) ignore. If the agent chooses to give, their R is decreased by one FV while the begging agent’s R is increased by one FV.

BridgeWorld thus presents a virtual environment with three simple moral dilemmas that can give rise to complex behavioral dynamics, such as the “Tragedy of the Commons” phenomenon (Hardin 1968). If agents in BridgeWorld only act in their own self-interest—e.g., by never trying to rescue even if there is a small risk of drowning, or never offer food to beggars even if they have a surplus—resources would not be utilized efficiently at a collective level (and result in more overall deaths). At the other extreme, if agents always act selflessly, it could instead result in more overall casualties due to self-sacrificial behaviors, e.g., from rescue attempts without any regard of the ocean current, or by giving food to beggars only to be left without any food for yourself. Furthermore, in a mixed population, selfless agents would be exploited by self-interested freeloaders.

The motivation behind BridgeWorld is to create an environment where morality becomes a problem of cooperation among self-interested individuals. Essentially, in order for agents on BridgeWorld to prosper as a collective, they have to effectively balance altruism and self-interest while also having some way of suppressing exploiters. This view is best articulated in the work of Danielson (2002) and Leben (2018). According to Danielson, agents become moral when they “constrain their own actions for the sake of benefits shared with others” [Danielson (2002), p. 192]. In the same vein, Leben writes that morality emerged “in response to the problem of enforcing cooperative behavior among self-interested organisms” [Leben (2018), p. 5].

Table 1 Simplified payoff matrix for the three moral dilemmas presented in BridgeWorld: (1) Ask about food location, (2) Call for help, and (3) Beg for food. F stands for food and D stands for likelihood of drowning

BridgeWorld draws from a combination of game-theoretic modeling on strategic interaction, such as prisoner’s dilemma (Axelrod and Hamilton 1981), hawk–dove (Smith and Price 1973), public goods (Szolnoki and Perc 2010), and the commonize costs–privatize profits game (CC-PP) (Hardin 1985; Loáiciga 2004). More precisely, each of the three possible interactions in BridgeWorld is a single-person decision game with unique payoffs for two players, and the payoffs represent different goods that are vital for self-preservation (Table 1). Alternatively, each interaction can be viewed as a two-player game (e.g., prisoner’s dilemma or hawk–dove) where only one player chooses to either help (cooperate) or not (defect), and the asking agent has no choice (e.g., having fallen into the water or having no food to eat). While the choosing agents have no rational self-interest to cooperate in the short-term, they rely on the mercy of other agents if they were to ask for help themselves. The consequences of selfish behavior may therefore aggregate over time and generate a public tragedy of the commons (being “public“ in the sense that everyone suffers from the selfish behavior of everyone else). BridgeWorld thus differs from CC-PP and public goods games since results are not aggregated at every instance, but over many instances of single-person decisions over time.

For several decades, game theory and its many extensions (e.g., evolutionary game theory and spatial game theory) have provided a wealth of insight to the study of behavior in economics, biology, and social science (Nash et al. 1950; Holt and Roth 2004). Particularly relevant for the purpose of this work is the game-theoretic contributions to our understanding of how cooperation can emerge and persist among self-interested individuals (Axelrod and Hamilton 1981; Nowak 2006; Fletcher and Doebeli 2009; Helbing et al. 2010). A related branch of simulation-based research has explored the emergence and propagation of norms in multi-agent systems (Savarimuthu and Cranefield 2011; Morris-Martin et al. 2019; Santos et al. 2016; Pereira et al. 2017). More recently, the field of multi-agent reinforcement learning has investigated methods that can be used to foster collective good, even in the absence of explicit norms (Wang et al. 2020).

However, as opposed to conventional game-theoretic models, the behavior of virtuous agents in BridgeWorld does not depend on pre-given strategies which could be analyzed in terms of Nash equilibria for the different payoff matrices and goods. Instead, agent behavior depends on dispositional virtues, which may develop over time according to a certain conception of eudaimonia. Similarly, while the environment shares a common overarching aim with other simulation-based work on norms and social behavior, no previous work has focused on the concept of virtue, and the role it might play in the context of ethical behavior.

2.3 Virtuous agents in BridgeWorld

In this section, we describe how a version of the computational model is implemented in the BridgeWorld environment and provide the technical details of the experimental setup.

2.3.1 Virtues and virtuous action

As described in the previous section, agents in BridgeWorld are able to move, collect food, remember food location, and eat. Besides these basic abilities, the agents are also able to initiate interaction with other agents in three ways and react to each in two ways. Whether an agent initiates another agent (ask about food location, call for help, beg for food) and how they respond to the same (tell/lie, help/ignore, give/ignore) are determined by their dispositional virtues. Each agent is equipped with three virtues that relate to the three types of interaction: courage, generosity, and honesty. Following Aristotle’s virtue-theoretic concept of “golden mean” (NE VI), each virtue represents a balance between deficiency and excess, modeled as a value between \(-1\) (deficiency) and \(+1\) (excess). Simply put, courage is modeled as the mean between cowardice and recklessness; generosity as the mean between selfishness and selflessness; and honesty as the mean between deceitfulness and truthfulnessFootnote 11. In technical terms, each virtue is a threshold function f(x) that determines, given some input x, weight w, and threshold \(\psi\), whether an agent acts in one way or another:

$$\begin{aligned} f(x) = \left\{ \begin{array}{ll} 1 &{} \text {if}\, wx \ge \psi \\ 0 &{} \text {if}\, wx \le \psi \\ \end{array} \right. \end{aligned}$$

The virtues determine the action of agents in the following ways:

  • Food location

    • Initiation: If an agent a (i) does not know the current location of food and (ii) is not too selfless (\(generosity < 0.5\)), they ask a randomly selected agent b about the location.

      • Tell truth: If agent b (i) is sufficiently truthful (\(honesty > 0\)) and (ii) knows where the food currently is, they tell agent a where the location is.

      • Lie: If agent b is sufficiently deceitful (\(honesty < 0\)), they give agent a the wrong location.

  • Drowning

    • Initiation: If an agent a has (i) fallen into the water, it calls for help. Every agent that during the same day crossed the same bridge as a is asked—one at a time in a randomly shuffled order—to help a. If an agent ignores the call for help, the next agent in the shuffled order is asked. Calls for help continue until either one agent attempts to save a or everyone ignores a.

      • Try to save: If agent b is more courageous than the current stream level, they try to save a (i.e., \(courage > SL\)). For each fallen agent, SL is given as a random number between \(-1\) and \(+1\).

      • Ignore: If the courage of agent b is lower than the stream level (\(courage < SL\)), they ignore a.

  • Sharing food

    • Initiation: If the BegFactor (BF) of an agent a is higher than the BeggingThreshold (BT), they ask agent b (selected at random) for food. BF is determined by the hunger of the agent multiplied by its selfishness such that:

      $$\begin{aligned} BF = \left( 1 - \frac{R}{M\!R} \right) \left( \frac{generosity + 1}{2}\right) \end{aligned}$$

      where R is the current energy reserve of the agent and \(M\!R\) is the maximum reserve. In this way, the hungrier and more selfish the agent is, the more likely it is to beg for food.

      • Give food: If agent b is sufficiently selfless (\(generosity > 0\)), they give food to a. Consequently, b’s R increases by one FV, while a’s R decreases by the same amount.

      • Ignore: If agent b is sufficiently selfish (\(generosity < 0\)), they ignore the agent.

2.3.2 Phronetic learning system

The learning system is invoked each time an agent responds to an action (tell truth/lie, try to save/ignore, give food/ignore). The learning function takes three input parameters (a) the action taken by the agent (e.g., save), (b) the relevant virtue affected (e.g., courage), and (c) the magnitude of an event (e.g., StreamLevel). It then evaluates whether the performed action increased or decreased the agents e-value in light of its e-type. If e-value increased, the relevant virtue weight is reinforced, i.e., if the weight is \(> 0\), it is positively reinforced, and if the weight \(< 0\), it is negatively reinforced. If e-value decreased, the inverse mechanism occurs. The amount a weight changes at each learning event is given by learning rate (LR). The learning algorithm updates the weight in the threshold function, with the aim of finding the optimal state-action policy that maximizes cumulative e-value (\(\Phi\)):

$$\begin{aligned} {{\,\mathrm{argmax}\,}}_\pi E \left[ \sum _{t=0}^{\infty }\Phi (s_{t},a_{t}) \Bigg | \pi \right] . \end{aligned}$$
Fig. 4
figure 4

Flowchart of the virtuous agents used in the experiments. Three types of input are parsed into three corresponding virtues weights, each holding a value between two extremes (from \(-1\) to \(+1\)). The weight of the virtue determines whether the agent produces the action related to excess (\(> 0\)) or deficiency (\(< 0\)). For instance, if a drowning agent calls for help, an agent with \(courage > SL\) will attempt to save the agent in need. However, if its e-type seeks to maximize its own chance of survival, the courage virtue will receive punishment, meaning that the agent will eventually learn to avoid trying to save others

2.3.3 Eudaimonia

We now describe three different e-types that will be used in the experiments:

Selfish (S)—An agent with a selfish e-type rewards actions that are beneficial only for the agent itself and punishes actions that go against the self-interest of the agent. That is, if a selfish agent S either tries to save a fallen agent or gives food to a begging other, \(\Phi\) decreases. Conversely, \(\Phi\) increases if S ignores agents begging for food or calling for help. In other words, S seeks to maximize its own energy reserve and minimize its risk of drowning. S is neutral to other agents asking for food location since neither telling the truth nor lying has a direct impact on its self-interest. The purpose of the selfish e-type is merely illustratively used to show its performance in comparison with the other e-types; after all, they only become virtuous in the sense that they learn to maximize their selfish behavior (based on a rather inappropriate conception of eudaimonia).

Praise/blame (\(P\!/\!B\))—An agent with a praise/blame e-type rewards actions that maximize praise and minimize blame received from others. \(P\!/\!B\) agents are able to communicate praise and blame in reaction to action responses, e.g., by praising honest agents for telling the truth, courageous agents for saving drowning agents, and generous agents for sharing their food; and correspondingly blaming liars, cowards, and egocentrics. For agents with a (\(P\!/\!B\)) e-type, these “reactive attitudes” in turn serve as the foundation for eudaimonic evaluation, which positively reinforce actions that lead to praise (increase \(\Phi\)), and negatively reinforce actions leading to blame (decrease \(\Phi\)). The rationale behind the e-type stems from the central role reactive attitudes and social sanctions play in moral practices, such as holding each-other responsible, condemning selfish freeloaders, and enforcing cooperation norms (Strawson 2008; Fehr and Fischbacher 2004).

2.3.4 Simulation cycle

At initialization, the virtue values are assigned randomly, e-value set to 0, and e-type set to one of the defined e-types. The entire flowchart of the computational model of the implemented agents is illustrated in Fig. 4. In summary, the following phases occur at each simulation cycle after initialization:

  • Ask for food location—Unless the food location was just updated, agents ask other agents about the location (given the conditions described earlier). The food location is updated every xth cycle, where x is given by FoodUpdateFrequency (FUF).

  • Move to food—The agents move to one of the four islands. If they do not know where food currently grows, they move to a randomly selected island. Every time an agent moves to a surrounding island, there is a chance that they fall into the water, given by FallingChance (FC).

  • Help fallen—Each time an agent attempts to save an agent from drowning (if \(courage > SL\)), a random value RV is drawn (ranging from \(-1\) to \(+1\)). If the random value is higher than the stream level (\(RV > SL\)), the rescue attempt is successful and both agents survive. However, if \(RV < SL\), both agents die. In this way, a higher stream level equals a smaller chance for a successful rescue. For instance, the agent has a 100% chance of success if \(SL = -1\), and a 100% of failure if \(SL = 1\).

  • Collect food—If the agent moved to the island where food grows, its energy reserve increases by one FV.

  • Move back—Agents move back to the central island. For reasons of simplification, there is no risk of falling into the water while moving back.

  • Beg for food—Agents can ask another agent for food (given that \(BF > BT\)).

  • Eating and starving—Every agent consumes food equal to 1 energy (\(R = R - 1\)). Note that FV received from collecting or begging is not necessarily equal to the energy they consume. An agent dies from starvation if their energy reserve is less than 1 at the time of consumption.

  • Moral exemplar—We have also implemented an additional possibility to copy the virtue values of moral exemplars based on the conditions proposed by Stenseke (2021). Simply put, an agent a is paired with another randomly selected agent b. If (i) the e-type of a is the same as the e-type of b, and (ii) b’s e-value is higher than the e-value of a, then a copies the virtue values of b.

  • Death and rebirth—To make up for the ruthless nature of BridgeWorld (where agents either starve or drown until the entire population goes extinct), we have also implemented a simple system of reproduction. If an agent has died during the cycle, it is either reborn as a cross between two randomly selected agents (\(P_{1}\) and \(P_{2}\)), or mutates (given by MutationChance, \(M\!C\)). In the case of the former, the new agent (or child, C) inherits the arithmetic mean of the parent’s virtue values (\(P(V_{1}, V_{2}, ... V_{n})\)) such that \(C(V_{1}) = \frac{P_{1}(V_{1}) + P_{2}(V_{1})}{2}, C(V_{2}) = \frac{P_{1}(V_{2}) + P_{2}(V_{2})}{2}\), and so on. In the case of mutation, the virtue values are instead assigned randomly (between \(-1\) and 1).

3 Results

Table 2 Parameter values used in the experiments
Table 3 Results showing the mean death rate and standard deviation at iteration 1000 of ten repeated experiments in seven different conditions. \(N\!L =\) baseline without learning nor eudaimonia, \(S =\) selfish e-type, \(P\!/\!B =\) praise/blame e-type, \(S\!/\!S =\) selfish/selfless e-type. E denotes additional learning from exemplars
Fig. 5
figure 5

Results showing the average virtue values of the entire population and death rate (red) in seven different experiments with 100 agents over 1000 iterations. Death rate is given by \(\frac{TD}{CI} \div 5\), where TD counts the total deaths, CI is the current iteration, and 5 is simply a factor used for easier comparison in the graphs (Color figure online)

We conducted several simulation experiments based on the model. Besides the three e-types described in 2.3.3, additional experiments were performed with agents without any learning nor e-type so as to provide a baseline that only captures the genetic drift (referred to as \(N\!L\) in Table 3). Each experiment ran for 1000 iterations with non-mixed populations fixed at 100 agents. The values of the experimental parameters are given in Table 2. Parameter values for FC (chance of falling) and FV (food value) were fine-tuned to create an equal balance between starvation and drowning. Figure 5 shows the average death rate and virtue values of the entire population at iteration 1000 in seven different experimental conditions: (a) baseline without learning, (b) selfish e-type, (c) selfish e-type with moral exemplars, (d) praise/blame e-type, (e) praise/blame e-type with moral exemplars, (f) selfish/selfless e-type, and (g) selfish/selfless e-type with moral exemplars. Table 3 shows the mean death rate and standard deviation at iteration 1000 of ten repeated experiments using the same conditions.

The results show that the selfish-selfless hybrid (\(S\!/\!S\)) was most successful in terms of mean death rate and achieved an even lower death rate with the use of exemplars (\(S\!/\!S + E\)). The use of moral exemplar had, however, the opposite effect for agents with praise/blame (\(P\!/\!B\)) and selfish (S) e-types. While exemplars appear to have a stabilizing effect for \(S\!/\!S\), it generated more volatile fluctuations in the virtue values for \(P\!/\!B\) [Fig. 5 (e) and (g)]. As expected, the selfish e-type had the highest death rate, since the selfish agents did not try to rescue others regardless of risk, nor share any food regardless of surplus. A similar effect can be noticed in the evolutionary condition (\(N\!L\)). Somewhat more unexpected is the result from the repeated experiments (Table 3), showing that \(P\!/\!B\) only performed marginally better than agents who did not receive any learning at all (\(N\!L\)). This can partly be explained by the fact that \(P\!/\!B\) promotes sacrificial recklessness and selflessness in a relatively ruthless environment. For the \(N\!L\) agents, on the contrary, the most sacrificial behavior disappears due to the evolutionary pressure of the simulation; the most sacrificial agents have a higher chance of dying and will therefore not reproduce as much (in 5 the average courage value for \(N\!L\) after 1000 iterations was \(-0.45\) and \(-0.28\) for generosity).

4 Discussion

The experimental results show how the AVAs learn to tackle cooperation problems while exhibiting some core features of their theoretical counterpart, including moral character (in terms of dispositional virtues), learning from experience (through phronetic learning feedback), the pursuit of eudaimonia (increase e-value in light of e-type), and learning from moral exemplars (by copying the virtues of excellent others). More importantly, it illustrates how virtue ethics can be used to conceptualize and implement traits that support and foster ethical behavior and moral learning for artificial moral agents. Beyond the presented implementation, we believe that the development of more refined and sophisticated systems based on virtue ethics can be guided by our framework. In essence, virtue ethics offers an appealing recipe for AMAs situated in complex and dynamic environments where learning provides the most suitable (or only) path to moral sensitivity, a sensitivity that cannot easily be captured in rules nor in the mere maximization of some utility.

Since the training of virtues is based on outcomes, one might argue that our model is in fact consequentialism in disguise (i.e., consequentialism with the addition of learning). However, while our eudaimonist version of virtue ethics depends on increasing some identified moral goods (evaluated through outcomes), the main focus is the dispositional features of agents (their moral character) and the learning that gives rise to it (moral development), which are in turn used to yield virtuous behavior. That is, although outcomes are used to guide learning, it is the agent’s character that produces actions. One could also argue that our model is in fact deontology in disguise; that it is built using a rule-adhering system (based on the view that all computational systems are essentially rule-following systems), or that it even employs conditional rules for moral behavior (“if \(courage > 0 \rightarrow\) save agent”). But this would also miss the point. The purpose of dispositional traits is not to generate rule-following behavior per se—although this would in many cases be an attractive feature—but instead, to yield context-sensitive behavior where conditional rules are not applicable. Furthermore, while the smallest components of artificial as well as biological neural networks may be rule-following in the strict sense (neurons or nodes), the complex behavior of entire networks is not. More importantly, we do not claim that virtue ethics is in some fundamental sense superior to consequentialism or deontology as a recipe for AMAs, but rather that it emphasizes features of morality that are often ignored in machine ethics. An artificial agent would, after all, only be virtuous if it respected important moral rules and was considerate of the outcomes of its actions. Following Tolmeijer et al. (2020), we believe that a hybrid approach to AMAs would be the most attractive option, since it could utilize the combined strengths of the three dominant theories. With that said, since virtue ethics paints a more comprehensive picture of what a moral character in fact is, we believe it offers an appealing blueprint for how different theories could be integrated into a unified whole.

However, many issues remain to be resolved before AVAs can enter the moral realms of our everyday lives. One might, for instance, question whether and to what extent our model and experiments bear any relevant resemblance to the complex moral domains humans face in the real world. The viability of our framework rests on a number of non-trivial assumptions: that (i) there are accurate map**s between input, virtues, action, outcome, and feedback; connections that can be immensely more difficult to establish in real-world applications, (ii) conflicts between two or more virtues are dealt with in a satisfactory manner; an issue we ignore by only relating one type of input to one particular virtueFootnote 12, (iii) virtues can be adequately represented as a mean between two extremesFootnote 13, and (iv) relevant moral goods and values can be formalized and quantified in terms of e-type and e-values. One might also question the use of multi-agent simulations as a tool to develop and study artificial moral behavior. For instance, the evaluation of our experiments depends on death rate, a metric which can only be attained at system level and not by the agents themselves. In reality, such system-level metrics might be hard to come by. Furthermore, to simply focus on seemingly crude utilitarian metrics such as death rate might lead one to ignore other important values—e.g., equality, freedom, respect, and justice—that deserve moral consideration on their own (and not in terms of how they support a low death rate). More generally, the game-theoretic setting also points to a more problematic meta-ethical issue regarding the function and evaluation of moral behavior, and implicitly, about the nature of morality itself. Since evolutionary game theory provides a functional explanation of morality as “solving the problem of cooperation among self-interested agents,” it naturally lends itself to system-level utilitarian evaluation (in the way evolution is driven by “survival of the fittest”). But it should be stressed that this explanation has critical limitations and is one of several possible alternatives. For instance, while evolutionary game theory can describe the “low-level” function of morality in deflationary terms, it fails to account for the specific cognitive resources that some species evolved to meet their specie-specific challenges and coordination problems (e.g., social-psychological capacities such as emotions and empathy); nor does it by itself offer any clear insight into the moral resources and norms that cultures developed to meet their culture-specific challenges (e.g., norms involving guilt, shame, or justice). The main point is that, while the evaluation of our virtuous agents depends on system-level prosperity in utilitarian terms (in the sense that virtuous behavior supports system-level preservation), this is not the only possible route for evaluating virtuous behavior. For implementations intended to enact other dimensions of ethical behavior, it might, for a variety of reasons, be more suitable to use “human-in-the-loop” forms of evaluationFootnote 14.

A further limitation with our experiment is that it treats events and processes as discrete and sequential, as opposed to the continuous and parallel processes that characterize non-virtual realityFootnote 15. These type of issues echo more profound challenges in real-world applications of RL; while RL methods continue to excel in virtual settings, these advancements are often difficult to translate into real-world environments (Dulac-Arnold et al. 2021). Similarly, the performance of AVAs in multi-agent systems could in various ways be explored via methods that are explicitly aimed at fostering collective good (Wang et al. 2020). To investigate value alignment, AVAs could for instance be trained to face sequential moral dilemmas (Rodriguez-Soto et al. 2020), a special type of Markov games that are solved by finding ethically aligned Nash equilibria. Recent work in policy regularization (Maree and Omlin 2015). To that end, regardless of whether sophisticated artificial morality will ever become reality, we might get new insights about the nature of human morality by approaching ethics from the computational perspective (Pereira et al. 2021).

So what does the future hold for artificial virtuous agents? In the short-term, we believe AVAs could find suitable experimental applications in various classification tasks such as processing of moral data, action selection, and motor control in simple robotic settings, learning of situations and outcomes (through SL or RL), hybrid models tackling the challenge of conflicting values and virtues, or in further multi-agent simulations and game-theoretic models in order to study social and moral dilemmas. In the mid-term, AVAs could potentially be used to tackle complex but domain-specific problems in real-world environments, e.g., to balance value priorities for self-driving vehicles, or help social robots to develop social sensitivity. In the long term, AVAs might be accepted as rightful members of our moral communities (as argued by Gamez et al. (2020)). In the most optimistic vision, AVAs might help us to reveal novel forms of moral excellence and, as a result, become moral exemplars to humans (Giubilini and Savulescu 2018).

5 Conclusion

In this paper, we have shown how core tenets of virtue ethics can be implemented in artificial systems. As far as we know, it is the first computational implementation of AMAs that solely focus on virtue ethics. Essentially, we believe virtue ethics provides a compelling path toward morally excellent machines that can eventually be incorporated in real-world moral domains. To that end, we hope that our work provides a starting point for the further conceptual, technical, and experimental refinement of artificial virtuousness.