Preferences, risk neutrality and risk-sensitive MDPs

Alexander, James; Sobel, Matthew J.

doi:10.1007/s10479-024-06020-6

Preferences, risk neutrality and risk-sensitive MDPs

Original Research
Open access
Published: 06 May 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Annals of Operations Research Aims and scope Submit manuscript

Preferences, risk neutrality and risk-sensitive MDPs

Download PDF

200 Accesses
Explore all metrics

Abstract

A binary preference relation on a real vector space satisfying four (natural) axioms is shown to induce a utility function composed of a linear function to the reals and a weakly monotonic function. The key axiom is decomposition, and the utility function can be taken to be linear if and only if this axiom’s converse is also satisfied. Important consequences follow for risk-sensitive discounted Markov decision processes, decision trees, and the discounted utility model in economics. Since the four axioms imply that preferences correspond to discounting, the four axioms without the converse imply that preferences are consistent with discounting without risk neutrality.

Multidimensional risk aversion: the cardinal sin

Article Open access 17 August 2022

Concavity, stochastic utility, and risk aversion

Article 27 January 2021

Weighted sets of probabilities and minimax weighted expected regret: a new approach for representing uncertainty and making decisions

Article 11 December 2014

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

John von Neumann and Oscar Morgenstern (vN-M) presented the expected utility theorem in Theory of Games and Economic Behavior (von Neumann & Morgenstern, 1953) which is one of the most important scientific books of the twentieth century. The theorem showed that if a decision-maker (DM) has preferences which are consistent with a few axioms regarding a set V of scalar-valued random variables (r.v.s), and if the DM must choose one r.v. in V, then there is a real-valued function U (a “von Neumann-Morgenstern utility function”) such that the DM chooses an element of $\arg \max \{{\mathbb {E}}[U(v)]: v\in V\}$ in which ${\mathbb {E}}$ denotes expected value. The extent of curvature of U is important in applications of the expected utility theorem. In particular, if U is affine and ${\mathbb {E}}(v)={\mathbb {E}}(v')$, then v and $v'$ are preference-equivalent and the DM is said to be risk neutral. The presence or absence of curvature is a major consideration in this paper.

VN-M intended to construct a theory of the foundations of economic behavior, and indeed the book precipitated a transformation of economics. However, the impacts of the expected utility theorem (and of the book in general) have also been extensive outside economics. Other affected areas include operations research, mathematics, statistics, political science and much else. The two-volume handbook of utility theory (Barbera et al., 1998) is a good snapshot at the end of the twentieth century. A generalization (which the handbook does not address in detail) replaces scalar-valued r.v.s with vector-valued r.v.s to encompass preferences among entities with multiple attributes. See Rothblum (1975) (which was written mid-way between von Neumann and Morgenstern, 1953; Barbera et al., 1998) and the more recent (Miyamoto & Wakker, 1996).

The present paper is concerned with a generalization of the expected utility theorem in which the elements of set V, instead of being scalar-valued r.v.s, are an abstraction of vector-valued r.v.s with time-indexed components which can themselves be vectors, i.e., vector-valued stochastic processes. Such a generalization arises naturally with Markov decision processes (MDPs) and raises the question of whether risk preference and intertemporal preference are related. This paper answers a version of this question in an abstract framework that encompasses MDPs, decision trees and the discounted utility model in economics.

Each decision rule and initial state in an MDP induces a real-valued stochastic process of successive single-period rewards. A decision rule induces a vector-valued stochastic process of single-period rewards in which the components of the vectors correspond to different initial states. Therefore, a preference relation among vector-valued stochastic processes induces a selection criterion among decision rules.

Confining attention temporarily to a specific initial state, let $X_1, X_2, \ldots $ be the sequence of real-valued random variables that are single-period rewards; the present value of the rewards is $B:=\sum _{t=1}^{\infty } \beta _t X_t$ in which $\beta _1, \beta _2, \ldots $ is a sequence of discount factors. Most of the fifty year literature on risk-sensitive discounted MDPs seeks a decision rule that maximizes E[W(B)] in which W is an increasing function that is generally nonlinear, and E denotes expected value. If W is affine, then preferences are said to be risk neutral. Frequently, $W(b)=-exp(\lambda b) $ ($\lambda < 0$) and $\beta _t=\beta ^{t-1}$ for each t ($\beta \in (0,1]$), which are called exponential utility and geometric discounting, respectively, but we do not impose either restriction. We ask a generalization of this question: if preferences among stochastic processes correspond to comparisons of present values, then what further assumptions imply the existence of nonlinear W?

The answer is significant too for the analysis of decision trees, and for the discounted utility model of economics, which has an even longer history. Although the axiomatic foundations of that model are deterministic (Koopmans, 1960; Koopmans et al., 1964; Williams & Nassar, 1966; Koopmans, 1972), it is widely employed in stochastic settings. It is useful to imbed questions about MDPs, decision trees, and the discounted utility model in a general abstract setting and, thus, to obtain answers that encompass multiple types of models. In particular, we ask (and answer): Under what conditions in these models do preferences over time and under risk correspond to discounting without risk neutrality?

We concern ourselves with preference orderings on sets, with particular attention to the presence or absence of risk neutrality, and the resulting representation of utility functions. We suppose a weak ordering on a real vector space with minimal basic properties. If the weak ordering can be represented by a utility function U, various definitions of risk neutrality are equivalent to the affineness of U (Keeney & Raiffa, 1976). However, we follow Miyamoto and Wakker (1996) in emphasizing the weak ordering as the primary object rather than the utility function. This approach elucidates a number of points. We develop results in a context where there are infinitely many utility functions corresponding to any ordering, some of which may be risk neutral and some not. This is another justification for the widely held belief that preference relations are a more basic concept than utility functions. Utility functions that are not risk neutral have been used since Daniel Bernoulli’s treatment of the St. Petersburg paradox; for examples see Garber and Phelps (1997), Krysiak and Krysiak (2006), Markowitz (1959), Pliskin et al. (1980), and Rubinstein (1976).

In our context a utility function that is not risk neutral can be replaced by one that is risk neutral and yet be consistent with the same preference relation. A simple consequence of our treatment is to clarify the situation completely, and to characterize those preference relations for which there is a risk-neutral utility function. Roughly speaking, a preference relation (i.e., a weak ordering) has a risk-neutral utility function if and only if it does not have any “indifference regions.” We note that we make no topological assumptions on the space of preferences, yet we obtain sufficient conditions for the existence of a utility function. We note that similar results could be derived, mutatis mutandis, for mixture spaces (Hausner, 1954; Herstein & Milnor, 1953).

In the next section we set the abstract framework and state the key result, and in the following sections, explicate the concrete implementations for risk-sensitive MDPs, decision trees, and discounted utility. The proof of the key result follows a discussion section.

2 Axioms, definitions, and main result

Our main purpose is to elicit the structure of a utility function when a binary preference relation $\succeq $ on a real vector space V (with elements X, Y, etc.) with zero $\textbf{0}$ satisfies the following axioms:

(A1)
rationality: $\succeq $ is a weak ordering (preorder; reflexive, transitive, complete) on V; $\succ $ denotes the associated strong ordering and $\sim $ the associated equivalence,
(A2)
decomposition: $X-Y\succeq (\preceq )\; \textbf{0}$ implies $X\succeq (\preceq )\; Y$,
(A3)
continuity: for any X, $Y\in V$, the sets $\{\alpha \in \Re :\alpha X -Y \succeq (\preceq )\; \textbf{0}\}$ are closed,
(A4)
non-triviality: there exists an element $X_0\in V$ such that $X_0\succ \textbf{0}$.

A key concern is the presence or absence of the converse of decomposition:

(A2$^{\,c}$):: $X\succeq (\preceq )\; Y$ implies $X-Y\succeq (\preceq )\;\textbf{0}$.

A pseudo-utility function is a function $u:V\rightarrow \Re $ such that $u(X)\ge u(Y)$ implies $X\succeq Y$, or contrapositively, $X\succ Y$ implies $u(X)>u(Y)$. (This definition differs slightly from those in Candeal et al. (1998), Peleg (1970) and Subiza and Peris (1998).) A pseudo-utility function can be overly discriminating; that is, it may be that $X\sim Y$, but $u(X)\ne u(Y)$. A utility function is a pseudo-utility function $U:V\rightarrow \Re $ such that $X\succeq Y $ if and only if $U(X)\ge U(Y)$.

Theorem 1

For every ordering satisfying (A1)–(A4), there exists a utility function of the form $U=f\circ u:V\rightarrow \Re $ in which $u:V\rightarrow \Re $ is a linear pseudo-utility function. Also, $f:\Re \rightarrow \Re $ is weakly monotonic and can be taken to be linear if and only if (A2$^{\,c}$) holds.

There is a partial converse, discussed below. The immediate decision-theoretic consequence of the theorem is that preferences satisfying the four axioms are risk neutral if and only if the converse (A2$^{\,c}$) of decomposition is satisfied too. Only sufficiency in a restricted setting had previously been established (Sobel, 2013).

3 Decision trees and risk-sensitive discounted MDPs

The decision tree model dates at least from Raiffa and Schlaifer (1961). Raiffa (1968) notes that the computational analysis of a decision tree model is a dynamic programming algorithm, and there is a chapter on models in which W is nonlinear. Nonlinearity became standard practice in the field of decision analysis (cf. Kirkwood, 2014).

The literature on risk-sensitive MDPs was initiated by Howard and Matheson (1972). See Bäuerle and Rieder (2014) and Denardo and Rothblum (2006) for many citations to the subsequent literature.

Let I denote the natural numbers and let V be the set of stochastic sequences $X=(X_1,X_2,\ldots ,)$ defined on a probability space $(\Omega ,F,P)$ with $X_t(\omega ) \in \Re ^M$ for all $(t,\omega ) \in I\times \Omega $. For $X \in V$, $Y \in V$, and $b \in \Re $, define $X+Y \in V$ and $bX \in V$ with component-wise addition and multiplication, respectively. Let $\theta $ be the zero vector in $\Re ^M$, so V is a real vector space with zero element $\textbf{0}=(\theta ,\theta ,\ldots ) \in V$. Let S be the set of random vectors with sample space $\Re ^M$, and for $C \in S$ denote $(C,\theta ,\theta ,\ldots )\in V$ as $(C,\textbf{0})$. Then $(V,\succeq )$ induces a preference relation $\underline{>}\!\!\!\!>$ on S: $A \underline{>}\!\!\!\!>B$ if and only if $(A,\textbf{0}) \succeq (B,\textbf{0})$ ($(V,\succeq )$ induces many other preference relations on S as well). Let $e_m$ be the $m^{th}$ unit vector in $\Re ^M$ and $e_{mt} \in V$ be the perturbation of 0 where $e_m$ replaces the $t^{th}$ $\theta $.

Let $W:S\rightarrow \Re $ and $\beta _t\in \Re $ for all $t\in I$. In a risk-sensitive discounted MDP, the score of a decision rule is $E[W(\sum _{t=1}^{\infty } \beta _tX_t)]$, which is an important two-fold reduction in complexity. First, comparisons of stochastic processes are reduced to comparisons of elements of S (present values are elements of S). In an MDP with a given initial state, let X and Y be the stochastic processes of rewards induced by two decision rules. The first reduction is

$$\begin{aligned} X \succeq Y \Leftrightarrow \sum _{t=1}^{\infty } \beta _tX_t \ \underline{>}\!\!\!\!>\sum _{t=1}^{\infty } \beta _tY_t \end{aligned}$$

for which the weakest known sufficient conditions are (A1)–(A4) (Sobel, 2013).

The second reduction of complexity compares elements in S via comparisons of real vectors (the expectations at various initial states): for all $A, C \in S$ for which E[W(A)] and E[W(C)] exist,

$$\begin{aligned} A \ \underline{>}\!\!\!\!>C \Leftrightarrow E[W(A)] \ge E[W(C)]. \end{aligned}$$

Given sufficient conditions (A1)–(A4) for the first reduction, it follows from the theorem that the second reduction can occur with W nonlinear if and only if $(A2^c)$ fails to hold.

4 Discounted utility model

The discounted utility model is

$$\begin{aligned} X \succeq Y \Leftrightarrow E[\Sigma _t \beta _tg(X_t)] \ge E[\Sigma _t \beta _tg(Y_t)] \end{aligned}$$

(1)

where $X_t=(X_{1t},\ldots ,X_{Mt}) \in S$, $g:\Re ^M\rightarrow \Re $ is an intra-period utility function, and the $\beta _t \in \Re $ are discount factors. An intra-period utility function satisfies the definition of a utility function except that $(\underline{>}\!\!\!\!>,S)$ replaces $(\succeq ,V)$. It is standard practice in economics applications to use this model to encompass both time preference and attitude towards risk. It was suggested in Samuelson (1937) (which was foreshadowed by Ramsey, 1928) in a deterministic setting where it was axiomatized by Koopmans (1960, 1972), Koopmans et al. (1964) whose postulates include (A1)–(A4) and $(A2^c)$. In the stochastic case, the model emerges from results concerning multiattribute preference orderings and utility functions that are unified and generalized by the axiomatization in Miyamoto and Wakker (1996) which includes (A1)–(A4) and $(A2^c)$. Thus, it follows from the theorem that these axiomatizations of the discounted utility model imply that there is a utility function of the form $U=f\circ u:V\rightarrow \Re $ in which f is affine, i.e. preferences among stochastic processes are risk neutral. Specifically, given (A1)–(A4), g in the discounted utility model is non-affine if and only if $(A2^c)$ is not valid.

The question asked in §1 was: Under what conditions do preferences over time and under risk correspond to discounting without risk neutrality? Our answer is: axioms (A1)–(A4) without (A2$^{\,c}$).

The comparison in (1) is additively separable, so the separability literature (Blackorby et al., 1998) is relevant but does not address the structure of the separable terms. Extensive and largely unrelated literatures either take the intra-period utility function as given and discuss the choice of a discount factor (e.g., Arrow et al., 1995; Lind et al., 1982; Portney & Weyant, 1999), or investigate the existence and properties of the intra-period utility function (e.g., Mehta, 1998). The relationship of the issues in this paper to the intertemporal resolution of uncertainty (e.g., Johnson & Donaldson, 1985; Kreps & Porteus, 1979; Machina, 1989) is unclear.

5 Discussion

Here we note some aspects of the axioms, and discuss examples, applications, and variations.

5.1 Topology

Some natural applications involve infinite-dimensional V, so we make no dimensional assumption. Past axiomatic constructions of utility functions on uncountable outcome sets have followed two routes (Fishburn, 1979). Some endow $(V, \succeq )$ with a topology and posit a continuity axiom (Bridges & Mehta, 1995). Another approach, if the dimension of V is at least two, is algebraic and is based on the assumption that outcomes that differ in some dimensions can be offset with compensating differences in other dimensions (Luce & Tukey, 1964). We make no assumption regarding compensating differences, and the continuity axiom makes no topological assumptions. Note that (A3) refers only to the topology of the real numbers, and V is not endowed with a topology. Of course, any finite-dimensional vector space over $\Re $ carries a unique natural (product) topology, but this topology is not pertinent to our discussion.

5.2 Utility functions

At least two definitions of a utility function are current in the literature. Let Q and T be binary relations on sets V and B, respectively. A function $g: V \rightarrow B$ is an order homomorphism if $XQY \Rightarrow g(X)Tg(Y)$ for all $X,Y \in V$, and it is an order isomorphism if $XQY \Leftrightarrow g(X)Tg(Y)$ for all $X,Y \in V$. When $B=\Re $, a utility function has recently been defined as an order homomorphism (e.g., Bridges & Mehta, 1995, page 5; Mehta, 1998; Vind, 2003) and as an order isomorphism (e.g., Bridges and Mehta, 1995, page 27; Ok, 2007). The extant existence proofs with both definitions use topological properties of V (cf. references cited in this paragraph). In this paper, we say that preferences are risk neutral if there is a utility function $U=f\circ u:V\rightarrow \Re $ in which $u:V\rightarrow \Re $ is a linear pseudo-utility function and $f:\Re \rightarrow \Re $ is linear.

5.3 Partial orderings

A real vector space with a binary relation is said to be partially ordered if it has a cone property that $x \succeq y$ implies $\alpha x \succeq \alpha y$ for all $\alpha \ge 0$, and it satisfies antisymmetry, (A1), (A2), and (A2$^{\,c}$). There exists a linear pseudo-utility function $u:V\rightarrow R$ if the vector space V is partially ordered and has additional properties (Hausner, 1954; Hausner & Wendel, 1952). However, we do not assume that V is partially ordered and the effect of the absence or not of (A2$^{\,c}$) is a major point of interest. The theorem yields the existence of a linear pseudo-utility function without requiring (A2$^{\,c}$). It follows from part 4 of the lemma in the proof that the cone property is redundant if a partially ordered vector space (Aliprantis, 1989; Hausner, Hausner (1954); Hausner & Wendel, 1952; Peressini, 1967) satisfies (A3).

5.4 Examples

Note that a non-trivial linear function $u:V\rightarrow \Re $ defines an order $\succeq _u$, via $X\succeq _u Y$ if $u(X)\ge u(Y)$. This order satisfies (A1)–(A4) and also (A2$^{\,c}$). On the other hand, consider the following examples.

Let $V=\Re $ and define $X\succeq Y$ if $X\ge Y$ or if $X\ge 1$. This order satisfies (A1)–(A4), but not (A2$^{\,c}$). However, $U(X)=\min \{X,1\}$ is a utility function.
This second example is the same as above, except also $1\prec X$ for all $X>1$. It too satisfies (A1)–(A4), but not (A2$^{\,c}$), and it has the following utility function which jumps at 1: $U(X)=X$ if $X\le 1$, and $U(X)=2$ if $X>1$.

We see below that these examples are prototypical. In particular, how an order fails (A2$^{\,c}$) is made apparent. In fact, the second example is somewhat exotic and does not occur if (A3) is strengthened slightly as follows:

(A3’)
the sets $\{\alpha \in \Re :\alpha X -Y \succeq (\preceq )\; Z\}$ are closed for any X, Y, $Z\in V$.

The second example above does not satisfy (A3’) with $Y=0$, $X=1$, $Z=2$; the set in (A3’) is the open half-line $(1,\infty )$.

5.5 Variants

D. Turcic (unpublished) has suggested replacing (A2) and/or (A2$^{\,c}$) with

(A$\tilde{2}$):: monotonicity: For all X, $Y\in V$ such that $X-Y\succeq 0$ and $\alpha $, $\beta \in [0,1]$, $\alpha \ge \beta \Leftrightarrow $
$$\begin{aligned} \alpha X+(1-\alpha )Y\succeq \beta X+(1-\beta ) Y \end{aligned}$$
(2)

We note that (A$\tilde{2}$) is stronger than the combination of (A2) with $(A2^c)$ (let $\alpha =1$, $\beta =0$). Indeed, (A$\tilde{2}$) is strictly stronger than the combination; the first example above satisfies (A2) but not (A$\tilde{2}$) (let $X=2$, $Y=0$, $\alpha =.9<1=\beta $). On the other hand, (A1)–(A4) and (A2$^{\,c}$) imply (A$\tilde{2}$). For by our theorem, (A1)–(A4) and (A2$^{\,c}$) imply there is a linear utility function $U:V\rightarrow \Re $, in which case (2) is equivalent to $(\alpha -\beta )U(X-Y)\ge 0$. We note that the following variant of (A$\tilde{2}$) implies (A2$^{\,c}$):

(A$\tilde{\tilde{2}}$):: For all X, $Y\in V$ such that $X-Y\succ 0$ and $\alpha $, $\beta \in [0,1]$, $\alpha >\beta $ implies
$$\begin{aligned} \alpha X+(1-\alpha )Y\succ \beta X+(1-\beta ) Y \end{aligned}$$

This variant of (A$\tilde{2}$) implies (A2$^{\,c}$) because (A2$^{\,c}$) is equivalent to its contrapositive: $X-Y\succ 0\Rightarrow X\succ Y$, and again, let $\alpha =1$, $\beta =0$.

6 Proof of main result

For use below, we restate (from Sobel, 2013) some elementary technical properties of the ordering.

Lemma

Axioms (A1), (A2), and (A3) imply the following:

1.
$W\succeq (\preceq ,\sim )\;\textbf{0}$ implies $W+Z\succeq (\preceq ,\sim )\; Z$ for all $Z\in V$,
2.
if $X\succeq (\preceq ,\sim )\;\textbf{0}$ $(\mathrm{resp. }X\succ (\prec )\;\textbf{0})$ and $Y\succeq (\preceq ,\sim )\;\textbf{0}$, then $X+Y\succeq (\preceq ,\sim )\;\textbf{0}$ $(X+Y\succ (\prec )\;\textbf{0})$,
3.
if $X\succeq (\preceq ,\sim )\;\textbf{0}$ $(\mathrm{resp. }X\succ (\prec )\;\textbf{0})$, then $-X\preceq (\succeq ,\sim )\;\textbf{0}$ $(-X\prec (\succ )\;\textbf{0})$,
4.
if $X\succeq (\preceq ,\sim )\;\textbf{0}$, then $bX\succeq (\preceq ,\sim )\;\textbf{0}$ for all real $b> 0$.

Proof

For part 1: use axiom (A2) with $X=W+Z$ and $Y=Z$. For $\sim $, combine the cases $\succeq $ and $\preceq $. For part 2: part 1 implies $X+Y\succeq X\succeq (\succ )\;\textbf{0}$, and similarly for the opposite order. The third follows immediately. For part 4: use induction on part 2 to see that $nX\succeq (\preceq )\;\textbf{0}$ for all non-negative integers n. Thus $(n/m)X\succeq (\preceq )\;\textbf{0}$ for all non-negative rationals n/m. Finally, use (A3) (with $Y=\textbf{0}$) to establish that $bX\succeq (\preceq )\;\textbf{0}$ for all non-negative b. This proves the lemma.

Proof of theorem

Given an order satisfying (A1)–(A4), we first construct a linear pseudo-utility function u. We claim that for any Y, there exists a unique real c such that $cX_0-Y\sim \textbf{0}$. First note that there is at most one such c for any Y. For if $cX_0-Y\sim \textbf{0}$, $c'X_0-Y\sim \textbf{0}$ then (Lemma, parts 3 and 2), $(c-c')X_0\sim \textbf{0}$, and (Lemma, part 4), $c-c'=0$.

Next we establish the existence of c. Any $Y\in V$ satisfies exactly one of $Y\succ \textbf{0}$, $Y\sim \textbf{0}$ or $Y\prec \textbf{0}$. If $Y\sim \textbf{0}$, set $c=0$. If $Y\succ \textbf{0}$, consider the set $A=\{\alpha : \alpha X_0-Y\succ \textbf{0}\}$. We claim $A\ne \emptyset $. For if $X_0-(1/\alpha )Y\preceq \textbf{0}$ for all large $\alpha $, then by Lemma, part 3 and (A3), $X_0\preceq \textbf{0}$, which contradicts (A4). Moreover, if $\alpha <0$, (Lemma, parts 4, 3 and 2) $\alpha \not \in A$. Thus let $c=\inf \{\alpha \in A\}\ge 0$. By (A3) $cX_0-Y\succeq \textbf{0}$. If $\alpha <c$, then $\alpha X_0-Y\preceq \textbf{0}$, so (A3 again) $cX_0-Y\preceq \textbf{0}$. Thus $cX_0- Y\sim \textbf{0}$. If $Y\prec \textbf{0}$, use the previous argument on $-Y$.

We can thus define $u(Y)=c$. We next establish that u is linear. If $cX_0-Y\sim \textbf{0}$, then (Lemma, part 4) $bcX_0-bY\sim 0$ for all real b, so that $u(bY)=bu(Y)$. Also, if $cX_0-Y\sim \textbf{0}$ and $c'X_0-Y'\sim \textbf{0}$, then (Lemma, part 2) $(c+c')X_0-(Y+Y')\sim \textbf{0}$, so that $u(Y+Y')=u(Y)+u(Y')$. Hence u is linear. We also establish that u is a pseudo-utility. If $Y\succ Y'$, then (contrapositive of A2), $\textbf{0}\prec Y-Y'=(cX_0-Y')-(cX_0-Y)$. Since $cX_0-Y\sim \textbf{0}$, (Lemma, part 2) $cX_0-Y'\succ \textbf{0}$, so $u(Y')<c$. Thus u is a pseudo-utility.

Finally, we construct a utility function U. This amounts to showing that a certain quotient of $\Re $ is again a copy of $\Re $. This seems like it should be a standard known result; however, we could not locate a reference, and include the construction for completeness. Also, we give an explicit construction, which could be useful in explicit applications.

Define an equivalence relation on $\Re $ by $x\equiv y$ if there are $X\in V$ and $Y\in V$ with $u(X)=x$, $u(Y)=y$ and $X\sim Y$. Suppose $X\sim Y$ with $u(X)\ge u(Z)\ge u(Y)$. Then $X\succeq Z\succeq Y\sim X$, so that (A1) $X\sim Z\sim Y$. That is, all elements with u values between those of X and Y, inclusive, are equivalent under the order. Thus the relation $\equiv $ is well-defined (does not depend on the choices of X and Y), and moreover, for any x, the set of y equivalent to x is a single point or an interval $I=I_x$. The idea is to define a monotonic function $f:\Re \rightarrow \Re $ such that $f(x)=f(y)$ if and only if $x\equiv y$. Then $U=f\circ u:V\rightarrow \Re $ is the desired utility function.

For numbers $x\in \Re $, denote $x\succ y$, etc., if for any X, Y with $u(X)=x$, $u(Y)=y$, $X\succ Y$, etc. By the above, this ordering is well-defined. For any interval I, let $x_I^\ell $ (for left) and $x_I^r$ (for right) denote the end points of I. Call an end point $x_I^{\ell ,r}$ regular if $x_I^{\ell ,r}\sim x$ for $x\in I^\circ $ (the interior of I), and irregular otherwise (in the examples above, $x=1$ is regular in the first case and irregular in the second). Note that a point can be the right boundary point of one interval and the left boundary point of another, and irregular with respect to either or both intervals.

We begin with the simplest case, and the only one really relevant for applications. Assume there are no irregular points and only finitely many intervals I on any bounded set of $\Re $, equivalently the lengths of the intervals I are bounded away from zero. A mathematical point is that the end points of the intervals I do not have any accumulation points. Define $g:\Re \rightarrow \Re $ by $g=0$ on the union of the intervals $I^\circ $ and $g=1$ otherwise. The function g is the density of an absolutely continuous measure with characteristic function $f(x)=\int _0^x g(y)\,dy$, and $U=f\circ u$ is the desired utility function.

More generally, we define the function f in terms of Lebesgue integrals of non-negative measures on $\Re $. For g defined as above, let $\mu _1=g\,dx$, where dx is the usual Lebesgue measure. Let $\delta (x)$ denote the delta function (as a measure) at x. Let $\mu _2=\sum \ell (I)\delta _{x_I^{\ell ,r}}$, where the sum is over all irregular end points of intervals I and $\ell (I)$ is the length of I.

Before defining $\mu _3$, an example: Let $V=\Re $. Write any real number x between 0 and 1 in trinary notation $x=\sum _{i=1}^\infty a_i 3^{-i}$, where $a_i=0$, 1 or 2. Suppose the $I^\circ $ are precisely the intervals of numbers with no ‘1’ in the expansion. The $I^\circ $ are the middle-third intervals in the construction of the Cantor set, and the complement of $\cup I^\circ $ on the interval [0, 1] is the standard Cantor set. For convenience, we suppose all the intervals I are contained in a bounded set of $\Re $. For the full case, one performs the construction below on a sequence of bounded sets; the details are left to the reader. For any (small) $L>0$, let $N(L)<\infty $ be the number of intervals with $\ell (I)\ge L$ (it is here we require the intervals to be in a bounded set). Define

$$\begin{aligned} \mu _3=\lim _{L\rightarrow 0}2^{-N(L)-1}\sum _{\ell (I)\ge L} \bigl (\delta (x_I^\ell )+\delta (x_I^r)\bigr ). \end{aligned}$$

The measure $\mu _3$ is supported on the set of accumulation points of the end points of the intervals I. In the example above, $\mu _3$ is the standard Cantor measure, and the Lebesgue integral $\int _0^x\,d\mu _3$ is the standard Cantor function on [0, 1].

Define $\mu =\mu _1+\mu _2+\mu _3$, and let

$$\begin{aligned} f(x)=\frac{1}{2}\Bigl (\int _0^{x+}\!\!d\mu +\int _0^{x-}\!\!d\mu \Bigr )=\frac{1}{2}\Bigl (\lim _{y\downarrow x}\int _0^y\!d\mu +\lim _{y\uparrow x}\int _0^y\!d\mu \Bigr ). \end{aligned}$$

It is routine to establish that f has the desired property, namely $f(x)=f(y)$ if and only if $x\sim y$. $\square $

7 Notes

If x is not an irregular endpoint of an interval, then f is the Lebesgue integral $f(x)=\int _0^x d\mu $. Thus if there are no irregular points, the function f is continuous. At each irregular point, the function f has a jump of size $\ell (I)/2$ on each side (thus the utility function of the proof is slightly different than in the example).
If there are no irregular points, and the lengths $\ell (I)$ are bounded away from zero, the function f can be made smooth ($C^\infty $), by redefining g. Details are left to the reader.
The function f is defined as a Lebesgue integral, but can be defined as a Stieltjes integral. If $\mu _2=\mu _3=0$, it can be defined as a Riemann integral.
The pseudo-utility function u is unique up to scale. It is determined up to scale by the set $\{X:X\succeq \textbf{0}\}$.
For the two examples of Sect. 5.4, the pseudo-utility functions are both the identity from $\Re $ to $\Re $.
In the construction of U from u, the number 0 is not in any interval $I_x$, otherwise (A2) would be violated. Conversely, given u, any set of disjoint intervals $I_x$, none of which contains 0, determines, via the resulting U, an order satisfying (A1)–(A4).
If (A2$^{\,c}$) obtains, then the equivalence relation $\equiv $ is trivial; $x\equiv y$ if and only if $x=y$. In this case, the utility function can be assumed linear. Conversely, if the utility function is linear, the equivalence relation $\equiv $ is trivial, and (A2$^{\,c}$) obtains.

References

Aliprantis, C. D., Brown, D. J., & Burkinshaw, O. (1989). Existence and optimality of competitive equilibria. Springer-Verlag.
Arrow, K. J., Cline, W. R., Maler, K.-G., & Munasighe, M. (1995). Intertemporal equity, discounting, and economic efficiency. In M. Munasighe (Ed.), Global climate change: Economic and policy issues (pp. 1–32). The World Bank.
Barbera, S., Hammond, P. J., & Seidl, C. (Eds.). (1998). Handbook of Utility Theory, Volume 1: Volume 2: 2004. Kluwer Academic Publishers.
Bäuerle, N., & Rieder, U. (2014). More risk-sensitive Markov decision processes. Mathematics of Operations Research, 39, 105–120.
Article Google Scholar
Blackorby, C., Primont, D., & Russell, R. R. (1998). Separability: A survey. In S. Barbera, P. J. Hammond, & C. Seidl (Eds.), Handbook of utility theory, volume 1: principles (pp. 51–92). Kluwer Academic Publishers.
Bridges, D. S., & Mehta, G. B. (1995). Representations of preference orderings. Springer-Verlag.
Candeal, J. C., Indurain, E., & Olóriz, E. (1998). Existence of additive utility on positive semigroups: An elementary proof. Annals of Operations Research, 80, 269–279.
Article Google Scholar
Denardo, E. V., & Rothblum, U. G. (2006). A turnpike theorem for a risk-sensitive Markov decision process with stop**. SIAM Journal Control Optimization, 45, 414–431.
Article Google Scholar
Fishburn, P. C. (1979). Utility theory for decision making. Robert E. Krieger Publishing Co.
Garber, A. M., & Phelps, C. E. (1997). Economic foundations of cost-effectiveness analysis. Journal of Health Economics, 16, 1–31.
Article Google Scholar
Hausner, M. (1954). Multidimensional utilities. In R. M. Thrall, C. H. Coombs, & R. L. Davis (Eds.), Decision processes (pp. 167–180). John Wiley & Sons, Inc.
Hausner, M., & Wendel, J. G. (1952). Ordered vector spaces. Proceedings of the American Mathematical Society, 3, 977–982.
Article Google Scholar
Herstein, I. N., & Milnor, J. (1953). An axiomatic approach to measurable utility. Econometrica, 21, 291–297.
Article Google Scholar
Howard, R. A., & Matheson, J. E. (1972). Risk-sensitive Markov decision processes. Management Science, 18, 356–369.
Article Google Scholar
Johnson, T. H., & Donaldson, J. B. (1985). The structure of intertemporal preferences under uncertainty and time consistent plans. Econometrica, 53, 1451–1458.
Article Google Scholar
Keeney, R. L., & Raiffa, H. (1976). Decisions with multiple objectives. John Wiley & Sons Inc.
Kirkwood, C. W. (2014). Decision Tree Primer, 2002. Available http://www.public.asu.edu/~kirkwood/DAStuff/decisiontrees/index.html.
Koopmans, T. C. (1960). Stationary ordinal utility and impatience. Econometrica, 28, 287–309.
Article Google Scholar
Koopmans, T. C., Diamond, P. A., & Williamson, R. E. (1964). Stationary utility and time perspective. Econometrica, 32, 82–100.
Article Google Scholar
Koopmans, T. C. (1972). Representation of preference orderings over time. In C. B. McGuire & R. Radner (Eds.), Decisions and organizations (pp. 79–100). North-Holland.
Kreps, D. M., & Porteus, E. L. (1979). Temporal von Neumann–Morgenstern and induced preferences. Journal of Economic Theory, 20, 81–109.
Article Google Scholar
Krysiak, F. C., & Krysiak, D. (2006). Sustainability with uncertain future preferences. Environmental and Resource Economics, 33, 511–531.
Article Google Scholar
Lind, R. C., Arrow, K. J., Corey, G. R., Dasgupta, P., Sen, A. K., Stauffer, T., Stiglitz, J. E., Stockfisch, J. A., & Wilson, R. (1982). Discounting for time and risk in energy policy. Resources for the Future Inc.
Luce, R. D., & Tukey, J. W. (1964). Simultaneous conjoint measurement: A new type of fundamental measurement. Journal of Mathematical Psychology, 1, 1–27.
Article Google Scholar
Machina, M. J. (1989). Dynamic consistency and non-expected utility models of choice. Journal of Economic Literature, 27, 1622–1668.
Google Scholar
Markowitz, H. M. (1959). Portfolio selection: Efficient diversification of investments. John Wiley & Sons.
Mehta, G. B. (1998). Preference and utility. In S. Barbera & P. J. H. C. Seidl (Eds.), Handbook of utility theory, Volume 1: Principles (pp. 1–47). Kluwer Academic Publishers.
Miyamoto, J. M., & Wakker, P. P. (1996). Multiattribute utility theory without expected utility foundations. Operations Research, 44, 313–326.
Article Google Scholar
von Neumann, J., & Morgenstern, O. (1953). Theory of games and economic behavior. Princeton University Press.
Ok, E. A. (2007). Real analysis with economic applications. Princeton University Press.
Peleg, B. B. (1970). Utility functions for partially ordered topological spaces. Econometrica, 38, 93–96.
Article Google Scholar
Peressini, A. L. (1967). Ordered topological vector spaces. Harper & Row.
Pliskin, J. S., Shepard, D. S., & Weinstein, M. C. (1980). Utility functions for life years and health status. Operations Research, 28, 206–224.
Article Google Scholar
Portney, P. R., & Weyant, J. P. (Eds.). (1999). Discounting and intergenerational equity. Resources for the Future Inc.
Raiffa, H. (1968). Decision analysis: Introductory lectures on choices under uncertainty. Addison-Wesley.
Raiffa, H., & Schlaifer, R. O. (1961). Applied statistical decision theory, division of research. Harvard Business School.
Ramsey, F. P. (1928). A mathematical theory of saving. Economic Journal, 38, 543–559.
Article Google Scholar
Rothblum, U. G. (1975). Multivariate constant risk posture. Journal of Economic Theory, 10, 309–332.
Article Google Scholar
Rubinstein, M. (1976). The strong case for the generalized logarithmic utility model as the premier model of financial markets. The Journal of Finance, 31, 1797–1818.
Article Google Scholar
Samuelson, P. A. (1937). A note on measurement of utility. The Review of Economic Studies, 4, 155–161.
Article Google Scholar
Sobel, M. J. (2013). Discounting axioms imply risk neutrality. Annals of Operations Research, 208, 417–432.
Article Google Scholar
Subiza, B., & Peris, J. E. (1998). Non-trivial pseudo utility functions. Journal of Mathematical Economics, 29, 67–73.
Article Google Scholar
Vind, K. (2003). Independence, additivity, uncertainty. Springer-Verlag.
Williams, A. C., & Nassar, J. I. (1966). Financial measurement of Capital Investments. Management Science, 12, 851–864.
Article Google Scholar

Download references

Acknowledgements

The authors have benefited from comments and suggestions by a number of interested individuals. We particularly acknowledge Professors Vera Tilson and Danko Turcic. The present paper is the result of revisions based on such feedback.

Author information

Authors and Affiliations

Departments of Mathematics and Cognitive Science, Case Western Reserve University, Cleveland, USA
James Alexander
Department of Operations, Weatherhead School of Management, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH, 44106-7235, USA
Matthew J. Sobel

Authors

James Alexander
View author publications
You can also search for this author in PubMed Google Scholar
Matthew J. Sobel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthew J. Sobel.

Ethics declarations

Conflict of interest

The authors have no financial or non-financial interests that are directly or indirectly related to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

James Alexander died May 19, 2021. His humor, wit, intelligence and friendship are greatly missed.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Alexander, J., Sobel, M.J. Preferences, risk neutrality and risk-sensitive MDPs. Ann Oper Res (2024). https://doi.org/10.1007/s10479-024-06020-6

Download citation

Received: 28 November 2023
Accepted: 18 April 2024
Published: 06 May 2024
DOI: https://doi.org/10.1007/s10479-024-06020-6

Preferences, risk neutrality and risk-sensitive MDPs

Abstract

Similar content being viewed by others

Multidimensional risk aversion: the cardinal sin

Concavity, stochastic utility, and risk aversion

Weighted sets of probabilities and minimax weighted expected regret: a new approach for representing uncertainty and making decisions

1 Introduction

2 Axioms, definitions, and main result

Theorem 1

3 Decision trees and risk-sensitive discounted MDPs

4 Discounted utility model

5 Discussion

5.1 Topology

5.2 Utility functions

5.3 Partial orderings

5.4 Examples

5.5 Variants

6 Proof of main result

Lemma

Proof

Proof of theorem

7 Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Preferences, risk neutrality and risk-sensitive MDPs

Abstract

Similar content being viewed by others

Multidimensional risk aversion: the cardinal sin

Concavity, stochastic utility, and risk aversion

Weighted sets of probabilities and minimax weighted expected regret: a new approach for representing uncertainty and making decisions

1 Introduction

2 Axioms, definitions, and main result

Theorem 1

3 Decision trees and risk-sensitive discounted MDPs

4 Discounted utility model

5 Discussion

5.1 Topology

5.2 Utility functions

5.3 Partial orderings

5.4 Examples

5.5 Variants

6 Proof of main result

Lemma

Proof

Proof of theorem

7 Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation