1 Forward

This manuscript is an expanded, updated, and more detailed version of the authors’ earlier rather similar article Following the Trail of the Operator Geometric Mean [28]. Over twenty years ago the authors published an article [22] in the Monthly that treated in some depth the two-variable matrix geometric mean, an article that has been rather widely read and cited. In this paper we seek to describe the significant advancement and broad generalization of the theory that has taken place in these twenty plus years.

2 Introduction

The problem of “squaring a rectangle” is the problem of constructing the side of a square that has the same area as a given rectangle. Such a construction is given by Euclid in Book II of the Elements. If the sides of the rectangle are a and b, then the side of the square has length \(\sqrt{ab}\), the geometric mean of a and b. Because of their interest in proportions and musical ratios, the Greeks defined some 2500 years ago at least eleven different means, with the best known ones being the arithmetic, geometric, harmonic, and golden. Thus the subject of (binary) means for positive numbers or line segments has a rich mathematical lineage dating back into antiquity. The study of various means on the positive reals and their properties has continued off and on throughout the history of mathematics up to the present day.

The appropriate definition of the geometric mean for two positive definite matrices of the same size seems to have first appeared in 1975 in a paper of Pusz and Woronowicz [40]. Ando [2] provided the first systematic development of many of its basic properties and gave equivalent characterizations and also applications to matrix inequalities that are otherwise difficult to prove.

In an article [22] appearing over twenty years ago in the Monthly the authors presented eight characterizations or prominent properties of the classical geometric mean and showed how each extended to the matrix geometric mean setting, providing convincing documentation that the name “matrix geometric mean” was most appropriate. As hinted above, that theory has now advanced to a multivariable setting for both positive matrices and operators and beyond, as we will trace out.

Positive definite matrices have become fundamental computational objects in many areas of engineering, statistics, quantum information, applied mathematics, and elsewhere. They appear as “data points” in a diverse variety of settings: covariance matrices in statistics, elements of the search space in convex and semidefinite programming, and kernels in machine learning. A variety of computational algorithms have arisen for approximation, interpolation, filtering, estimation, and averaging. Our interest focuses on the last named, the process of finding an average or mean, which is again positive definite. This problem appears, for instance, in applications to elasticity [35], radar signal processing [5, 29], medical imaging [6, 12] and image processing [39].

In recent years it has been increasingly recognized that the Euclidean distance is often not the most suitable for the space \(\mathbb P\) of positive definite matrices and that working with the appropriate geometry does matter in computational problems; see e.g. [4, 36]. The matrix geometric mean grows out of the geometric structure of \(\mathbb P\), which makes it a particularly suitable averaging tool in a variety of settings.

3 Positive definite matrices

Let \({\mathcal {M}}_m({\mathbb C})\), or simply \({\mathcal {M}}_m\), denote the set of \(m\times m\) complex matrices. We may identify \({\mathcal {M}}_m\) with the set of linear operators on \({\mathbb C}^m\), where we consider \({\mathbb C}^m\) to be a complex Hilbert space of column vectors with the usual hermitian inner product. Denoting the conjugate transpose of \(A\in {\mathcal {M}}_m\) by \(A^*\), we recall that A is hermitian if \(A=A^*\) and unitary if \(A^*=A^{-1}\). The hermitian matrix A is positive definite if \(\forall u\ne 0,\,\langle u,Au\rangle >0\). These notions readily generalize to \({\mathcal {B}}(H)\), the algebra of linear operators on an arbitrary Hilbert space.

The following are well-known equivalences for a hermitian matrix A to be positive definite (with the definition appearing first):

  1. (1)

    \(\langle Ax,x\rangle >0\) for all \(0\ne x,\) where \(\langle \cdot ,\cdot \rangle \) is the Hilbert space inner product on \({{\mathbb {C}}}^m\).

  2. (2)

    \(A=BB^*\) for some invertible B.

  3. (3)

    A has only positive eigenvalues.

  4. (4)

    \(A=\exp B=\sum _{k=0}^\infty B^k/k!\) for some (unique) hermitian B.

  5. (5)

    \(A=UDU^*\) for some unitary U and diagonal D with positive diagonal entries.

The positive definite \(m\times m\)-matrices form an open cone in \({\mathbb {H}}_m\), the space of \(m\times m\) hermitian matrices, with closure the positive semidefinite matrices (equivalently, \(\langle Ax,x\rangle \ge 0\) for all x). We denote the open cone of positive definite matrices by \(\mathbb P\) (or \(\mathbb P_m\) if we need to distinguish the dimension). The exponential map \(\exp :{\mathbb {H}}\rightarrow \mathbb P\) is an analytic diffeomorphism with inverse analytic diffeomorphism \(\log : \mathbb P\rightarrow {\mathbb {H}}\).

Every positive definite (resp. hermitian) matrix operator has a unique spectral decomposition

$$\begin{aligned} A=\sum _{i=1}^k\lambda _iE_i, \end{aligned}$$

where the \(\lambda _i>0\) (resp. \(\lambda _i\in {\mathbb R})\) range over the distinct eigenvalues of A and \(E_i\) is the orthogonal projection onto the eigenspace of \(\lambda _i\). One then has

$$\begin{aligned} A^p=\sum _{i=1}^k \lambda _i^p E_i, \end{aligned}$$

from which one can easily deduce that every positive definite matrix has a unique positive definite \(p^{th}\)-root. We also note that the exponential map is given alternatively by

$$\begin{aligned} \exp A=\exp \left( \sum _{i=1}^k \lambda _i E_i\right) =\sum _{i=1}^k e^{\lambda _i}E_i, \end{aligned}$$

from which we can quickly deduce the equivalence of items (3) and (4) in the previous list of equivalent characterizations of positive definite matrices.

We define a partial order (sometimes called the Loewner order) on the vector space \({\mathbb {H}}_m\) of hermitian matrices by \(A\le B\) if \(B-A\) is positive semidefinite. We note \(0\le A\) iff A is positive semidefinite and write \(0<A\) if \(A\in \mathbb P\) iff A is positive definite. The matrix A is sometimes called strictly positive in this setting.

For any invertible \(M\in {\mathcal {M}}_m({\mathbb {C}})\), the congruence transformation \(\Gamma _M(X)=MXM^*\) is an invertible linear map on \({\mathcal {M}}_m\) that carries each of \({\mathbb {H}}\), \(\mathbb P\), and \({\overline{\mathbb P}}\), the convex cone of positive semidefinite matrices, onto itself. It follows that congruence transformations preserve the Loewner order on \({\mathbb {H}}\): \(A\le B\) implies \(MAM^* \le MBM^*\) for M invertible. We note, in particular, for each \(M\in \mathbb P\), \(\Gamma _M(X)=MXM\) is a congruence transformation on \(\mathbb P\). Matrix inversion \(A\mapsto A^{-1}\) maps \({\mathbb {P}}\) onto itself and reverses the Loewner order, as we shall see later.

The geometry of \(\mathbb P\) will be crucial in what follows. One important approach to geometry is that of Felix Klein’s Erlangen Program, which emphasized the importance of the group of transformations or “symmetries” that preserved basic geometric properties. For the study of \(\mathbb P\) this group, denoted \(G(\mathbb P)\), is the one generated by the congruence transformations and the inversion map, which acts as the point reflection through the identity matrix.

Remark 3.1

For \(A,B\in \mathbb P\), by the previous item (5), there exists a unitary U such that \(U(A^{-1/2}BA^{-1/2})U^*=D\), a diagonal matrix, and hence \(\Gamma _{UA^{-1/2}}\) carries A to I and B to D. This observation allows various results about \(A,B\in \mathbb P\) to be reduced to the case that \(A=I\) and B is a diagonal matrix.

The arithmetic and harmonic means readily extend from \({\mathbb {R}}^{>0}:=(0,\infty )\) to the set \(\mathbb P\) of positive definite matrices:

$$\begin{aligned} {\mathcal {A}}(A,B)=\frac{1}{2}(A+B); ~~~~~~{\mathcal {H}}(A,B)=2(A^{-1}+B^{-1})^{-1}. \end{aligned}$$

The geometric mean is not so obvious (e.g., \(\sqrt{AB},\) the square root of AB with positive eigenvalues, need not be positive definite for AB positive definite). One approach is to rewrite the equation \(x^2=ab\) (which has positive solution the geometric mean of a and b) in its appropriate form in the noncommutative setting and solve for X:

$$\begin{aligned} XA^{-1}X= & {} B\\ A^{-1/2}XA^{-1/2}A^{-1/2}XA^{-1/2}= & {} A^{-1/2}BA^{-1/2}\\ A^{-1/2}XA^{-1/2}= & {} (A^{-1/2}BA^{-1/2})^{1/2}\\ X= & {} A^{1/2}(A^{-1/2}BA^{-1/2})^{1/2}A^{1/2}. \end{aligned}$$

Definition 3.2

The matrix geometric mean \(A\# B\) of \(A,B\in \mathbb P\) is given by

$$\begin{aligned} A\# B=A^{1/2}(A^{-1/2}BA^{-1/2})^{1/2}A^{1/2}. \end{aligned}$$

Alternatively it can be characterized as the unique positive definite solution X of the elementary Riccati equation \(XA^{-1}X=B\).

By inverting both sides \(XA^{-1}X=B\) and multiplying through by X on the right and left, one obtains \(XB^{-1}X=A\). Since the second equation is equivalent to the first, we see \(A\#B=B\#A\). Similarly one can use the Riccati equation to show that the matrix geometric mean is invariant under congruence map**s and inversion, i.e., \(M(A\#B)M^* =(MAM^*)\# (MBM^*)\) for M invertible and \((A\#B)^{-1}=A^{-1}\#B^{-1}.\) By the Riccati equation \((A\#B)(A^{-1}-B^{-1})(A\#B)=B-A\), and this along with order invariance under congruence maps shows that the inversion map is order reversing. We collect these properties together with other basic properties that can be deduced by elementary arguments; see [22].

Proposition 3.3

The following hold in \(\mathbb P\):

  1. (i)

    (Commutativity) \(A\#B=B\# A\).

  2. (ii)

    (Congruence Invariance) \(M(A\# B)M^*=MAM^*\#MBM^*\) for M invertible.

  3. (iii)

    (Inversion Invariance) \((A\# B)^{-1}=A^{-1}\#B^{-1}\).

  4. (iv)

    (Monotonicity) If \(C\le A\) and \(D\le B\), then \(C\#D\le A\#B\).

  5. (v)

    For \(AB=BA\), \(A\#B=A^{1/2}B^{1/2}=(AB)^{1/2}\); in particular \(A\#I=A^{1/2}\).

  6. (vi)

    (AGH Inequality) \(2(A^{-1}+B^{-1})^{-1}\le A\#B\le (1/2)(A+B)\).

By the Loewner–Heinz inequality \(A\#I=A^{1/2}\le B^{1/2}=B\#I\) for \(0<A\le B\). From this, using congruence invariance, one obtains the important property (iv). As mentioned in the introduction, other formulations and connections between the matrix geometric mean and the one for positive real numbers may be found in [22].

4 The Riemannian metric

The tools needed for extending the binary geometric mean on \(\mathbb P\) to a multivariable one have relied on the metric and geometric structure of \(\mathbb P\). Such considerations had already begun in the binary setting; see [22, Section 4]. We briefly overview the necessary tools in this section; see [22, Section 4] and [7, Chapter 6] for details.

We equip the space \({\mathbb {H}}\) of hermitian matrices of some fixed dimension m with the Frobenius inner product \(\langle A,B\rangle =\) Tr(AB), the trace of AB, which makes \({\mathbb {H}}\) a Hilbert space. The corresponding norm \(\Vert A\Vert _2=\langle A,A\rangle ^{1/2}\) is called the Frobenius or Hilbert–Schmidt norm. We can write \(A=UDU^*\) for some unitary U, where D is a diagonal matrix with entries the eigenvalues of A. We now observe that

$$\begin{aligned} \Vert A\Vert _2=({\textrm{Tr}}(UDU^*UDU^*))^{1/2}=({\textrm{Tr}}D^2)^{1/2}=\left( \sum _{i=1}^m \lambda _i^2(A)\right) ^{1/2}, \end{aligned}$$
(4.1)

where \(\{\lambda _i\}_{i=1}^m\) is the set of eigenvalues of A.

We define the Riemannian distance \(\delta \) on \(\mathbb P\) by

$$\begin{aligned} \delta (A,B)=\Vert \log (A^{-1/2}BA^{-1/2})\Vert _2=\left( \sum _{i=1}^m \log ^2\lambda _i(A^{-1/2}BA^{-1/2})\right) ^{1/2},\qquad \nonumber \\ \end{aligned}$$
(4.2)

where \(\{\lambda _i\}\) are the eigenvalues of \(A^{-1/2}BA^{-1/2}\). In the last expression we may replace \(A^{-1/2}BA^{-1/2}\) by \(BA^{-1}\), since the two are similar and hence have the same eigenvalues.

We list basic properties of the Riemannian metric \(\delta \).

Proposition 4.1

The Riemannian distance \(\delta \) is a metric making \(\mathbb P\) a complete metric space exhibiting the following properties:

  1. (1)

    For \(M\in \textbf{GL}_m({\mathbb {C}})\), the group of \(m\times m\) invertible matrices, the congruence transformation \(\Gamma _M:(\mathbb P,\delta )\rightarrow (\mathbb P,\delta )\) defined by \(\Gamma _M(X)=MXM^*\) is an isometry. The inversion map \(A\mapsto A^{-1}\) is also an isometry on \(\mathbb P\).

  2. (2)

    The exponential map \(\exp : {\mathbb {H}}\rightarrow \mathbb P\) is expansive, that is, \(\delta (e^A,e^B)\ge \Vert B-A\Vert _2\) for all \(A,B\in {\mathbb {H}}\).

  3. (3)

    The exponential map restricted to any one-dimensional subspace of \({\mathbb {H}}\) is an isometry. Furthermore, any metric on \(\mathbb P\) that has this property and is invariant under congruence transformations must agree with \(\delta \).

  4. (4)

    For \(A,B\in \mathbb P\), \(A\#B\) is the unique metric midpoint between A and B.

Proof

To give some flavor of the preceding, we prove parts of (1), (3) and (4). First for (1) we observe for \(A,B\in \mathbb P\) and M invertible, \((MBM^*)(MAM^*)^{-1}= M(BA^{-1})M^{-1}\), which is similar to \(BA^{-1}\), and hence to \(A^{-1/2}BA^{-1/2}\), and thus may replace it in first part of Eq. (4.2).

For (3), let \(A\in {\mathbb {H}}\). Then

$$\begin{aligned} \delta (e^{sA}, e^{tA})= & {} \delta (e^{(-s/2)A}e^{sA}e^{(-s/2)A},e^{(-s/2)A}e^{tA}e^{(-s/2)A})\\= & {} \delta (I,e^{(t-s)A}) =\Vert \log (e^{(t-s)A})\Vert _2=\Vert tA-sA\Vert _2. \end{aligned}$$

Now let \(d(\cdot ,\cdot )\) be a metric satisfying the two properties of (3). For \(A,B\in \mathbb P\),

$$\begin{aligned} d(A,B)= & {} d(A^{-1/2}AA^{-1/2}, A^{-1/2}BA^{-1/2})\\= & {} d(I,A^{-1/2}BA^{-1/2})\\= & {} \Vert \log (A^{-1/2}BA^{-1/2})\Vert _2=\delta (A,B). \end{aligned}$$

For the midpoint property in (4), since congruence map**s preserve \(\delta \) and \(\#\), it suffices to consider the case \(A\#B=I\) (otherwise first apply \(\Gamma _{(A\#B)^{-1/2}}\)). Then \(B=IA^{-1}I=A^{-1}\). Applying \(\log \) we obtain \(\log B=-\log A\), so restricting \(\exp \) to the one-dimensional subspace \({\mathbb R}\cdot A\) yields the result by (3).

A proof of (2) is sketched in [22], and a shorter and more elegant proof appears in [7, Chapter 6].

For the triangular inequality, consider first the case ABC with \(A=I\). Then from (3) \(\delta (I,B)=\Vert \log B\Vert _2\), \(\delta (I,C)=\Vert \log C\Vert _2\) and from (2) \(\Vert \log (C)-\log (B)\Vert _2\le \delta (B,C)\), so

$$\begin{aligned} \delta (I,C)=\Vert \log C\Vert _2\le \Vert \log B\Vert _2+\Vert \log (C)-\log (B)\Vert _2\le \delta (I,B)+\delta (B,C). \end{aligned}$$

The general case follows using (1) and the congruence transformation \(X\mapsto A^{-1/2}XA^{-1/2}\) and its inverse.

By (2) and continuity of the exponential map, the metric \(\delta \) is complete. \(\square \)

For a metric space (Xd) the metric is said to satisfy the semiparallelogram law if for all \(x_1,x_2\in X\), there exists \(m\in X\) such that for any \(x\in X\),

$$\begin{aligned} d^2(x_1,x_2)+4d^2(x,m)\le 2d^2(x,x_1)+2d^2(x,x_2)\qquad \qquad (NPC) \end{aligned}$$

One can show that \(m=m(x_1,x_2)\) is unique and is the unique metric midpoint between \(x_1\) and \(x_2\). If the inequality is replaced by an equality, one obtains a version of the parallelogram law holding in Hilbert spaces. The semiparallelogram law is a metric version of nonpositive curvature (NPC). We define a Hadamard space to be a complete metric space satisfying the semiparallelogram law. These spaces have been and continue to be widely studied and appear under the alternative names of global NPC-spaces or CAT(0)-spaces.

Using Proposition 4.1 it is straightforward to show that \((\mathbb P,\delta )\) is a Hadamard space. One first considers the case that \(A\#B=I\) and another point C. Then the parallelogram law holds in the Hilbert space \({\mathbb {H}}\) for \(\log A\), \(\log B=-\log A\), and \(\log C\), and one uses Proposition 4.1(2) to obtain the semiparallelogram law for ABC. Via Proposition 4.1(1) the general case reduces to this one. See, for example, [22] for further details.

Corollary 4.2

The space \((\mathbb P,\delta )\) is a Hadamard space.

Remark 4.3

A more sophisticated approach to the results of this section is the path of Riemannian geometry. The open cone \(\mathbb P_m\) of \(m\times m\) positive definite matrices becomes a well-known Riemannian manifold when equipped with the trace Riemannian metric: \(\langle X,Y\rangle _A ={\textrm{tr}} A^{-1}XA^{-1}Y,\) where \(A\in \mathbb P_m\) and XY are \(m\times m\) Hermitian matrices. The corresponding distance metric on \(\mathbb P_m\) is precisely our metric \(\delta \), and this is the source of the name “Riemannian metric.” The distance metric of a simply connected Riemannian manifold satisfies (NPC) iff the manifold has nonpositive curvature in the usual sense. See [17, Chapter XII] for more details.

5 Means of several variables

Formally a mean of order n, or n-mean for short, on a set X is a function \(\gamma :X^n \rightarrow X\) satisfying the idempotency condition: \(\forall x\in X,\, \gamma (x,x,\ldots , x)=x.\) It is frequently assumed in the definition of a mean that it is symmetric, that is, invariant under any permutation of variables. When we speak of an omnivariable mean \(\gamma =\{\gamma _n\},\) we are referring to one defined for all \(X^n\), \(n \ge 1\). (For \(n=1\), \(\gamma _1\) is the identity, and is thus frequently ignored.)

The mean \(\gamma :X^n\rightarrow X\) is continuous or a topological mean if X is a topological space and \(\gamma \) is continuous. Frequently a mean represents some type of averaging operator.

Ando et al. [3] gave the first extension of the binary geometric mean to an omnivariable mean on \(\mathbb P\), which over time was abbreviated to the ALM mean. For three variables ABC, they first took the new three point set \(\{A_1:=B\#C, B_1:=A\#C, C_1:=A\#B\}\) consisting of the geometric mean (midpoint) of each pair, then repeated this construction on the new three point set and continued repeating the operation inductively. They showed the triples approached a common point, their mean for the case \(n=3\). They extended it inductively to an n-variable mean for all \(n>2\). We note that Lawson and Lim [23] later extended the ALM construction to a rather wide class of metric spaces.

Ando, Li, and Mathias made two important contributions in their paper. First of all, they identified axiomatic properties that an omnivariable geometric mean \(\gamma \) should satisfy. They then established that the ALM mean they had defined satisfied all these properties. The proofs typically involved extending from the known case of \(n=2\) by induction.

Theorem 5.1

Let \({\mathbb {A}}=(A_1,\ldots , A_n),{\mathbb {B}}=(B_1,\ldots ,B_n)\in \mathbb P^n\), and let \(\gamma \) be the ALM mean. The following properties hold.

  1. (P1)

    (Consistency with scalars) \( \gamma ({\mathbb A})=(A_{1}\cdots A_{n})^{1/n}\) if the \(A_{i}\)’s commute;

  2. (P2)

    (Joint homogeneity) \( \gamma (a_{1}A_{1},\dots ,a_{n}A_{n})= (a_{1}\cdots a_{n})^{1/n}{\gamma ({\mathbb {A}})}\) for \(a_1,\ldots , a_n>0;\)

  3. (P3)

    (Permutation invariance) \(\gamma ({\mathbb {A}}_{\sigma }) =\gamma ({\mathbb {A}}),\) where \({\mathbb {A}}_{\sigma }=(A_{\sigma (1)},\dots ,A_{\sigma (n)})\) for all index permutations \(\sigma ;\)

  4. (P4)

    (Monotonicity) If \(A_{i}\le B_{i}\) for all \(1\le i\le n,\) then \( \gamma ({\mathbb {A}})\le \gamma ({\mathbb {B}});\)

  5. (P5)

    (Continuity) \(\gamma \) is continuous;

  6. (P6)

    (Congruence invariance) \(\gamma (M{\mathbb {A}}M^*)= M\gamma ({\mathbb {A}})M^*\) for M invertible, where \(M(A_{1},\dots ,A_{n})M^{*}=(MA_{1}M^{*},\dots ,MA_{n}M^{*});\)

  7. (P7)

    (Joint concavity) \(\gamma (\lambda {\mathbb {A}}+(1-\lambda ){\mathbb {B}})\ge \lambda \gamma ({\mathbb {A}})+(1-\lambda )\gamma ({\mathbb {B}})\) for \(0\le \lambda \le 1\);

  8. (P8)

    (Self-duality) \(\gamma (A_{1}^{-1},\dots , A_{n}^{-1})^{-1}= \gamma (A_{1},\dots , A_{n});\)

  9. (P9)

    (Determinantal identity) \( {\textrm{Det}}\,\gamma ({\mathbb {A}})= \prod _{i=1}^{n}({\textrm{Det}}A_{i})^{1/n}\); and

  10. (P10)

    (AGH mean inequalities) \( n(\sum _{i=1}^{n}A_{i}^{-1})^{-1}\le \gamma ({\mathbb {A}}) \le \frac{1}{n}\sum _{i=1}^{n}A_{i}.\) (AGH is short for arithmetic–geometric-harmonic.)

Bini et al. [11] later gave a variant of the ALM mean that retained its properties, but was much more computationally efficient.

A weight \({\textbf{w}}\) of length n is an n-tuple \((w_1,\ldots ,w_n)\) where \(0< w_i\le 1\) for each i and \(\sum _{k=1}^n w_k=1\), and a weighted n-tuple of a set X is a pair \(({\textbf{w}},{\textbf{x}})\), where \({\textbf{w}}\) is a weight of length n and \({\textbf{x}}=(x_1,\ldots ,x_n)\in X^n\). We think of this as convenient notation for an ordered n-tuple of weighted points \((x_i,w_i)\). An n-variable weighted mean \(\gamma \) on a set X assigns to each weighted n-tuple \(({\textbf{w}},{\textbf{x}})\) some \(\gamma ({\textbf{w}},{\textbf{x}})\in X\) with the extra condition that \(\gamma ({\textbf{w}},(x,x,\ldots ,x))=x\). We may think of \(\gamma ({\textbf{w}},{\textbf{x}})\) as the assigning of a “center of mass.”

Remark 5.2

The weighted geometric mean \(A\#_tB\) (with \({\textbf{w}}=(1-t,t)\) ) is given by

$$\begin{aligned} A\#_t B=A^{1/2}(A^{-1/2}BA^{-1/2})^t A^{1/2}. \end{aligned}$$

The map \(\alpha :[0,\delta (A,B)]\rightarrow \mathbb P\) defined by \(\alpha (t)=A\#_t B\) is an isometry onto the geodesic arc between A and B.

Rather obvious variants of the axiomatic properties (P1)–(P10) exist in the weighted mean setting and were introduced and studied in [21]. There a weighted version of the mean given by Bini et al. was introduced, which could also be extended to more general metric spaces.

Closely related to the notion of a omnivariable weighted mean is that of a barycentric map. Denote by \({\mathcal {P}}^{<\infty }(X)\) the set of all finitely supported probability measures on X, measures of the form \(\sum _{k=1}^n w_k\delta _{x_k}\), where \((w_1,\ldots ,w_n)\) is a weight and \(\delta _x\) is the unit point mass at x. A barycentric map in its simplest manifestation is a map \(\beta :{\mathcal {P}}^{<\infty }(X)\rightarrow X\) such that \(\beta (\delta _x)=x\) for each \(x\in X\). A barycentric \(\beta \) gives rise to a corresponding multivariable weighted mean defined by

$$\begin{aligned}\gamma _\beta ({\textbf{w}};x_1,\ldots ,x_n)=\beta \left( \sum _{k=1}^n w_k \delta _{x_k}\right) ,\end{aligned}$$

where \({\textbf{w}}=(w_1,\ldots ,w_n)\) is a weight. We note, however, that it is not the case that all weighted means arise from barycentric maps. If we restrict to uniform weights (\(w_i=1/n\) for each i), we are essentially in the setting of non-weighted means.

Soon after the introduction of the ALM mean, an alternative, in many ways better, candidate for the multivariable matrix geometric mean was put forth to which we now turn. Let (Md) be a metric space. The least squares mean \(\Lambda (a_1,\ldots ,a_n)\) is defined as the solution to the optimization problem of minimizing the sum of distances squared

$$\begin{aligned} \Lambda (a_1,\ldots ,a_n)=\underset{x\in M}{{{\,\mathrm{\mathrm {arg\, min}}\,}}}\sum _{i=1}^n d^2(x,a_i), \end{aligned}$$

and the weighted least squares mean \(\Lambda ({\textbf{w}};a_1,\ldots ,a_n)\) for \({\textbf{w}}=(w_1,\ldots , w_n)\) by

$$\begin{aligned} \Lambda ({\textbf{w}};a_1,\ldots ,a_n)=\underset{x\in M}{{{\,\mathrm{\mathrm {arg\, min}}\,}}}\sum _{i=1}^nw_{i} d^2(x,a_i), \end{aligned}$$

provided the solution uniquely exists in each of the respective cases. Note that the least squares mean is equal to the weighted least squares mean for the uniform weight with all entries 1/n. The unique solution of the minimizing problem exists in Hadamard spaces [42, Proposition 1.7], since the non-negative function defined by \(x\mapsto \sum _{i=1}^n w_i d^2(x,a_i)\) is uniformly convex in this case. (We recall \(F:M\rightarrow {\mathbb R}\) is uniformly convex if there exists a strictly increasing \(\varphi : [0,\infty )\rightarrow [0,\infty )\) such that for all \(x,y\in M\),

$$\begin{aligned} F(m(x,y))\le \frac{1}{2} F(x)+\frac{1}{2}F(y)-\varphi (d(x,y)), \end{aligned}$$

where m(xy) is the unique midpoint between x and y.)

E. Cartan considered such “barycenters” in the case of Riemannian manifolds, where they uniquely exist for the ones of nonpositive curvature and exist locally much more generally, and Fréchet [13] considered them in more general metric spaces. Thus the least squares mean is also called the Cartan mean or Fréchet mean. Such means also frequently go by the name “Karcher means,” but Karcher’s approach from differential geometry involved finding the solution of the “Karcher equation” that was satisfied at this extremum [16]. We return to this in a later section.

Moakher [34] first introduced and studied the least squares mean for the set of positive definite matrices \(\mathbb P\) equipped with the Riemannian metric as a omnivariable generalization of the two-variable geometric mean. Independently Bhatia and Holbrook [8, 9] introduced and studied the least squares mean in the weighted setting. These authors established its (unique) existence and verified most of the axiomatic properties (P1)-(P10) satisfied by the Ando–Li–Mathias geometric mean: consistency with scalars, joint homogeneity, permutation invariance, congruence invariance, and self-duality (the last two being true since congruence transformations and inversion are isometries). Further, based on computational experimentation, Bhatia and Holbrook conjectured monotonicity for the least squares mean (property (P4) in the earlier list), but this was left as an open problem.

6 The inductive mean, random variables, and monotonicity

One other mean played an important role in what followed, one that we shall call the inductive mean, following the terminology of Sturm [42]. It appeared elsewhere in the work of Sagae and Tanabe [41] and Ahn et al. [1]. It is defined inductively for Hadamard spaces (or more generally for metric spaces with weighted binary means \(x\#_t y\)) for each \(k\ge 2\) by \(S_2(x,y)=x\# y\) and for \(k\ge 3\), \(S_{k}(x_1,\ldots , x_{k})=S_{k-1}(x_1,\ldots , x_{k-1})\#_{\frac{1}{k}}x_{k}\). (Here \(x\#_t y\) is the unique point z such that \(d(x,y)=(1-t)d(x,z)+ td(y,z)\) for \(0\le t\le 1\).) Note that this mean at each stage is defined from the previous stage by taking the appropriate two-variable weighted mean, which is monotone for the Hadamard space \({\mathbb {P}}\). Thus the inductive mean is monotone (property (P4)).

Let (Md) be Hadamard space and \({\textbf{w}}=(w_1,\ldots , w_n)\) be a weight. Set \(\mathbb N_n=\{1,2,\ldots , n\}\) and assign to \(k\in \mathbb N_n\) the probability \(w_k\). Let \({\textbf{x}}=(x_1,\ldots ,x_n)\in M^n\). For each \(\omega \in \prod _{j=1}^\infty \mathbb N_n\), a countable product, define a sequence \(\sigma _{\omega ,{\textbf{x}}}\) in M by \(\sigma _{\omega ,{\textbf{x}}}(1)=x_{\omega (1)}\), \(\sigma _{\omega ,{\textbf{x}}}(k)=S_k(x_{\omega (1)},\ldots ,x_{\omega (k)})\), where \(S_k\) is the inductive mean. The sequence \(\sigma _{\omega ,{\textbf{x}}}\) may be viewed as a “walk” starting at \(\sigma _{\omega ,{\textbf{x}}}(1)=x_{\omega (1)}\) and moving toward \(x_{\omega (k)}\) from \(\sigma _{\omega ,{\textbf{x}}}(k-1)\) a distance of \((1/k)d(\sigma _{\omega ,{\textbf{x}}}(k-1),x_{\omega (k)})\) to obtain \(\sigma _{\omega ,{\textbf{x}}}(k)\). Alternatively we may give \(\prod _{j=1}^\infty \mathbb N_n\) the product probability, making it a probability space, and define a family of i.i.d. random variables \(\{X_k\}\) by \(X_k(\omega )=x_{\omega (k)}\). We replace the traditional sum of the first k random variables by \(\sigma _{\omega ,{\textbf{x}}}(k)\) and take for the expected value \(\Lambda ({\textbf{w}};x_1,\ldots ,x_m)\). From this viewpoint we have the following special case of Sturm’s Law of Large Numbers for Hadamard spaces [42, Theorem 4.7]:

6.1 Sturm’s theorem

Giving \(\prod _{n=1}^\infty {\mathbb {N}}_m\) the product probability, the set

$$\begin{aligned}\left\{ \omega \in \prod _{k=1}^\infty \mathbb N_m:\lim _{k\rightarrow \infty } \sigma _{\omega ,{\textbf{x}}}(k)= \Lambda ({\textbf{w}};x_1,\ldots ,x_m)\right\} \end{aligned}$$

has measure 1, i.e., \(\sigma _{\omega ,{\textbf{x}}}(k)\rightarrow \Lambda ({\textbf{w}};x_1,\ldots ,x_m)\) as \(k\rightarrow \infty \) for almost all \(\omega \).

We shall briefly return to the theory of random variables on a probability space taking values in \(\mathbb P\), or more generally in a Hadamard space, at a later point.

Using the preceding version of Sturm’s Theorem, Lawson and Lim [25] provided a positive solution to the earlier mentioned conjecture of Bhatia and Holbrook about the monotonicity of the least squares mean.

Theorem 6.1

Let \(\mathbb P\) be the open cone of positive definite matrices of some fixed dimension, and let \(n\ge 3\).

  1. (1)

    The [weighted] least squares mean \(\Lambda \) on \(\mathbb P\) is monotone: \(A_i\le B_i\) for \(1\le i\le n\) implies \(\Lambda (A_1,\ldots , A_n)\le \Lambda (B_1,\ldots B_n)\) \([\Lambda ({\textbf{w}};A_1,\ldots , A_n)\le \Lambda ({\textbf{w}};B_1,\ldots B_n)]\).

  2. (2)

    The other nine (weighted) ALM axioms hold for \(\Lambda \).

Proof

Assume for \({\mathbb {A}}=(A_1,\ldots ,A_m)\) and \({\mathbb {B}}=(B_1,\ldots ,B_m)\) that \(A_i\le B_i\) for \(1\le i\le m\). Let \({\textbf{w}}\) be a weight. By Sturm’s theorem applied to the “walks” for \({\mathbb {A}}\) and for \({\mathbb {B}}\), we have

$$\begin{aligned}\sigma _{\omega ,{\mathbb {A}}}(k)\rightarrow \Lambda ({\textbf{w}};A_1,\ldots ,A_n)~~\textrm{and}~~ \sigma _{\omega ,{\mathbb {B}}}(k)\rightarrow \Lambda ({\textbf{w}};B_1,\ldots ,B_n)\end{aligned}$$

as \(k\rightarrow \infty \) for almost all \(\omega \in \prod _{n=1}^\infty \mathbb N_m\) (since the intersection of two sets of measure 1 has measure 1). Fixing any such \(\omega \), we obtain part (1) since the partial order relation is closed and by the monotonicity of the inductive mean for each k

$$\begin{aligned}\sigma _{\omega ,{\mathbb {A}}}(k)=S_k(A_{\omega (1)},\ldots ,A_{\omega (k)})\le S_k(B_{\omega (1)},\ldots ,B_{\omega (k)})=\sigma _{\omega ,{\mathbb {B}}}(k).\end{aligned}$$

\(\square \)

Later Bhatia and Karandikar [10] proved the monotonicity property using in place of Sturm’s theorem some elementary counting arguments and basic inequalities for the metric \(\delta \). Meanwhile Holbrook [15] identified a specific \(\omega \in \prod _{n=1}^\infty \mathbb N_m\) that always yielded a “walk” converging to \(\Lambda (A_1,\ldots , A_n)\) for any choice of \(A_1,\ldots , A_n\in \mathbb P\). He referred to his result as the “no dice” theorem since it eliminated probabilistic reasoning. Later Lim and Pálfia [31] obtained more general results about deterministic walks yielding the least squares mean in general Hadamard spaces.

Example 6.2

As a basic example we obtain a deterministic walk for \((A_1,\ldots ,A_n)\in \mathbb P^n\), all points equally weighted, that converges to \(\Lambda (A_1,\ldots ,A_n)\) by defining \(C_k\) for each \(k\in \mathbb N\) to be \(A_j\), where \(k\equiv j\) mod n, and defining \(Y_k\) to be the inductive mean of \((C_1,\ldots , C_k)\). Then the sequence \(\{Y_k\}\) converges to \(\Lambda (A_1,\ldots ,A_n)\); see [31]. The example generalizes to Hadamard spaces.

7 A uniqueness condition

The ALM mean is typically distinct from the least squares mean for \(n\ge 3\). Thus the ALM axioms do not characterize a mean. The latter fact had already been noted by Bini et al., who observed their BMP variant of the ALM mean was different from it [11]. In [43] Yamazaki established the following important inequality for the Karcher mean:

$$\begin{aligned}{} & {} \sum _{k=1}^nw_k\log A_k{\le } 0\Leftrightarrow \Lambda (\textbf{w};A_1,\dots ,A_n){\le } I\Leftrightarrow \Lambda (\textbf{w};A_1^r,\dots ,A_n^r){\le } I,~r{\ge } 1\nonumber \\ \end{aligned}$$
(7.1)

In his paper Yamazaki showed how the equivalence of his inequalities grew out of the Ando–Hiai inequality in the two-variable setting.

In Lim and Pálfia [30] used the work of Yamazaki to characterize the Karcher mean.

Theorem 7.1

As an omnivariable mean, the Karcher mean is uniquely determined by congruence invariancy (P6), self-duality (P8), and the Yamazaki inequality

$$\begin{aligned}\sum _{k=1}^nw_k\log A_k\le 0\Rightarrow \Lambda (\textbf{w};A_1,\dots ,A_n)\le I.\end{aligned}$$

8 Symmetric cones

At this point in the survey we have traced the generalization of the geometric mean from the positive real numbers to pairs of positive definite matrices to the weighted least squares mean on finite weighted tuples of positive definite matrices. We may well pause to ask whether there are important classes of Hadamard spaces or Riemannian manifolds of nonpositive curvature carrying a structure that satisfies the ALM axioms for the least squares mean. And indeed there is, namely the class of positive symmetric cones of finite-dimensional Euclidean Jordan algebras. This has been partially carried out in [25], with more details given in [24]. The class of positive matrix cones is one important example of these symmetric cones.

One would have to say that the theory of the least squares geometric mean in the setting of symmetric cones is not nearly so well-developed as in the setting of the cone of positive matrices.

8.1 Open problem

Extend the theory (to the extent possible) of the Karcher mean on \({\mathbb {P}}\) to the setting of symmetric cones. For example, is there an extension of the Yamazaki inequality characterization (at the end of the preceding section) for symmetric cones?

9 The Hilbert space setting

Let \({\mathcal {B}}(E)\) be the \(C^*\)-algebra of bounded linear operators equipped with the operator norm on an infinite-dimensional Hilbert space E. Let \({\mathbb {H}}(E)\) denote the closed subspace of hermitian operators, and let \(\mathbb P_E\) be the cone of invertible positive hermitian operators, an open cone in \({\mathbb {H}}(E)\). Again \(\exp \) and \(\log \) are analytic inverses between \({\mathbb {H}}(E)\) and \(\mathbb P_E\). We equip \(\mathbb P_E\) with the Thompson metric defined by \(d_T(A,B)=\Vert \log (A^{-1/2}BA^{-1/2})\Vert \), where \(\Vert \cdot \Vert \) is the operator norm. The metric \(d_T\) retains the four properties of \(\delta \) in Proposition 4.1 in the finite-dimensional setting, except that \(A\#B\) is no longer the unique midpoint between A and B [37]. Additionally \(\mathbb P_E\) is no longer a Hadamard space, basically because \({\mathbb {H}}(E)\) is no longer a Hilbert space. The definition and basic properties of the geometric mean \(A\# B\), in particular those of Proposition 3.3, remain valid, and it still gives a distinguished metric midpoint (no longer unique [37]) between A and B. The smallest closed set containing A and B and closed under taking \(\#\)-midpoints yields a distinguished metric geodesic connecting A and B consisting of all \(A\#_t B\), \(0\le t\le 1\). (One should note however that metric geodesics between two given points are not unique.)

10 The Karcher equation

The uniform convexity of the Riemannian metric \(\delta \) on \(\mathbb P\) yields that the least squares mean is the unique critical point for the function \(X\mapsto \sum _{k=1}^n w_k\delta ^2(X,A_k)\). The least squares mean is thus characterized by the vanishing of the gradient, which is equivalent to being a solution of the following Karcher equation:

$$\begin{aligned} \sum _{k=1}^{n}w_{k}\log (X^{-1/2}A_{k}X^{-1/2})=0. \end{aligned}$$
(10.1)

The Karcher equation (10.1) can also be used to define a mean on the cone \(\mathbb P\) of positive invertible bounded operators on an infinite-dimensional Hilbert space (where one no longer has a Hadamard space), called the Karcher mean. As we just previously noted, restricted to the matrix setting it yields the least squares mean. Thus it is reasonable to continue to denote it by \(\Lambda ({\textbf{w}};A_1,\ldots ,A_n)\).

Power means for positive definite matrices were introduced by Lim and Pálfia [30].

Theorem

Let \(A_{1},\dots ,A_{n}\in \mathbb P\) and let \({\textbf{w}}=(w_{1},\dots ,w_{n})\) be a weight. Then for each \( t\in (0,1],\) the following equation has a unique positive definite solution \(X=P_t({\textbf{w}};A_1,\ldots ,A_n)\), called the t-weighted power mean:

$$\begin{aligned} X=\sum _{k=1}^{n}w_{k}(X\#_{t}A_{i}). \end{aligned}$$

When restricted to the positive real numbers, the power mean reduces to the usual power mean

$$\begin{aligned} P_t({\textbf{w}};a_1,\ldots ,a_n)=\left( w_{1}a_{1}^{t}+\cdots +w_{n}a_{n}^{t}\right) ^{\frac{1}{t}}. \end{aligned}$$

In 2014 Lawson and Lim showed that the preceding notion of power mean extended to the setting of bounded operators on a Hilbert space [26] and established that the power means are decreasing: \(s<t\) implies \(P_s(\cdot ;\,\cdot )\le P_t(\cdot ;\,\cdot )\). Using power means Lawson and Lim were able to establish the existence and uniqueness of the Karcher mean in the \(C^*\)-algebra of bounded operators on a Hilbert space.

Theorem 10.1

In the strong operator topology

$$\begin{aligned} \Lambda (\cdot ;\,\cdot )=\lim _{t\rightarrow 0^+} P_t(\cdot ;\,\cdot )=\inf _{t>0}P_t(\cdot ;\,\cdot ), \end{aligned}$$

where \(\Lambda \) is the Karcher mean, the unique solution of the Karcher equation

$$\begin{aligned} X=\Lambda (\textbf{w};A_1,\ldots ,A_n)\Leftrightarrow \sum _{k=1}^n w_k\log (X^{-1/2}A_kX^{-1/2})=0. \end{aligned}$$

Via this machinery many of the axiomatic properties of the least squares mean in the finite-dimensional setting were extended to the corresponding Karcher mean in the infinite-dimensional setting.

Recent work by Lim and Pálfia [32] and independently by Lawson [19] shows that the preceding constructions and results remain valid for the open cone of positive invertible elements in any unital \(C^*\)-algebra.

11 Barycenters

A Borel probability measure on a metric space (Xd) is a countably additive non-negative measure \(\mu \) on the Borel algebra \({\mathcal {B}}(X)\), the smallest \(\Sigma \)-algebra containing the open sets, such that \(\mu (X)=1\). We denote the set of all probability measures on \((X,{\mathcal {B}}(X))\) by \({{\,\mathrm{\mathcal {P}}\,}}(X)\). Let \({\mathcal P}_{0}(X)\) be the set of all uniform finitely supported probability measures, i.e., all \(\mu \in \mathcal {P}(X)\) of the form \(\mu =\frac{1}{n}\sum _{k=1}^{n}\delta _{x_{k}}\) for some \(n\in {\mathbb N},\) where \(\delta _x\) is the point measure of mass 1 at x.

A measure \(\mu \in {{\,\mathrm{\mathcal {P}}\,}}(X)\) is said to be integrable if for some (and hence all) x

$$\begin{aligned} \int _{X}d(x,y)d\mu (y)<\infty . \end{aligned}$$

The set of integrable measures is denoted by \({{\,\mathrm{\mathcal {P}}\,}}^1(X)\).

The Wasserstein distance (alternatively Kantorovich-Rubinstein distance) \(d^W\) on \(\mathcal {P}^{1}(X)\) is a standard metric for probability measures. It is known that \(d^W\) is a complete metric on \(\mathcal {P}^{1}(X)\) whenever X is a complete metric space and that \(\mathcal {P}_{0}(X)\) is \(d^W\)-dense in \(\mathcal {P}^{1}(X)\).

Definition 11.1

([42]). A map \(\beta :({{\,\mathrm{\mathcal {P}}\,}}^1(X),d^W)\rightarrow (X,d)\) is a contractive barycentric map if (i) \(\beta (\delta _x)=x\) for all \(x\in X\), and (ii) \(d(\beta (\mu ),\beta (\nu ))\le d^W(\mu ,\nu )\) for all \(\mu ,\nu \in {{\,\mathrm{\mathcal {P}}\,}}^1(X)\).

Suppose \(G=\{G_n:X^n\rightarrow X: n\ge 2\}\) is a omnivariable mean on (Xd), a metric space. The mean G is said to be iterative if for all \(n,k\ge 2\) and \({\textbf{x}}=(x_1,\ldots ,x_n)\in X^n\),

$$\begin{aligned} G_n({\textbf{x}})=G_{nk}({\textbf{x}}^k), \end{aligned}$$

where \({\textbf{x}}^k=(x_1,\ldots ,x_n, x_1,\ldots ,x_n,\ldots , x_1,\ldots ,x_n)\in X^{nk}\). The next proposition is a key result from [27].

Proposition 11.2

Suppose G is a omnivariable mean on a complete metric space (Xd). If G is symmetric, iterative, and satisfies for all n, all \({\textbf{x}}=(x_1,\ldots ,x_n)\), \({\textbf{y}}=(y_1,\ldots ,y_n)\),

$$\begin{aligned} d(G({\textbf{x}}),G({\textbf{y}}))\le \frac{1}{n}\sum _{k=1}^n d(x_k,y_k), \end{aligned}$$
(11.1)

then there exists a unique contractive barycentric map \(\beta :{{\,\mathrm{\mathcal {P}}\,}}^1(X)\rightarrow X\) satisfying \(G(x_1,\ldots , x_n)=\beta \left( \sum _{k=1}^n (1/n)\delta _{x_k}\right) \).

The Wasserstein distance between \(\sum _{k=1}^n \frac{1}{n}\delta _{x_k}\) and \(\sum _{k=1}^n \frac{1}{n}\delta _{y_k}\) is the minimum over all permutations of \(\{y_k\}_{k=1}^n\) of the sum on the right in Eq. (11.1), so the inequality implies \(\beta \) is contractive on \({{\,\mathrm{\mathcal {P}}\,}}_0(X)\), a dense subset, and hence extends uniquely to a contractive map on \({{\,\mathrm{\mathcal {P}}\,}}^1(X)\).

The hypotheses of the preceding proposition are satisfied by the Karcher mean \(\Lambda \), and thus the mean “extends” to a contractive barycentric map \(\beta _\Lambda :{{\,\mathrm{\mathcal {P}}\,}}^1(\mathbb P)\rightarrow \mathbb P\), called the Karcher barycentric map. It is characterized by

$$\begin{aligned} X=\beta _\Lambda (\mu ) \Leftrightarrow \int _\mathbb P\log (X^{-1/2} A X^{-1/2}) d\mu (A)=0. \end{aligned}$$

The existence and basic theory and properties of the Karcher barycentric map can be found in [27]. Many of the basic properties of the least squares mean in Theorem 5.1 have analogous properties for the Karcher barycentric map. As one example the partial order \(\mathbb P\) extends to one on \({{\,\mathrm{\mathcal {P}}\,}}^1(\mathbb P)\) called the stochastic order. It is defined by

$$\begin{aligned} \mu \le \nu ~\textrm{if}~\mu (U)\le \nu (U) \text{ for } \text{ all } \text{ open } U=\uparrow U=\{y\in \mathbb P:\exists x\in U,~x\le y\}. \end{aligned}$$

An important result that generalizes the monotonicity of the Karcher mean is that the Karcher barycentric map is monotonic with respect to the stochastic order; see [14, 18].

12 \(\mathbb P\)-valued random variables

The Karcher barycentric map makes possible a theory of \(\mathbb P\)-valued random variables. Let \((Z,{\mathcal {A}},q)\) be a probability space, where Z is a set, \({\mathcal {A}}\) is a \(\sigma \)-algebra, and q is a probability measure on \({\mathcal {A}}\). Let \(X:Z\rightarrow \mathbb P\) be measurable. With respect to X there is push-forward measure \(X_*(q)\) on \(\mathbb P\) defined by \(X_*(q) (Q)=q(X^{-1}(Q))\) for any Borel subset Q of \(\mathbb P\). This measure is called the distribution of X. We are particularly interested in the \(L^1\) random variables, those for which \(X_*(q)\in {{\,\mathrm{\mathcal {P}}\,}}^1(\mathbb P)\). In this case we can define the expectation \(E(X)=\beta (X_*(q))\), where \(\beta \) is the Karcher barycentric map. These ideas can be worked out in Hadamard spaces to a rather full-blown theory as has been done by Sturm [42], and we refer the reader to that source for details. However, the infinite dimensional \(\mathbb P\) is not a Hadamard space, so requires new approaches; see [32].

13 Lie groups

Computational anatomy involves the development and application of mathematical and computational techniques to analyze and model anatomical structures. One important aspect is shape analysis, analyzing the shape and geometry of anatomical structures. Lie groups appear to be well-suited for certain aspects of shape analysis, and this has motivated searching for means on Lie groups that employ and reflect the geometric structure of Lie groups, in particular. For the study of Karcher means on Lie groups in this arena see the work of Pennec and Arsigny, principally in [33, 38], and in the references found in their work.

Pennec and Arsigny consider a connected Lie group G equipped with the canonical (Cartan) connection, a well-known connection. A key motivation for introducing the canonical connection is the desire to consider the Karcher mean on sufficiently small “geodesically convex” open subsets of a Lie group, where one can derive existence and uniqueness results, as is done in [38]. However, we restrict our attention here to the global setting, which allows us to generalize the previous setting, drop the need of the connection, but retain other main ingredients.

Our setting will be that of connected finite-dimensional Lie groups G with identity e, although many of our considerations easily generalize to the Banach-Lie group setting. The tangent space \(T_e G\) at e carries the structure of the Lie algebra of G and there is an exponential map \(\exp : T_eG\rightarrow G\) with standard well-known known properties. The one-parameter subgroups of G consist of the smooth homomorphisms \(\gamma _v:({\mathbb R},+)\rightarrow G\) defined by \(\gamma _v(t)=\exp (tv)\) for each \(v\in T_eG\). We define the (maximal) geodesics of G to be left translations by members of G of the one-parameter subgroups. Note the geodesics are invariant under left translation, but also under right translation since \(g\gamma _v(t)=({{\,\textrm{Ad}\,}}(g)\gamma _v(t))g\), and thus are invariant under inversion. We then have for \(v\in T_g G\),

$$\begin{aligned} \exp _g(v)= \gamma _{g,v }(1)= g\exp (dL_{g^{-1}}(v)). \end{aligned}$$

where \(\gamma _{g,v}:{\mathbb R}\rightarrow G\) is the unique maximal geodesic with \(\gamma _{g,v}(0)=g\), \(\dot{\gamma }_{g,v}(0)=v\) and \(dL_{g^{-1}}\) is the derivative of the function \(L_{g^{-1}}:G\rightarrow G\), left translation by \(g^{-1}\).

We turn to the weighted Karcher mean in the context of Lie groups. Let G be a connected Lie group, let \(\{g_1,\ldots , g_n\}\) be a finite subset of G, and let \(\{w_1,\ldots w_n\}\) be a set of weights in the sense \(0<w_k<1\) for all k and \(\sum _{k=1}^n w_i=1\). Then m is a Karcher mean for \(({\textbf{w}},{\textbf{g}})\), where \({\textbf{w}}=(w_1,\ldots , w_n)\) and \({\textbf{g}}=(g_1,\ldots ,g_n)\), if there exists an open set U containing m and \(\{g_1,\ldots ,g_n\}\) such that there exists a smooth inverse \(\log _m: U\rightarrow T_mG\) to \(\exp _m\) such that \(\sum _{k=1}^n w_i\log _m (g_i)=0\). If \(\log _m\) is defined globally, we can alternatively write \(\sum _{k=1}^n w_i\log (m^{-1} g_i)=0\).

In [38] the authors propose the following algorithm for computing the Karcher mean, what they call the bi-invariant mean. For \({\textbf{w}}=(w_1,\ldots , w_n)\) and \({\textbf{g}}=(g_1,\ldots ,g_n)\) pick \(m_1=g_1\) and define

$$\begin{aligned} m_{i+1}= m_i\exp \left( \sum _{i=1}^n w_i \log (m_i^{-1} x_i)\right) , \end{aligned}$$
(13.1)

Remark 13.1

If we rewrite the iteration equation in the alternative form

$$\begin{aligned} m^*= \exp _m\left( \sum _{j=1}^n w_j \log _m x_j\right) , \end{aligned}$$

we can give an intuitive meaning to the formula. We represent locally the geodesic from m to \(x_j\) by the vector \(\log _m x_j\), the vector tangent to the geodesic at m. We take the weighted sum over N of these vectors, a local approximation to trying to take the “weighted sum” of the geodesics. Then we take the geodesic arising from this local approximation.

In Theorem 12 of [38] the authors show that the sequence generated by the given algorithm (13.1) converges to a unique Karcher mean in sufficiently small neighborhoods of a point. However, they do not have global theorems for convergence of the algorithm and existence of the Karcher mean. Although the results are quite limited at this stage, the following example has been recently established [20].

Example 13.2

Let \(\textrm{UP}_n\) be the unipotent matrix Lie group over \({\mathbb R}\) or \({\mathbb C}\) of \(n\times n\) upper triangular matrices with 1’s along the diagonal. Let \(\textbf{up}_n\) denote its (nilpotent) Lie algebra of strictly upper triangular matrices. We recall the well-known fact that the exponential and logarithm map are inverse diffeomorphisms in this setting. Given matrices \(A_1,\ldots , A_N\in \textrm{UP}_n\) and weights \(w_1,\ldots ,w_N > 0\) such that \(\sum _{k=1}^N w_i=1\), it turns out that the iteration

$$\begin{aligned} M_{k+1}=M_k\exp \left( \sum _{i=1}^N w_i \log (M_k^{-1} A_i)\right) \end{aligned}$$
(13.2)

converges to the unique Karcher mean \(\Lambda ({\textbf{w}}, {\mathbb {A}})\) in \((n-1)\)-iterations for any initial point \(M_1\in \textrm{UP}_n\).

13.1 Open problem

For various connected Lie groups G, particularly those for which the exponential function is a diffeomorphism, determine the convergence properties of the algorithm (13.2). A more specific problem is given a matrix Lie group G, fixed \(A_1,\ldots , A_N\in G\) and weights \(w_1,\ldots ,w_N > 0\) such that \(\sum _{k=1}^N w_i=1\), does there exists \(t>0\) such that the iteration

$$\begin{aligned} M_{k+1}=M_k\exp \left( t\sum _{i=1}^N w_i \log (M_k^{-1} A_i)\right) . \end{aligned}$$
(13.3)

converges to the Karcher mean.

14 Summary

In the preceding we have attempted to outline the high points of the striking development of the theory of the matrix/operator geometric mean on the cone of positive matrices/operators in the past twenty-plus years. Over this period of time it has evolved from a two-variable matrix mean to a omnivariable matrix mean (the least squares, Cartan, or Frechét mean) to an operator mean in the setting of unital \(C^*\)-algebras (the Karcher mean) and finally to a barycentric map on the space of integrable Borel probability measures. At each stage of the evolution significant new insights and developments were necessary. The theory has drawn heavily from matrix and operator theory and at the same time from geometric notions. And along the way we have seen a variety of characterizations of this mean: the least squares mean, the probabilistic or deterministic characterization as a limit of a “walk” with the inductive mean, the solution of the Karcher equation, the mean satisfying the Yamazaki inequality, and the limit/infimum of the power means \(\{P_t\}\) for \(t\searrow 0\). Whatever future developments may hold, it is clear that a substantial theory has already emerged.