Abstract
We study optimal mass transport problems between two measures with respect to a non-traditional cost function, i.e. a cost c which can attain the value \(+\infty \). We define the notion of c-compatibility and strong c-compatibility of two measures, and prove that if there is a finite-cost plan between the measures then the measures must be c-compatible, and if in addition the two measures are strongly c-compatible, then there is an optimal plan concentrated on a c-subgradient of a c-class function. This function is the so-called potential of the plan. We give two proofs of this theorem, under slightly different assumptions. In the first we utilize the notion of c-path-boundedness, showing that strong c-compatibility implies a strong connectivity result for a directed graph associated with an optimal map. Strong connectivity of the graph implies that the c-cyclic monotonicity of the support set (which follows from classical reasoning) guarantees its c-path-boundedness, implying, in turn, the existence of a potential. We also give a constructive proof, in the case when one of the measures is discrete. This approach adopts a new notion of ‘Hall polytopes’, which we introduce and study in depth, to which we apply a version of Brouwer’s fixed point theorem to prove the existence of a potential in this case.
Similar content being viewed by others
Notes
All considered measures are Borel measures on Polish spaces, which are complete, separable metric spaces equipped with their Borel \(\sigma \)-algebra (see also the beginning of Sect. 2).
When referring to a function on \(X \times Y\) as “measurable” we assume it is both measurable with respect to the product \(\sigma \)-algebra and its fibers \(f(\cdot ,y)\) and \(f(x,\cdot )\) are measurable functions on X and Y respectively, for any \(x\in X\) and \(y\in Y\).
References
Ambrosio, L., Pratelli, A.: Existence and stability results in the \({L}^1\) theory of optimal transportation. Optimal transportation and applications, pp. 123–160. Springer, Berlin, Heidelberg (2003)
Artstein-Avidan, S., Barel, H., Rubinstein, Y., Sadovsky, S., Wyczesany, K.: Transportation induced by the polarity transform, in preparation
Artstein-Avidan, S., Milman, V.: Hidden structures in the class of convex functions and a new duality transform. J. Eur. Math. Soc. 13(4), 975–1004 (2011)
Artstein-Avidan, S., Rubinstein, Y.A.: Differential analysis of polarity: Polar Hamilton-Jacobi, conservation laws, and Monge Ampère equations. J. d’Anal. Math. 132(1), 133–156 (2017)
Artstein-Avidan, S., Sadovsky, S., Wyczesany, K.: A Rockafellar-type theorem for non-traditional costs. Adv. Math. 395, 108157 (2022)
Ball, K.: An elementary introduction to monotone transportation. In: Geometric aspects of functional analysis, Lecture Notes in Mathematics, pp. 41–52. Springer, Berlin, Heidelberg (2004)
Barel, H.: Optimal transportation problem for polar cost, Master’s thesis, Tel Aviv University (2019)
Beiglböck, M., Goldstern, M., Maresch, G., Schachermayer, W.: Optimal and better transport plans. J. Funct. Anal. 256(6), 1907–1927 (2009)
Bianchini, S., Caravenna, L.: On optimality of c-cyclically monotone transference plans. Comptes Rendus Math. 348(11–12), 613–618 (2010)
Brenier, Y.: Polar factorization and monotone rearrangement of vector-valued functions. Commun. Pure Appl. Math. 44(4), 375–417 (1991)
Caffarelli, L.A.: The regularity of map**s with a convex potential. J. Am. Math. Soc. 5(1), 99–104 (1992)
Gangbo, W., McCann, R.J.: The geometry of optimal transportation. Acta Math. 177(2), 113–161 (1996)
Kantorovich, L.V.: On the transfer of masses. Dokl. Acad. Sci. SSSR 37, 227–229 (1942)
Kantorovich, L.V.: On a problem of Monge. Uspekhi Mat. Nauk. 3, 225–226 (1948)
McCann, R.J.: A convexity theory for interacting gases and equilibrium crystals, Ph.D. thesis, Princeton University (1994)
Rochet, J.C.: A necessary and sufficient condition for rationalizability in a quasi-linear context. J. Math. Econ. 16(2), 191–200 (1987)
Rockafellar, R.T.: Characterization of the subdifferentials of convex functions. Pac. J. Math. 17(3), 497–510 (1966)
Rüschendorf, L.: On c-optimal random variables. Stat. Probab. Lett. 27(3), 267–270 (1996)
Smith, C., Knott, M.: On Hoeffding-Fréchet bounds and cyclic monotone relations. J. Multivar. Anal. 40(2), 328–334 (1992)
Strassen, V.: The existence of probability measures with given marginals. Ann. Math. Stat. 36(2), 423–439 (1965)
Trudinger, N.S., Wang, X.: On strict convexity and continuous differentiability of potential functions in optimal transportation. Arch. Ration. Mech. Anal. 192(3), 403–418 (2009)
Villani, C.: Topics in optimal transportation, vol. 58. Graduate Studies in Mathematics American Mathematical Society, Ann Arbor (2003)
Villani, C.: Optimal transport: old and new, vol. 338. Springer Science & Business Media, Springer, New York, (2008)
Wyczesany, K.: Topics in high-dimensional geometry and optimal transport, Ph.D. thesis, University of Cambridge (2020)
Acknowledgements
The authors were partially supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant agreement No. 770127). The first named author was partially supported by ISF Grant No. 784/20. The second named author is grateful to the Azrieli foundation for the award of an Azrieli fellowship. We thank the anonymous referee whose comments helped improve and clarify this manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by A. Mondino.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: c-subgradients and polar subgradients
Appendix A: c-subgradients and polar subgradients
Since c-subgradients play such an important role in this theory, we gather here some relevant information regarding them but which we did not include in the main text so as not to disturb its flow.
Let us recall that given a function \(\varphi \) in the c-class, its c-subgradient is defined by
Denoting by \(\partial ^c\varphi (x)\) the set of points \(y\in Y\) for which \((x,y)\in \partial ^c \varphi \), we have by the definition that
Notice that \(y\in \partial ^c\varphi (x)\) if and only if the function \(c(\cdot ,y)-\varphi ^c(y)\) is above \(\varphi \) and coincides with it at x. This provides the first simple but useful way to think about c-subgradients, summarized in Lemma A.1. Given a function \(\varphi \) in the c-class, it is the image, under the c-transform, of another c-class function \(\psi = \varphi ^c\) and therefore, it can be written as an infimum over basic functions as follows:
All the functions on the right hand side lie above \(\varphi \). If any one of the basic functions (indexed by y) on the right hand side is equal to \(\varphi \) at the point x, then the pair (x, y) belongs to \(\partial ^c \varphi \), and \(y \in \partial ^c \varphi (x)\).
Lemma A.1
Let \(\varphi \) be a c-class function, and \(x\in X\) and assume that \(\varphi (x)<\infty \). Then \(y_0\in \partial ^c \varphi (x)\) if and only if \(c(x,y_0)<\infty \) and the function \(\ell (z)=c(z,y_0)-c(x,y_0)+\varphi (x)\) satisfies
Proof
By the definition we have that \(y_0\in \partial ^c \varphi (x)\) if and only if
Using the definition of the c-transform we see that
which holds if and only if for all z we have \(c(z,y_0)-c(x,y_0)+\varphi (x) \ge \varphi (z)\). \(\square \)
It is useful to understand the structure of the c-subgradient of the basic functions. In parallel to the classical case, where the linear functions have constant subgradient, we show that under mild assumptions the same is true for c-subgradients of basic functions. This was, of course, our motivation for using the specific candidates for the potential functions in Sect. 5.
Lemma A.2
Let X, Y be Polish spaces and let \(c:X\times Y \rightarrow (-\infty , \infty ]\) be a measurable cost function. Consider a basic function \(\varphi (x) = c(x,y_0)+ t\) for some \(y_0 \in Y\). If \(c(x,y_0)<\infty \), then \(y_0\in \partial ^c\varphi (x)\). If, in addition, for any \(y_1\ne y_0\) we have that \(\inf _z\left( c(z,y_1) - c(z,y_0)\right) \) is not attained at x (for example, if the infimum is \(-\infty \), or bounded but not attained at all) then \(\{y_0 \}= \partial ^c\varphi (x)\).
Proof
Indeed, let \(\varphi \) be as in the statement. From the definition it follows that \(y\in \partial ^c\varphi (x)\) if and only if \(c(x,y)<\infty \) and
which can be reformulated as
Plugging in the definition of \(\varphi \) we get
We see that \(y=y_0\) always satisfies the equality, so that \(y_0\in \partial ^c \varphi (x)\). Clearly for \(y_1\ne y_0\), such an inequality means precisely that the infimum is attained at x. \(\square \)
An important and motivating first example is the one coming from the clasical cost function \(c(x,y) = -\langle x,y \rangle \).
Example A.3
For the cost function \(c(x,y)=-\langle x,y \rangle \), whose transport plans and maps coincide with those associated to the quadratic cost, the c-subgradient coincides, up to a minus sign, with the well known subgradient. More formally, a function \(\varphi \) is in the c-class if and only if \(-\varphi \in \textrm{Cvx}({\mathbb {R}}^n)\), namely is convex and lower semi-continuous. Denoting \(\psi = -\varphi \) and using the definition of the c-transform we see that \((x,y)\in \partial ^c \varphi \) if and only if for all \(z\in X\) we have
Plugging in the cost we indeed get that \(y\in \partial ^c \varphi (x)\) if for all z it holds that \( \psi (x) + \langle z-x,y \rangle \le \psi (z)\), namely \(y\in \partial \psi (x)\).
The second motivating example, which is our main point of interest, is that of the polar cost \(p:{\mathbb {R}}^n \times {\mathbb {R}}^n \rightarrow (-\infty , \infty ]\), which we once again recall
It was shown in [7] that for the polar cost the p-class consists of all functions of the form \(-\ln (\varphi )\), where \(\varphi \) is a geometric convex function, that is, a lower semi-continuous non-negative convex function with \(\varphi (0)=0\). We denote this class by \(\textrm{Cvx}_0({\mathbb {R}}^n)\). The associated cost transform is linked with the \({{\mathcal {A}}}\)-transform defined in [3] and given by
More precisely, one may easily verify that \(- \ln ({{\mathcal {A}}}\varphi )=(-\ln (\varphi ))^p\). Further, the p-subgradient of the function \(-\ln (\varphi )\) can be rewritten as the polar subgradient \(\partial ^\circ \), introduced in [4], of the function \(\varphi \in \textrm{Cvx}_0({\mathbb {R}}^n)\). Indeed, we have that
This convenient form is a reason for us to sometimes consider a “multiplicative” setting, where the basic functions are of the form
The next lemma, which is a version of [4, Lemma 3.3], describes the connection between the polar subgradient and the classical subgradient. We will also use the following notation for the zero set \(Z_\varphi = \{ x: \varphi (x) = 0\}\) and \(\textrm{dom}( \varphi ) = \{x: \varphi (x) <\infty \}\) for the domain where \(\varphi \) is finite.
Lemma A.4
Let \(\varphi \in \textrm{Cvx}_0 ({\mathbb {R}}^n)\) and let \(x\in { \mathrm dom}(\varphi ){\setminus } Z_\varphi \). Then there is a bijection between the set \(\{ z\in \partial \varphi (x): \, \langle x,z\rangle \ne \varphi (x)\}\) and the set \(\partial ^\circ \varphi (x)\) given by \(z \mapsto y=\frac{z}{ \langle x,z \rangle - \varphi (x)}\). (Note that these sets might be empty.)
Proof
We first show that the map** has the desired range. Let \(z\in \partial \varphi (x)\) with \(\langle x,z\rangle \ne \varphi (x)\), which means that for every w we have \(\langle w,z \rangle -\varphi (w)\le \langle x,z \rangle -\varphi (x)\). In particular, \(\langle x,z\rangle -\varphi (x)> 0\). Hence, letting \(y=\frac{z}{ \langle x,z \rangle - \varphi (x)}\) we have that \(\langle x,y\rangle >1\).
To show that \(y \in \partial ^\circ \varphi (x)\), it remains to show that \( \varphi (x){{\mathcal {A}}}\varphi (y) = \langle x,y \rangle -1\). According to the definition of \({{\mathcal {A}}}\), this holds if for every w with \(\langle w,y \rangle >1\) and \(\varphi (w)>0\), we have
Plugging in the formula for y and rearranging gives
Using that \(\langle x,z \rangle -\varphi (x)> 0\), the above inequality is equivalent to our initial assumption \(\langle w,z \rangle -\varphi (w)\le \langle x,z \rangle -\varphi (x)\).
Having established that the map** is indeed into \(\partial ^{\circ }\varphi (x)\), we proceed to show that it is surjective. Given \(y \in \partial ^\circ \varphi (x)\) it follows from the definition that \(\langle x,y \rangle >1\). Consider
which is well defined. Taking scalar product with x we see that \(\langle z,x \rangle = \frac{\langle y,x \rangle \varphi (x)}{\langle y,x \rangle -1}>0\), which can be rearranged to \(\langle y,x \rangle (\langle z,x \rangle -\varphi (x)) = \langle z,x \rangle >0\) so, in particular, \(\langle z,x \rangle \ne \varphi (x)\). Solving for \(\langle x,y \rangle \) and plugging back into the Eq. (18) we get that \(y=\frac{z}{ \langle x,z \rangle - \varphi (x)}\). To prove surjectivity, it remain to show that \(z\in \partial \varphi (x)\). To this end, as before, we use that if \(y\in \partial ^\circ \varphi (x)\) then for any w with \(\langle w,y \rangle >1\) and \(\varphi (w) >0\) we have \( \frac{\langle w,y \rangle -1}{\varphi (w)}\le \frac{\langle x,y \rangle -1}{\varphi (x)}\). Plugging in the formula for y and rearranging, we get that
holds for any w such that \(\langle w,z \rangle >\langle x,z \rangle -\varphi (x)\) and \(\varphi (w)>0\). In the case when w is such that \(\langle w,z \rangle \le \langle x,z \rangle -\varphi (x)\), this actually means that \( \varphi (x) + \langle w-x,z \rangle \le 0\) and since the geometric convex functions are non-negative the desired inequality trivially follows. Finally, we consider the case when \(w\in Z_\varphi \), i.e. when \(\varphi (w)=0\). Then, plugging in z as defined in (18), we have that the inequality defining the subgradient of \(\varphi \) at x becomes simply
That is, we need to show that \(\partial ^\circ \varphi (x)\) is contained in the polar set of \(Z_\varphi \). Indeed, \(y\in \partial ^\circ \varphi (x)\) implies, in particular, that \(y \in \textrm{dom}\,({{\mathcal {A}}}\varphi )\) (since the value of \({{\mathcal {A}}}\varphi (y) = \frac{\langle x,y \rangle -1}{\varphi (x)} <\infty \)) and it follows from the definition of \({{\mathcal {A}}}\) that \(\textrm{dom}\,({{\mathcal {A}}}\varphi )\subset Z_\varphi ^\circ \), which completes the proof of surjectivity.
To show injectivity, assume that for \(z,w\in \partial \varphi (x)\) we have that \( y=\frac{z}{ \langle x,z \rangle - \varphi (x)} = \frac{w}{ \langle x,w \rangle - \varphi (x)}\). This means that these vectors are parallel. Taking scalar product of each of them with x and simplifying we get that \(\langle x,z \rangle = \langle x,w \rangle \), and plugging this back into the equality we get \(z = w\), as claimed. \(\square \)
We end this appendix with one explicit example of a function and its p-subgradient. More examples and applications can be found in [7, 24] and in the forthcoming [2].
Example A.5
Let \(\varphi (x) = \Vert x\Vert ^2/2\), in which case \({{{\mathcal {A}}}}\varphi (y)= \Vert y\Vert ^2/2\) and the supremum in the definition of \({{\mathcal {A}}}\varphi \) is satisfied for \(x = 2y/\Vert y\Vert ^2\). Hence, \(\partial ^\circ \varphi (x) = \frac{2x}{\Vert x\Vert ^2}\). Note that the map** \(x\mapsto \partial ^\circ \varphi (x)\) in this case is a (rescaled) spherical inversion.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Artstein-Avidan, S., Sadovsky, S. & Wyczesany, K. Optimal measure transportation with respect to non-traditional costs. Calc. Var. 62, 35 (2023). https://doi.org/10.1007/s00526-022-02362-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00526-022-02362-w