Abstract
We introduce a stochastic model of diffeomorphisms, whose action on a variety of data types descends to stochastic evolution of shapes, images and landmarks. The stochasticity is introduced in the vector field which transports the data in the large deformation diffeomorphic metric map** framework for shape analysis and image registration. The stochasticity thereby models errors or uncertainties of the flow in following the prescribed deformation velocity. The approach is illustrated in the example of finite-dimensional landmark manifolds, whose stochastic evolution is studied both via the Fokker–Planck equation and by numerical simulations. We derive two approaches for inferring parameters of the stochastic model from landmark configurations observed at discrete time points. The first of the two approaches matches moments of the Fokker–Planck equation to sample moments of the data, while the second approach employs an expectation-maximization based algorithm using a Monte Carlo bridge sampling scheme to optimise the data likelihood. We derive and numerically test the ability of the two approaches to infer the spatial correlation length of the underlying noise.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In this work, we aim at modelling variability of shapes using a theory of stochastic perturbations consistent with the action of the diffeomorphism group underlying the large deformation diffeomorphic metric map** framework (LDDMM, see [65]). In applications, such variability arises and can be observed, for example, when human organs are influenced by disease processes, as analysed in computational anatomy [66]. Spatially independent white noise contains insufficient information to describe these large-scale variabilities of shapes. In addition, the coupling of the spatial correlations of the noise must be adapted to a variety of transformation properties of the shape spaces. The theory developed here addresses this problem by introducing spatially correlated transport noise which respects the geometric structure of the data. This method provides a new way of characterizing stochastic variability of shapes using spatially correlated noise in the context of the standard LDDMM framework.
We will show that this specific type of noise can be used for all the data structures to which the LDDMM framework applies. The LDDMM theory was initiated by [6, 12, 19, 46, 60] based on the pattern theory of [23]. LDDMM models the dynamics of shapes by the action of diffeomorphisms (smooth invertible transformations) on shape spaces. It gives a unified approach to shape modelling and shape analysis that is valid for a range of structures such as landmarks, curves, surfaces, images, densities or even tensor-valued images. For any such data structure, the optimal shape deformations are described via the Euler–Poincaré equation of the diffeomorphism group, usually referred to as the EPDiff equation [26, 27, 66]. In this work, we will show how to obtain a stochastic EPDiff equation valid for any data structure, and in particular for the finite-dimensional spaces of landmarks. For this, we will follow the LDDMM derivation in [8] based on geometric mechanics [24, 43]. This view is based on the existence of momentum maps, which are characterized by the transformation properties of the data structures for images and shapes. These momentum maps persist in the process of introducing noise into the EPDiff equation, and they thereby preserve most of the technology developed for shape analysis in the deterministic context and in computational anatomy.
This work is not the first to consider stochastic evolutions in LDDMM. Indeed, [61, 64] and more recently [44] have already investigated the possibility of stochastic perturbations of landmark dynamics. In these works, the noise is introduced into the momentum equation, as though it was an external random force acting on each landmark independently. In [44], an extra dissipative force was added to balance the energy input from the noise and to make the dynamics correspond to a certain type of heat bath used in statistical physics. Refs. [55, 56] considered evolutions on the landmark manifold with stochastic parts being Brownian motion with respect to a Riemannian metric and estimated parameters of the models from observed data. Here, we will introduce Eulerian noise directly into the reconstruction relation used to find the deformation flows from the velocity fields, which are solutions of the EPDiff equation [26, 65]. As we will see, this derivation of stochastic models is compatible with variational principles, preserves the momentum map structure and yields a stochastic EPDiff equation with a novel type of multiplicative noise, depending on the gradient of the solution, as well as its magnitude. This model is based on the previous works [2, 25], where, respectively, stochastic perturbations of infinite- and finite-dimensional mechanical systems were considered. The Eulerian nature of the noise discussed here implies that the noise correlation depends on the image position and not, as for example in [44, 61], on the landmarks themselves. Consequently, the present method for the introduction of noise is compatible with any data structure, for any choice of its spatial correlation. We also mention the conference paper [3] in which the basic theory underlying the present work was applied to shape transformations of the corpus callosum. We discuss possibilities for including Lagrangian noise advected with the flow in contrast to the present Eulerian case, and possibilities for including nonstationary correlation statistics that responds to the evolution of advected quantities, in the conclusion of the paper.
To illustrate this framework and give an immediate demonstration of stochastic landmark dynamics, we display in Fig. 1 three experiments which compare the proposed model with a stochastic forcing model, of the type studied in [61]. The proposed model introduces the following stochastic Hamiltonian system for the positions of the landmarks, \({\mathbf {q}}_i\), and their canonically conjugate momenta, \(\mathbf p_i\),
In (1.1), the \(\sigma _l\) are prescribed functions of space which represent the spatial correlations of the noise. In Fig. 1, the \(\sigma _l\) fields are Gaussians whose variance is equal to twice their separation distance and locations are indicated by black dots. We compare this model with the system,
where \(\sigma \) is a constant. In this case, the noise corresponds to a stochastic force acting on the landmarks, whose corresponding Brownian motion is different for each landmark. We show on the first panel of Fig. 1 that for a small number of landmarks and a large range of spatial correlations of the noise, both types of stochastic deformations in (1.1) and (1.2) visually coincide. This is shown for a simple experiment in translating a circle (from the black circle to the black dashed circle). By doubling the number of landmarks (middle panel), the dynamics of (1.2) results in small-scale noise correlation (magenta), whereas the proposed model (blue) remains equivalent to the first experiment. This figure illustrates shape evolution when the noise is Eulerian and independent of the data structure. Indeed, the limit of a large number of landmarks corresponds to a certain continuum limit, in this case corresponding to curve dynamics. Finally, in the right-most panel, we reduce the range of the spatial correlation of the noise by adding more noise fields. This arrangement allows us to qualitatively reproduce the dynamics of the equation (1.2) with the same number of landmarks as the amount of noise and its spatial correlation is similar in both cases. Indeed, the spatial correlations are dictated by the Eulerian functions \(\sigma _l\) defined in fixed space for our model, and by the density of landmarks in the stochastically forced landmark model.
Modelling large-scale shape variability with noise is of interest for applications in computational anatomy, in which sources of variability include natural ageing, the influence of diseases such as Alzheimer’s disease, and intra-subject population scale variations. In the LDDMM context, these effects are sometimes modelled using the random orbit model [45]. The random orbit approach models variability in the observed data by using an ensemble of initial velocities in matching a template to a set of observations via geodesic flows, see [62]. The randomness is confined to the initial velocity as opposed to the evolving stochastic processes used in the present work. A prior can be defined by assuming a distribution of the initial velocities, and Bayesian approaches can then be used for inference of the template shape as well as additional unknown parameters [1, 41, 67]. The stochastic model developed here can also be applied to model random warps and to generate distributions used in Bayesian shape modelling, and for coupling warps and functional variations such as those in [40, 51]. Indeed, because the proposed probabilistic approach assigns a likelihood to random deformations, the model can be used for general likelihood-based inference tasks.
In the present model, the observed shape variability indicates the required spatial correlation of the noise, which must be specified or inferred for each application. As this correlation is generally unknown, estimating the parameters of the correlation structure becomes an important part of the framework. We will address the problem of inferring the noise parameters by considering two different methods in the context of the representation of shapes by landmarks: The first method is based on estimating the time evolution of the probability distribution of each landmark. We will derive a set of differential equations approximating the time evolution of the complete distribution via its first moments. We can then solve the inverse problem of estimating the noise correlation from known initial and final distribution of landmarks by minimization of a certain cost function, solved using a genetic algorithm. The second method is based on an expectation-maximization (EM) algorithm which can infer unknown parameters for a parametric statistical model from observed data. In this context, since only initial and final landmarks positions are observed, the full stochastic trajectories are regarded as missing information. For this algorithm, we need to estimate the likelihood of stochastic paths connecting sets of observed landmarks. We achieve this by adapting the theory of diffusion bridges to the stochastic landmark equation. As discussed in the concluding remarks, inference methods for other data structures, in particular for infinite-dimensional shape representations, are not treated in this paper and left as outstanding problems for future work.
Finally, we wish to mention that multiple additional approaches for shapes analysis exist outside the LDDMM context, particularly exemplified by the Kendall shape spaces [37], see also [18]. We focus this paper on the LDDMM framework leaving possibilities for extending the presented methods to include stochastic dynamics and noise inference in other shape analysis approaches to future work.
1.1 Plan of this Work
We begin by develo** a general theory of stochastic perturbations for inexact matching in Sect. 2. We then focus on exact landmark matching in Sect. 3, which is the simplest example of this theory. In particular, we derive the Fokker–Planck equation in Sect. 3.2 and diffusion bridge simulation in Sect. 3.3. In Sect. 4, we describe the two methods we use for estimating parameters of the noise from observations. The Fokker–Planck based method is discussed in Sect. 4.2 and the expectation-maximization algorithm is treated in Sect. 4.3. We end the paper with numerical examples in Sect. 5, in which we investigate the effect of the noise on landmark dynamics and compare the two methods for estimating the noise amplitude.
2 Stochastic Large Deformation Matching
In this section, we will first review the geometrical framework of LDDMM, following [8], and then introduce noise following [25] to preserve the geometrical structure of LDDMM. The key ingredient for both topics is the momentum map, which we will use as the main tool for reducing the infinite-dimensional equation on the diffeomorphism group to equations on shape spaces.
2.1 The Deterministic LDDMM Model
Here, we will briefly review the theory of reduction by symmetry, as applied to the LDDMM context, following the presentation of [8]. We detail the proof of the formulas below in the next section when we include noise. Define an energy functional E by
where \(I_0,I_1\in V\) are shapes represented in a vector space V on which the diffeomorphism group \(\mathrm {Diff}({\mathbb {R}}^d)\) acts, \(u_t\) is a time-dependent vector field, and \(\lambda \) is a weight, or tolerance, which allows the matching to be inexact. The flow \(g_t\in \mathrm {Diff}({\mathbb {R}}^d)\) corresponding to \(u_t\) is found by solving the reconstruction relation
and \(I_0\) is matched against \(I_1\) through the action \(g_1.I_0\) of \(g_1\) on \(s_0\). The vector field \(u_t\) can be considered an element of the Lie algebra \({\mathfrak {X}}({\mathbb {R}}^d)\). In the case of \(I_0,I_1\) being images \(I: {\mathbb {R}}^d\rightarrow {\mathbb {R}}\), the action is by push-forward, \(g.I=I\circ g^{-1}\), and when I represents N landmarks with positions \({\mathbf {q}}_i\in {\mathbb {R}}^d\), the action is by evaluation \(g.{\mathbf {q}} = (g({\mathbf {q}}_1),\ldots , g(\mathbf q_N))\) (see [8] for more details). The group elements can act on various additional shape structures such as tensor fields.
Remark 2.1
(Nonlinear shape structures) This framework can be extended to structures that are not represented by a vector space V, such as curves or surfaces. We leave this extension for future work.
Using the calculus of variations for the functional (2.1) results in the equation of motion for \(u_t\) of the form
which is called the Euler–Poincaré equation. The operation \(\mathrm {ad}^*\) is the coadjoint action of the Lie algebra of vector fields associated with the diffeomorphism group. The operation \(\mathrm {ad}^*\) acts on the variations \({\delta l}/{\delta u}\), which are 1-form densities, in the dual of the Lie algebra of vector fields, under the \(L^2\) pairing. When l(u) is a norm, this equation is the geodesic equation for that norm, in the case that \(\lambda =\infty \); that is, with exact matching. We will focus on this case later in Sect. 3 when discussing landmark dynamics. Here, the inexact matching term constrains the form of the momentum \(m= \frac{\partial l}{\partial u}\) to depend on the geodesic path. Following the notation of [8], the momentum map is defined as
where \(g_{t,s}\) is the solution of (2.2) at time t with initial conditions at time s, while \(J_t^0 = g_{t,0} I_0\) and \(J_t^1= g_{t,1} I_1\). The value \(J_1^0\) corresponds to the initial shape, pushed forward to time \(t=1\), and \(J_1^1= I_1\) is the target shape.
The operations \(\diamond \) and \(\flat \) in the momentum map formula (2.4) are defined, as follows. The Lagrangian l in (2.1) may be taken as kinetic energy, which defines a scalar product and norm \(l(u) = \langle u,L u\rangle _{L^2}= \Vert u\Vert ^2_{L^2}\) on the space of vector fields \({\mathfrak {X}}(\mathbb R^d)\). The quantity \(Lu={\delta l}/{\delta u}\) may then be regarded as the momentum conjugate to the velocity u. Similarly, for the image data space V, we define the dual space \(V^*\) with the \(L^2\) pairing \(\langle f,I\rangle = \int _\Omega f(x)I(x) \mathrm{d}x\), where \(f\in V^*\) and \(\Omega \) is the image domain \(\Omega \in {\mathbb {R}}^d\). This identification defines the \(\flat \) operator as \(\flat : V\rightarrow V^*\). When an element \(g_t\) of the diffeomorphism group acts on V by push-forward, \(I_t=g_t.I_0 = (g_t)_*I_0\), the corresponding infinitesimal action of the velocity u in the Lie algebra of vector fields \(u\in {\mathfrak {X}}({\mathbb {R}}^d)\) is given by \(u.I:= [g_t^*\frac{d}{dt}(g_t)_*I_0]_{t=0}\). In terms of this infinitesimal action, we can then define the operation \(\diamond :V\times V^*\rightarrow {\mathfrak {g}}^*\) as
A detailed derivation of this formula for the momentum map can be found in [8].
Remark 2.2
(Solving this equation) We will just add here the important remark that the relation (2.4) introduces nonlocality into the problem, as the momentum implicitly depends on the value of the group at later times. This is exactly what is needed in order to solve the boundary value problem coming from the matching of images \(I_1\) and \(I_0\). The optimal vector field can be found with a shooting method or a gradient descent algorithm on the energy functional (2.1), see [6]. For more information about the relation of the momentum map approach of [8] to the LDDMM approach of [6], see [9].
2.2 Stochastic Reduction Theory
The aim here is to introduce noise in the Euler–Poincaré equation (2.3) while preserving the momentum map (2.4); so that the noise descends to the shape spaces. Following [25], we introduce noise in the reconstruction relation (2.2) and proceed with the theory of reduction by symmetry. We will focus on a noise described by a set of J real-valued independent Wiener processes \(W^i_t\) together with J associated vector fields \(\sigma _i\in {\mathfrak {X}}({\mathbb {R}}^d)\) on the data domain. We will later discuss particular forms of these fields and methods for estimating unknown parameters of the fields in the context of landmark matching.
Remark 2.3
(Dimension of the noise) We proceed here with a finite number of J associated vector fields and finite-dimensional noise while leaving possible extension to infinite-dimensional noise such as done by [64] for later works.
We replace the reconstruction relation (2.2) by the following stochastic process
where \(\circ \) denotes Stratonovich integration. That is, the Lie group trajectory \(g_t\) is now a stochastic process. With this noise construction, the previous derivations of (2.3) and (2.4) in [8] still apply and we obtain the following result for the stochastic vector field, \(u_t\).
Proposition 2.4
Under stochastic perturbations of the form (2.6), the momentum map (2.4) persists, and the Euler–Poincaré equation takes the form
Proof
We first show that the momentum map formula (2.4) persists in the presence of noise. The key step in its computation is to prove the formula in lemma 2.5 of [8] which is given by \(\partial _t (g^{-1} \delta g ) = \mathrm {Ad}_g\delta u\), where \(\mathrm {Ad}\) is the adjoint action on the diffeomorphism group on its Lie algebra. We first compute the variations of (2.6)
and then prove this formula by a direct computation
This key formula is the same as in [6] and [8] for the deterministic case. In particular, it does not explicitly depend on the Wiener processes \(W_t^l\). This ensures that the momentum map formula (2.4) remains the same as in the deterministic case. The last step of the proof is to derive the stochastic Euler–Poincaré equation (2.7). This is done by computing the stochastic evolution of the momentum, given by
The only time dependence is in the coadjoint action, and, by the standard formula
we obtain the result
where we have used the stochastic reconstruction relation (2.6) in the form
\(\square \)
In summary, this stochastic perturbation of the LDDMM framework preserves the form of momentum map (2.4), although it does affect the reconstruction relation (2.6) and the Euler–Poincaré equation (2.7). As shown in [8], various data structures fit into this framework including landmarks, images, shapes, and tensor fields. In practice, for inexact matching, a gradient descent algorithm can be used to minimize the energy functional (2.1). The noise will only appear in the evaluation of the matching cost via the reconstruction relation. The algorithm of [6] then directly applies, provided the stochastic reconstruction relation can be integrated with enough accuracy. We will not treat the full inexact matching problem here. Instead, we will study the simpler case of exact matching, where the energy functional consists only of the kinetic term.
The exact matching problem in computational anatomy possesses many parallels with the geometric approach to classical mechanics and ideal fluid dynamics. In particular, Poincaré’s fundamental paper in 1901, which started the field of geometric mechanics in finite dimensions, has recently been generalized to the stochastic case [14]. In addition, the fundamental analytical properties of Euler’s fluid equations have been shown to extend to the stochastic case in [13].
We expect that these advances in the analysis of SPDEs occurring in fluid dynamics and other parallel fields will inform computational anatomy, and eventually will apply to infinite-dimensional representations of shape. One reason for our optimism is that the fundamental analytical properties of incompressible Euler fluid dynamics in three dimensions have already been found in [13] to persist under the introduction of the present type of stochasticity. Namely, the properties of local-in-time existence and uniqueness, as well as the Beal-Kato-Majda criterion for blow-up for the deterministic 3D Euler fluid motion equations, all persist in detail for stochastic Euler fluid motion, under the introduction of the type of stochastic Lie transport by cylindrical Stratonovich noise that we have proposed here for stochastic shape analysis.
The persistence of deterministic analytical properties in passing to the SPDEs governing stochastic 3D incompressible continuum fluid dynamics is a type of infinite-dimensional result that has not yet been proven for the evolution of shapes. The corresponding results in the analysis of SPDEs for embeddings, immersions and curves representing data structures for shape evolution, for example, have not yet been discovered, and they remain now as outstanding open problems. However, we believe that the prospects for successfully performing the necessary analysis are hopeful because the type of noise we propose here preserves the fundamental properties of diffeomorphic flow for both continuum fluids and shapes. For example, the momentum maps for the deterministic and stochastic evolution of shapes of any data structure are identical. Thus, the only difference in the present approach from the deterministic case is that the diffeomorphic time evolution of the various shape momentum maps proceeds by the action of Lie derivative by a stochastic vector field, instead of a deterministic one. Since the stochastic part of the vector field is as smooth as we wish, we are hopeful that the analytical properties for the deterministic evolution of a large class of infinite-dimensional representations of shape (such as smooth embeddings) will also persist under the introduction of the type of stochastic transport proposed here. For the remainder of the paper, we restrict ourselves to the treatment of stochastic landmark dynamics.
3 Exact Stochastic Landmark Matching
In this section, we apply the previous ideas of stochastic deformation of LDDMM to exact matching with landmark dynamics. This is the simplest data structure in the LDDMM framework, and it will serve to give interesting insights into the effect of the noise in this context. Since exact matching means that the energy functional contains only a kinetic energy, the optimal vector field is found from a boundary value problem with the Euler–Poincaré equation (2.3). For exact matching, the singular momentum map for landmarks takes the simple familiar form for the reduction of the EPDiff equation (see [11, 26])
for N landmarks with momenta \({\mathbf {p}}_i\) and positions \(\mathbf q_i\), with \(i=1,2,\dots ,N\). A direct substitution of \(u= K*m\) into the stochastic Euler–Poincaré equation (2.7) gives the stochastic landmark equations in (3.6). Here, K is a given kernel corresponding to the Green’s function of the differential operator L used to construct the Lagrangian. Below, we take a different approach and proceed from a variational principle in which the stochastic landmark dynamics is constrained. We refer the interested reader to, e.g., [34] for a detailed exposition of this derivation in the deterministic context.
3.1 Stochastic Landmarks Dynamics
Recall that for N landmarks in \({\mathbb {R}}^d\), the diffeomorphism group elements g act on the landmarks by evaluation of their position \(g.{\mathbf {q}}= (g(q_1),\ldots , g(q_N))\), and the associated momentum map is (3.1). The original action functional (2.1) can be equivalently written as a constrained variational principle where the \({\mathbf {p}}_i\) play the role of Lagrange multipliers enforcing the stochastic reconstruction relation (2.6). This procedure is based on the Clebsch action principle, which for landmark dynamics has been studied for one-dimensional motion of landmarks on the real line in [32]
Notice that only the Lagrangian depends on the spatial (Eulerian) variable \({\mathbf {x}}\) on the image domain. We now use the singular momentum map (3.1) which provides us with the relation
This relation reduces the action functional (3.2) to the finite-dimensional space of landmarks. We arrive at the action integral
where the Hamiltonian only depends on the landmark variables, as
The action integral in (3.3) involves the phase space Lagrangian (3.4) and the stochastic potential, given by
Taking free variations of (3.3) gives the stochastic Hamilton equations in the form
Explicitly, we have
In coordinates, the stochastic equations (3.6) become
where \(\alpha , \beta \) run through the domain directions, \(\alpha ,\beta =1,\ldots ,d\).
In order to have a unique strong solution of this stochastic differential equation, we need the drift and volatility to be Lipschitz functions with a linear growth condition after converting to Itô form, and for the volatility to be uniformly bounded, see [36]. This requirement is achieved when the functions \(\sigma _l\) are twice continuously differentiable and uniformly bounded in the position variable. The latter property will hold with these functions being \(C^2\) kernel functions. The particular form of the stochastic potential in (3.5) arises from the Legendre transformation of (3.2). The solutions of (3.8) represent the singular solutions of the stochastic EPDiff equation, corresponding to a stochastic path in the diffeomorphism group. In previous works such as [44, 61, 64], noise has been introduced additively and only in the momentum equation, corresponding to a stochastic force. Also, the noise has typically been taken to be different for each landmark, and one can interpret it having been attached to each landmark. In the present case, the noise is not additive and the Wiener processes are not related to the landmarks, but to the domain of the image. Nearby landmarks will thus be affected by a similar noise, controlled by the spatial correlations of the noise. We refer to Fig. 1 in the Introduction for a numerical experiment demonstrating this effect.
Remark 3.1
(Geometric noise) The geometric origin of the Hamiltonian stochastic equations in (3.6) deserves a bit more explanation. In the position equation (3.6), the noise arises as the infinitesimal transformation by the action of the stochastic vector field in (2.6), namely \(\mathrm{d}g g^{-1}= u \mathrm{d}t + \sum _l \sigma _l \circ \mathrm{d}W_l^t\), on the manifold of positions of the landmarks, which is generated by the J stochastic potentials, \(\Phi _l(\mathbf q_i,{\mathbf {p}}_i):= {\mathbf {p}}_i \cdot \sigma _l({\mathbf {q}}_i) )\). Since this stochastic Hamiltonian is linear in the canonical momenta, the noise perturbing the evolution of the landmark positions is independent of the landmark momenta. On the other hand, the noise in the momentum equations arises as the cotangent lift of the action of the stochastic vector field \(dg g^{-1}\) on the positions of the landmarks. This cotangent lift determines the action on the momentum fibres attached to the perturbed position of each of the landmarks in phase space. The cotangent lift transformation is given explicitly by the product of the momentum and the gradient of the spatial fields \(\sigma _l\) with respect to the position \({\mathbf {q}}_i\) of the i-th landmark. This difference increases the effect of the noise in regions where the \(\sigma _l\) fields have large spatial gradients, provided the landmarks are moving rapidly enough for their momenta to be nonnegligible. We will see in the example that in certain cases this balance in the product of the momentum and the spatial gradient of the noise parameters can significantly affect the dynamics of the landmarks.
3.2 The Fokker–Planck Equation
In this section, we study the evolution of the probability density function of the stochastic landmarks by using the Fokker–Planck equation. This study is possible in the case of landmarks because the associated phase space is finite-dimensional.
We will denote the probability density by \({\mathbb {P}}(\mathbf q,{\mathbf {p}},t)\), on the phase space \({\mathbb {R}}^{2dN}\) at time t. The Fokker–Planck equation can be computed using standard procedures and is given in the following proposition.
Proposition 3.2
The Fokker–Planck equation associated with the stochastic process (3.6) for the probability distribution \({\mathbb {P}}:\mathbb R^{2dN}\times {\mathbb {R}}\rightarrow {\mathbb {R}}\) is given by
where \(\{F,G\}_\mathrm {can} = \nabla F^T {\mathbb {J}} \nabla G\) is the canonical Poisson bracket with \({\mathbb {J}}=\left( \begin{matrix} 0 &{}1\\ -1 &{}0 \end{matrix}\right) \) and \(\phi _l({\mathbf {q}},{\mathbf {p}})= \sum _i {\mathbf {p}}_i\cdot \sigma _l({\mathbf {q}}_i)\) are the stochastic potentials. This formula also defines the forward Kolmogorov operator, \({\mathscr {L}}^*\).
Proof
The proof follows the standard derivation of the Fokker–Planck equation, by taking into account the geometrical structure of the stochastic process (3.6). The time evolution of an arbitrary function \(f:{\mathbb {R}}^{2dN}\rightarrow {\mathbb {R}}\) can be written as
as both drift and volatility have the same Hamiltonian form in the Stratonovich formulation. We then compute the Itô correction of this stochastic process, which is can be written as a double Poisson bracket form; namely, \(\frac{1}{2} \sum _l \{\{ f,\phi _l\}_\mathrm {can},\phi _l\}_\mathrm {can}\mathrm{d}t\). The Itô correction is the quadratic variation of the Stratonovich term in the stochastic differential equation, which equals the nonstochastic part of one half of the time derivative of the volatility (where a square Brownian motion becomes \(\mathrm{d}t\)). We refer to [2, 14] for a more detailed derivation of this formula in a general setting. Taking the expectation of the Itô process then removes the noise term and defines the forward Kolmogorov operator such that \(\dot{f} = \mathscr {L} f\). By pairing this formula with the density function \(\mathbb P({\mathbf {q}},{\mathbf {p}},t)\) over the phase space \(({\mathbf {q}},\mathbf p)\) by using the usual \(L^2\) pairing, as
we obtain the Fokker–Planck equation \(\dot{ {\mathbb {P}}}= {\mathscr {L}}^* {\mathbb {P}}\), which is explicitly given by (3.9) as the double bracket term is self-adjoint and the advection term anti-self-adjoint. Notice that here we have used a special property of the Poisson bracket; namely, that the Poisson bracket is also a symplectic 2-form, which is exact and whose integral over the whole phase space vanishes, provided we choose suitable boundary conditions. We again refer to [2, 14] for more details about this derivation. \(\square \)
Of course, the direct study of this equation is not possible, even numerically, because of its high dimensionality. The main use here of the Fokker–Planck equation will be to understand the time evolution of uncertainties around each landmark. Indeed, for each landmark \({\mathbf {q}}_i\), the corresponding marginal distribution (integrating \({\mathbb {P}}\) over all the other variables) will represent the time evolution of the error on the mean trajectory of this landmark. We will show in the next section how to approximate the Fokker–Planck equation with a finite set of ordinary differential equations which describe the dynamics of the first moments of the distribution \({\mathbb {P}}\). This will then be used to estimate parameters of the noise fields \(\sigma _l\) for given sets of initial and final landmarks.
Remark 3.3
(On ergodicity) The question of ergodicity of the process (3.6) is not relevant here, as we will only consider this process for finite times, usually between \(t=0\) and \(t=1\). The existence of stationary measures of the Fokker–Planck equation via Hörmander’s theorem is thus not needed. Nevertheless, we will rely on a notion of reachability in the landmark position in the next section, where we will show how to sample diffusion bridges for landmarks with fixed initial and final positions. This ensures that there exists a noise realization which can bring any set of landmarks to any other set of landmarks. This property is weaker than the Hörmander condition and was introduced in [58].
3.3 Diffusion Bridges
The transition probability and solution to the Fokker–Planck equation \({\mathbb {P}}({\mathbf {q}},{\mathbf {p}},t)\) can also be estimated by Monte Carlo sampling of diffusion bridges. This approach will, in particular, be natural for maximum likelihood estimation of parameters of landmark processes using the expectation-maximization (EM) algorithm that will involve expectation over unobserved landmark trajectories, or for direct optimization of the data likelihood. The EM estimation approach will be used in Sect. 4.3. Here, we develop a theory of conditioned bridge processes for landmark dynamics which we will employ in the estimation. The approach is based on the method of [15] with two main modifications. The scheme and its modifications will be detailed after a short description of the general situation. Alternative methods for simulating conditioned diffusion bridges can be found in, e.g. [7, 50, 52].
In [15], a Girsanov formula [22], generalized to account for unbounded drifts, is used to show that when the diffusion field \(\Sigma ({\mathbf {x}},t)\) of an \({\mathbb {R}}^d\)-valued diffusion process
is uniformly invertible, the corresponding process conditioned on hitting a point \({\mathbf {v}}\in {\mathbb {R}}^d\) at time \(T>0\) is absolutely continuous with respect to an explicitly constructed unconditioned process \(\hat{{\mathbf {x}}}\) that will hit \({\mathbf {v}}\) at time T a.s.. The modified process \(\hat{{\mathbf {x}}}\) is constructed by adding an additional drift term that forces the process towards the target \({\mathbf {v}}\). In [15], this process is constructed as a modification of (3.10)
Letting \(P_{{\mathbf {x}}|{\mathbf {v}}}\) denote the law of \({\mathbf {x}}\) conditioned on hitting \({\mathbf {v}}\) with corresponding expectation \({\mathbb {E}}_{{\mathbf {x}}|{\mathbf {v}}}\), the Cameron–Martin–Girsanov theorem implies that \(P_{{\mathbf {x}}|{\mathbf {v}}}\) is absolutely continuous with respect to \(P_{\hat{{\mathbf {x}}}}\), see for example [49] and the discussion in [50]. An explicit expression for the Radon–Nikodym derivative \(\mathrm{d}P_{{\mathbf {x}}|\mathbf v}/\mathrm{d}P_{\hat{{\mathbf {x}}}}\) can be computed, and this derivative is central for using simulations of the process \(\hat{{\mathbf {x}}}\) to compute expectations over the conditioned process \({\mathbf {x}}|\mathbf v\). In particular, as shown in [15], the conditioned process \({\mathbf {x}}|{\mathbf {v}}\) and the modified process \(\hat{{\mathbf {x}}}\) are related by
where \(\varphi (\hat{{\mathbf {x}}})\) is a correction factor applied to each stochastic bridge \(\hat{{\mathbf {x}}}\). Notice here that f is a real-valued function of the stochastic path from \(t=0\) to \(t=T\).
Returning to landmark evolutions in the phase space \(\mathbb R^{2dN}\), the process (3.6) has two vector variables \(({\mathbf {q}},{\mathbf {p}})\) that typically will be conditioned on hitting a fixed set of landmark positions \({\mathbf {v}}\) at time T. The conditioning thus happens only in the \({\mathbf {q}}\) variables by requiring \({\mathbf {q}}_T={\mathbf {v}}\). To construct bridges with an approach similar to the scheme of [15], we need to find an appropriate extra drift term and handle the fact that the diffusion field may not be invertible in general. Recall first that the landmark process (3.6) has diffusion field
where \(\sigma _j({\mathbf {q}})\) denotes the vector \((\sigma _j(q_1),\ldots ,\sigma _j(q_N))^T\). Notice that this matrix is not square and has dimension \(2dN\times J\) so that \(\Sigma (\mathbf q,{\mathbf {p}}) \circ \mathrm{d}W_t\) with \(\mathrm{d}W_t\) a J-vector corresponds to the stochastic term of (3.6). Though \(\Sigma ({\mathbf {q}},\mathbf p)\) couples the \({\mathbf {q}}\) and \({\mathbf {p}}\) equation, when the number of noise fields J is sufficiently large, the \({\mathbf {q}}\) part \(\Sigma _{{\mathbf {q}}}({\mathbf {q}})\) will often be surjective as a linear map \({\mathbb {R}}^J\rightarrow {\mathbb {R}}^{dN}\). In this situation, by letting \(\Sigma _{{\mathbf {q}}}({\mathbf {q}})^\dagger \) denote the Moore–Penrose pseudo-inverse of \(\Sigma _{{\mathbf {q}}}({\mathbf {q}})\), we can construct a guiding drift term as
This term, when added to the process (3.6) and when measures are taken to control the unbounded drift of (3.6), will ensure that the modified process hits \(\mathbf q_T\) a.s. at time T. The drift term (3.14) is a direct generalization of the term added in (3.11). If \(\Sigma \) had been invertible then \(\Sigma \Sigma ^\dagger =\mathrm{Id}\) resulting in the guiding term of [15] used in equation (3.11). In the current noninvertible case, \(\Sigma \Sigma _{\mathbf {q}}^\dagger ({\mathbf {q}}-{\mathbf {v}})\) uses the difference \({\mathbf {q}}-{\mathbf {v}}\) which only involves the landmark position but affects both the position and the momentum equations. We stress here the fact that the introduction of noise in the \({\mathbf {q}}\) equation in (3.6) is essential in our present approach. When conditioning on the \({\mathbf {q}}\) variable, a guided process could not directly be constructed in this way, if the noise had been introduced only in the \({\mathbf {p}}\) equation, as in [44, 61, 64]. The fact that this term is weighted by \(\Sigma \Sigma ^\dagger \) is also important as it allows the guiding term to be more efficient in the noisy regions of the image, where there is more freedom to deviate from the deterministic path. The guiding term can be interpreted as originating from a time-rescaled gradient flow, and with the guiding term added, the diffusion process can be seen as a stochastically perturbed gradient flow, see [3].
The guiding term (3.14) is, in practice, not always appropriate for landmarks. Because the correction is dependent only on the difference to the target in the position equation, a phenomenon of over-shooting is often observed. In such cases, the landmarks travel too fast initially due to a large momentum, strengthened by the guiding term that forces the landmarks towards \({\mathbf {v}}\). The high initial speed is only corrected when the time approaches T and the guiding term brings the landmark back to their final position. This effect is illustrated in Fig. 4 in Sect. 5.2 and results in low values of the correction factor \(\varphi (\mathbf q,{\mathbf {p}})\) used to compute the expectation in (3.12). This effect tends to produce inefficient samples when approximating (3.12) by Monte Carlo sampling. As an alternative, upon letting \(b({\mathbf {q}},{\mathbf {p}})\) denote the drift term of (3.6), we employ a guided diffusion process of the form
for some appropriately chosen function \(\phi _{t,T}:\mathbb R^{2dN}\rightarrow {\mathbb {R}}^{dN}\) that gives an estimate of the value of \(\hat{{\mathbf {q}}}_T\) using the value of the modified stochastic process \((\hat{{\mathbf {q}}}_t,\hat{{\mathbf {p}}}_t)\) at time t. The hat denotes the solution of the process (3.15), which is different from the original dynamics of the process (3.6) written without the hats. The choice \(\phi _{t,T}(\hat{{\mathbf {q}}},\hat{{\mathbf {p}}}):=\hat{{\mathbf {q}}}\) recovers the guiding term (3.14). It would be natural to define \(\phi _{t,T}(\hat{ {\mathbf {q}}},\hat{\mathbf p}):={\mathbb {E}}_{( {\mathbf {q}},{\mathbf {p}})}({\mathbf {q}}_T| ({\mathbf {q}}_t, {\mathbf {p}}_t)=(\hat{ {\mathbf {q}}},\hat{ {\mathbf {p}}}))\). The resulting guiding term will only be driven by the expected amount needed at the endpoint, not from the value at time t. A similar choice but easier to handle is to let \(\phi _{t,T}(\hat{{\mathbf {q}}},\hat{\mathbf p})\) be the solution at time T of the original deterministic landmark dynamics (2.3), obtained from the initial conditions \((\hat{{\mathbf {q}}}_t,\hat{{\mathbf {p}}}_t)=(\hat{{\mathbf {q}}},\hat{\mathbf p})\). We will use this latter choice with a modification to ensure its time derivative is bounded. The approach is visualized in Figure 4. To ensure convergence of \(\hat{{\mathbf {q}}}_t\rightarrow {\mathbf {v}}\) for \(t\rightarrow T\), a bounded approximation \({\tilde{b}}\) will be chosen to replace the original unbounded drift b in (3.15). As it turns out, this choice has little influence in practice.
The matrix \(\Sigma (\hat{{\mathbf {q}}},\hat{{\mathbf {p}}})\Sigma _{\mathbf q}(\hat{{\mathbf {q}}})^\dagger \) in (3.15) only accounts for the \({\mathbf {q}}\) dynamics in the pseudo-inverse \(\Sigma _{\mathbf q}(\hat{{\mathbf {q}}})^\dagger \). When the momentum is high and the noise fields \(\sigma _j\) have high gradients, this fact can again lead to improbable sample paths. In such cases, the scheme can be further generalized by using a guiding term of the form
The matrix \( D_h\big ( \phi _{t,T}(\Sigma (\hat{{\mathbf {q}}},\hat{\mathbf p}){\mathbf {h}}) \big )|_{{\mathbf {h}}={\mathbf {0}}} \) is a linear approximation of the expected endpoint dynamics as a function of the noise vector \({\mathbf {h}}\in {\mathbb {R}}^J\). Again, with \(\phi _{t,T}(\hat{{\mathbf {q}}},\hat{{\mathbf {p}}}):=\hat{{\mathbf {q}}}\), the original guiding term (3.14) is recovered, and the term is close to the guiding term of (3.15) when the momentum or gradients of \(\sigma _j\) are low. We use this term for the experiments in Sect. 5.2 involving high momentum dynamics, e.g. Fig. 6.
The following result is an extension of [15, Theorem 5] and [42, Theorem 3] to the modified guided SDE (3.15). It is the basis for the EM approach for estimating the parameters of the landmark processes developed in Sect. 4.3. Please note that the Girsanov theorem [15, Thm. 1] which relates the modified and original process, does not assume that \(\Sigma \) is invertible. The main analytic consequence of the noninvertibility is that the process is semi-elliptic and the transition density, therefore, cannot be bounded by Aronson’s estimation [4]. Instead, we here assume continuity and boundedness of the density of \({\mathbf {q}}\) in small intervals of (0, T] in the sense of the assumption below. We write \({\mathbb {P}}({\mathbf {q}}_0, {\mathbf {p}}_0; {\mathbf {q}}, {\mathbf {p}}, t)\) for the transition density at time t of a solution \(({\mathbf {q}},{\mathbf {p}})\) to (3.6) started at \(({\mathbf {q}}_0, {\mathbf {p}}_0)\). Similarly, when conditioning only on \({\mathbf {q}}\), we write \({\mathbb {P}}({\mathbf {q}}_0, {\mathbf {p}}_0;\mathbf q,t) = \int _{{\mathbb {R}}^{dN}} {\mathbb {P}}({\mathbf {q}}_0, {\mathbf {p}}_0; {\mathbf {q}},{\mathbf {p}},t)\mathrm{d}{\mathbf {p}}\).
Assumption 1
For any \(({\mathbf {q}}_0,{\mathbf {p}}_0)\) and \(({\mathbf {q}},{\mathbf {p}})\), the process \(({\mathbf {q}}_t,{\mathbf {p}}_t)\) has a density \({\mathbb {P}}({\mathbf {q}}_0, {\mathbf {p}}_0; {\mathbf {q}},\mathbf p,t)\) and the map \(({\mathbf {q}},t)\mapsto \int _{\mathbb R^{dN}}g_0({\mathbf {q}}_0,{\mathbf {p}}_0){\mathbb {P}}({\mathbf {q}}_0,\mathbf p_0; {\mathbf {q}},t)\mathrm{d}({\mathbf {q}}_0,{\mathbf {p}}_0)\) is continuous in t and \({\mathbf {q}}\) and bounded on sets \(\{({\mathbf {q}},t)|s-\epsilon \le t\le s\}\) for \(s\in (0,T]\), sufficiently small \(\epsilon >0\), and any integrable function \(g_0\).
The interpretation of Assumption 1 is that, given any distribution of initial conditions \(({\mathbf {q}}_0,\mathbf p_0)\) with density \(g_0\), the resulting \({\mathbf {q}}\)-transition density of the process is continuous and bounded in \({\mathbf {q}}\) and t. As shown in Lemma A.2, Assumption 1 can be slightly weakened if Theorem 3.4 is only used to approximate the transition density at time T as opposed to expectations \(\mathbb E[f({\mathbf {q}},{\mathbf {p}})|{\mathbf {q}}_T={\mathbf {v}}]\) for arbitrary measurable functions f.
We let \(W({\mathbb {R}}^{2dN})\) denote the Wiener space of continuous paths \([0,T]\rightarrow {\mathbb {R}}^{2dN}\).
Theorem 3.4
Assume \(\Sigma _{{\mathbf {q}}}({\mathbf {q}}):{\mathbb {R}}^J\rightarrow \mathbb R^{dN}\) is surjective for all \({\mathbf {q}}\) with \(\Sigma _{\mathbf q}({\mathbf {q}})^\dagger \) bounded, and that \(\Sigma \) is \(C^{1,2}\), bounded, and with bounded derivatives. Let \({\tilde{b}}_{\mathbf {q}}\) be a bounded approximation of the \({\mathbf {q}}\)-part of the drift b, and set \({\tilde{b}}=b+\Sigma ({\mathbf {q}},{\mathbf {p}})\Sigma _\mathbf q({\mathbf {q}})^\dagger ({\tilde{b}}_{\mathbf {q}}-b_{\mathbf {q}})\). Let \(v\in {\mathbb {R}}^{dN}\) be a point with \({\mathbb {P}}({\mathbf {q}}_0,\mathbf p_0; {\mathbf {v}},t)\) positive, and let \(P_{({\mathbf {q}},\mathbf p)|{\mathbf {v}}}\) be the law of \(({\mathbf {q}},{\mathbf {p}})\,|\,\mathbf q_T={\mathbf {v}}\). Let \((\hat{{\mathbf {q}}},\hat{{\mathbf {p}}})\) be solution to (3.15), \((\hat{{\mathbf {q}}}_0,\hat{\mathbf p}_0)=({\mathbf {q}}_0,{\mathbf {p}}_0)\) with \(\varphi _{t,T}:\mathbb R^{2dN}\rightarrow {\mathbb {R}}^{dN}\) a map with \(\frac{\varphi _{t,T}({\mathbf {q}},{\mathbf {p}})-{\mathbf {q}}}{T-t}\) bounded on [0, T). Then, for positive measurable \(f:W(\mathbb R^{2dN})\rightarrow {\mathbb {R}}\),
with
where \(A({{\mathbf {q}}})=\big (\Sigma _{{\mathbf {q}}}({\mathbf q})\Sigma _{{\mathbf {q}}}({{\mathbf {q}}})^T\big )^{-1}\). In addition,
In the Theorem, \([\cdot ,\cdot ]\) denotes the quadratic variation of semimartingales. As mentioned above, a bounded approximation \({\tilde{b}}\) must be used to replace the original drift term b in (3.15). The last integrals in the expression for \(\log \varphi ({\mathbf {q}},{\mathbf {p}},t)\) are results of this approximation and the use of the map \(\varphi _{t,T}\).
The result is proved in “Appendix A”. If \(\Sigma \) had been invertible and if the guidance scheme (3.11) was used, the result of [15] would imply that the right-hand side limit of (3.17) would equal
Extending the convergence argument to the present noninvertible case is nontrivial, and we postpone investigating this to future work. For numerical computations, \(\varphi (\hat{\mathbf q},\hat{{\mathbf {p}}},t)\) can be approximated by finite differences. As described later in the paper, we do this using a framework that allows symbolic evaluation of gradients and thus subsequent optimization for parameters of the processes.
4 Estimating the Spatial Correlation of the Noise
We now assume a set of n observed landmark configurations \(\mathbf q^1,\ldots ,{\mathbf {q}}^n\) at time T, i.e. the observations are considered realizations of the stochastic process at some positive time T. From this data, we aim at inferring parameters of the model. This can be both parameters of the noise fields \(\sigma _l\) and parameters for the initial configuration \(({\mathbf {q}}(0),\mathbf p(0))\). The initial configuration can be deterministic with fixed known or unknown parameters, or it can be randomly chosen from a distribution with known or unknown parameters. We develop two different strategies for performing the inference. The first inference method in Sect. 4.2 is a shooting method based on solving the evolution of the first moments of the probability distribution of the landmark positions while the second method in Sect. 4.3 is based on the expectation-maximization (EM) algorithm. The discussion is here in the context of landmarks, although these ideas may also apply in the more general context of Sect. 2.
4.1 The Noise Fields
We start by discussing the form of the unknown J noise fields \(\sigma _l\). To estimate them from a finite amount of observed data, we are forced to require the fields to be specified by a finite number of parameters. A possible choice for a family of noise fields is to select J linearly independent elements \(\sigma _1,\ldots ,\sigma _J\) from a dense subset of \(C^1(\mathbb R^d,{\mathbb {R}}^d)\). We here use a kernel k with length scale \(r_l\) and a noise amplitude \(\lambda _l\in {\mathbb {R}}^d\), that is
where \(\delta _l\) denotes the kernel positions. Possible choices for the kernel include Gaussians \(k_{r_l}(x)=e^{-x^2/(2r_l^2)}\), or cubic B-splines \(k_{r_l}(x)=S_3(x/r_l)\). The Gaussian kernel has the advantage of simplifying calculations of the moment equations, whereas the B-spline representation is compactly supported and gives a partition of unity when used in a regular grid. Other interesting choices may include a cosine or a polynomial basis of the image domain.
In principle, the methods below allow all parameters of the noise fields to be estimated given sufficient amount of data. However, for simplicity, we will fix the length scale and the position of the kernels. The unknown parameters for the noise can then be specified in a single vector variable \(\theta =(\lambda _1,\ldots ,\lambda _K)\). The aim of the next sections will be to estimate this vector, possibly in addition to the initial configuration \((\mathbf q(0),{\mathbf {p}}(0))\), from data using the method of moments in Sect. 4.2 and EM in Sect. 4.3, respectively.
Remark 4.1
For the bridge simulation scheme, we required \(\Sigma _{\mathbf q}({\mathbf {q}})\) to be surjective as a linear map \(\mathbb R^J\rightarrow {\mathbb {R}}^{dN}\). This assumption can be satisfied when the number of landmarks is low relative to the number of noise fields having nonzero support in the area where the landmarks reside. On the other hand, if the number of landmarks is increased while the number of noise fields is fixed, the assumption eventually cannot be satisfied. Intuitively, in such cases, the extra drift added to the bridge SDE must guide through a nonlinear submanifold of the phase space to ensure the landmarks will hit the target point \({\mathbf {v}}\) exactly. This limitation can be handled in three ways: (1) The method of moments as described below avoids matching individual point configurations, and it can, therefore, be used in situations where the surjectivity condition is not satisfied. (2) As discussed in Remark 2.3, the noise can be made infinite dimensional. This can be done while kee** correlation structure similarly to the case with finite J. See also [3] for a discussion of noise in the form of a Gaussian process. (3) The bridge matching can be made inexact mimicking the inexact matching pursued in deterministic LDDMM. This could potentially relax the requirements on the extra drift term to only ensure convergence towards a given distance of the target. Inexact observations of stochastic processes are for example treated in [63].
4.2 Method of Moments
We describe here our first method for estimating the parameters \(\theta \) by solving a shooting problem on the space of first and second-order moments. Given an estimate of the endpoint distributions \({\mathbb {P}}({\mathbf {q}},{\mathbf {p}}, T)\), we will solve the inverse problem which consists in using the Fokker–Planck equation (3.9) to find the values of \(\theta \) such that we can reproduce the observed final distribution. Solving the Fokker–Planck equation directly is infeasible due to its high dimensionality. Instead, we will derive a set of finite-dimensional equations approximating the solution of the Fokker–Planck equation (3.9) for the probability distribution \({\mathbb {P}}\) in terms of its first moments. This approach has been developed in the field of plasma physics for the Liouville equation, which is similar to the Fokker–Planck equation (3.9).
Remark 4.2
(Geometric moment equation) As the Fokker–Planck (3.9) is written in term of the canonical bracket, we could expect to be able to apply a geometrical version of the method of moments such as the one developed by [28]. Although this method seems to fit the present geometric derivation of the stochastic equations, we will not use it as it is not in our case practically useful. Indeed, it requires the expansions of the Hamiltonian functions in term of the moments, but we cannot obtain here a valid expansion with a finite number of terms. This is due to the fact that the LDDMM kernel and the noise kernels cannot generally be globally approximated by finite polynomials with bounded approximation error for large distances. This would, in turn, produce spurious strong interactions between distant landmarks.
The method for approximating the Fokker–Planck that we will use here is the following. We first define the moments
where we have written only two possible moments, although any combinations of p and q at any order are possible. In this work, we will only consider moments up to the second order, that is the moments \({\langle q_i^\alpha \rangle ,\langle p_i^\alpha \rangle ,\langle q_i^\alpha q_j^\beta \rangle ,\langle q_i^\alpha p_j^\beta \rangle }\) and \({\langle p_i^\alpha p_j^\beta \rangle }\). Notice that the first moment are (1, 1)-tensors, and the second moments are (2, 2)-tensors, although we will only use index notation here.
We illustrate this method with the first moment \({\langle q_i^\alpha \rangle }\), which represents the mean position of the landmarks. We compute its time derivative and use the property of the Kolmogorov operator \({\mathscr {L}}\) defined in (3.9) to obtain
We thus first need to apply the Kolmogorov operator \({\mathscr {L}} \) to \(q_i^\alpha \) to obtain
which corresponds to the q part of the drift of the stochastic process with Itô correction. Similarly, for the momentum evolution, we obtain
Most of the terms on the right-hand side of (4.5) and (4.6) are nonlinear; so their expected value cannot be written in terms of only the first moments. This is the usual closure problem of moment equations, such as the BBGKY problem arising in many-body problems in quantum mechanics. The solution to this problem is to truncate the hierarchy of moments for a given order and consider the system of ODEs as an approximation of the complete Fokker–Planck solution. Here, we will apply the so-called cluster expansion method described in [38]. We refer to “Appendix B.1” for more details about this method.
Apart from the first approximation \({\langle q_i^\alpha q_j^\beta \rangle \rightarrow \langle q_i^\alpha \rangle \langle q_j^\beta \rangle }\), the next order of approximation is to keep track of the correlations
This quantity is also called a centred second moment as for \(i=j\) it corresponds to the covariance of the probability distribution for the landmark i. In general, it corresponds to the correlation between the positions of two landmarks. The dynamical equation for this correlation is found from the equation of the second moment, which gives
where \((i\leftrightarrow j) \) stands for the same term but with i and j exchanged. This equation is interesting to study in more detail, as it already gives us information about the nature of the dynamics for the spatial covariance of landmarks. Indeed, we have three types of terms with the following effects.
-
(1)
The\(\sigma _l\)-dependent terms. This first term is quadratic in the \(\sigma \)’s, not proportional to any linear or quadratic polynomial in q or p. This term is a direct contribution from the noise in the q equation and will have the effect of almost linearly increasing the centred covariance, wherever a \(\sigma _l>0\).
-
(2)
Theh-dependent terms. From the form of this term, we expect it to be proportional to a correlation. It will thus have an exponential effect on the dynamics, triggered by the linear contribution of the first term. Notice that this term only depends on the Hamiltonian, and, thus, on the interaction between landmarks. If two landmarks interact, we expect their covariance to be averaged. This term will capture their averaged covariance.
-
(3)
The\(\nabla _q\sigma _l\)-dependent terms. These terms are related to the noise in the p equation and will account for the effect on the landmark position of the interaction of the momentum of the landmark with the gradients of the noise.
Notice that the last two types of terms describe second order effects with respect to the spatial covariance of the landmarks, as they depend linearly on the correlations. In the expansion of these nonlinear terms, the other correlations involving p will appear. This means that all of the possible second-order correlations must be computed. This computation is done in “Appendix B”, where we also approximate the expected value of the kernels as \({\langle K({\mathbf {q}}) \rangle \approx K(\langle {\mathbf {q}} \rangle )}\). As we will see in the numerical examples in Sect. 5, these approximations can give a reliable estimate of the landmark covariance, but this should be rigorously justified to obtain a precise estimate of the errors. Such a study is beyond the scope of this work and is left open.
Given the equations for the moment evolution, we can estimate the parameters \(\theta \) by minimizing the cost function
where \(\gamma _1\) and \(\gamma _2\) are weights. We denote by \({\langle {\mathbf {q}} \rangle }\) and \({\Delta _2\langle \mathbf {qq} \rangle }\) the target first and second moments and by \(\langle {\mathbf {q}} \rangle (T)\) and \(\Delta _2\langle \mathbf {qq} \rangle (T)\) the estimated moments which implicitly depend on the parameters of the noise and the initial momentum. The choice of the norm is free here, and we chose a norm which only considers \(i=j\) and normalizes each term to 1 so that all the covariance of the landmarks contribute equally to the cost. Other choices could be made, depending on applications. Also, the cost function may depend on other parameters, but this would make its minimization more difficult.
To minimize the cost (4.8), we can use gradient-based methods such as the BFGS algorithm. Such methods require the evaluation of the Jacobian of C with respect to all of its arguments. Usually, for the estimation of the initial momentum, a linear adjoint equation is used. However, the derivative with respect to the parameters of the noise cannot be computed in this way. We will evaluate the gradients symbolically by using the Theano library in Python [59]. To improve the efficiency of the algorithm, we first match the mean final position, by only updating the initial momentum. Then, with this initial condition, we match for both first and second moments and update the initial momentum as well as the parameters \(\lambda _l\). As we will see in the numerical experiments in section 5, gradient-based methods are not optimal, and genetic algorithms, such as the differential evolution algorithm of [57] designed for global minimizations, turn out to perform better.
4.3 Maximum Likelihood and Expectation-Maximization
We now describe how to estimate the unknown parameters collected in the variable \(\theta \) by a maximum likelihood estimation based on the expectation-maximization (EM) algorithm of [16]. The likelihood of n independent observations \(({\mathbf {q}}^1,\ldots ,{\mathbf {q}}^{n})\) at time T given parameters \(\theta \) takes the form
The parameters \(\theta \) can be estimated by maximizing the likelihood, that is by letting
For this, the likelihood could be directly computed by numerical approximation of \({\mathbb {P}}_\theta ({\mathbf {q}}_i,T)\) using an approximation of the Fokker–Planck equation (3.9). Alternatively, the fact that the stochastic process is only sampled at time T suggests a missing data approach that marginalizes out the unobserved trajectories up to time T. Let \(({\mathbf {q}},\mathbf p; \theta )\) denote the stochastic landmark process with parameters \(\theta \), and let \(P({\mathbf {q}},{\mathbf {p}}; \theta )\) denote its law. Let \({\mathcal {L}}({\mathbf {q}},{\mathbf {p}}; \theta )\) denote the likelihood of the entire stochastic path for a given realization of the noise, and computed with respect to the parameter \(\theta \). Notice that this likelihood is only defined for finite time discretizations of the process and there is no notion of path density for the infinite-dimensional process. We thus proceed formally, while noting that the approach can be justified rigorously, see e.g. [17]. An alternative approach is to optimize the likelihood (4.9) directly using (3.18). This is pursued in [55].
The EM algorithm finds a sequence of parameter estimates \(\{\theta _k\}\) converging to a \({\hat{\theta }}\) by iterating over the following two steps:
-
(1)
Expectation: Compute the expected value of the log-likelihood given the previous parameter estimate \(\theta _{k-1}\):
$$\begin{aligned} \begin{aligned} Q(\theta | \theta _{k-1})&:= {\mathbb {E}}_{\theta _{k-1}} [ \log {\mathcal {L}}({\mathbf {q}},{\mathbf {p}};\theta ) \,|\, {\mathbf {q}}^1,\ldots ,{\mathbf {q}}^{n} ] \\&= \sum _{i=1}^{n} {\mathbb {E}}_{\theta _{k-1}} [ \log {\mathcal {L}} ({\mathbf {q}},{\mathbf {p}}; \theta |{\mathbf {q}}^i) ] \, . \end{aligned} \end{aligned}$$(4.10)The expectation (4.10) over the process conditioned on the observations \({\mathbf {q}}_i\) integrates the likelihood over all sample paths reaching \({\mathbf {q}}_i\). For this, we employ the bridge simulation approach developed in Sect. 3.3. For each \({\mathbf {q}}^i\), we thus exchange \(({\mathbf {q}}_t,{\mathbf {p}}_t; \theta )\) with a guided process \((\hat{{\mathbf {q}}},\hat{{\mathbf {p}}}; \theta , {\mathbf {q}}^i)\) and use the equality (3.17) from Theorem 3.4. The expectation on the right-hand side of (3.17) can be approximated by drawing samples from the guided process. Note that the correction factor \(\varphi (\mathbf q,{\mathbf {p}}|\theta _{k-1}, {\mathbf {q}}_i)\) makes the approach equal to importance sampling of the conditioned process with the guided process as proposal distribution.
-
(2)
Maximization: Find the new parameter estimate
$$\begin{aligned} \theta _k= \mathrm {argmax}_\theta \, Q(\theta |\theta _{k-1})\, . \end{aligned}$$(4.11)The maximization step can be approximated by updating \(\theta _k\) such that it increases \(Q(\theta | \theta _{k-1})\) instead of maximizing it. This is the approach of the generalized EM algorithm [48]. The update of \(\theta \) is thus computed by taking a gradient step
$$\begin{aligned} \theta _k=\theta _{k-1}+\epsilon \nabla _\theta Q(\theta | \theta _{k-1}), \end{aligned}$$(4.12)where \(\epsilon >0\). The gradient which is evaluated for each of the sampled paths can be computed symbolically using the Theano library [59]. Theano allows the entire computational chain from the definition of the Hamiltonian and noise fields to the time-discrete stochastic integration to be specified symbolically. The framework can therefore automatically derive gradients symbolically before the expressions are compiled to efficient numerical code. See also [39] for more details on the use of Theano for differential geometric and stochastic computations.
The resulting estimation algorithm is listed in Algorithm 1. For each \({\mathbf {q}}^i\), the expectation \({\mathbb {E}}_{\theta _{k-1}}[\log {\mathbb {P}}_\theta ({\mathbf {q}},\mathbf p|{\mathbf {q}}_i)]\) is estimated by sampling \(N_{\mathrm {bridges}}\) bridges. The algorithm can perform a fixed number K of updates to the estimate \(\theta _k\) or stop at convergence.
5 Numerical Examples
We now present several numerical tests of the stochastic perturbation of the landmark dynamics. In particular, we want to illustrate aspects of the effect of the noise on the landmarks and test the algorithms for estimation of the spatial correlation of the noise. We will focus here on synthetic examples and refer to [3] for an application of the methods on a dataset of Corpora Callosa shapes represented by 77 landmarks. The numerical simulations of this work have been done in Python, using the symbolic computation framework Theano [6]. Studying the effects of the stochastic model on other nonlinear data structures such as curves or surfaces would also be of great interest for future works.
As a second topic, we raised the issue of determining the noise correlation from data sets which would allow the theory of stochastic deformations to be used with observed data. We developed two independent methods which we implemented and applied to several test examples. First, the moment equation allows matching of the sample moments. It is deterministic, making optimization of the noise parameters stable and efficient, and it does not require special conditions on the noise fields. Its accuracy depends on the approximation order in the moment equation. Scaling the moment equation to a large number of landmarks or continuous shapes such as curves may be challenging as well as optimizing for a high number of unknown parameters. In the landmark experiments we presented above, this approach allowed us to reliably estimate the underlying noise, but an extension of this method to infinite-dimensional representations of shapes is not possible unless a discretized version of the equations is used. For this method, we also made two approximations that could possibly be improved elsewhere. One of them is the truncation to retain only second-order terms in the moment equation, and the other is to approximate the expectation of a kernel function as the kernel of the expected values. Both approximations were shown to work well for cases with small enough noise, which would be the case in most applications. Finally, it is also important to notice that we did not use the freedom of the initial value of the variance of the momentum and the position/momentum correlation. These parameters could either be inferred using this scheme (with a larger parameter space) or be obtained by using other information about the data.
The second method is the MLE optimization, a Monte Carlo method which evaluates expectations over conditioned stochastic trajectories. The bridge sampling scheme we used requires the noise fields to span the entire \({\mathbf {q}}\)-space to allow guiding the landmarks towards their target. With high nonlinearity as may happen with large initial momentum and high gradients of the noise fields, guiding the trajectories towards their target with high-probability bridges can be challenging. In general, the stochastic nature of the algorithm makes it harder to control than the matching provided by the moment equation. The bridge sampling scheme can be interpreted as a gradient flow, as discussed in [3] when applied to images. It allows the likelihood of observed images to be evaluated without a prior image registration step. The method may thus be applicable to image analysis problems, and more generally for inexact matching of shapes in which case the requirement of the noise to span the \({\mathbf {q}}\)-space may be relaxed.
The inference of noise parameters treated here can be extended to more general statistical inference problems on shape spaces. Inferring the initial \({\mathbf {q}}_0\) positions can be regarded as estimating a most-likely mean, thereby drawing similarities to the Frechét mean [20] and to means defined by the maximum likelihood of probability distributions in nonlinear spaces [54]. When generalized to images, the approach can be used for simultaneous estimation of template images [35], possible time-dependent transformations in the momentum as caused for example by disease processes [47], and population variation in the spatial noise correlation.
It is possible to generalize the stochastic equations we have introduced here to allow for time-dependent noise amplitude as done in [21] for fluid dynamics. In this case, the noise fields could be advected by the diffeomorphism and only the initial condition of the noise field would have to be inferred. This requires the choice of a meaningful advection scheme. By construction of its metric LDDMM is right-invariant, and the flow energy is therefore measured in Eulerian coordinates. This leads us to define stochastic flows that are compatible with this right-invariant geometry thus giving noise in Eulerian coordinates. In the deterministic setting, left-invariant metrics [53] provide a Lagrangian view of the metric that thus, in a medical context, follows the advected anatomy. We leave it as an open and very relevant problem to consider advected, or left-invariant Lagrangian noise.
Extending the inference methods presented here to other data structures, in particular to infinite-dimensional shapes spaces, would again constitute an interesting future direction. As discussed in detail at the end of Sect. 2, we believe that the methods presented here with suitable modifications can be applied also for infinite-dimensional representations of shapes, and that additional methods could be introduced, such as stochastic filtering for further data assimilation of the results in infinite-dimensional cases, see e.g. [5].
References
Stephanie Allassonnière, Yali Amit, and Alain Trouvé, Towards a coherent statistical framework for dense deformable template estimation, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 69 (2007), no. 1, 3–29.
Alexis Arnaudon, Alex L Castro, and Darryl D Holm, Noise and dissipation on coadjoint orbits. Journal of Nonlinear Science 28 (2018), no. 1, 91–145.
Alexis Arnaudon, Darryl D Holm, Akshay Pai, and Stefan Sommer, A stochastic large deformation model for computational anatomy, Information Processing for Medical Imaging (IPMI), 2017.
D. G. Aronson, Bounds for the fundamental solution of a parabolic equation, Bulletin of the American Mathematical Society 73 (1967), no. 6, 890–896. MR0217444
Alan Bain and Dan Crisan, Fundamentals of stochastic filtering, Vol 3, Springer.
M Faisal Beg, Michael I Miller, Alain Trouvé, and Laurent Younes, Computing large deformation metric map**s via geodesic flows of diffeomorphisms, International journal of computer vision 61 (2005), no. 2, 139–157.
Mogens Bladt, Samuel Finch, and Michael Sørensen, Simulation of multivariate diffusion bridges, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78 (2016), no. 2, 343–369.
Martins Bruveris, François Gay-Balmaz, Darryl D Holm, and Tudor S Ratiu, The momentum map representation of images, Journal of Nonlinear Science 21 (2011), no. 1, 115–150.
Martins Bruveris and Darryl D Holm, Geometry of image registration: The diffeomorphism group and momentum maps, Geometry, mechanics, and dynamics, 2015, pp. 19–56.
Zdzisław Brzeźniak, Franco Flandoli, and Mario Maurelli, Existence and uniqueness for stochastic 2D Euler flows with bounded vorticity, Archive for Rational Mechanics and Analysis 221 (2016), no. 1, 107–142.
Roberto Camassa and Darryl D Holm, An integrable shallow water equation with peaked solitons, Physical Review Letters 71 (1993), 1661–1664.
Gary E. Christensen, Richard Rabbitt, and Michael I. Miller, Deformable templates using large deformation kinematics. Image Processing, IEEE Transactions on 5 (1996), no. 10.
Dan Crisan, Franco Flandoli, and Darryl D Holm, Solution properties of a 3D stochastic Euler fluid equation, ar**v:1704.06989 (2017).
Ana Bela Cruzeiro, Darryl D Holm, and Tudor S Ratiu, Momentum maps and stochastic clebsch action principles, Communications in Mathematical Physics (2017), 1–40.
Bernard Delyon and Ying Hu, Simulation of conditioned diffusion and application to parameter estimation, Stochastic Processes and their Applications 116 (2006), no. 11, 1660 – 1675.
A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the royal statistical society, series B 39 (1977), no. 1, 1–38.
Sophie Donnet and Adeline Samson, Parametric inference for mixed models defined by stochastic differential equations, ESAIM: Probability and Statistics 12 (2008), 196–218.
Ian L Dryden and Kanti V Mardia, Statistical shape analysis: With Applications in r. John Wiley & Sons 2016.
Paul Dupuis, Ulf Grenander, and Michael I. Miller, Variational Problems on Flows of Diffeomorphisms for Image Matching, Quarterly of applied mathematics (1998).
Maurice Fréchet, Les élèments aléatoires de nature quelconque dans un espace distancié, Ann. Inst. H. Poincaré 10 (1948), 215–310.
François Gay-Balmaz and Darryl D Holm, Stochastic geometric models with non-stationary spatial correlations in lagrangian fluid flows, Journal of Nonlinear Science (2018).
I. Girsanov, On Transforming a Certain Class of Stochastic Processes by Absolutely Continuous Substitution of Measures, Theory of Probability & Its Applications 5 (1960), no. 3, 285–301.
Ulf Grenander, General Pattern Theory: A Mathematical Study of Regular Structures, Oxford University Press, USA, 1994.
Darryl D. Holm, Geometric Mechanics - Part I: Dynamics and Symmetry, 2 edition, Imperial College Press, London : Hackensack, NJ, 2011.
Darryl D Holm, Variational principles for stochastic fluid dynamics, Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 471 (2015), no. 2176, 20140963.
Darryl D. Holm and Jerrold E Marsden, Momentum maps and measure-valued solutions (peakons, filaments, and sheets) for the epdiff equation, The breadth of symplectic and Poisson geometry, 2005, pp. 203–235.
Darryl D Holm, Jerrold E Marsden, and Tudor S Ratiu, The Euler–Poincaré equations and semidirect products with applications to continuum theories, Advances in Mathematics 137 (1998), no. 1, 1 – 81.
Darryl D. Holm, Vakhtang Putkaradze, and Cesare Tronci, Geometric dissipation in kinetic equations, Comptes Rendus Mathematique 345 (2007), no. 5, 297–302.
Darryl D Holm, Vakhtang Putkaradze, and Cesare Tronci, Double-bracket dissipation in kinetic theory for particles with anisotropic interactions, Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences (2010).
Darryl D Holm, and Cesare Tronci, Multiscale turbulence models based on convected fluid microstructure, Journal of Mathematical Physics 53 (2012), no. 11, 115614.
Darryl D Holm and Tomasz M Tyranowski, Stochastic discrete hamiltonian variational integrators, ar**v:1609.00463 (2016).
Darryl D Holm, and Tomasz M Tyranowski, Variational principles for stochastic soliton dynamics, Proc. R. Soc. A 472 (2016), no. 2187, 20150827.
DD Holm, WP Lysenko, and JC Scovel, Moment invariants for the vlasov equation, Journal of mathematical physics 31 (1990), no. 7, 1610–1615.
Henry O Jacobs and Stefan Horst Sommer, Higher-order spatial accuracy in diffeomorphic image registration, Geometry, Imaging and Computing (2014).
Sarang Joshi, Brad Davis, Matthieu Jomier, and Guido Gerig, Unbiased diffeomorphic atlas construction for computational anatomy, NeuroImage 23 (2004), 151–160.
Ioannis Karatzas and Steven E Shreve, Brownian Motion and Stochastic Calculus, Vol. 113, Springer Science & Business Media, 1991.
David G Kendall, Shape manifolds, procrustean metrics, and complex projective spaces, Bulletin of the London Mathematical Society 16 (1984), no. 2, 81–121.
Mackillo Kira and Stephan W Koch, Semiconductor quantum optics, Cambridge University Press, 2011.
Line Kühnel, Alexis Arnaudon, and Stefan Sommer, Differential geometry and stochastic dynamics with deep learning numerics, ar** Effects in Image Registration, SIAM Journal on Imaging Sciences 10 (2017), no. 2, 578–601.
Jun Ma, Michael I. Miller, Alain Trouvé, and Laurent Younes, Bayesian template estimation in computational anatomy, NeuroImage 42 (2008), no. 1, 252–261.
Jean-Louis Marchand, Conditioning diffusions with respect to partial observations, ar**v:1105.1608 (2011).
Jerrold E Marsden and Tudor S Ratiu, Introduction to Mechanics and Symmetry, Texts in Applied Mathematics, Vol 17, Springer, New York, New York, NY, 1999.
Stephen Marsland and Tony Shardlow, Langevin equations for landmark image registration with uncertainty, SIAM Journal on Imaging Sciences 10 (2017), no. 2, 782–807.
M. Miller, A. Banerjee, G. Christensen, S. Joshi, N. Khaneja, U. Grenander, and L. Matejic, Statistical methods in computational anatomy, Statistical Methods in Medical Research 6 (1997), no. 3, 267–299.
Michael I Miller, Alain Trouvé, and Laurent Younes, On the metrics and Euler–Lagrange equations of computational anatomy, Annual Review of Biomedical Engineering 4 (2002), 375–405.
Prasanna Muralidharan and P. Thomas Fletcher, Sasaki Metrics for Analysis of Longitudinal Data on Manifolds, Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2012 (2012), 1027–1034.
Radford M Neal, and Geoffrey E Hinton, A view of the EM algorithm that justifies incremental, sparse, and other variants, Learning in graphical models, 1998, pp. 355–368.
Bernt Øksendal, Stochastic Differential Equations: An Introduction with Applications, Springer Science & Business Media, 2003.
Omiros Papaspiliopoulos and Gareth O. Roberts, Importance sampling techniques for estimation of diffusion models, Statistical Methods for Stochastic Differential Equations, 2012.
Lars Lau Raket, Stefan Sommer, and Bo Markussen, A nonlinear mixed-effects model for simultaneous smoothing and registration of functional data, Pattern Recognition Letters 38 (2014), 1–7.
Moritz Schauer, Frank van der Meulen, and Harry van Zanten, Guided proposals for simulating multi-dimensional diffusion bridges, Bernoulli 23 (2017), no. 4A, 2917–2950.
Tanya Schmah, Laurent Risser, and François-Xavier Vialard, Diffeomorphic Image Matching with Left-Invariant Metrics, Geometry, Mechanics, and Dynamics, 2015, pp. 373–392.
Stefan Sommer, Anisotropic Distributions on Manifolds: Template Estimation and Most Probable Paths, Information Processing in Medical Imaging, 2015, pp. 193–204.
Stefan Sommer, Alexis Arnaudon, Line Kuhnel, and Sarang Joshi, Bridge Simulation and Metric Estimation on Landmark Manifolds, Graphs in Biomedical Image Analysis, Computational Anatomy and Imaging Genetics, 2017, pp. 79–91.
V. Staneva and L. Younes, Learning Shape Trends: Parameter Estimation in Diffusions on Shape Manifolds, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 717–725.
Rainer Storn, and Kenneth Price, Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces, Journal of global optim. 11 (1997), no. 4, 341–359.
Héctor J Sussmann, Orbits of families of vector fields and integrability of distributions, Transactions of the American Mathematical Society 180 (1973), 171–188.
Theano Development Team, Theano: A Python framework for fast computation of mathematical expressions, ar**v e-prints (2016).
Alain Trouvé, An infinite dimensional group approach for physics based models in pattern recognition, preprint (1995).
Alain Trouvé and François-Xavier Vialard, Shape splines and stochastic shape evolutions: a second order point of view, Quarterly of Applied Mathematics 70 (2012), no. 2, 219–251.
M. Vaillant, M.I. Miller, L. Younes, and A. Trouvé, Statistics on diffeomorphisms via tangent space representations, NeuroImage 23 (2004), no. Supplement 1, S161–S169.
Frank van der Meulen and Moritz Schauer, Continuous-discrete smoothing of diffusions, ar**v:1712.03807 (2017).
François-Xavier Vialard, Extension to infinite dimensions of a stochastic second-order model associated with shape splines, Stochastic Processes and their Applications 123 (2013), no. 6, 2110–2157.
Laurent Younes, Shapes and Diffeomorphisms, Springer, 2010.
Laurent Younes, Felipe Arrate, and Michael I. Miller, Evolutions equations in computational anatomy, NeuroImage 45 (2009), no. 1, Supplement 1, S40–S50.
Miaomiao Zhang, Nikhil Singh, and P Thomas Fletcher, Bayesian estimation of regularization and atlas building in diffeomorphic image registration, International Conference on Information Processing in Medical Imaging, (2013), pp. 37–48.
Acknowledgements
We are grateful to M. Bruveris, M. Bauer, A Pai, N. Ganaba, C. Tronci, M.R. Schauer, T. Tyranowski and F. Van Der Meulen for helpful discussions of this material and to the anonymous referees for thoughtful comments which improved the exposition of this paper. AA and DH were partially supported by the European Research Council Advanced Grant 267382 FCCA held by DH. DH is also grateful for support from EPSRC Grant EP/N023781/1. AA acknowledges partial support from an Imperial College London Roth Award and the EPSRC through award EP/N014529/1 funding the EPSRC Centre for Mathematics of Precision Healthcare. SS is partially supported by the CSGB Centre for Stochastic Geometry and Advanced Bioimaging funded by a grant from the Villum Foundation. The research was partially completed while the authors were visiting the Institute for Mathematical Sciences, National University of Singapore in 2016.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Communicated by Endre Suli.
Appendices
Appendix A: Bridge Sampling
We here follow [15] and the later paper [42] to argue for the almost sure hitting of a target \({\mathbf {v}}\) for the guided process (3.15) and to find the correction term \(\varphi (\mathbf q,{\mathbf {p}},t)\). For completeness, we will explicitly derive the correction term following the program in [15, Theorem 5]. The guided SDE (3.15) differs from the previous schemes in using the function \(\phi _{t,T}:\mathbb R^{2dN}\rightarrow {\mathbb {R}}^{dN}\) to predict the endpoint, and, importantly, in that the diffusion field \(\Sigma \) is not invertible resulting in a semi-elliptic diffusion. We handle the first issue by repeated application of the Girsanov theorem. This also accounts for the unboundedness of the drift term \(b(\hat{{\mathbf {q}}},\hat{\mathbf p})\) coming from the momentum of the landmarks in the same way as [42]. We do not here argue for the \(t\rightarrow T\) limit of the expectation of the correction term that in the elliptic case follows from [15, 42].
Let \(b({\mathbf {q}},{\mathbf {p}})\) be the drift in (3.6) in Itô form. Because this drift is unbounded, we construct \({\tilde{b}}({\mathbf {q}},{\mathbf {p}})\) in Theorem 3.4 to be an approximation so that the \({\mathbf {q}}\)-part \({\tilde{b}}_{\mathbf q}\) is bounded on \({\mathbb {R}}^{dN}\). To construct a map \(\phi _{t,T}:{\mathbb {R}}^{2dN}\rightarrow {\mathbb {R}}^{dN}\) satisfying the conditions of the theorem, let \(\phi _{t,T}\) be the \({\mathbf {q}}\)-part of the time T solution to the ODE \(\partial _t({\mathbf {q}}_t,\mathbf p_t)={\tilde{b}}({\mathbf {q}}_t,{\mathbf {p}}_t)\) started at time t with initial conditions \(({\mathbf {q}},{\mathbf {p}})\). This ODE corresponds to the deterministic ODE (2.3), however using the drift approximation to ensure \(\partial _t\phi _{t,T}({\mathbf {q}},{\mathbf {p}})\) is bounded. Then the process \(\frac{{\tilde{\phi }}_{t,T}(\hat{\mathbf q},\hat{{\mathbf {p}}})-\hat{{\mathbf {q}}}}{T-t}\) is defined, bounded and continuous on [0, T].
The SDE
differs from the Itô form of the SDE (3.15) by
As argued by [42], A.1 has a unique solution satisfying \(\lim _{t\rightarrow T}\hat{{\mathbf {q}}}=v\) a.s., and the processes A.1 and (3.15) are absolutely continuous with respect to each other. The correction term \(\varphi ({\mathbf {q}},{\mathbf {p}},t)\) can be derived from [42, Theorem 3] and the difference A.2. For completeness, we give the derivation in the landmark case that proves Theorem 3.4.
Proof of Theorem 3.4
Let \(f:W({\mathbb {R}}^{2dN})\rightarrow {\mathbb {R}}\) be a nonnegative measurable function on [0, t], \(t<T\). Following [15], we define
noting that in the present case, we use the pseudo-inverse \(\Sigma _{{\mathbf {q}}}({\mathbf {q}})^\dagger \) in h since \(\Sigma _{{\mathbf {q}}}({\mathbf {q}})\) is not invertible. Let now \((\tilde{{\mathbf {q}}},\tilde{{\mathbf {p}}})\) be a solution to the SDE
From the Girsanov theorem with unbounded drift [15, Thm.1], we have
where
We now define an intermediate function
and compute
Applying the product rule, we obtain for the second term
Writing the process \(\frac{1}{2} g(\tilde{{\mathbf {q}}},t)\) in integral form,
Note that the first term \(\int _0^t\frac{(\tilde{{\mathbf {q}}}-\mathbf v)^T A(\tilde{{\mathbf {q}}})(\tilde{{\mathbf {q}}}-{\mathbf {v}})}{2(T-s)^2}\mathrm{d}s\) of the right-hand side is the negative of the term \(-\frac{1}{2}\int _0^t\Vert h(\tilde{{\mathbf {q}}},{\mathbf {p}},s)\Vert ^2\mathrm{d}s\) in (A.5). The second term expands to
where the second term of the right-hand side is the negative of \(\int _0^th^T(\tilde{{\mathbf {q}}},{\mathbf {p}},s)\mathrm{d}W\). Rearranging terms and inserting in (A.5),
We can now use the Girsanov theorem again to change the drift from (A.3) to (A.1). For this, we define
Then let \(\varphi ({\mathbf {q}},{\mathbf {p}},t)\) be the function satisfying
Now (A.4) and the definition of \(\log \varphi \) gives
with \((\hat{{\mathbf {q}}},\hat{{\mathbf {p}}})\) a solution to (3.6). Thus
where convergence of the right-hand side limit to the conditioned process follows from Lemma A.1. The limit expression (3.18) for the density follows from Lemma A.2 and (A.8). \(\square \)
Lemmas A.1 and A.2 follow [15] and [42] with minor modifications to clarify where the assumption 1 on the density of the process \(({\mathbf {q}}_t,{\mathbf {p}}_t)\) is needed in the semi-elliptic case.
Lemma A.1
Let \(({\mathbf {q}},{\mathbf {p}})\) be a solution to (3.6) satisfying Assumption 1 and the conditions of Theorem 3.4. Let \(0<t_1<t_2<\cdots<t_n<T\) be a finite set of time point in [0, T] and let \(f\in C_b({\mathbb {R}}^{n2dN})\). Let \(\psi _t\) be the process
with g as defined above. Then
Proof
Following [15], we write
with
Note that \(\Phi _f\) is continuous by assumption. We now apply a change of variable \({\mathbf {q}}={\mathbf {v}}+(T-t)^{\frac{1}{2}}{\mathbf {q}}'\) to get
From assumption 1, \(\Phi _f(t,{\mathbf {q}})\) is continuous and bounded. Because \(\Sigma \) is bounded, \(e^{-\frac{\Vert \Sigma _{{\mathbf {q}}}({\mathbf {v}}+(T-t)^{\frac{1}{2}}\mathbf q')^\dagger ({\mathbf {q}}')\Vert ^2}{2}} \le e^{-\frac{c\Vert \mathbf q'\Vert ^2}{2}}\) for some constant c and the dominated convergence theorem implies the limit. We conclude that
The result now follows from the definition of \(\Phi _f\), see [15]. \(\square \)
If only the density \({\mathbb {P}}({\mathbf {q}}_0, {\mathbf {p}}_0;\mathbf q,t)\) is of interest, the following result holds assuming only continuity and boundedness of \({\mathbb {P}}({\mathbf {q}}_0, \mathbf p_0;{\mathbf {q}},t)\) for fixed initial conditions \(({\mathbf {q}}_0,\mathbf p_0)\).
Lemma A.2
Let \(({\mathbf {q}},{\mathbf {p}})\) be a solution to (3.6) with the conditions of Theorem 3.4 and assume the process has a density \({\mathbb {P}}({\mathbf {q}}_0,{\mathbf {p}}_0;{\mathbf {q}},\mathbf p,t)\) and that \({\mathbb {P}}({\mathbf {q}}_0,{\mathbf {p}}_0; {\mathbf {q}},t)\) is continuous in t and \({\mathbf {q}}\) and bounded on \(\{(\mathbf q,t)|T-\epsilon \le t\le T\}\). Let \(\psi _t\) be the process defined above. Then
Proof
The result follows from the convergence of \(\Phi _1(t,{\mathbf {q}})\) in Lemma A.1. \(\square \)
Appendix B: Moment Equation for Stochastic Landmark
1.1 Cluster Expansion Method
We explain the basics of this method, which can be found in more details in, for example, [38] with application in the context of semiconductor physics. This method is used when one seeks the dynamics of the expected value of N particles that we will write here \(\left\langle N \right\rangle \). One cannot solve the complete system, especially if the number of particles is large, thus we want to approximate the expected value of products in term of only a few independent variables. For this, we apply the cluster expansion, which begins by writing
The next decomposition is
and so on and so forth. We then only compute the dynamics for the singlets \(\langle 1 \rangle \) and the correlations, up to some chosen order. In the sequel, we will only consider the doublet correlations \(\Delta _2\), and in this case, we have the general decomposition
In the context of quantum mechanics, where the particle operators do not commute, extra care is needed especially for the sign of the term. Here we will consider \(q_i^\alpha \) and \(p_i^\alpha \) as our particles, and as they commute, the expansions are simpler than in [38]. We directly compute two of them for illustration, up to quadratic order,
This sort of expansion can fit a more geometrical framework, where the final equations for the first moment will preserve the original structure of the equations. This was developed first in [33] and later in [28, 29]. We will not use this method here for a good reason related to the form of the equations. A key step in these papers is to expand the expected value of the Hamiltonian in terms of a finite number of moments, to enable computation of the equation of motion. In our case, the Hamiltonian has a kernel function, which generally cannot be expanded in a finite sum of polynomial terms. By doing the computations directly, we will be able to do another approximation for the kernels, that is, we will assume that they commute with the operation of expectation. A more subtle approximation can be done using the Heaviside function but will give a much larger number of terms in the expansion, see “Appendix B.6” for no clear improvements of the solution.
To perform this expansion on the Fokker–Planck equation associated with the landmark dynamics, we will use several simplifications:
-
Gaussian noise fields \(\sigma _l\) in (4.1),
-
for a kernel K(x), we will assume that \(\langle K(x) \rangle \approx K(\langle x \rangle )\) and
-
only the second-order correlations \(\Delta _2\) will be considered in this expansion.
These assumptions can be relaxed but the resulting equation may be difficult to compute.
1.2 First Moments
Recall the backward Kolmogorov operator on \(q_i^\alpha \)
which is used to compute the time evolution of the singlet
In this case, the equation only depends on the singlet of the momentum variable. We thus compute
which similarly gives the time evolution of the momentum singlet in two terms as
where
We then expand \(A_p\) further using the cluster expansion method on the triplet to get
Already this term depends on the mixed correlations which we will compute shortly, but we first compute the position correlation.
1.3 \(\langle qq \rangle \) Correlation
Recall the formula of the Kolmogorov operator applied to \(q_i^\alpha q_j^\beta \),
which together with (4.5) gives the time evolution of the position correlation in the form
where
We will denote by A the terms corresponding to the drift, by B the terms which are not present in the first moments equation, and by C the other terms which only depend on the noise and the derivative of the noise fields. We proceed by first approximating the expectation of the kernels to get
where we also used the explicit form of \(\sigma _l\) as a Gaussian and its derivative. We will now approximate the \(A_{qq}\) term to get
It is now clear that the B term will linearly increase the position correlation, which will then exponentially increase by the C term and be affected by the momentum-position correlation by the A term. We now proceed by computing the momentum correlation.
1.4 \(\left\langle pp \right\rangle \) Correlation
We compute the Kolmogorov operator on \(p_i^\alpha p_j^\beta \) to get
and using (B.5), we obtain the time evolution of the correlation in three terms as
where
We first approximate
The last two terms cancel as they are symmetric under the transpose operation because of the sum on the free indices, thus the C term is
We proceed with the \(B_{pp}\) term, which is also symmetric under the transpose operation, thus giving the approximation
We treat the two Hamiltonian terms separately by first writing them explicitly as
We expand the first term to arrive at
and the second term
This term cancels the terms of the \(A_{pp}^1\) proportional to \(\langle p_i^\alpha \rangle \) to give the approximation
We end this computation by approximating the dynamics of the mixed correlation.
1.5 \(\left\langle pq \right\rangle \) Correlation
We compute
Then, using (4.5) and (B.5) we obtain the time evolution of \(\Delta \langle p_i^\alpha q_j^\beta \rangle \) as
where
We first approximate
For the Hamiltonian term we obtain
1.6 Kernel Approximation
One of the approximations we did in the previous derivation of the moment equation is to replace the expectation of a kernel by the kernel of the expected values.
If we expand the kernel in powers of its argument, we could compute the errors up to any order. For example for a Gaussian kernel, the first few terms are
The main problem with this approximation with polynomials is that the approximation to any order corresponds to having a kernel with unbounded values for large arguments. This results in nonphysical and large interactions of particles far away, which should normally not interact. Obtaining a reliable expansion of a kernel function is thus a difficult task in the moment approximation.
Nevertheless, one could consider the following higher order approximation of the expected value of a Gaussian kernel
where the function \(\theta ({\mathbf {x}})\) is given by
and the coefficient \(f_\alpha \) is found such that this approximation is the best fit to the Gaussian. In practice, we have \(f_\alpha \approx 0.6\), but this value depends on \(\alpha \) in general. This cutoff function \(\theta \) is necessary here, otherwise, this approximation will not be bounded, leading to large errors in the dynamics. The expected value of this approximation assumes that the \(\theta \) function commutes with it, and only takes into account the approximation of the quadratic term. It turned out that for all our experiments, these correction terms did not substantially improve the result, thus we did not include them in the equations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Arnaudon, A., Holm, D.D. & Sommer, S. A Geometric Framework for Stochastic Shape Analysis. Found Comput Math 19, 653–701 (2019). https://doi.org/10.1007/s10208-018-9394-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10208-018-9394-z
Keywords
- Shape analysis
- Stochastic flows of diffeomorphisms
- Stochastic landmark dynamics
- Stochastic geometric mechanics