1 Introduction

Consider a family of weighted particles (carrying a mass, or a size) which (informally) merge according to the following rule: given some non-negative symmetric collision kernel K, each pair of particles with masses x and y collides at rate K(xy), upon which they coalesce to form a single new particle of mass \(x+y\) (later on, this is sometimes referred to as a cluster). A mean-field model is provided by Smoluchowski’s equations [34], which consist of an infinite system of ordinary differential equations characterising the joint evolution of the densities of particles of each mass as time goes. These systems have only been solved explicitly in some special cases, among which one may cite the cases when the kernel is either additive, \(K(x,y)=x+y\), or multiplicative, \(K(x,y)=xy\) ([6, 9], see [17] for more recent and general results).

Arguably, one of the objectives in the field of coalescent processes is to tend towards models of physical systems that would be more realistic “at the particles level”, even if many of the features of real systems are still ignored, starting with the positions in space and energies of the particles. For an overview of the literature on these issues, and of the relation between coalescence processes and Smoluchowski’s equations, we refer the interested reader to Aldous’ survey [6], Pitman [31], or Bertoin [9].

When the number of particles is finite, it is rather easy to define rigorously a Markov process having the dynamics discussed in the first paragraph above. One possible construction is the so-called Marcus–Lushnikov [25, 27] coalescence process. Informally, consider the masses as vertices of a complete graph, and equip the edges between vertices i and j with random exponential clocks with parameter \(K(x_i,x_j)\). When the clock between i and j rings, replace the masses \(x_i\) and \(x_j\) on the nodes i and j by \(x_i+x_j\) and 0, respectively, and update the parameters of the clocks involving i and j so that the rates remain given by the kernel. When the number of masses is infinite, the definition of coalescence processes is much more involved. These issues are discussed and solved for important classes of kernels in Evans and Pitman [16] and in Fournier and Löcherbach [18] (see also references therein, and [3, 6]).

In this paper we focus on the additive and multiplicative kernels. Together with Kingman’s coalescent (for which \(K(x,y)=1\)), the associated coalescence processes are somehow the simplest, but also some of the most important. This is mostly because of their manifestations in fundamental discrete models that we will call hereafter combinatorial coalescence processes. These “incarnations” are forest-like or graph-like structures modelling the coalescence at a finer level which (partially) keep track of the history of the coalescing events [3, 4, 810, 14, 16, 30, 31].

The main functionals of interest concern the maximal masses at any time t. In this context there are two main themes:

  • Fixed time asymptotics. Here one tries to understand the asymptotics for the maximal clusters of particle systems of increasing size at some fixed time t. At the risk of oversimplifying, let us only outline one major technique that consists in viewing the clusters as the connected components of some random graph (e.g., the classical Erdős–Rényi in the critical regime), and then trying to reveal the information about the masses of the connected components using explorations of the graph such as breadth-first or depth-first. To this aim, one associate a walk (or random real-valued function) to the exploration, where typically the walk dips below its past minimum when a connected component has been fully explored. The lengths of the excursions of the walk above its past minima describe the sizes of the connected components. One then shows that, when suitably rescaled, these walks converge to stochastic processes (e.g., a Brownian motion with parabolic drift), and tries to prove that the (rescaled) lengths of the maximal excursions of the walk over the past minimum converge to the maximal excursions of the stochastic process away from its past minimum.

  • Evolution as a function of time. In this context, one tries to understand various questions about the evolution of the process, including the explicit constructions of processes from standard objects such as Brownian motion, questions about the entrance boundary of the associated Markov process, as well as the natural spaces for these objects (\(\ell ^2\) for the multiplicative, and \(\ell ^1\) for the additive).

Our approach combines these two themes in that we try to understand the evolution in time using asymptotics for large particle systems. The following paragraphs describe at a rather high level our main ideas and contributions.

Invasion percolation and linear representations. Most importantly for us, both the additive and the multiplicative coalescents started with unit-mass particles admit a graphical representation as (a time change of) the process of level sets of some weighted graph: there exists some (random) graph and random weights such that at each time t, the clusters are the connected components of the graph consisting of all edges of weight at most t. We call such processes percolation systems. For the multiplicative coalescent, the graph is simply the complete graph weighted by independent and identically distributed (i.i.d.) random variables uniform on [0, 1]; the additive case arises when taking the graph as a uniformly random labelled tree, and the weights to be i.i.d. uniform that are also independent of the tree. The idea underlying our work relies on invasion percolation [13, 24] or equivalently on Prim’s algorithm [20, 32] to obtain an order on the vertices of such percolation systems, which we refer to as the Prim order and that is consistent with the coalescent in a sense that we make clear immediately. Given a connected graph whose edges are marked with non-negative and distinct weights, and a starting node, say \(v_1\), Prim’s algorithm grows a connected component from \(v_1\), each time adding the endpoint of the lightest edge leaving the current component (see Sect. 4). The Prim order “linearises” the coalescent in a consistent way: at all times, the clusters are intervals of the Prim order, so that, in particular the clusters that coalesce are always “adjacent” (Proposition 13). Furthermore, it is remarkable that this definition of an alternate (random) order makes the consistence in time transparent and exact at the combinatorial level. We believe that that this new point of view should lead to further advances in the study of coalescence processes.

Aside from this new unifying idea which is interesting on its own, our main contributions about the multiplicative and additive coalescent are the following:

Multiplicative coalescent. We prove that the representation of the asymptotic cluster masses in terms of the excursion lengths of a functional of a Brownian motion with parabolic drift that Aldous [3] proved valid for some fixed time (convergence of a marginal) can be extended to a convergence as a time-indexed process (convergence as a random function). This answers in particular Question 6.5.3, p. 851 of Aldous [3]. The combinatorial coalescence process of interest is the percolation process on the complete graph, which is nothing else than the classical (Erdős–Rényi) random graph process \((G(n,p), p\in [0,1])\) (see Sect. 3.1) seen around \(p=1/n + O(1/n^{-4/3})\). Furthermore, we also construct a version of the standard augmented multiplicative coalescent of Bhamidi et al. [11] using only a Brownian motion and a Poisson point process. This process has been constructed as the scaling limit of the sequence of cluster sizes and excesses of critical random graphs, see Sect. 2 for more details.

Additive coalescent. Here the central combinatorial model is the percolation process on a uniformly random labelled tree, hereafter referred to as \(\mathsf{CP}_+^{\textsc {ap}}\) which has initially been built using random forests. The construction due to Pitman [30] (see also references there for a complete and long history of the problem) leads to a continuous representation of the standard additive coalescent in terms of the time reversal of a fragmentation process (the logging process) of the Brownian continuum random tree (see Aldous and Pitman [4]). We introduce a slight modification of the parking model, that we refer to as \(\mathsf{CP}_+^{\textsc {cl}}\), constructed by Chassaing and Louchard [14] as an approximation of the additive coalescence process. Our model \(\mathsf{CP}_+\) is equivalent to \(\mathsf{CP}_+^{\textsc {cl}}\) up to a random time change. Again \(\mathsf{CP}_+\) is a one-dimensional model in which only consecutive blocks merge as time evolves. Our contribution here is to unify these results by showing that the model \(\mathsf{CP}_+\) can be used to encode \(\mathsf{CP}_+^{\textsc {ap}}\). Similarly to what is done in [14], the blocks (resp. limiting blocks) have a representation in terms of the excursion lengths of some associated random walks (resp. functional of the normalised Brownian excursion) indexed by a two-dimensional domain (space and time). In this case, the limiting process is the standard additive coalescent and its construction using a Brownian excursion was already known ([8, 14]).

Note 1

In the entire paper, we only discuss the standard additive and multiplicative coalescents, that is the processes that arise in homogeneous settings, from discrete constructions where one initially has a finite number of unit masses. We believe that our ideas may be used in inhomogeneous settings, (the birthday \(\mathbf{p}\)-trees [4, 5] for the additive case, and the inhomogeneous random graphs of Norros and Reittu [29] in the multiplicative case), but this direction is not pursued here.

2 Main results about additive and multiplicative coalescents

We present here the consequences of our work in terms of coalescence processes. Write \(\ell ^p_\downarrow \) for the set of non increasing sequences of non-negative real numbers belonging to \(\ell ^p\) equipped with the standard \(\ell ^p\) norm, \(\Vert x\Vert _p=\left(\sum _i |x_i|^p\right)^{1/p}\). As explained in Evans and Pitman [16], \(\ell ^p_\downarrow \) is a convenient space to describe coalescence processes. Consider an element \(\mathbf{x}=(x_i, i\ge 1)\) of this space as a configuration, \(x_i\) being the mass of particle i. When two particles with masses \(x_i\) and \(x_j\) merge, their masses are removed from \(\mathbf{x}\), and replaced by one mass \(x_i+x_j\) and another one with mass zero, inserted at the positions that ensure that the resulting configuration remains a non-increasing sequence of masses.

The Marcus–Lushnikov ([25, 27], see also [3]) definition of the finite-mass multiplicative (resp. additive) coalescent can be extended to sequences of masses in \(\ell ^2\) (resp. \(\ell ^1\)) (see [3, 4]). More precisely, Aldous [3, Proposition 5] (resp. Evans and Pitman [16, Theorem 2]) proved that there exists a Feller Markov process taking values in \(\ell ^2_\downarrow \) (resp. \(\ell ^1_\downarrow \)) which has the dynamics of the multiplicative (resp. additive) coalescent.

Let \(\widetilde{\varvec{\gamma }}^+(n,\,\cdot \,)\) be the additive coalescent process started at time 0 in the state \((1/n,\dots ,1/n,0,0,\dots )\in \ell ^1_\downarrow \), a configuration with n particles each having mass 1 / n. Evans and Pitman [16] (see also Aldous and Pitman [4, Proposition 2]), proved that

$$\begin{aligned} \Bigg (\widetilde{\varvec{\gamma }}^+\Bigg (n,t+\frac{1}{2} \log n\Bigg )\Bigg )_{-\infty < t <+\infty } \xrightarrow [n]{(d)}\left(\widetilde{\varvec{\gamma }}^+(t)\right)_{-\infty < t <+\infty } \end{aligned}$$

for the Skorokhod topology on \({\mathbb {D}}((-\infty ,+\infty ),\ell ^1_\downarrow )\), the space of cadlag functions from \((-\infty ,+\infty )\) taking values in \(\ell ^1_\downarrow \), where the limiting process \(\widetilde{\varvec{\gamma }}^+\) is also an additive coalescent, called the standard additive coalescent (see also Sect. 3.2.1). (We use a \(\tilde{\ }\) in the notation because this process is a time-changed version of the process \(\varvec{\gamma }^+\) we use more consistently later on.)

In the multiplicative case, Aldous [3, Proposition 4] proves that starting with a configuration with n particles of mass \(n^{-2/3}\), when the parameters of the exponential clocks between clusters are the product of their masses, then the sorted sequence of cluster sizes present at time \(n^{1/3}+t\) (for a fixed t) converges in distribution in \(\ell ^2_\downarrow \) to some sequence \(\varvec{\gamma }^\times (t)\) (described below). In Corollary 24, he shows that there exists a Markov process, called the standard multiplicative coalescent, whose distribution at time t coincides with \(\varvec{\gamma }^\times (t)\), and whose evolution is that of the multiplicative (Marcus–Lushnikov) coalescent. Nevertheless, with the construction he proposes, he is not able to prove that, as a process, \(\varvec{\gamma }^\times \) is the standard multiplicative coalescent.

The marginals of these standard coalescents both possess a representation using Brownian-like processes. Let \(\mathsf{e}\) be a normalised Brownian excursion (with unit length) and let \(\mathsf{B}\) be a standard Brownian motion. Define

$$\begin{aligned} \mathsf{y}^{(\lambda )}_{\times }(x)&= \mathsf{B}(x)+\lambda x - x^2/2,\quad x\ge 0,\,\, \lambda \in {\mathbb {R}}\\ \mathsf{y}^{(\lambda )}_{+}(x)&= \mathsf{e}(x)-\lambda x ,\quad x\in [0,1],\,\, \lambda >0 \end{aligned}$$

and consider the operator \(\Psi \) on the set of continuous functions \(f:\Lambda \rightarrow {\mathbb {R}}\) defined by

$$\begin{aligned} \Psi f(x)= f(x)-\min \{f(y)~: y\le x\}, \quad x \in \Lambda \end{aligned}$$
(1)

where \(\Lambda =[0,+\infty )\) or \(\Lambda =[0,1]\) is the domain of f. An interval \(I=[a,b]\) is said to be an excursion of f (resp. of f above its minimum) if \(f(a)=f(b)=0\) and \(b=\inf \{t>a, f(t)=0\}\) (resp. if \(f(a)=f(b)=\min \{f(t)~:t\le b\}\) and \(b=\inf \{t>a: f(t)=f(a)\}\)). An important property of \(\Psi \) is the following immediate lemma, which is illustrated in Fig. 1.

Lemma 2

If \(f(0)=0\) and \(g=\Psi f\), then \(I=[a,b]\) is an excursion of f above its minimum if and only if I is an excursion of g above 0. As a consequence, when these are well-defined, the multiset of the k largest excursion sizes of f above its minimum and of g above 0 coincide.

Fig. 1
figure 1

A simulation of the processes \(\mathsf{y}_\times ^{(\lambda ),n}\) (bottom) and \(\Psi \mathsf{y}_\times ^{(\lambda ),n}\) (top), with the green line marking the infimum process. Observe the correspondence between the excursions above 0 in the top picture with that above the minimum in the bottom one (colour figure online)

Let \(\varvec{\gamma }^+(\lambda ):=(\gamma ^+_i(\lambda ))_{i\ge 1}\) and \(\varvec{\gamma }^\times (\lambda ):=(\gamma ^+_i(\lambda ))_{i\ge 1}\) be the sequence of lengths of the excursions of \(\Psi \mathsf{y}_+^{(\lambda )}\) and of \(\Psi \mathsf{y}_\times ^{(\lambda )}\), respectively, sorted in decreasing order. Clearly, for any \(\lambda \ge 0\), \(\gamma ^+(\lambda )\in \ell _\downarrow ^1\) and, by Aldous [3, Lemma 25], for any \(\lambda \in {\mathbb {R}}\), \(\varvec{\gamma }^\times (\lambda )\in \ell _\downarrow ^2\). Then, it is known that for any integer k and real numbers \(\lambda _1<\lambda _2<\cdots < \lambda _k\) the vectors

$$\begin{aligned} (\varvec{\gamma }^+(-\ln (\lambda _1)), \dots , \varvec{\gamma }^+(-\ln (\lambda _k))) \quad \text {and} \quad (\varvec{\gamma }^\times (\lambda _1), \dots , \varvec{\gamma }^\times (\lambda _k)) \end{aligned}$$

are distributed as the marginals at times \((\lambda _1,\dots , \lambda _k)\) of the standard additive and multiplicative coalescent, respectively (for the additive case, see Bertoin [8] and Chassaing–Louchard [14]; for the multiplicative case, see Aldous [3] for the marginal convergence, and Bhamidi et al. [11] for the finite-dimensional distributions).

Bertoin [8] also proved that the process \((\varvec{\gamma }^+(-\ln (\lambda )))_{\lambda \ge 0}\) is a version of the standard additive coalescent. A similar statement has been announced by Armendariz [7] for \((\varvec{\gamma }^\times (\lambda ))_{\lambda \in {\mathbb {R}}}\), but has never been published. Both [8] and [7] argue directly in the continuum (Chassaing and Louchard [14, Theorem 4.2] proceeded from a parking scheme, see Sect. 3.2.2, and proved only convergence of marginals). The main purpose of this paper is to give a simple and unified proof of these results based on discrete versions of the coalescents. The objects involved are, as we said earlier, a parking scheme in the additive case and the random graph process \((G(n,p))_p\) in the multiplicative one. More precisely, our approach relies on encodings of these objects using discrete analogues of \(\mathsf{y}_+^{(\lambda )}\) and \(\mathsf{y}_\times ^{(\lambda )}\), denoted by \(\mathsf{y}_+^{(\lambda ),n}\) and \(\mathsf{y}_\times ^{(\lambda ),n}\) (defining precisely these encodings now would lead us astray; for precise definitions, see (5) and (7)). The associated processes

$$\begin{aligned} \varvec{\gamma }^+(n,\lambda ):=(\gamma ^+_i(n,\lambda ))_{i\ge 1} \quad \text {and} \quad \varvec{\gamma }^\times (n,\lambda ):=(\gamma ^+_i(n,\lambda ))_{i\ge 1} \end{aligned}$$

will be seen (and this is standard) to coincide with lengths of their excursions above their respective minima (up to some details, see Note 8).

Using the Prim order alluded above, the strength of these encodings will appear to be that the lengths of the excursions of \(\Psi \mathsf{y}_+^{(\lambda ),n}\) (resp. \( \Psi \mathsf{y}_\times ^{(\lambda ),n}\)) correspond, up to a time change and a normalisation, to the cluster sizes in an additive (resp. multiplicative) coalescent process, as a time-indexed process (\(\lambda \) plays the role of time). In particular, as \(\lambda \) grows, only successive excursions of \(\Psi \mathsf{y}_+^{(\lambda ),n}\) (and of \(\Psi \mathsf{y}_\times ^{(\lambda ),n}\)) merge, which translates the fact that the Prim order linearises the additive and multiplicative processes, in the sense that it makes them consistent with a linear order.

Again, the construction in the additive case is close to that of Chassaing–Louchard [14] where the same property holds. As developed in Sect. 3.2.4, the novelty here is that our combinatorial additive coalescent corresponds to the linearisation of the time reversal of a fragmentation of a uniform Cayley tree defined by Pitman (see Sect. 3.2.1). We show that in a suitable space

$$\begin{aligned} \mathsf{y}^{(\lambda ),n}_+(x)\xrightarrow [n]{(d)}\mathsf{y}^{(\lambda )}_+(x) \end{aligned}$$

as a process indexed by \((\lambda ,x)\) (see Theorem 10).

The linearisation in the multiplicative case is new and allows us to prove the convergence of \(\mathsf{y}^{(\lambda ),n}_\times (x)\) to \(\mathsf{y}^{(\lambda )}_\times (x)\) as a process indexed by \((\lambda ,x)\) (see Theorem 7).

Using the properties of \(\Psi \) and of the operator “extraction of excursion sizes”, we prove:

Theorem 3

We have

$$\begin{aligned} (\varvec{\gamma }^{+,n}(\lambda ): \lambda \ge 0) \xrightarrow [n]{(d)}(\varvec{\gamma }^{+}(\lambda ): \lambda \ge 0) \end{aligned}$$
(2)

and

$$\begin{aligned} (\varvec{\gamma }^{\times ,n}(\lambda ): \lambda \in {\mathbb {R}}) \xrightarrow [n]{(d)}(\varvec{\gamma }^{\times }(\lambda ): \lambda \in {\mathbb {R}}) \end{aligned}$$

in the sense of Skorokhod convergence on \({\mathbb {D}}({\mathbb {R}}, \ell ^1_\downarrow )\) and \({\mathbb {D}}({\mathbb {R}}, \ell _\downarrow ^2)\), respectively.

A a corollary, using a correspondence with coalescence (which in the additive case amounts to clarifying the time change) we establish that

Corollary 4

The processes \((\varvec{\gamma }^+(e^{-t}))_{t\in {\mathbb {R}}}\mathrel {\mathop {=}\limits ^{(d)}}(X^{\infty }_+(t))_{t\in {\mathbb {R}}}\) and \((\varvec{\gamma }^\times (\lambda ))_{\lambda \in {\mathbb {R}}}\) are versions of the additive and multiplicative coalescent, respectively.

Here, the statement means that \((\varvec{\gamma }^+(e^{-t}))_{t\in {\mathbb {R}}}\) is a Markov process taking values in \(\ell ^1_\downarrow \) such that for every t, \(\varvec{\gamma }^+(e^{-t})\) is distributed as follows [4]. Consider a Brownian continuum random tree \({\mathcal {T}}\) [2] with mass measure \(\mu \) and length measure l on its skeleton \({\text {Sk}}({\mathcal {T}})\). Consider a Poisson point process \({\mathcal {P}}\) of intensity measure \(l\otimes ds\) on \({\text {Sk}}({\mathcal {T}})\times [0,\infty )\). At time s, splits \({\mathcal {T}}\) at the marks u such that \((u,t)\in {\mathcal {P}}\) and \(t\le s\), and denote by \(\mathbf{F}(s):=({\mathcal {F}}_1(s), {\mathcal {F}}_2(s),\dots )\) the sequence of the \(\mu \)-masses of the connected components (subtrees) obtained, sorted in decreasing order. Then, for every \(s\in {\mathbb {R}}\), we have \(\mathbf{F}(s)\in \ell ^1_\downarrow \) and \(\Vert \mathbf{F}(s)\Vert _1=1\). With this setting, \((\varvec{\gamma }^+(s))_{s\in {\mathbb {R}}}\) and \((\mathbf{F}(s))_{s\in {\mathbb {R}}}\) have the same distribution, a result which is originally due to Bertoin [8].

In the multiplicative case, this means that \((\varvec{\gamma }^\times (\lambda ))_{\lambda \in {\mathbb {R}}}\) is a Markov coalescent process taking values in \(\ell ^2_\downarrow \) such that for every \(\lambda \in {\mathbb {R}}\), the vector \(\varvec{\gamma }^\times (\lambda )\) is distributed as the limit rescaled component sizes of the random graph \(G(n,p_\lambda (n))\) for

$$\begin{aligned} p_\lambda (n)=\frac{1}{n}+\frac{\lambda }{n^{4/3}}. \end{aligned}$$
(3)

The existence of such a process, the standard multiplicative coalescent, has been proved by Aldous [3, Corollary 24] by resorting to Kolmogorov’s extension theorem. Here, we provide an explicit construction of the process from a single Brownian motion. The fact that the coalescing rates are multiplicative is a direct consequence of weak convergence used for the construction. The proofs of Theorem 3 and of Corollary 4 are postponed until Sect. 7.

Remark

We have learned after the completion of this article that a construction had also been proposed independently by Martin and Ráth [28]; their approach allows the construction of the multiplicative coalescent from any initial state, and they take an angle that is rather different that the one here, focusing on applications to dynamic versions that allow deletions of clusters related to the forest-fire on random graphs introduced by Ráth and Tóth [33].

In the multiplicative case, we also construct a version of the standard augmented multiplicative coalescent of Bhamidi et al. [11] as a “decorated” process of \(\varvec{\gamma }^\times \). For a connected graph, let the excess be the minimum number of edges that one must remove in order to obtain a tree. Then, the augmented multiplicative coalescent is the scaling limit of the sizes and excesses of the connected components of \(G(n,p_\lambda (n))\), that is of \((\varvec{\gamma }^{\times ,n}(\lambda ),\mathbf{s}^n(\lambda ))\) where \(\mathbf{s}^n(\lambda )=(s_i^n,i\ge 1)\) and \(s_i^n\) is the excess of the ith largest connected component of \(G(n,p_\lambda (n))\). The zero-set \(\{x \ge 0: \Psi \mathsf{y}_\times ^{(\lambda )}(x)=0\}\) separates the half-line \({\mathbb {R}}^+\) into countably many open intervals \((I_i(\lambda ))_{i\ge 1}\) whose lengths are precisely the components of the vector \(\varvec{\gamma }^\times (\lambda )\). Let \(\** \) be a Poisson point process with unit rate on \({\mathbb {R}}^+\times {\mathbb {R}}^+\). Then, for each \(\lambda \in {\mathbb {R}}\) and for each \(i\ge 1\), let \(s_i(\lambda )\) denote the number of points of \(\** \) falling under the graph of \(\Psi y_\times ^{(\lambda )}\) on the interval \(I_i(\lambda )\), the interval corresponding to the ith longest excursion of \(\Psi y_\times ^{(\lambda )}\):

$$\begin{aligned} s_i(\lambda ):=\#\{(x,w)\in \** : x\in I_i(\lambda ), w\le \Psi \mathsf{y}_\times ^{(\lambda )}(x)\}. \end{aligned}$$

Then write \(\mathbf{s}(\lambda )=(s_i(\lambda ))_{i\ge 1}\). The state space of interest is now \({\mathbb {U}}_\downarrow \) defined by

$$\begin{aligned} {\mathbb {U}}_\downarrow :=\bigg \{ (\mathbf{x},\mathbf{s})\in \ell ^2_\downarrow \times {\mathbb {N}}^\infty : \sum _{i\ge 1} x_i s_i <\infty \,\, \text { and }\,\, s_i=0\,\, \text { whenever }\,\, x_i=0\bigg \} \end{aligned}$$

endowed with the metric

$$\begin{aligned} {\text {d}}_{\mathbb {U}}( (\mathbf{x},\mathbf{s}), (\mathbf{x}',\mathbf{s}') ) := \Bigg (\sum _{i\ge 1} |x_i-x_i'|^2 \Bigg )^{1/2} + \sum _{i\ge 1} |x_i s_i- x_i' s_i'|. \end{aligned}$$

Theorem 5

The following convergence

$$\begin{aligned} ((\varvec{\gamma }^{\times ,n}(\lambda ),\mathbf{s}^n(\lambda )): \lambda \in {\mathbb {R}}) \xrightarrow [n]{(d)}((\varvec{\gamma }^\times (\lambda ), \mathbf{s}(\lambda )): \lambda \in {\mathbb {R}}) \end{aligned}$$

holds in \({\mathbb {D}}({\mathbb {R}}, {\mathbb {U}}_\downarrow )\). In particular, \((\varvec{\gamma }^\times (\lambda ), \mathbf{s}(\lambda ))_{\lambda \in {\mathbb {R}}}\) is a version of the standard augmented multiplicative coalescent.

The metric structure of the connected components at fixed \(\lambda \) has been obtained in [1] using a similar representation by functions. One could wonder if the Prim order, that allows us to derive a consistent representation of the sizes of connected components as \(\lambda \) evolves, could also give much more and in particular could provide a consistent representation of the evolution of the metric. However the connection between the metric and the encoding relies (at least in [1]) on a depth-first order, and the relationship seems to be ruined when one sorts the nodes using the random Prim order. A careful look at Sect. 4 should suffice to convince the reader that the very idea of obtaining a representation that is consistent in \(\lambda \) (fixing some order of nodes once and for all, independently of \(\lambda \)) is incompatible with tracking the internal structure of connected components.

Organization of the rest of the paper. In Sect. 3, we introduce formally the combinatorial coalescence processes that we are using to represent the multiplicative and additive coalescent processes with finitely many particles: the multiplicative case (Sect. 3.1) relies on critical random graphs; the additive case requires more time for one of our main objective which is to exhibit the interplay between different representations (Sect. 3.2). Section 4 is devoted to the Prim order and its properties. We then prove the convergence results for the additive (Sect. 5) and multiplicative (Sect. 6) coalescent processes that yield to Theorems 10 and 7 about the asymptotics for rescaled walks with vertices sorted according to the Prim order. The transfer of the results to asymptotics for the lengths of the largest excursions above the past minima in spaces of sequences of masses is the topic of Sect. 7.

3 Combinatorial coalescence processes and their encodings

3.1 The multiplicative case: critical random graphs

The aim of this part is to present some elements concerning the multiplicative coalescence processes, our new approach, and the main steps to the proofs of Theorems 3 and 4.

We first define the random graph process on the vertex set \([n]:=\{1,2,\dots , n\}\), for a positive integer n. Let \(E^n=\{\{i,j\}, i\ne j, i,j \in [n]\}\) denote the set of pairs of elements of [n], the set of edges. Let \((U_e)_{e\in E^n}\) be a collection of i.i.d. uniform random variables on [0, 1]. Let G(np) be the graph on [n] consisting of the edges \(e\in E^n\) for which \(U_e\le p\). Then, \((G(n,p))_{p\in [0,1]}\) is the classical random graph process [12, 19]. It is a Markov process but not time-homogeneous (as it would have been if instead of uniform random variables we would have used exponential ones). The ordered sequence of sizes of connected components \((|C^n_i(p)|)_{i\ge 1}\) is also a Markov process, for which the initial state is \((1,1,\dots ,1)\) and the components of the vector coalesce at rate which is proportional to the product of their values. Indeed, conditionally on G(np), the next edge to be added is equally likely among the ones which are not already present, so that the probability that it joins a vertex of \(C_i^n(p)\) to one of \(C_j^n(p)\) is proportional to \(|C_i^n(p)|\times |C_j^n(p)|\). Thus, up to a time change, the connected components in G(np) behave as the multiplicative coalescent.

To obtain a limit theorem for these connected component sizes as a time-indexed process, our approach uses ideas from the proof by Aldous [3] of the convergence at a fixed time. He encodes the connected components into a discrete random real-valued process whose convergence implies the convergence of the sizes of the connected component. To get suitable limit theorem, the probability p has to be chosen inside the critical window, that is of the form \(p=p_\lambda (n)\), as defined in (3). The method of Aldous relies on a breadth-first traversal of the graph \(G(n,p_\lambda (n))\). It is easily seen that, in the context of the random graph G(np), the following “smallest-label-first” traversal has the same distribution, so that the results of Aldous [3] apply when using this modified algorithm. In the following, we call neighbourhood of a set of vertices S the collection of nodes that have an edge to a node in S, but are not themselves in S.

Algorithm 1

(Standard traversal) Traverse the vertices of a graph on [n] as follows:

  • Start at step \(k=1\) with node \(v_1=1\) and set \(S_1=\{v_1\}\).

  • At step \(k+1\in \{2,\dots , n\}\), the nodes \(v_1,\dots , v_k\) are already known, and we have \(S_k=\{v_1,\dots , v_k\}\). Let \(v_{k+1}\) be the node with smallest label among the neighbours of \(S_k\), or if the neighbourhood of \(S_k\) is empty, \(v_{k+1}\) is the node with smallest label in \([n]{\setminus } S_k\).

Denote by \(Z_k^{n,p_\lambda (n)}\) the size of the neighbourhood of \(S_k\) and set

$$\begin{aligned} Y_k^{n,p_\lambda (n)}= Z_k^{n,p_\lambda (n)} -\#\{j \le k, Z_k^{n,p_\lambda (n)}=0\}. \end{aligned}$$

Then, the sizes of the connected components of \(G(n,p_\lambda (n))\) are precisely the lengths of the intervals between the zeros of \((Z_k^{n,p_\lambda (n)}, 1\le k\le n)\) (see Section 1.3 of [3] and Lemma 14). Then define

$$\begin{aligned} y^{n,(\lambda )}(x):=\frac{Y^{n,p_\lambda (n)}(n^{2/3}x)}{n^{1/3}} \quad \text {and}\quad z^{n,(\lambda )}(x):=\frac{Z^{n,p_\lambda (n)}(n^{2/3}x)}{n^{1/3}}. \end{aligned}$$

Aldous [3] proved that, for any fixed \(\lambda \in {\mathbb {R}}\),

$$\begin{aligned} y^{n,(\lambda )} \xrightarrow [n]{(d)}\mathsf{y}^{(\lambda )}_\times \quad \text {and}\quad z^{n,(\lambda )}(x)\xrightarrow [n]{(d)}\Psi \mathsf{y}^{(\lambda )}_\times , \end{aligned}$$
(4)

where the convergence holds for the topology of uniform convergence on every compact.

We propose to modify a bit the traversal of the graph in Algorithm 1: instead of using the labels order to define the traversal, use the Prim order (see Sect. 4 for more details): that is proceed as in Algorithm 1 but replace the two instances of “the node with smallest label” by “the node with smallest Prim rank”. Observe that the Prim order on G(np) is defined using the weights \((U_e)_{e\in E^n}\) only, and thus does not depend on p, unlike the order given by the standard traversal used by Aldous. In the following, we add the subscript “\(\times \)” in the notation for the random variables defined using this modified Prim traversal, and we set

$$\begin{aligned} \mathsf{y}^{n,(\lambda )}_\times (x) :=\frac{Y^{n,p_\lambda (n)}_\times (n^{2/3}x)}{n^{1/3}} \end{aligned}$$
(5)

where \(Y_\times \) is assumed to be interpolated between integer points. Observe that in the superscript of \(\mathsf{y}^{n,(\lambda )}\), the superscript \((\lambda )\) corresponds to the parameter \(p_\lambda (n)\) defined in (3). In the following, the processes \(\lambda \mapsto \mathsf{y}^{n,(\lambda )}_\times \) and \(\lambda \mapsto \mathsf{y}_\times ^{(\lambda )}\) are denoted more simply by \(\mathsf{y}_\times ^{n}\) and \(\mathsf{y}_\times \).

Note 6

The Prim order is always defined on connected weighted graphs. Given a set of (distinct) weights, there is a unique such order, which is total. It can then be used as any other total order on the vertices, and in particular, can be used to explore graphs which are not connected. For instance, in the case of G(np), the order is computed using the weights on the complete graph (which is connected), and is then used to replace the standard order in Algorithm 1. In particular, there is no ambiguity in deciding which node to explore next when a connected component is finished: just take the next in the Prim order (Fig. 2).

Fig. 2
figure 2

A realisation of the weighted complete graph G on \([6]=\{1,2,\dots , 6\}\) where only the edges with weight at most 0.9 are drawn. On the right, only the edges e such that \(w_e\le 0.5\) are kept (this is \(G_{0.5}\)), and the \(u_i\)’s are the nodes sorted according to the Prim order. Notice that the connected components are intervals in the Prim order. In this example, \({\mathcal {Z}}^{0.5}(1)=\{u_2\}, {\mathcal {Z}}^{0.5}(2)=\{u_3\}, {\mathcal {Z}}^{0.5}(3)=\varnothing , {\mathcal {Z}}^{0.5}(4)=\{u_5\}, {\mathcal {Z}}^{0.5}(5)=\varnothing , {\mathcal {Z}}^{0.5}(6)=\varnothing \)

Theorem 7

The following convergence holds in \({\mathbb {D}}({\mathbb {R}}, {\mathbb {C}}([0,\infty ),{\mathbb {R}}))\),

$$\begin{aligned} \mathsf{y}^{n }_\times \xrightarrow [n]{(d)}\mathsf{y}_\times \end{aligned}$$
(6)

where \({\mathbb {C}}([0,\infty ),{\mathbb {R}})\) is the set of continuous functions from \([0,\infty )\) with values in \({\mathbb {R}}\).

The proof is postponed until Sect. 6 (and more details on the distribution of \((\mathsf{y}^{n,(\lambda )}_\times , \lambda \in {\mathbb {R}})\) are given in Sect. 6.1).

Observe that for a fixed \(\lambda \), the convergence (4) obtained by Aldous [3] implies that \(\mathsf{y}^{n,(\lambda )}_\times \rightarrow \mathsf{y}_\times ^{(\lambda )}\) in distribution, provided that we additionally prove that \(\mathsf{y}^{n,(\lambda )}_\times \) and \(y^{n,(\lambda )}\) have the same distribution, a fact that we prove in Lemma 15. We also provide a direct proof of the fixed-time convergence in Sect. 6.2.

Note 8

When we are talking about interpolated discrete processes and discrete coalescence, a slight modification in the definition of excursions has to be done in order to obtain an exact correspondence between the cluster sizes and excursion sizes. For the excursion away from zero, \(f(a)=f(b)=0\) and \(b=\inf \{t>a: f(t)=0\}\) has to be replaced by \(f(a)=f(b)=0\) and \(b=\inf \{t>a+\alpha _n: f(t)=0\}\), where \(\alpha _n\) is the size of a rescaled discrete step. The discrete excursions above the current minimum are defined by \(a=\min \{t: f(t)=f(a)\}\), and \(b=\min \{t: f(t)=f(b)\}\) with \(f(b)=f(a)-\beta _n\), where \(\beta _n\) is the space normalisation.

Note 9

In order to obtain exactly the (time-homogeneous) Markovian coalescent from the random graph process, one only needs to consider a new time parameter given by \(t=-\ln (1-p_\lambda (n))\). However, as \(n\rightarrow \infty \), \(-\ln (1-p_\lambda (n))\) and \(p_\lambda (n)\) behave similarly (at the second order), and the study of coalescent can be done using \(p_\lambda (n)\). We use \(p_\lambda (n)\) in order to stay closer to the random graph model, as did Aldous [3].

3.2 Additive coalescence processes

In the three next subsections, we treat the different combinatorial coalescence processes related to the additive coalescent. The main references here are [4, 8, 14, 16, 30, 31].

3.2.1 The combinatorial coalescence process \(\mathsf{CP}_+^{\textsc {ap}}\)

The following discussion relies on the results by Aldous and Pitman [4], see also Pitman [30, (ii)’, p. 170]. We define a process of random forests of unrooted labelled trees F(ns), \(s\ge 0\) as follows. At time \(s=0\), the forest F(n, 0) consists of n trees \(t_1,\dots ,t_n\) where each \(t_i\) consists of a the single vertex i. When the number of trees is m, wait an exponential random variable with parameter \(m-1\), then pick a pair of trees \((t_i,t_j)\) with \(1\le i<j\le m\) with probability \((|t_i|+|t_j|)/(n(m-1))\), and add an edge between a uniform node in \(t_i\) and a uniform node in \(t_j\). Considering only the rescaled tree sizes \(C^{n,s}=(C_i^{n,s}/n,i\ge 1)\) of the forest F(ns) (sorted and completed by an infinite sequence of 0), we have

$$\begin{aligned} (C^{n,s},s\ge 0)\mathrel {\mathop {=}\limits ^{(d)}}(X_+^n(s), s\ge 0). \end{aligned}$$

Since any pair of trees coalesces with probability proportional to the sum of their sizes, we just need to check that the same time-scale arises in the additive coalescent. This is indeed the case, since in the additive coalescent, if m particles with masses \(x_i\), \(1\le i\le m\), are present, and if \(\sum _{1\le i\le m}x_i=1\), the first coalescence occurs after a time equal to the minimum of independent exponential random variable with parameters \(K(x_i,x_j)\), and \(\sum _{1\le i<j \le m}K(x_i,x_j)=\sum _{1\le i<j \le m}(x_i+x_j) =m-1.\) Thus in the present coalescent, if one takes \((E_i,1\le i\le n-1)\) a sequence of i.i.d. exponential r.v. with parameter 1, then the number of coalescences before time s is

$$\begin{aligned} M^n(t)=\max \bigg \{m\ge 0: \sum _{j=1}^m \frac{E_j}{n-j}{\le t},\bigg \} \end{aligned}$$

and

$$\begin{aligned} n^{-1/2}\left(n-M^n\left(s+\frac{1}{2}\log n\right)\right)\rightarrow e^{-s}, \end{aligned}$$

where the convergence holds in \({\mathbb {D}}((-\infty ,+\infty ),{\mathbb {R}})\) (the convergence holds in fact uniformly on any compact \([-\lambda _\star ,\lambda ^\star ]\)).

3.2.2 The combinatorial coalescence process \(\mathsf{CP}_+^{\textsc {cl}}\)

We now present quickly the model and results of Chassaing and Louchard [14]. Assume n cars park on a circular parking, identified with \({\mathbb {Z}}/n{\mathbb {Z}}\), according to the following algorithm. Let \((\mathsf{Ch}_i,1\le i\le n)\) be a family of i.i.d. random variables uniform on \({\mathbb {Z}}/n{\mathbb {Z}}\). The cars park successively. When the \(i-1\) first cars have already parked, car i chooses place \(\mathsf{Ch}_i\) and parks at the first available place in the list \(\mathsf{Ch}_i\), \(\mathsf{Ch}_{i}+1\mod n\), \(\mathsf{Ch}_{i}+2\mod n \ldots \) Assume that m cars are parked and call block a sequence of adjacent occupied places. As explained in [14, Section 8] to get a suitable relation with the additive coalescent, the correct notion of size for a block is the number of cars consecutively parked plus one. This model coincides exactly with the Marcus–Lushnikov additive process up to a random time change in the following sense: the sorted sequence of blocks size in the parking model at time t has same distribution as the sorted sequence of cluster sizes of the additive coalescent starting with n particles after t coalescences, and this is true as a process indexed by t.

However, since only blocks of cars that are separated by a single empty place may merge, the parking evolution is different from Marcus–Lushnikov process in which any pair of blocks may merge. The link holds only at the level of block sizes, and may be understood as follows: by a simple counting/symmetry argument, it may be proved that at any time t, the distribution of the sequence of block sizes in the parking, sorted now according to their position around the circle, is exchangeable. To understand the dynamic of the sorted sequence of block sizes after t coalescences, one may condition on the block sizes sequence \((C_i,1\le i \le n-t)\) at time t. Under this condition, the distribution of the successive blocks in the parking is a uniform random permutation of the sequence \((C_i,1\le i \le n-t)\). From this observation, it may be proved that the following coalescence will be between a block of size i and a block of size j with probability proportional to \(K(i,j)=i+j\) multiplied by \(n_in_j\), where \(n_i\) is the number of block of size i. The exchangeability is not lost after a coalescence, which implies the coincidence of the Markov processes of the cluster sizes in the Marcus–Lushnikov additive process on the one hand, and the block sizes in the parking on the other.

When \(m=m(n)=\lfloor n-\lambda \sqrt{n}\rfloor \) cars are parked, the large n asymptotic evolution of the sizes of these blocks (sorted in decreasing order)

$$\begin{aligned} B^{n,\lambda }:=\frac{1}{n}(B^{n,\lambda }_i,i\ge 0) \end{aligned}$$

is given by the standard additive coalescent up to a time change. Here are some precisions on this time change: the time \(\lfloor n-\lambda \sqrt{n}\rfloor \) coincides with the number of coalescence done. From what we said above in the additive coalescent, this occurs at a random time of order \(t+(1/2)\log (n)\) for t such that \(\exp (-t)=\lambda \), so that for a fixed \(\lambda >0\) one can prove

$$\begin{aligned} B^{n,\lambda }\xrightarrow [n]{(d)}X^{\infty }(-\ln (\lambda )). \end{aligned}$$

More precisely, Chassaing and Louchard obtained in [14, Theorem 1.3] the convergence of the sizes of the k largest blocks to that of the k largest excursions of \(\Psi \mathsf{y}_+^{(\lambda )}\).

3.2.3 The new combinatorial coalescence process \(\mathsf{CP}_+\)

Our new combinatorial coalescence process is in the mean time very close to that of Chassaing and Louchard [14] and to that of Pitman (Sect. 3.2.1 above). The new idea of Prim’s order makes the connection between these two models very clear, in a way that is both different from the one discussed in [14, Section 8], and similar to our approach to the multiplicative coalescent.

Although the intuition comes from the percolation model on the uniformly random labelled tree, it is convenient for the proofs to construct the process \(\mathsf{CP}_+\) as follows. The connections with the parking model and the fragmentation on trees are made later on in Sect. 3.2.4 Consider a sequence of i.i.d. Poisson random variables \((X(i),1\le i \le n)\) with parameter one, and associate to this sequence the random walk

$$\begin{aligned} Y(m)=\sum _{j=1}^m (X(j)-1),\quad 0\le m \le n. \end{aligned}$$

Denote by \(\tau _{-1}=\inf \{ m~:~Y(m)=-1\}\) the hitting time of \(-1\).

Further, denote by \((Y_+^n(m),0\le m \le n)\) the random walk Y conditioned on \(\tau _{-1}=n\). Denote by \((X_+^{n}(i),1\le i \le n)\) the increments of \(Y_+^n\) (that is under the condition that \(\tau _{-1}=n\)). Now introduce an array \((U_{k}(\ell ),1\le k \le n, 1\le \ell \le n)\) of i.i.d. \(\mathsf{uniform}[0,1]\) random variables and, conditionally on the family \((X^{n}_+(i),1\le i \le n)\), define the family \((X^{n,(t)}_+(i), 0\le t \le 1, 1\le i \le n)\), by

$$\begin{aligned} X_+^{n,(t)}(i)=\sum _{j=1}^{X_+^{n}(i)} 1_{U_{i}{(j)}\le t}, \quad {\text { for any }}\,\, t\in [0,1],\,\, 1\le i \le n. \end{aligned}$$

Hence, \(X^{n,(0)}_+(i)=0\), \(X^{n,(1)}_+(i)=X^{n}_+(i)\) and for any i, \(t\mapsto X_+^{n,(t)}(i)\) is non-decreasing. Define

$$\begin{aligned} Y^{n,(t)}_+(m)= \sum _{j=1}^m (X^{n,(t)}_+(j)-1),\,\,0\le m \le n, \,\,t\in [0,1]. \end{aligned}$$

From now on, consider that \((Y^{n,(t)}_+(m),0\le m \le n)\) is a continuous process in the variable m, obtained by linear interpolation between integer points. Set

$$\begin{aligned} \mathsf{y}_+^{n,(\lambda )}(x)=\frac{Y^{n, (1-\lambda /\sqrt{n})}_+(nx)}{\sqrt{n}}, \quad x\in [0,1],\,\, \lambda \ge 0. \end{aligned}$$
(7)

The processes \((\lambda ,x)\mapsto \mathsf{y}^{n,(\lambda )}_+(x)\) and \((\lambda ,x)\mapsto \mathsf{y}_+^{(\lambda )}(x)\) are denoted more simply \(\mathsf{y}_+^{n}\) and \(\mathsf{y}_+\) in the sequel. They are seen as random variables taking their values in \({\mathbb {D}}({\mathbb {R}}^+,{\mathbb {C}}([0,1],{\mathbb {R}}))\). In words, for fixed \(\lambda \), \(\mathsf{y}_+^{(\lambda )}\) is a r.v. in \({\mathbb {C}}([0,1],{\mathbb {R}})\). Seen as a process in \(\lambda \) it is right-continuous with left limits. From what we said earlier \(\lambda \mapsto \mathsf{y}_+^{n,(\lambda )}\) is non-increasing in \(\lambda \).

For \(\lambda =0\), \(Y_+^{n,(0)}\) is just a random walk conditioned to hit \(-1\) at time n. By a generalisation of Donsker’s invariance principle [15], see for instance [21, 26], we have

$$\begin{aligned} \mathsf{y}^{n,(0)}_+\xrightarrow [n]{(d)}\mathsf{y}^{(0)}_{+}, \end{aligned}$$
(8)

in \({\mathbb {C}}([0,1],{\mathbb {R}})\) equipped with the topology of uniform convergence (recall that \(\mathsf{y}^{(0)}_{+}=\mathsf{e}\)). The proof of the next theorem is postponed until Sect. 5.

Theorem 10

The following convergence holds in \({\mathbb {D}}({\mathbb {R}}^+, {\mathbb {C}}([0,1],{\mathbb {R}}))\),

$$\begin{aligned} \mathsf{y}^{n}_+ \xrightarrow [n]{(d)}\mathsf{y}_+. \end{aligned}$$

3.2.4 Between a parking scheme and fragmentation of Cayley trees

The parking scheme point of view on \(\mathsf{CP}_+\). The construction in Sect. 3.2.3 may be interpreted as follows in terms of a parking scheme: X(i) is the number of cars whose first choice is place i and that park at the first empty place to the right of i. The condition \(\tau _{-1}=n\) amounts to saying that, in then end, the place n is still empty (see [14] for more details). The random variable \(X_+^{n,(t)}(i)\) represents the number of cars that have chosen place i by time t. Observe that conditionally on \(\tau _{-1}=n\), \(\sum _{i=1}^n X(i)=n-1\), so that \(\sum _{i=1}^n X^{n,(t)}_+(i)\) is binomial with parameters \(n-1\) and t. In particular, this is random, unlike in [14]. Hence, at time \(t=1-{\lambda }/{\sqrt{n}}\) the number of coalescences that already occurred, denoted by \(N^{n,\lambda }\), is binomial\((n-1,1-\lambda /{\sqrt{n}})\) and it follows that

$$\begin{aligned} W_{n,\lambda }:=\frac{n-N^{n,\lambda }}{\sqrt{n}}\xrightarrow [n]{(d)}\lambda , \end{aligned}$$

the convergence holding in distribution in \({\mathbb {D}}([0,\lambda _\star ], {\mathbb {R}})\) for any \(\lambda _\star \), since the convergence is uniform on any compact. Indeed one may check that, as a process on \([0,\lambda _\star ]\), we have \(W_{n,\lambda }=n^{-1/2}\sum _{i=1}^{n-1} 1_{U^{(i)}\le \lambda /\sqrt{n}}\) for some \((U^{(i)}, i\ge 1)\) i.i.d. uniform on [0, 1]. The process \(\lambda \mapsto W_{n,\lambda }\) is non-decreasing, and its finite-dimensional distributions converge to those of the deterministic process \((\lambda , \lambda \ge 0)\) (convergence of the mean, and the variance goes to 0), and thus the convergence is almost sure (a.s.) on any compact (see [26, Appendix] if more details are needed).

The convergence of this time change between our model and the discrete coalescence process, together with the convergence of the excursion sizes (as a process in \(\lambda \)) are the main tool to obtain the convergence to the additive coalescent.

The percolation point of view. Consider a uniform Cayley tree with n vertices (uniformly labelled tree on [n]), and root it at the vertex labelled 1. For a node in [n], let its out-degree be the number of its neighbours that are further from the root. Then, it is folklore that the sequence of node out-degrees \((d_i,1\le 1\le n)\) where the nodes are sorted according to the breadth-first order is distributed as \((X(i),1\le i \le n)\) conditional to \(\tau _{-1}=n\), as described above. The random walk \(Y^{n,(0)}_+\) appears in the literature as the Łukasiewicz walk associated with a uniform Cayley tree (see, e.g., [23]). Now, equip the edges of the Cayley tree with i.i.d. uniform weights \((U_e,e \in E)\) (independently of the tree) and keep the edges with weight smaller than t, discarding the others. One then obtains a forest. In this forest \(F_t\), let \(d_i(t)\) denote the out-degree of the node that had previously rank i in the Cayley tree (so that \(d_i(1)=d_i\)). Then clearly,

$$\begin{aligned} (d_i(t),~1\le i \le n)_{t\in [0,1]}\mathrel {\mathop {=}\limits ^{(d)}}(X^{n,(t)}_+(i),~1\le i \le n)_{t\in [0,1]}. \end{aligned}$$
(9)

The following proposition is a consequence of Lemma 16 (and Lemma 15):

Proposition 11

Let T be a uniform Cayley tree on n vertices whose edges are (independently) equipped with i.i.d. uniform [0, 1] weights. Let \((u_i,1\le i \le n)\) be the nodes sorted according to the Prim order (when the root node is \(u_1=1\)), and let \(X^t(i)\) be the number of edges between \(u_i\) and its children, that have a weight at most t. Then

$$\begin{aligned} (X^t(1),\dots ,X^t(n))_{t\in [0,1]}\mathrel {\mathop {=}\limits ^{(d)}} (X_+^{n,(t)}(1),\dots ,X_+^{n,(t)})_{t\in [0,1]}. \end{aligned}$$

As a consequence the collection of excursion sizes of \(\Psi Y^{n,(t)}_+\) at time t evolves, up to a time change, as the additive coalescent.

It is classical that a tree (or a forest) can be encoded by a Łukasiewicz walk. This walk encodes the sequence of node degrees \((Y^{n,t}(j)=\sum _{k=1}^j (d_i^{(t)}-1))\). As a consequence, the sequence of sizes of the trees in \(F_t\), sorted using the Prim order correspond to the sequence of excursion sizes of \(\Psi Y^{n,(t)}_+\), and this property is true as a process indexed by t. This makes a connection between the results by Aldous and Pitman [4] and our representation of additive coalescent, and explains again the fact that the additive coalescent can be linearised.

4 Prim’s order and linear representations of coalescents

This part presents the main new idea underlying this work.

4.1 Prim’s algorithm and coalescents

In this section, we assume that an integer \(n\ge 2\) is fixed. Let \(G=([n],E)\) be any connected graph, where the edges are marked by some weights, \(\mathbf{w}=(w_e,e \in E)\in [0,1]^{\#E}\), some non-negative real numbers. The pair \((G,\mathbf{w})\) is said to be properly weighted if the weights are distinct and positive.

Prim’s algorithm (or Prim–Jarník algorithm) is an algorithm which associates with any properly weighted graph \((G,\mathbf{w})\) its unique minimum spanning tree, the connected subgraph of G that minimises the sum of the weights of its edges. It also defines a total order \(\prec \) on the set of vertices. Let us describe the nodes \(u_1,\dots ,u_n\) satisfying \(u_1\prec u_2\prec \dots \prec u_n\). We will use below the notation \(V_{i}\) for the set \(\{u_1,\dots ,u_i\}\).

First set \(u_1=1\) and \(V_1=\{u_1\}\). Assume that for some \(1\le i \le n-1\), the nodes \(u_1,\dots ,u_i\) have been defined. Consider the set of weights \(\{w_{\{a,b\}} ~|~a\in V_i, b\notin V_i\}\) of edges between a vertex of \(V_i\) and another outside of \(V_i\). Since all weights are distinct, the minimum is reached at a single pair \((a^\star ,b^\star )\in V_i \times \complement V_i\) (where \(\complement V_i\) denotes the complement of \(V_i\)). Set \(u_{i+1}=b^\star \). This iterative procedure completely determines the Prim order \(\prec \). If one sets additionally \(\pi _{i+1}=a^\star \), a classical result (not used in the paper) is that the minimum spanning tree is the tree on [n] with set of edges \(\{(\pi _i, u_i): 2\le i\le n\}\).

Definition 12

We say that a set of nodes \(\{v_1,\dots ,v_\ell \}\) forms a Prim interval, if \(\{v_1,\dots ,v_\ell \}=\{u_{i}, i \in \llbracket a,a+\ell -1\rrbracket \}\) for some a, that is if their Prim ranks are consecutive. Any Prim interval can be written as \(V_j{\setminus } V_i\) for some pair (ij).

Given a properly weighted graph \((G,\mathbf{w})\), for any \(t \in [0,1]\), \(E_t(\mathbf{w})=\{e\in E: w_e\le t\}\) and \(G_t(\mathbf{w})=([n],E_t)\) the graph whose edges are the edges of E with weight at most t. The next proposition, which seems to be folklore in graph theory, is of prime importance to us (see an illustration Fig. 2). In the sequel we write \(E_t\) and \(G_t\) for short, the weights being clear from the context.

Proposition 13

Let \((G,\mathbf{w})\) be a properly weighted graph. For any \(t\in [0,1]\), all the connected components of \(G_t\) are Prim intervals. As a consequence, the coalescence of connected components arising when t increases corresponds to coalescence of consecutive Prim intervals.

Proof

Only the first statement needs to be proved. The graph \(G_t\) is non-decreasing for \(t\in [0,1]\), and since by hypothesis the weights are distinct and non-zero, we have \(E_0=\varnothing \) and \(E_1=E\). The finite set of weights/times \(\{w_e, e\in E\}\) are the jum** times for the function \((G_t, 0\le t \le 1)\), and exactly \(n-1\) of these dates \(t_1,\dots ,t_{n-1}\) modify the number of connected components. So the result needs only be checked at these times. For \(t=t_0=0\) the result holds. Assume that, at time \(t_k\) for some \(0\le k \le n-2\), all the connected components are consecutive intervals \((I_1,\dots ,I_\ell )\) with \(I_j=[a_j,b_j]\) and \(a_{j+1}=b_j+1\). Denote by e the edge which is added at time \(t_{k+1}\). By hypothesis adding it decreases the number of connected components, and so its end points lie in two distinct intervals \(I_{x}=[a_x,b_x]\) and \(I_{y}=[a_y,b_y]\) for some \(x<y\). If \(y=x+1\) we are done, so assume for a contradiction that \(y>x+1\). The weight \(w_e\) is smaller than all those of the missing edges at time \(t_{k+1}\); in particular, it is smaller than all the weights of the edges between \(\cup _{i\le x} I_i\) and \(I_{x+1}\). But this is impossible, since it contradicts the fact that the vertices are sorted according to the Prim order: indeed, by definition, the extremity of the lightest edge out of \(\cup _{i\le x} I_i\) is \(a_{x+1}\in I_{x+1}\). \(\square \)

Denote by \(\mathsf{Neigh}^G(v)=\{u~: \{u,v\}\in E\}\) the set of neighbours of v in G. For a set of nodes S let also \(\mathsf{Neigh}^G(S)=(\bigcup _{v \in S} \mathsf{Neigh}^G(v) ){\setminus } S\), the set of neighbours of S (out of S). Aldous [3] study of the multiplicative coalescent relies on an exploration of the graph and an encoding of the process \(i\mapsto \#\mathsf{Neigh}^G(S_i)\), for an increasing collection of sets \((S_i)_{1\le i\le n}\) that are built by a breadth-first search algorithm. The modified Algorithm 1 uses the standard order on the nodes instead of breadth-first search, but here we investigate the influence of the order on [n], hereafter denoted by \(<\), that is used in building the sets \((S_i)_{1\le i\le n}\). The exploration is as follows.

The first visited node is the smallestFootnote 1 one \(v_1\) for the order \(<\). Assume now that we have visited \(S_k=\{v_1,\dots , v_{k}\}\) at some time \(1\le k\le n\). Then two cases arise:

  • if \(\mathsf{Neigh}^G(S_k)\ne \varnothing \), then \(v_{k+1}\) is the smallest node for \(<\) in \(\mathsf{Neigh}^G(S_k)\), or

  • if \(\mathsf{Neigh}^G(S_k)= \varnothing \), then \(v_{k+1}\) is the smallest node for \(<\) in \([n]{\setminus } S_k\).

In the exploration used by Aldous, the labels of the nodes \(1,2,\dots ,n\) are compared using the standard order \(<\) on \({\mathbb {N}}\). The exploration clearly depends on the order \(<\), and the notation should have reflected this fact. For example, we could have written \(v_{k}(<)\) and \(S_k(<)\) instead of \(v_k\) and \(S_k\). We will sometimes use these enriched notation later on, and for \(0\le k \le n+1\) we also use the more compact notation

$$\begin{aligned} Z^{g}_{<}(k)=\#\mathsf{Neigh}^{g}(S_k(<)) \end{aligned}$$

where by convention \(Z^{g}_{<}(0)=Z^{g}_{<}(n+1)=0\).

The proof of the following lemma is immediate. For a graph \(g=([n],E)\), denote by \(\mathsf{CC}(g)\) the set of connected components of g, seen as a partition of [n].

Lemma 14

Let \(g=([n],E)\) be a graph (connected or not). For any total order \(<\) on the set of nodes [n],

$$\begin{aligned} \#\{k \in \{0,\dots ,n\}~:Z^g_{<}(k)=0\}=\#\mathsf{CC}(g), \end{aligned}$$

and the successive sizes of the connected components ordered by the exploration coincide with the distances between successive zeros in the sequence \((Z^g_{<}(k),0\le k \le n+1)\).

In the following we will call Prim exploration the exploration based on the Prim order \(\prec \). Unlike the standard exploration (the exploration defined in terms of the standard order \(<\) on \(\mathbb {N}\)), it is defined only on properly weighted graphs \((G,\mathbf{w})\).

4.2 Prim traversal versus standard traversal

Take a random graph \(\mathbf{G}=([n],E)\) whose edges are equipped with i.i.d. uniform [0,1] weights. Let us examine the similarities and the differences between the standard exploration (using the order \(<\)) and the Prim exploration of the random graph \(\mathbf{G}_t\). By Lemma 14 the multiset of excursions lengths of \(Z^{\mathbf{G}_t}_<\) and \(Z^{\mathbf{G}_t}_{\prec }\) are the same, but in general the paths \(Z^{\mathbf{G}_t}_<\) and \(Z^{\mathbf{G}_t}_{\prec }\) do not have the same distribution. We already said that the distributions of the corresponding processes in t were different (by Proposition 13), but this is also true for fixed t (even if the distribution of \(\mathbf{G}\) is invariant by random permutation of the node labels). The reason is that during the exploration, the Prim order favours the nodes with a large indegree since the order is defined using the weights of the edges. Here is an example illustrating this. Consider G the graph with vertices \(\{1,a_1,a_2,a_3\}\) and edges \((1,a_1), (1,a_2), (1,a_3), (a_2,a_3)\). Conditionally on \(\mathbf{G}_t=G\), one sees that for the standard exploration, under a random labelling preserving 1, one visits \(1,a_2,a_3,a_1\) in that order with probability 1 / 6. However, under the Prim exploration, it is easy to check that, among the 24 possible orderings of the weights \((w_e,e\in E)\), six of them give this order so that the probability is 1 / 4.

There are however some special cases, including when \(\mathbf{G}\) is a uniform Cayley tree or the complete graph for which the distributions of \(Z^{\mathbf{G}_t}_<\) and \(Z^{\mathbf{G}_t}_{\prec }\) are the same.

Lemma 15

Let \((\mathbf{G},W)=(([n],E),W)\) be a rooted weighted random graph. Assume that, for any k the distribution of \(\#\mathsf{Neigh}^{\mathbf{G}_t}(S_{k+1}(\prec ))\) knowing \((S_k,\mathsf{Neigh}^{\mathbf{G}_t}(S_k(\prec )))\) is

  • independent of the weights \(w_e\) on the edges between \(S_k\) and \(\mathsf{Neigh}^{\mathbf{G}_t}(S_k(\prec ))\),

  • the same as the distribution of \(\#\mathsf{Neigh}^{\mathbf{G}_t}(S_{k+1}(\prec ))\) knowing \((S_k,\#\mathsf{Neigh}^{\mathbf{G}_t}(S_k(\prec )))\),

then \(Z^{\mathbf{G}_t}_\prec \) and \(Z^{\mathbf{G}_t}_{<}\) have the same distribution.

Proof

We prove by induction on \(k\ge 0\) that under the conditions of the lemma, for any \(t\in [0,1]\), \((Z_\prec ^{\mathbf {G}_t}(i))_{0\le i\le k}\) and \((Z_<^{\mathbf {G}_t}(i))_{0\le i\le k}\) have the same distribution. The base case \(k=0\) is clear. Suppose now that this holds up to some integer k. Then, by Skorokhod’s representation theorem, we can find a coupling for which \((Z_\prec ^{\mathbf {G}_t}(i))_{0\le i\le k}\) and \((Z_<^{\mathbf {G}_t}(i))_{0\le i\le k}\) are a.s. the same. Now, the distribution of \(\#\mathsf{Neigh}(S_{k+1})\) conditional on \((S_k,\#\mathsf{Neigh}(S_k))\) is the same as the one conditionally on \((S_k, \mathsf{Neigh}(S_k))\), for both orders. Furthermore, since this distribution is independent of the weights between \(S_k\) and \(\mathsf{Neigh}(S_k)\), it is not affected when modifying them in such a way that the end point of the lightest edge is the node of minimum label in \(\mathsf{Neigh}(S_k)\). But in this modified version, we then have \(Z_\prec ^{\mathbf {G}_t}(k+1)=Z_<^{\mathbf {G}_t}(k+1)\) with probability one, so that \((Z_\prec ^{\mathbf {G}_t}(i))_{0\le i\le k+1}\) and \((Z_<^{\mathbf {G}_t}(i))_{0\le i\le k+1}\), which completes the proof of the induction step. \(\square \)

Lemma 16

The following models both satisfy the hypotheses (and then the conclusion) of Lemma 15:

(a):

The complete graph \(\mathbf{G}=K_n\) on [n] whose edges are weighted with i.i.d. uniforms on [0, 1].

(b):

\(\mathbf{G}\) a uniform Cayley tree on n nodes whose edges are equipped with i.i.d. weights uniform on [0, 1].

Case (b) can easily be extended to Galton–Watson trees conditioned to have size n, but we omit the details (see the proof below).

Proof

In the entire proof, there is no risk of confusion and we write \(S_k\) instead of \(S_k(\prec )\), and drop the superscript referring to the graph we are working on.

(a) The case of the complete graph is straightforward: whatever the node \(v_{k+1}\), the distribution of \(\#\mathsf{Neigh}(S_{k+1})\) given \((S_k, \mathsf{Neigh}(S_k)\) is always the same and is that of

$$\begin{aligned} \#\mathsf{Neigh}(S_k-\mathbf {1}_{ \{ \#\mathsf{Neigh}(S_k)>0 \} } + \mathsf{Bin}(n-k-\mathbf {1}_{ \{ \#\mathsf{Neigh}(S_k)>0 \} }-\#\mathsf{Neigh}(S_k), t). \end{aligned}$$

In particular, it is independent of the weights on the edges between \(S_k\) and \(\mathsf{Neigh}(S_k)\). Also, this distribution is the same conditionally on \((S_k, \mathsf{Neigh}(S_k))\).

(b) The case of the Cayley tree is based on the invariances of the distribution of the tree. Observe first that the percolated tree \(\mathbf {G}_t\) is distributed as a forest of Cayley trees, which may each be seen as rooted at the node of smallest label. Now, condition on \((S_k, \#\mathsf{Neigh}(S_k))\). If \(\#\mathsf{Neigh}(S_k)=0\), then the claim clearly holds. Otherwise, the distribution of \(\#\mathsf{Neigh}(S_{k+1})\) is that of

$$\begin{aligned} \#\mathsf{Neigh}(S_k)-1 + \mathsf{Bin}(D_{k+1}, t), \end{aligned}$$

where \(D_{k+1}+1\) denotes the degree of \(v_{k+1}\), the next node to be visited. However, the subtrees rooted at \(\mathsf{Neigh}(S_k)\) are exchangeable, and since the weights are independent of the tree, we see that the distribution of \(D_{k+1}\) is independent of the weights between \(S_k\) and \(\mathsf{Neigh}(S_k)\). Furthermore, the distribution conditionally on \(S_k\) and \(\mathsf{Neigh}(S_k)\) is unchanged since we may still permute the sets of children of the nodes in \(\mathsf{Neigh}(S_k)\) without altering the conditional distribution. \(\square \)

5 Encoding the additive coalescent: proof of Theorem 10

5.1 Finite-dimensional distributions

We start with the proof of the convergence of the finite-dimensional distributions (fdd).

Lemma 17

For any integers \(k\ge 1\) and \(\ell \ge 1\), and for any \(s_1,s_2, \dots , s_k\in [0,1]\) and \(\lambda _1,\lambda _2, \dots , \lambda _\ell \ge 0\), we have

$$\begin{aligned} (\mathsf{y}^{n,(\lambda _j)}_+(s_i))_{1\le i\le k, 1\le j\le \ell } \xrightarrow [n]{(d)}(\mathsf{y}^{(\lambda _j)}_+(s_i))_{1\le i\le k, 1\le j\le \ell }. \end{aligned}$$

We start with two bounds that will be used all along the proof. First, for any \(\varepsilon >0\),

$$\begin{aligned} {\mathbb {P}}\left(\max _{1\le i \le n} X^{n,(1)}_+(i)\ge n^{\varepsilon }\right)=O( \exp (-n^{\varepsilon /2})), \end{aligned}$$
(10)

as \(n\rightarrow \infty \). To see this, observe that \(X^{n,(1)}_+(i)\) are Poisson(1) random variables conditioned to satisfy \(\tau _{-1}=n\), and we have \({\mathbb {P}}(\tau _{-1}=n) \sim c n^{-3/2}\) as \(n\rightarrow \infty \). So the conditioning may be removed at the expense of a factor \((cn^{-3/2})^{-1}\), the union bound brings another factor n, and then \({\mathbb {P}}(\mathsf{Poisson}(1)\ge n^{\varepsilon })\le \min \{ e^{-1+e^{s}-s n^{\varepsilon }}~:s>0\}\) by standard Chernoff’s bounding method. The case \(\lambda =1\) provides the bound in (10). Furthermore, as a consequence of the weak convergence in \({\mathbb {C}}([0,1],{\mathbb {R}})\) stated in (8), we have

$$\begin{aligned} \Vert \mathsf{y}^{n,(0)}_+\Vert _\infty \rightarrow \Vert \mathsf{y}^{(0)}_+\Vert _\infty , \end{aligned}$$
(11)

and \(\Vert \mathsf{y}^{(0)}_+\Vert _\infty <\infty \) with probability one.

Proof of Lemma 17

By the Skorokhod representation theorem, there exists a probability space on which the convergence stated in (8) is almost sure. In the following, we work on this space, and keep the same notation for the version of \(\mathsf{y}^{n,(0)}_+\) which converges a.s., and still denote the discrete increments by \((X_+^n(i),1\le i \le n)\). Conditionally on \((X_+^n(i),1\le i \le n)\), we have

$$\begin{aligned} \mathsf{y}^{n,(\lambda )}_+(s)=n^{-1/2}\sum _{m=1}^{\lfloor ns\rfloor } \left( -1+\sum _{\ell =1}^{X^n_+(m)} \mathbf {1}_{ \{ U_m(\ell )\le 1-\lambda n^{-1/2} \} }\right) +\varepsilon _{n,s,\lambda }, \end{aligned}$$
(12)

where \(\varepsilon _{n,s,\lambda }\) is the term coming from the fact that ns is not an integer, so that the sum misses a contribution corresponding to the portion between \(\lfloor ns\rfloor \) and ns. We have the simple bound

$$\begin{aligned} \sup _{0\le s\le 1,\lambda \in {\mathbb {R}}}|\varepsilon _{n,s,\lambda }|\le n^{-1/2}\max _{1\le i\le n} |X^n_+(i)-1|. \end{aligned}$$
(13)

Note that, by (10), the bound in (13) goes to 0 in probability. Using the fact that \(\sum _{m=1}^{\lfloor ns\rfloor }X^n_+(m)=\lfloor ns\rfloor +\sqrt{n}\mathsf{y}^{n,(0)}_+(s)\), we may rewrite \(\mathsf{y}^{n,(\lambda )}_+(s)\) as:

$$\begin{aligned} \mathsf{y}^{n,(\lambda )}_+(s)&= n^{-1/2}\sum _{m=1}^{\lfloor ns\rfloor } \sum _{\ell =1}^{X_+^n(m)} (\mathbf {1}_{ \{ U_m(\ell )\le 1-\lambda n^{-1/2} \} }-(1-\lambda n^{-1/2})) \nonumber \\&\quad -\,n^{-1/2}\lfloor ns\rfloor +(1-{\lambda }n^{-1/2}) n^{-1/2}\left(\lfloor ns\rfloor +\sqrt{n}\mathsf{y}^{n,(0)}_+(s)\right)\nonumber \\&\quad +\,\varepsilon _{n,s,\lambda }. \end{aligned}$$
(14)

Now, for \(\lambda \in {\mathbb {R}}\) fixed, the first term goes to 0 in probability (conditionally on the \(X_i\)). To see this, observe that the number of terms in the sum is \(\lfloor ns\rfloor +\sqrt{n}\mathsf{y}^{n,(0)}_+(s)\), and that the terms \(\mathbf {1}_{ \{ U_i(\ell )\le 1-\lambda n^{-1/2} \} }-(1-\lambda n^{-1/2})\) are independent and each have variance bounded by \(\lambda n^{-1/2}\). So the total variance of this first term is \(O(n^{-1/2})\). Since \(\Vert \mathsf{y}^{n,(0)}_+\Vert _\infty \) converges a.s. on the probability space we are working on, and that each term is centred, Chebyshev’s inequality guarantees that we have convergence to zero in probability.

The term in the second line of the right-hand side of (14) is

$$\begin{aligned} (1-{\lambda }n^{-1/2}) \mathsf{y}^{n,(0)}_+(s) -\lambda \frac{\lfloor ns\rfloor }{n}&\xrightarrow [n]{(d)}&\mathsf{y}^{(0)}_{+}(s) -s\lambda . \end{aligned}$$
(15)

Finally, we see that on the probability space we are working on for any \(s,\lambda \),

$$\begin{aligned} \mathsf{y}^{n,(\lambda )}_+(s) \xrightarrow [n]{(proba.)}\mathsf{y}_+^{(0)}(s)-\lambda s. \end{aligned}$$

This completes the proof of convergence of the fdd. \(\square \)

5.2 Tightness of the sequence \((\mathsf{y}_+^n)_{n\ge 1}\)

It suffices to show that the sequence \((\lambda \mapsto \mathsf{y}^{n, (\lambda )}_+)_{n\ge 1}\) is tight on \({\mathbb {D}}([0,a], {\mathbb {C}}([0,1],{\mathbb {R}}))\) for each \(a\ge 0\). So fix \(a>0\). According to Kallenberg [22, Theorem 14.10], considering the modulus of continuity

$$\begin{aligned} \omega _{\delta }(\mathsf{y}^{n}_+):= \sup \{ \Vert \mathsf{y}^{n,(\lambda _1)}_+-\mathsf{y}_+^{n,(\lambda _2)} \Vert _\infty ~: |\lambda _1-\lambda _2|\le \delta , 0\le \lambda _1,\lambda _2 \le a\}, \end{aligned}$$

it suffices to show that for any \(\varepsilon ,\varepsilon '>0\), there exists \(\delta >0\) such that for all n large enough, \({\mathbb {P}}(\omega _{\delta }(\mathsf{y}^{n}_+)\ge \varepsilon ) \le \varepsilon '\). Observe that this modulus of continuity is a priori not adapted to convergence in a space of cadlag functions, but we take advantage of the continuity of the limit, and simply show that the convergence is uniform on \([0,a]\times [0,1]\). So fix \(\varepsilon ,\varepsilon '>0\). Considering again (10) and (11), there exist constants \(b\in (0,1/4)\) and \(b'>0\) such that the following event

$$\begin{aligned} B_n:=\{\max _{1\le i \le n} X^{n,(1)}_+(i)\le n^{b}, \Vert \mathsf{y}^{n,(0)}_+\Vert _\infty \le b'\} \end{aligned}$$

has probability \(1-\varepsilon '/10\) for all n large enough.

From (14) and the discussion just below it, taking \(\lambda _1 < \lambda _2\), we have

$$\begin{aligned} \mathsf{y}^{n,(\lambda _1)}_+(s)-\mathsf{y}_+^{n,(\lambda _2)}(s)&= n^{-1/2}\sum _{m=1}^{\lfloor ns\rfloor } \sum _{\ell =1}^{X_+^n(m)} \left(\mathbf {1}_{ \{ \lambda _1 n^{-1/2} <1-U_m(\ell )\le \lambda _2 n^{-1/2} \} }-\frac{\lambda _2-\lambda _1}{n^{1/2}}\right) \nonumber \\&\quad +\, s({-}\lambda _1+\lambda _2) + O\bigg ( \frac{\Vert y^{n,(0)}\Vert _\infty }{n^{1/2}}+\frac{1}{n} +\frac{\max _i |X^n_+(i)-1|}{n^{1/2}}\bigg ), \end{aligned}$$
(16)

where the \(O(\,\cdot \,)\) term above is uniform on \(0\le \lambda _1,\lambda _2\le a, 0\le s\le 1\). We thus have \( w_{\delta }(\mathsf{y}^{n})\le w_{\delta }(\mathsf{v}^{(n)})+ O(s(\lambda _2-\lambda _1)),\) where

$$\begin{aligned} \mathsf{v}^{n,(\lambda )}(s)=n^{-1/2}\sum _{m=1}^{\lfloor ns\rfloor } \sum _{\ell =1}^{X_+^n(m)} (\mathbf {1}_{ \{ U_m(\ell )\le 1-\lambda n^{-1/2} \} }-(1-\lambda n^{-1/2})). \end{aligned}$$

On \(B_n\), the \(O(\,\cdot \,)\) term in (16) goes to zero in probability. Furthermore, since \({\mathbb {P}}(\complement B)<\varepsilon '/10\), in order to complete the proof, it suffices to show the following lemma:

Lemma 18

For every \(\varepsilon ,\varepsilon '>0\) there exists \(\delta >0\) such that for every n large enough

$$\begin{aligned} {\mathbb {P}}(\omega _{\delta }(\mathsf{v}^{n})\ge \varepsilon /2) \le \varepsilon '/2. \end{aligned}$$

In fact, \(\mathsf{v}^{n,(\lambda )}(s)\) appears to be very small since it is a sum of \(ns + \sqrt{n}\mathsf{y}_+^{n,(0)}(s)\) centred r.v. with variance \(1/\sqrt{n}\) and the normalisation by \(n^{-1/2}\) makes of the sum a centred r.v. with variance \(1/\sqrt{n}\); however the fluctuations along the two dimensional rectangle are more difficult to handle.

Proof

We discretise the parameter \(\lambda \) and consider \(\lambda _j= j a /\sqrt{n}\) for \(j=1,\dots ,n.\) Then,

$$\begin{aligned} \mathsf{v}^{n,(\lambda _{j+1})}(s)-\mathsf{v}^{n,(\lambda _j)}(s) =n^{-1/2}\sum _{m=1}^{\lfloor ns\rfloor } \sum _{\ell =1}^{X_+^n(m)}\left(\mathbf {1}_{ \{ 1-\frac{ (j+1)a}{n}\le U_m(\ell )\le 1-\frac{ja}{n} \} } -\frac{a}{n}\right). \end{aligned}$$

Now, \(\mathsf{v}^{n,(\lambda )}\) does not fluctuate much between the any two successive \(\lambda _j\), \(j=1,\dots ,n\):

$$\begin{aligned}&\mathop {\mathop {\sup }\limits _{{1\le j\le n}}}\limits _{{\lambda \in [\lambda _{j},\lambda _{j+1}]}}| \mathsf{v}^{n,(\lambda )}(s)-\mathsf{v}^{n,(\lambda _j)}(s)| \le \left(\lfloor ns\rfloor +\sqrt{n}\mathsf{y}_+^{n,(0)}(s)\right)\frac{a}{n^{3/2}}\\&\quad +\,\sup _{1\le j\le n}\frac{\#\left\{ 1-U_m(\ell )\in \bigg [\frac{ ja}{ n},\frac{ (j+1)a}{ n}\bigg ], m\in \llbracket 1,n\rrbracket , \ell \in \llbracket 1,X_+^{n}(m)\rrbracket \right\} }{n^{1/2}}. \end{aligned}$$

The first term goes to 0. As for the second one, for a single j, on \(B_n\) the term is dominated by \(n^{-1/2}\mathsf{Bin}(n+n^{1+b},a/n)\), where \(\mathsf{Bin}(n,p)\) denotes a binomial r.v. with parameters n and p. Now, using Bernstein’s inequality, one can show, that for some \(c>0\), and n large enough,

$$\begin{aligned} {\mathbb {P}}\left(\frac{\mathsf{Bin}(n+n^{1+b},a/n)}{n^{1/2}}>\varepsilon /4\right)\le \exp (-c\varepsilon ^2n^{1/2}), \end{aligned}$$

so that the union bound then suffices to control the supremum on \(j\in \{1,2,\dots ,n\}\). It remains to bound the fluctuations restricted to the \(\lambda _j\), \(1\le j\le n\).

Now, let \((\lambda ,s)\mapsto \mathsf{w}^{n,(\lambda )}(s)\) be the continuous process (in \({\mathbb {C}}([0,a]\times [0,1],{\mathbb {R}})\)) which interpolates \(\mathsf{v}^{n}\) as follows: for any \(1\le j \le n\), \(\mathsf{w}^{n,(\lambda _j)}= \mathsf{v}^{n,(\lambda _j)}\). Each \(\mathsf{w}^{n,(\lambda _j)}\) is interpolated between the points (\(k/n,(k+1)/n)\) (this corresponds to the discrete increments), and the interpolation from \(\mathsf{v}^{n,(\lambda _j)}(s)\) to \(\mathsf{v}^{n,(\lambda _{j+1})}(s)\) being then also linear for each s. To end the proof of tightness, we will show that \(\mathsf{w}^{n}\) is tight in \({\mathbb {C}}([0,a]\times [0,1],{\mathbb {R}})\). We use the criterion given in Corollary 14.9, p. 261 of [22]. It suffices to show that \((\mathsf{w}^{n,(0)}(x))_{x\in [0,1]}\) is tight (which is the case since \(\mathsf{y}_+^{n,(0)}\) is tight), and for some constant C, some \(\alpha ,\beta >0\), for any \((\lambda _1,s_1)\) and \((\lambda _2,s_2)\) in \([0,a]\times [0,1]\), any n large enough,

$$\begin{aligned} {\mathbb {E}}x[ \left|\mathsf{w}^{n,(\lambda _1)}({s_1})-\mathsf{w}^{n,(\lambda _2)}(s_2)\right|^\alpha ] \le C\Vert (s_1,\lambda _1)-(s_2,\lambda _2)\Vert ^{2+\beta }. \end{aligned}$$
(17)

We will show this conditionally on \(B_n\) only, which is sufficient too. As usual, it suffices to get the inequality at the discretisation points.

$$\begin{aligned}&{\mathbb {E}}\bigg [\left|\mathsf{w}^{n,(\lambda _{j_1})}\bigg (\frac{k_1}{n}\bigg )- \mathsf{y}_+^{n,(\lambda _{j_2})}\bigg (\frac{k_2}{n}\bigg )\right|^p\bigg ]\\&\quad = {\mathbb {E}}\bigg [\bigg | n^{-1/2}\sum _{m=k_1+1}^{k_2} \sum _{\ell =1}^{X_+^n(m)} \left(\mathbf {1}_{ \{ j_1 a/n <1-U_m(\ell )\le j_2 a/ n \} }-a\frac{j_2-j_1}{n}\right)\bigg |^p\bigg ] \end{aligned}$$

Using that for any sequence of independent and centred random variables \((A_i)_{i\ge 1}\), we have \({\mathbb {E}}[|\sum _{k=1}^m A_k|^p]\le C_p{\mathbb {E}}[\sum _{k=1}^m A_k^2]^{p/2}\), for some constant \(C_p\), we obtain

$$\begin{aligned}&{\mathbb {E}}\bigg [\left|\mathsf{w}^{n,(\lambda _{j_1})}\bigg (\frac{k_1}{n}\bigg )- \mathsf{y}_+^{n,(\lambda _{j_2})}\bigg (\frac{k_2}{n}\bigg )\right|^p~\bigg | ~\mathsf{y}_+^{n,(0)}\bigg ]\\&\quad \le {C_p} \left\{ \frac{k_2-k_1}{n}+n^{-1/2}\left(\mathsf{y}_+^{n,(0)}\bigg (\frac{k_2}{n}\bigg ) -\mathsf{y}_+^{n,(0)}\bigg (\frac{k_1}{n}\bigg )\right)\right\} ^{p/2}\\&\qquad \times \bigg (\frac{(j_2-j_1)a}{\sqrt{n}}\bigg )^{p/2}\frac{1}{\sqrt{n}^{p/2}} \end{aligned}$$

this is much smaller than needed: conditionally on \(B_n\) this is at most

$$\begin{aligned} \frac{C_1}{n^{p/2}} (\lambda _{j_1}-\lambda _{j_2})^{p/2}\left(\frac{k_2-k_1}{n}\right)^{p/2} + {C_2} \left(\lambda _{j_1}{-}\lambda _{j_2}\right)^{p/2} \left(\frac{\mathsf{y}_+^{n,(0)}(k_2/n){-}\mathsf{y}_+^{n,(0)}(k_1/n)}{n}\right)^{p/2}. \end{aligned}$$

Finally, since \(\mathsf{y}_+^{n,(0)} \le b'\) on \(B_n\), we have

$$\begin{aligned} \left(\frac{\mathsf{y}_+^{n,(0)}(k_2/n)-\mathsf{y}_+^{n,(0)}(k_1/n)}{n}\right)^{p/2} \le \left(\frac{M}{n}\right)^{p/2} \le C'\left(\frac{k_2-k_1}{n}\right)^{p/2}, \end{aligned}$$

which completes the proof of lemma. \(\square \)

6 Encoding the multiplicative coalescent: proof of Theorem 7

Theorem 7 states the convergence of \(\mathsf{y}^{n}_\times \). Before proving this convergence, we need to establish carefully the distribution of the sequence \(\mathsf{y}^n_\times \). This is the aim of the next subsection. We then move on to the proof of convergence in the sense of finite-dimensional distributions in Sect. 6.3 and of tightness of the sequence \((\mathsf{y}^n_\times )_{n\ge 1}\) in Sect. 6.4.

6.1 Distribution of \(\mathsf{y}^{n }_\times \)

In this part, we work on \(\mathbf{G}=K_n\) equipped with the weights \(\mathbf{w}=(W(i,j), ij\in [n])\) on its edges, some i.i.d. uniform r.v. on [0, 1]. Let \(u_1,\dots ,u_n\) be the list of nodes of [n] sorted according to Prim’s order on \((\mathbf{G},\mathbf{w})\). We now let \(U_{k}(\ell )= W(u_k,u_\ell )\). In this case \(\mathbf{G}_t\) coincides with G(nt), and Lemmas 16 and 15 apply.

For \(i\ge 1\) and \(t\in [0,1]\), define

$$\begin{aligned} {\mathcal {Z}}^t(i)&:= \{ u_k : i< k\le n, \exists j\le i, U_j(k) \le t \}\\ {\mathcal {S}}^t(i)&:=\{ u_k : i< k\le n, U_i(k)\le t\} {\setminus } {\mathcal {Z}}^t(i), \end{aligned}$$

and let \(Z^t(i)=\#{\mathcal {Z}}^t(i)\) and \(S^t(i)=\#{\mathcal {S}}^t(i)\). The set \({\mathcal {Z}}^t(i)\) is the list of nodes with Prim index greater than i, that share an edge in \(\mathbf{G}_t\) with a node with index at most i. The set \({\mathcal {S}}^t(i)\) records the neighbours of \(u_i\) in \(\mathbf{G}_t\) with Prim index larger than i, that are not neighbours of any \(u_j\) for \( j < i\). Observe that for every i the map \(t\mapsto {\mathcal {Z}}^t(i)\) is non-decreasing, non-negative, and that \({\mathcal {Z}}^t(0)=\varnothing \) for every \(t\in [0,1]\). Then define \(\Delta Z^t(i):=Z^t(i)-Z^t(i-1)\); in \(\mathbf{G}_t\), this is the number of nodes in \(\{u_{i+1},\ldots ,u_n\}\) that are adjacent to \(u_i\) but are not adjacent to any of \(u_1,\dots ,u_{i-1}\), minus the number of nodes in \(\{u_{i},\dots ,u_{n}\}\) with an edge from \(u_1,\dots ,u_{i-1}\) but not from \(u_i\). (If we were to consider the Prim order as an exploration, then \(\Delta Z^t(i)\) is the number of unexplored neighbors of \(u_i\); however, we point out the fact that, in general, there is no reason why \(u_{j+1}\) should be adjacent to \(u_j\), even if they lie in the same connected component.) Hence \(\Delta Z^t(i)\ge -1\) since only \(u_i\) can be in \({\mathcal {Z}}^t(i)\) but not in \({\mathcal {Z}}^t(i-1)\).

For a vector \(\mathbf{x}=(x_1,x_2,\dots , x_k)\) of distinct real numbers, we write \(\mathbf{x}^{\downarrow }\) for the vector consisting of the elements \(x_1,\dots , x_n\) sorted in decreasing order.

Lemma 19

For every \(i\in [n]\), conditionally on \(({\mathcal {Z}}^t(i-1))_{t\in [0,1]}\), we have

(a):

\((U_i(j))^{\downarrow }_{i< j\le n}\) is the reordering in decreasing order of \(n-i-1\) i.i.d. [0, 1]-uniform random variables,

(b):

\((\Delta Z^t(i))_{t\in [0,1]} = (\#\{k : U_i(k)\le t, i<k\le n, k\not \in {\mathcal {Z}}^t(i-1) \} -\mathbf {1}_{ \{ {\mathcal {Z}}^t(i-1)\ne \varnothing \} })_{t\in [0,1]}\), and

(c):

\((S^t(i))_{t\in [0,1]} = (\#\{k : U_i(k)\le t, i<k\le n, k\in {\mathcal {Z}}^t(i-1) \})_{t\in [0,1]}.\)

Proof

We define the sets \(({\mathcal {Z}}^t(i),1\le i \le n)\) and the Prim ordering \((u_1,u_2,\dots , u_n)\) simultaneously. For \(i\ge 1\), \(V_i\) denotes the set of nodes \(\{u_1,\dots , u_i\}\). We proceed by induction on \(i\ge 1\) and prove (a), (b) and (c) simultaneously for every \(t\in [0,1]\). Initially, we have \({\mathcal {Z}}^t(0)=\varnothing \) and \(i=1\), \(u_1=1\), \(V_1=\{u_1\}\). The weights \((W(u_1,k))_{1<k\le n}\) of the \(n-1\) edges which have \(u_1\) as an end point are independent and uniformly distributed on [0, 1], and by definition of \(\mathbf{G}_t\), for every \(t\in [0,1]\),

$$\begin{aligned} \Delta Z^{t}(1)&= \#\{v \in [n]{\setminus } V_1: W(1,v)\le t\}\\&= \#\{k \in [n]{\setminus } V_1: U_1(k)\le t, k\not \in {\mathcal {Z}}^{t}(0)\}, \end{aligned}$$

which proves the base case since \({\mathcal {Z}}^t(0)=\varnothing \), for every \(t\in [0,1]\). Furthermore, \({\mathcal {S}}^t(1)\subseteq {\mathcal {Z}}^t(0) = \varnothing \). This proves all three claims for every \(t\in [0,1]\), in the case that \(i=1\).

Suppose now that the claims (a), (b) and (c) hold true for all \(j\in \{1,2,\dots , i\}\), and consider Prim’s algorithm just after \(u_{i}\) has been defined. By definition, the node \(u_{i+1}\) is the node v in \([n]{\setminus } V_i\) for which the distance \(d(v, V_{i})\) is minimised. In particular, conditional on \(({\mathcal {Z}}^t(i))_{t\in [0,1]}\),

$$\begin{aligned} d(u_{i+1},V_i)=\inf \{t\ge 0: {\mathcal {Z}}^t(i)\ne \varnothing \}. \end{aligned}$$

Observe that the choice of \(u_{i+1}\) is done by looking at the weights of the edges \(W(u_j,v)\) for \(j\le i\) and \(v\not \in V_i\). In particular, this choice is independent of the weights \((W(u_{i+1},k): k\not \in V_{i+1})\), so that these weights are also i.i.d. uniform on [0, 1], conditionally on the nodes rank \(u_1,\dots ,u_{i+1}\). This is even true conditionally on \(({\mathcal {Z}}^t(i))_{t\in [0,1]}\) since these random variables only depend on the weights \(W(u_j,v)\), \(1\le j\le i\), \(v\in [n]\). Observe that once \(u_{i+1}\) is defined, the collection (the set) of weights to its neighbours in \([n]{\setminus } V_{i+1}\) is fixed, even if we do not know yet the precise Prim order induced on \([n]{\setminus } V_{i+1}\), so (a) follows readily.

To prove (b), observe that we have the following disjoint union (denoted by \(\bigsqcup \)),

$$\begin{aligned} {\mathcal {Z}}^t(i+1)&= ( {\mathcal {Z}}^t(i) {\setminus } V_{i+1} ) \bigsqcup \,\, ( \{v\in [n]{\setminus } V_{i+1} : W(u_{i+1}, v)\le t\} {\setminus } {\mathcal {Z}}^t(i) )\nonumber \\&= ( {\mathcal {Z}}^t(i) {\setminus } V_{i+1} ) \bigsqcup \,\, \{k: U_{i+1}(k)\le t, i+1<k\le n, k\not \in {\mathcal {Z}}^t(i)\}, \end{aligned}$$
(18)

which expresses the fact that we canonically assign the elements v of \({\mathcal {Z}}^t(i)\) to the first node \(u_j\in V_i\) for which \(W(u_j,v)\le t\). The first set of (18) is easy to deal with. Indeed, by Proposition 13, we have:

  • if \({\mathcal {Z}}^t(i)=\varnothing \) none of the nodes in \(V_i\) is connected to any of the nodes in \([n]{\setminus } V_i\) by an edge of \(E_t\);

  • if \({\mathcal {Z}}^t(i)\ne \varnothing \) some nodes of \(V_i\) are connected to some nodes in \([n]{\setminus } V_i\). But then, by Proposition 13, \(u_{i+1}\in {\mathcal {Z}}^t(i)\) and there are \(Z^t(i)-1\ge 0\) nodes of \([n]{\setminus } V_{i+1}\) which are connected to some node of \(V_i\).

So, in any case, there are \((Z^t(i)-1)_+\) nodes of the set \([n]{\setminus } V_{i+1}\) which are already connected to nodes in \(V_i\) by edges of \(E_t\), and we have

$$\begin{aligned} |{\mathcal {Z}}^t(i) {\setminus } V_{i+1}| = (Z^t(i)-1)_+. \end{aligned}$$
(19)

The representation of (b) follows immediately, and (c) is straightforward from the definition. \(\square \)

Corollary 20

There exists a collection \((U_i^\star (j))_{1\le i<j\le n}\) of i.i.d. random variables uniformly distributed on [0, 1] such that, for every \(i\in [n]\), the family \((U^\star _i(j))_{i\le j\le n}\) is independent of \((Z^t(i))_{t\in [0,1]}\). Furthermore, conditionally on \((Z^t(i))_{t\in [0,1]}\), we have for every \(t\in [0,1]\)

(a):

\(\Delta Z^t(i) = \#\{k : U^\star _i(k)\le t, i+(Z^t(i-1)-1)_+<k\le n \} -\mathbf {1}_{ \{ Z^t(i-1)>0 \} }\), and

(b):

\(S^t(i) = \#\{k : U^\star _i(k)\le t, i<k\le i+(Z^t(i-1)-1)_+\}.\)

Proof

The proof consists simply in observing that \(\#({\mathcal {Z}}^t(i-1){\setminus } V_{i})=(Z^t(i-1)-1)_+\) by (19), and in reordering the random variables, \((U_i(j))_{i<j\le n}\) for every fixed i in such a way that they are hit by the set \({\mathcal {Z}}^t(i-1)\) in increasing order of index as time increases. So fix \(i\ge 2\) and define \(t^i_{i+1}=\inf \{t\ge 0: \#{\mathcal {Z}}^t(i-1)\ge 1\}\). Note that a.s., \({\mathcal {Z}}_{i-1}(t^i_{i+1})\) contains a single element which we denote by \(\pi _i(i+1)\). Then, for \(i<j<n\), let \(t^i_{j+1}=\inf \{t> t^i_{j}: \#{\mathcal {Z}}^t(i-1)>\#{\mathcal {Z}}^{t-}(i-1)\}\) and define \(\pi _i(j+1)\) to be the a.s. unique element of \({\mathcal {Z}}^{t^i_{j+1}}(i\!-\!1){\setminus } {\mathcal {Z}}^{t^i_{j+1}-}(i\!-\!1)\). For \(i<j\le n\), define \(U^\star _i(j)=U_i(\pi _i(j))\). In order to conclude the proof, observe that the permutation \(\pi _i\) is depends only on \(({\mathcal {Z}}^t(i-1))_{t\in [0,1]}\), and is therefore independent of \((U_i(j))_{i<j\le n}\). \(\square \)

Using Lemma 14, the process \((Z^t(i),i\ge 0)\) seems to be a nice tool to study \(\mathsf{CC}(\mathbf{G}_t)\), but it is not since its convergence is not sufficient to entails the convergence of the sizes of the excursions . To circumvent this problem, the idea, already exploited by [3] and [14], is to use a companion process Y which has a drift and for which the lengths of the excursions above the current minimum converge.

For this, we use the settings of Corollary 20, and for \(0\le i \le n, t\in [0,1]\) we set

$$\begin{aligned} X^{n,t}_\times (i)&= \#\{k : U^\star _i(k)\le t, i+(Z^t(i-1)-1)_+<k\le n \}\end{aligned}$$
(20)
$$\begin{aligned} Y_\times ^{n,t}(i)&= \sum _{j=1}^i (X^{n,t}_\times (j)-1). \end{aligned}$$
(21)

Lemma 21

For every \(t\in [0,1]\), we have \(Z^{n,t}_\times =\Psi Y^{n,t}_\times \). In particular, by Lemma 2 the excursions of \(Y_\times ^{n,t}\) above its current minimum coincide with the excursions of \(Z^{n,t}\) away from zero.

Proof

The processes \(Z_\times ^{n,t}\) and \(Y^{n,t}_\times \) have the same increments, except when \(Z^t_{i-1}=0\), in which case \(\Delta Y^{n,t}_\times (i)=\Delta Z_\times ^{n,t}(i)-1\). The conclusion follows easily. \(\square \)

We are now ready to prove the convergence of \(\mathsf{y}^{n }_\times \). The proof is quite similar to that of Theorem 10. Again it is sufficient to show the weak convergence in \({\mathbb {D}}([\lambda _\star ,\lambda ^\star ],{\mathbb {C}}([0,a],{\mathbb {R}}))\) for \(-\infty <\lambda _\star <\lambda ^\star <+\infty \) and \(a>0\) fixed. We start by revisiting the proof of Aldous [3] of (4). Again, Lemma 15 entails that \(\mathsf{y}^{n,(\lambda )}_\times \) has same distribution as Aldous’ version \(y^{n,(\lambda )}\) associated with breadth first order.

6.2 Proof of convergence of \(\mathsf{y}^{n,(\lambda )}_\times \) for a fixed \(\lambda \)

Here \(\lambda \) is fixed, and we obtain the convergence in \({\mathbb {C}}([0,a],{\mathbb {R}})\) of \(\mathsf{y}^{n,(\lambda )}_\times \). Aldous proved (4) using the approximation of the Markov chain \(Y^{n,(\lambda )}\) by a diffusion, the convergence being on \({\mathbb {D}}([0,1],{\mathbb {R}})\); here, we provide a proof that is simpler and more easily extendable. For short, we use the following notation

$$\begin{aligned} X^{(\lambda )}(i)&=X^{n,p_\lambda (n)}_\times (i)\\ Y^{(\lambda )}(i)&=Y^{n,p_\lambda (n)}_\times (i)\\ Z^{(\lambda )}(i)&=Z^{n,p_\lambda (n)}_\times (i). \end{aligned}$$

Define also \(\bar{X}^{(\lambda )}(i)\) and \(\underline{X}^{(\lambda )}(i)\) by

$$\begin{aligned} \bar{X}^{(\lambda )}(i)&=\#\{k : U^\star _i(k)\le p_\lambda (n), i <k\le n \} \sim \mathsf{Bin}(n-i,p_\lambda (n))\\ \underline{X}^{(\lambda )}(i)&=\#\{k : U^\star _i(k)\le p_\lambda (n), i+n^{1/2} <k\le n \} \sim \mathsf{Bin}(n-i-n^{1/2},p_\lambda (n)). \end{aligned}$$

(The term \(n^{1/2}\) in the definition of \(\underline{X}^{(\lambda )}(i)\) may be replaced by \(n^{\alpha }\), for any \(\alpha \in [1/3,2/3]\).) Here, in Sect. 6.2, \(\lambda \) is fixed and we further shorten the notation by drop** the superscripts indicating its value. Then define \(\bar{Y}(i)\) and \(\bar{Z}(i)\) (resp. \(\underline{Y}(i)\) and \(\underline{Z}(i)\)) with \(\bar{X}\) (resp. \(\underline{X}\)) in the same way that Y and Z are defined with X in (20). The random variables X(i), \(\bar{X}(i)\) and \(\underline{X}(i)\) are all defined with the same uniforms, and, as long as \(Z_i\le n^{1/2}\), we have

$$\begin{aligned} \underline{X}(i)\le X(i) \le \bar{X}(i). \end{aligned}$$

In particular, this inequality holds at least until the first time when Z(i) exceeds \(n^{1/2}\), which is no earlier than the first time i when \(\bar{Z}(i)\ge n^{1/2}\), that we denote by \(\tau _{n^{1/2}}(\bar{Z})\). Now, consider some times \(t_0:=0<t_1<\dots <t_\kappa \le a\), for some \(\kappa \ge 1\), and let us investigate the convergence of \((\Gamma (n^{2/3}t_j),1\le j \le \kappa )\) for \(\Gamma =\bar{Y}\) and \(W=\underline{Y}\). Set \(\Delta t_j=t_j-t_{j-1}\) and \(\Delta _2 t_j= t_j^2-t_{j-1}^2\). The increments \((\Delta \Gamma (n^{2/3}t_j):=\Gamma (n^{2/3}t_j)-\Gamma (n^{2/3}t_{j-1}),1\le j \le n)\) are independent, and for \(1\le j \le \kappa \),

$$\begin{aligned} \Delta \bar{Y}(n^{2/3}t_j)&=\#\{k : U^\star _m(k)\le p_\lambda (n), m <k\le n, n^{2/3}t_{j-1}<m\le n^{2/3}t_j \} {-}n^{2/3}\Delta t_j \\&\sim \mathsf{Bin}\bigg (n^{5/3}\Delta t_j{-}\frac{n^{4/3}}{2}\Delta _2 t_j+\varepsilon _1,p_\lambda (n)\bigg ) {-}n^{2/3}\Delta t_j \end{aligned}$$

and

$$\begin{aligned} \Delta \underline{Y}(n^{2/3}t_j)&=\#\{k : U^\star _m(k)\le p_\lambda (n), m+n^{1/2} \\&<k\le n, n^{2/3}t_{j{-}1}<m\le n^{2/3}t_j \} {-}n^{2/3}\Delta t_j\\&\sim \mathsf{Bin}\bigg (n^{5/3}\Delta t_j-\frac{n^{4/3}}{2}\Delta _2 t_j -n^{2/3+1/2}\Delta t_j+\varepsilon _2,p_\lambda (n)\bigg )-n^{2/3}\Delta t_j, \end{aligned}$$

where \(\varepsilon _1\) and \(\varepsilon _2\) account for the error made by replacing \(k(k+1)^2/2\) by \(k^2/2\) and by the approximation of the fractional parts; in particular, \(\varepsilon _1,\varepsilon _2=O(n^{2/3})\) and eventually negligible. Recall that \(p_\lambda (n)=1/n+\lambda n^{-4/3}\). By the central limit theorem, for every \(j\in \{1,\dots , \kappa \}\) we have

$$\begin{aligned} \frac{\Delta \bar{Y}(n^{2/3}t_j) -\Delta \underline{Y}(n^{2/3}t_j) }{n^{1/3}}\xrightarrow [n]{(d)}0 \end{aligned}$$

and

$$\begin{aligned} \frac{\Delta \bar{Y}(n^{2/3}t_j)}{n^{1/3}}\xrightarrow [n]{(d)}{\mathcal {N}}\bigg (\lambda (t_j-t_{j-1}) -\frac{(t_j^2-t_{j-1}^2)}{2},t_j-t_{j-1}\bigg ) \end{aligned}$$

where \({\mathcal {N}}(\mu ,\sigma ^2)\) denotes the normal distribution with mean \(\mu \) and variance \(\sigma ^2\). This provides the convergence of the fdd of \(\bar{Y}\) and \(\underline{Y}\) to those of the process \((\mathsf{y}^{(\lambda )}_{\times }(x), x\ge 0)\). This also implies that \(\tau _{n^{1/2}}(\bar{Z})/n^{2/3}\rightarrow +\infty \). Then, the tightness of Y follows from that of \(\bar{Y}\) and \(\underline{Y}\), and these ones are consequences of a simple control of the fourth moment of

$$\begin{aligned} \frac{\Delta \bar{Y}(n^{2/3}t_j)}{n^{1/3}} -\bigg (\lambda (t_j-t_{j-1})-\frac{(t_j^2-t_{j-1}^2)}{2}\bigg ) \end{aligned}$$

which is \(3(t_j-t_{j-1})^2+O(n^{-1/3})\) the \(O(\,\cdot \,)\) term being independent of \(0\le t_j,t_{j-1}\le a\) (we just used here that the fourth centred moment of \(\mathsf{Bin}(n,q)\) is \(nq(1-q)(1+(3n-6)(q-q^2))\). The same estimate with a different \(O(\,\cdot \,)\) term holds for \(\underline{Y}\). This completes the proof of (4).

We now slightly adapt the proof to get the full convergence of the bi-dimensional process as stated in Theorem 7. We will prove the convergence in \({\mathbb {D}}([\lambda _\star ,\lambda ^\star ]\times [0,a], {\mathbb {R}})\) which will be sufficient to conclude.

6.3 Proof of the convergence of the fdd of \(\mathsf{y}^{n}_\times \) to those of \(\mathsf{y}_\times \)

In the additive case, we used three main ingredients: the convergence of \(\mathsf{y}^{n,(0)}_+\) to \(\mathsf{y}^{(0)}_+\) (stated in (8)), a global bound on the increments of \(\mathsf{y}^{n,(\lambda )}_+\) (the bound in (10)), and a global bound on \((\mathsf{y}^{n,(0)}_+,n\ge 0)\) (in (11)). The proof proceeding by comparing \(\mathsf{y}^{n,(\lambda )}_+\) with \(\mathsf{y}^{n,(0)}_+\), and the expressing the limiting behaviour of \(\mathsf{y}^{n,(\lambda )}_+\) in terms of \(\mathsf{y}^{(0)}_+\), the limit of \(\mathsf{y}^{n,(0)}_+\).

Here we rely on the same ingredients. The first one is (4), taken at \(\lambda ^\star \)

$$\begin{aligned} \mathsf{y}^{n,(\lambda ^\star )}_\times \xrightarrow [n]{(d)}\mathsf{y}^{(\lambda ^\star )}_\times , \end{aligned}$$
(22)

and we will express the limit of \(\mathsf{y}^{n,(\lambda )}_\times \) for \(\lambda \in [\lambda _\star ,\lambda ^\star ]\) in terms of \(\mathsf{y}^{(\lambda ^\star )}_\times \). The convergence (22) implies that

$$\begin{aligned} \sup _{x\in [0,a]}|\mathsf{y}^{n,(\lambda ^\star )}_\times (x)| \xrightarrow [n]{(d)}\sup _{x\in [0,a]}|\mathsf{y}^{(\lambda ^\star )}_\times (x)|<_{a.s.}+\infty . \end{aligned}$$

We rely on the approximations \(\bar{X}^{(\lambda )}\) and \(\underline{X}^{(\lambda )}\) introduced in the previous section, and recall that all the variables \(X,\bar{X},\underline{X}\) are all defined on the same probability space (the space where are defined the \(U^\star \)).

Clearly, we have

$$\begin{aligned} \sup _{0\le i \le an^{2/3}} X^{(\lambda )}(i)\le \max _{0\le i \le an^{2/3}} \bar{X}^{(\lambda ^\star )}(i), \end{aligned}$$

where the \(\bar{X}^{(\lambda ^\star )}(i)\), \(1\le i\le an^{2/3}\), are independent \(\mathsf{Bin}(n-i,p_{\lambda ^\star }(n))\) random variables. Thus, for \(\varepsilon >0\) small,

$$\begin{aligned} {\mathbb {P}}\bigg (\sup _{\lambda \in [\lambda _\star ,\lambda ^\star ]} \sup _{0\le i \le a n^{2/3}} X^{(\lambda )}(i) \ge n^{\varepsilon }\bigg )&\le a n^{2/3} \cdot {\mathbb {P}}(\mathsf{Bin}(n,p_{\lambda ^\star }(n))\ge n^{\varepsilon })\\&=O(\exp (-n^{\epsilon }/2)). \end{aligned}$$

Again by Skorokhod representation theorem, we consider a space where the convergence (22) holds with probability one. Let us prove that for \(\lambda \in [\lambda _\star ,\lambda ^\star ]\),

$$\begin{aligned} \sup _{0\le x \le a} |\mathsf{y}^{n,(\lambda ^\star )}_\times (x)-\mathsf{y}^{n,(\lambda )}_\times (x) -(\lambda ^\star -\lambda )x|\xrightarrow [n]{(proba.)}0, \end{aligned}$$
(23)

which suffices to get the convergence of the fdd. It suffices to get the same result with \(\bar{\mathsf{y}}^{n,(\lambda ^\star )}_\times (x)-{\bar{\mathsf{y}}}^{n,(\lambda )}_\times (x)\) and \(\underline{\mathsf{y}}^{n,(\lambda ^\star )}_\times (x)-{\underline{\mathsf{y}}}^{n,(\lambda )}_\times (x)\) instead (where \(\bar{\mathsf{y}}\) and \(\underline{\mathsf{y}}\) are defined from \(\bar{Y}\) and \(\underline{Y}\), respectively, in the same way that \(\mathsf{y}\) is defined from Y). But for these processes this follows from the fact that

$$\begin{aligned} \bar{Y}^{(\lambda ^\star )}(n^{2/3}x)-\bar{Y}^{(\lambda )}(n^{2/3}x) \sim \mathsf{Bin}\bigg (n^{5/3}x-\frac{n^{4/3}}{2}x^2+\varepsilon _n,p_{\lambda ^\star }(n)-p_{\lambda }(n)\bigg ). \end{aligned}$$

The same argument for \({\underline{Y}}^{(\lambda )}(n^{2/3}x)\) allows one to conclude.

6.4 Tightness of the sequence \((\mathsf{y}^{n}_\times )_{n\ge 1}\)

We mimic what is done for the proof of the tightness of \((\mathsf{y}^{n}_+)_{n\ge 1}\) in Sect. 5. We consider here the modulus of continuity

$$\begin{aligned} w_{\delta }(\mathsf{y}^{n}_\times )=\sup \{ \Vert \mathsf{y}^{n,(\lambda _1)}_\times -\mathsf{y}_\times ^{n,(\lambda _2)} \Vert _\infty ~: |\lambda _1-\lambda _2|\le \delta , \lambda _\star \le \lambda _1,\lambda _2 \le \lambda ^\star \}, \end{aligned}$$

where \(\Vert f\Vert _\infty =\sup \{|f(x)|:0\le x\le a\}\). Again, for any \(\varepsilon >0\), any small fixed \(b>0\), the event

$$\begin{aligned} B_n:=\{\max _{1\le i \le n} X^{n,(\lambda ^\star )}_\times (i)\le n^{b}, \Vert \mathsf{y}^{n,(\lambda ^\star )}_\times \Vert _\infty \le n^{2/3}\} \end{aligned}$$

has probability at least \(1-\varepsilon \) for n large enough.

As in the proof of the tightness of \(\mathsf{y}_+^{n,(\lambda )}\), one discretises the time parameter \(p_{\lambda }(n)\). For \(1\le j\le n\), we define

$$\begin{aligned} \lambda _j = \lambda _j(n) = \lambda _{\star }+\frac{j}{n}(\lambda ^\star -\lambda _\star ). \end{aligned}$$

When j goes from 0 to n, \(p_{\lambda [k]}(n)=1/n+\lambda _j/n^{4/3}\) goes from \(p_{\lambda _\star }(n)\) to \(p_{\lambda ^\star }(n)\). For \(j\in \{0,\ldots ,n\}\), define \(\mathsf{y}^{n,[\lambda _j]}_\times =\mathsf{y}^{n,(\lambda _j)}_\times \); then each one of the processes \(\mathsf{y}^{n,[\lambda _j]}\) is already continuous in [0, a]. In order to obtain a process that is continuous for \(\lambda \in [\lambda _j,\lambda _{j+1}]\) and \(x\in [0,a]\), \(\mathsf{y}^{n,[\lambda ]}_\times (x)\) is obtained by linear interpolation of \(\mathsf{y}^{n,[\lambda _j]}_\times (x)\) and \(\mathsf{y}^{n,[\lambda _{j+1}]}_\times (x)\). Instead of proving the tightness of \(\lambda \mapsto (x\mapsto \mathsf{y}^{n,(\lambda )}_\times (x))\) in \({\mathbb {D}}([\lambda _\star ,\lambda ^\star ],{\mathbb {C}}([0,a],{\mathbb {R}}))\), we prove the tightness of \((\lambda ,x)\mapsto \mathsf{y}^{n,[\lambda ]}_\times (x)\) in \({\mathbb {C}}([\lambda _\star ,\lambda ^\star ]\times [0,a],{\mathbb {R}})\). For this we use a moment criterion, that is, a bound of the type

$$\begin{aligned} {\mathbb {E}}[| \mathsf{y}^{n,[\lambda _2]}_\times (s_2)-\mathsf{y}^{n,[\lambda _1]} _\times (s_1)|^{\alpha }] \le C \Vert (s_1,\lambda _1),(s_2,\lambda _2)\Vert ^{2+\beta }, \end{aligned}$$

for some norm \(\Vert \,\cdot \,\Vert \), where \(\alpha ,\beta >0\) and C is a constant. Again, we get these bounds for \(\bar{\mathsf{y}}^{n}_\times \) and \(\underline{\mathsf{y}}^{n}_\times \) instead which is again sufficient to conclude.

However, for \(\lambda _2\ge \lambda _1\), and \(0\le t_1,t_2\le a\), we have

$$\begin{aligned} \bar{\mathsf{y}}_\times ^{n,[\lambda _2]}(t_2)-\bar{\mathsf{y}}_\times ^{n,[\lambda _1]}(t_1)\sim \mathsf{Bin}(N,q) \end{aligned}$$

with

$$\begin{aligned} N&= n^{5/3}(t_1\vee t_2-t_1\wedge t_2)-\frac{n^{4/5}}{2}((t_1\vee t_2)^2-(t_1\wedge t_2)^2)\\ q&= p_{\lambda _2}(n)-p_{\lambda _1}(n)=\frac{\lambda _2-\lambda _1}{n^{4/3}}. \end{aligned}$$

Again, a simple control of the fourth moment, the same as that of Sect. 6.2 suffices to conclude. The control of \(\underline{\mathsf{y}}^{n}_\times \) is done along the same lines. This suffices to conclude since the limits of \(\bar{\mathsf{y}}^{n,[\lambda ^\star ]}\) and \(\underline{\mathsf{y}}^{n,[\lambda ^\star ]}\) are both the same, as well as that of \(\bar{\mathsf{y}}^{n,[\lambda _\star ]}\) and of \(\underline{\mathsf{y}}^{n,[\lambda _\star ]}\), and since \(\underline{\mathsf{y}}^{n}\le \mathsf{y}\le \bar{\mathsf{y}}^{n}\).

7 Standard coalescents: proofs of Theorem 3 and 5

7.1 Proof of Theorem 3

Recall that a subset K of \(\ell ^p\) has a compact closure if the following conditions hold:

(i):

for any i , \(\{x_i ~: x \in K\}\) is bounded, and

(ii):

for any \(\varepsilon >0\), there exists \(n\ge 1\) such that for all \(x \in K\), \(\sum _{k\ge n} |x_k|^p\le \varepsilon \).

Here, we deal with sequences of excursion-lengths that are non-negative by nature. So from now on, we focus on the subspace \(\ell ^p_{\ge 0}=\ell ^p \cap \left\{ x=(x_i,i\ge 1), x_i\ge 0, \forall i\right\} \). Define the operator \(S^p:\ell ^p_{\ge 0}\rightarrow [0,+\infty )^{\mathbb {N}}\) by

$$\begin{aligned} S^p(x)=\left(\sum _{j=1}^k x_j^p, k\ge 1\right) \end{aligned}$$

the map associating to x its sequence of the p-partial sums. The map \(S^p\) is continuous and injective. Write \(K'=S^p(K)\). If K satisfies (i) and (ii) then:

(i’):

for any \(k\ge 1\), \(\{y_k: y\in K'\}\) is bounded;

(ii’):

for any \(y\in K'\), the sequence \((y_k)_{k\ge 1}\) is non negative, non-decreasing, convergent, and for any \(\varepsilon >0\), there exists \(n\ge 1\), such that for any \(y\in K'\), any \(m\ge n\), \(|y_m-\lim _k y_k|\le \varepsilon \).

Conversely, if \(K'\) satisfies (i’) and (ii’) then \((S^p)^{-1}(K')\) has a compact closure. Indeed, notice that \(S^p(\ell ^p_\ge 0)\) is exactly the set of non negative, non-decreasing sequence with a finite limit. For such a sequence \(y=(y_i,i\ge 1)\), \((S^p)^{-1}(y)=((y_i-y_{i-1})^{1/p},i\ge 1)\), with \(y_{0}=0\); one deduces that if \(K'\) satisfies (i)’ and (ii’) then \((S^p)^{-1}(K')\) satisfies (i) and (ii).

We now come back to our model: \(\lambda \rightarrow \varvec{\gamma }^{+,n}(\lambda )\) and \(\lambda \rightarrow \varvec{\gamma }^{\times ,n}(\lambda )\) that can be considered as elements of \(\ell _{\ge 0}^1\) and \(\ell _{\ge 0}^2\) respectively. An important property is that for any k, any n, \(\lambda \rightarrow S^1(\varvec{\gamma }^{+,n}(\lambda ))_k\) and \(\lambda \rightarrow S^2(\varvec{\gamma }^{\times ,n}(\lambda ))_k\) are both non decreasing (recall that the sequences \(\varvec{\gamma }\) are sorted). The second point comes from the fact that \((x_i+x_j)^2\ge x_i^2+x_j^2\) implying that the coalescence of any two clusters will never decrease \(S^2(x)_k\).

Lemma 22

Let \(I=[\lambda _\star ,\lambda ^\star ]\subset {\mathbb {R}}\), and let \(p\ge 0\). Let \({\mathcal {K}}\) be a subset of \({\mathbb {D}}(I,\ell ^p_{\ge 0})\). Assume that the following conditions hold : 

(a):

for any \(f \in {\mathcal {K}}\), any integer \(k\ge 1\), \(\lambda \mapsto S^p(f(\lambda ))_k\) is non-decreasing,

(b):

for any \(k\ge 1\), \(\{f(\lambda ^\star )_k~:f \in {\mathcal {K}}\}\) is bounded, and

(c):

for any \(\epsilon >0\), there exists \(M\ge 0\) such that for any \(m\ge M\), for any \(f\in {\mathcal {K}}\) we have

$$\begin{aligned} \inf _{\lambda \in I}S^p(f(\lambda ))_m \ge \sup _{\lambda \in I} \lim _{k\rightarrow \infty } S^p(f(\lambda ))_k -\epsilon . \end{aligned}$$

Then \({\mathcal {K}}\) is relatively compact.

Proof

Let \(f_n\) be a sequence of functions in \({\mathcal {K}}\). One may extract from \(f_n\) a subsequence \((f_{n[k]},k \ge 0)\) such that \(\lambda \rightarrow S(f_n(\lambda ))_1\) converges in \({\mathbb {D}}(I,{\mathbb {R}})\) (any sequence of bounded non decreasing functions on I has an accumulation point in \({\mathbb {D}}(I,{\mathbb {R}})\)). From a diagonal procedure one may further extract a subsequence \((f_{n[k]},k \ge 0)\) such that \(\lambda \rightarrow S(f_n(\lambda ))_k\) converges in \({\mathbb {D}}(I,{\mathbb {R}}^k)\) for any \(k \le K\) (for any K). Condition (c) provides the necessary tightness and allows one to conclude. \(\square \)

Lemma 22 permits to complete the proof of Theorem 3. We start with the additive case. Note that in this case, \(\Vert \varvec{\gamma }^{+,n}(\lambda )\Vert _1=1\), for every \(\lambda \), so (b) is satisfied. Condition (a) also clearly holds. Furthermore, for every integer \(m\ge 1\), the tail

$$\begin{aligned} \Vert \varvec{\gamma }^{+,n}(\lambda )\Vert _1-S^1(\varvec{\gamma }^{+,n}(\lambda ))_m=\sum _{i> m} \gamma ^{+,n}(\lambda ) \end{aligned}$$

is largest at \(\lambda =\lambda _\star \), and for Condition (c) in Lemma 22 to be satisfied, it suffices that there exists m such that for any \(f\in {\mathcal {K}}\), we have

$$\begin{aligned} S^1(f(\lambda _\star ))_m \ge \lim _{k\rightarrow \infty } (S^1(f(\lambda _\star ))_k)-\varepsilon . \end{aligned}$$

Let \(I=[\lambda _\star ,\lambda ^\star ]\in (-\infty ,0]\). By Theorem 10, \(\mathsf{y}^{n}_+ \rightarrow \mathsf{y}_+\) in distribution in \({\mathbb {D}}(I,{\mathbb {C}}([0,1],{\mathbb {R}}))\), which is separable. By Skorokhod representation theorem, there exists a probability space on which this convergence holds a.s. Let us work on this space and use the same names for the variables \(\mathsf{y}^{n}_+\) and \(\mathsf{y}_+\). The sum of excursion lengths above the minimum of \(\mathsf{y}_+(\lambda _\star )\) is 1; so for any \(\varepsilon >0\), there exists a \(k\ge 1\) such \(S^1(\varvec{\gamma }^+(\lambda _\star ))_k\ge 1-\varepsilon /2\). Hence, a.s. for all n large enough, \(S^1(\varvec{\gamma }^{n,+}(\lambda _\star ))_k\ge 1-\varepsilon \), so that for any \(\lambda \in [\lambda _\star ,\lambda ^\star ]\), we have \(S^1(\varvec{\gamma }^{n,+}(\lambda ))_k\ge 1-\varepsilon \), which proves that (c) holds. By Lemma 22, the family of functions \((\varvec{\gamma }^{+,n}(\lambda ))_{\lambda \in [\lambda _\star ,\lambda ^\star ]}\), \(n\ge 1\), is tight in \({\mathbb {D}}([\lambda _\star ,\lambda ^\star ], \ell ^1_\downarrow )\). Besides, on the space on which we work, a.s. for any \(\lambda _\star \le \lambda _1<\cdots \lambda _\ell \le \lambda ^\star \), any \(1\le j \le \ell \), we have \(\max _{i\le k}|S(\varvec{\gamma }^{n,+})(\lambda _j)_i-S(\varvec{\gamma }^{+}(\lambda _j))_i|\rightarrow 0\), so that we have convergence in distribution.

In the multiplicative coalescent, conditions (a) and (b) easily follow from the monotonicity and convergence in distribution of \(\varvec{\gamma }^{\times ,n}(\lambda ^\star )\), as \(n\rightarrow \infty \). Here, we are working in \(\ell ^2\) and since the norm is growing with \(\lambda \) the tail of \(\varvec{\gamma }^{+,n}(\lambda )\) is a priori not monotonic in \(\lambda \). Thus, we cannot bound the supremum for \(\lambda \in I\) by the value at \(\lambda _\star \) as we did for the additive case and (c) is a little more delicate. However, for any \(m\ge 1\), the only way for the tail

$$\begin{aligned} \Vert \varvec{\gamma }^{\times ,n}(\lambda )\Vert _2^2 - S^2(\varvec{\gamma }^{\times ,n}(\lambda ))_m = \sum _{i>m} \gamma ^{\times ,n}(\lambda )^2 \end{aligned}$$

to decrease is that some mass gets moved to some component of index at most m because of a coalescent that involves at least one component of index smaller that m. In other words if we suppress the m largest components, and let only the components of the tail evolve on their own according to the dynamics of the multiplicative coalescent, then their 2-mass is monotonic (and their \(\Vert .\Vert _2\) norm is larger than their contribution to the \(\ell ^2\) norm in the presence of the m first ones). Hence, one may build on the same probability space a coupling of the initial coalescent and the one with the m largest components suppressed.

For \(m\ge 1\), define the m-tail to be the sequence

$$\begin{aligned} \varvec{\tau }^{[m],n}(\lambda ):=(\gamma ^{\times ,n}_{m+1} (\lambda ),\gamma ^{\times ,n}_{m+2}(\lambda ),\dots ) \in \ell ^2_\downarrow . \end{aligned}$$
(24)

Let \(\mathbf{T}^{[m],n}(\lambda )\) denote the state of the multiplicative coalescent at time \(\lambda \) when started from \(\mathbf{T}^{[m],n}(\lambda _\star )=\varvec{\tau }^{[m],n}(\lambda _\star )\). Then, by the previous discussion we have (on a space) for \(\lambda \in [\lambda _\star ,\lambda ^\star ]\),

$$\begin{aligned} \Vert \varvec{\tau }^{[m],n}(\lambda )\Vert _2^2 \le \Vert \mathbf{T}^{[m],n}(\lambda )\Vert _2^2. \end{aligned}$$
(25)

Now, in order to prove that (c) in Lemma 22 holds, we rely on the Feller property. First note that, by the triangle inequality, for every \(m,n\ge 1\),

$$\begin{aligned} \Vert \varvec{\tau }^{[m],n}(\lambda _\star )\Vert _2 \le \Vert \varvec{\tau }^{[m]}(\lambda _\star )\Vert _2 + \Vert \varvec{\gamma }^{\times ,n}(\lambda _\star )-\varvec{\gamma }^{\times }(\lambda _\star )\Vert _2. \end{aligned}$$

So the starting point of \(\mathbf{T}^{[m],n}\) can be made arbitrarily small by choice of m and n large enough, with high probability: \(\Vert \varvec{\tau }^{[m],n}(\lambda _\star )\Vert _2\rightarrow 0\) in distribution as \(n,m\rightarrow \infty \). Thus, by the (spatial) Feller property, we obtain at time \(\lambda \)

$$\begin{aligned} \Vert \mathbf{T}^{[m],n}(\lambda ^\star )\Vert _2\rightarrow 0, \end{aligned}$$

in distribution as \(m,n\rightarrow \infty \). Together with (25), this yields the desired uniform bound on \(\varvec{\tau }^{[m],n}\) on \([\lambda _\star ,\lambda ^\star ]\). It follows that (c) is satisfied with high probability, which is sufficient to prove tightness of the family of functions \((\varvec{\gamma }^{\times ,n}(\lambda ))_{\lambda \in I}\), \(n\ge 1\), in \({\mathbb {D}}(I, \ell ^2_\downarrow )\). To complete the proof of the convergence, it suffices now to prove convergence of the fdd.

To this aim, use Theorem 7 and a Skorokhod’s representation in order to obtain a sequence of functions \((\mathsf{y}_\times ^n)_{n\ge 1}\) that converge a.s. in \({\mathbb {D}}({\mathbb {R}}^+\times {\mathbb {R}}, {\mathbb {R}})\). Then, in particular, for any \(\ell \ge 1\) and for any \(\lambda _\star \le \lambda _1<\lambda _2<\cdots <\lambda _\ell \le \lambda ^\star \), we have

$$\begin{aligned} (\mathsf{y}_\times ^{n,(\lambda _1)},\dots , \mathsf{y}_\times ^{n,(\lambda _\ell )}) \rightarrow (\mathsf{y}_\times ^{(\lambda _1)},\dots , \mathsf{y}_\times ^{(\lambda _\ell )}) \end{aligned}$$

a.s. as \(n\rightarrow \infty \). The result follows from a simple adaptation of the arguments leading to Lemma 7 of [3, Section 2.3].

Proof of Corollary 4

Since the finite-dimensional distributions (fdd) characterize the law of the process, it suffices to verify that the fdd of \((\varvec{\gamma }^+(e^{-t}))_{t\in {\mathbb {R}}}\) and \((\varvec{\gamma }^\times (\lambda )_{\lambda \in {\mathbb {R}}}\) coincide with those of the standard additive and multiplicative coalescents, respectively.

Consider any natural number \(k\ge 1\) and any \(t_1<t_2<\dots <t_k\). Then, for fixed \(n\ge 1\), the vector \((\varvec{\gamma }^{+,n}(e^{-t_1}), \dots , \varvec{\gamma }^{+,n}(e^{-t_k}))\) is distributed as the values at times \(t_1,t_2,\dots , t_k\) of the additive coalescent started at time \(t_1\) in the state \(\varvec{\gamma }^{+,n}(e^{-t_1})\). Since

$$\begin{aligned} (\varvec{\gamma }^{+,n}(e^{-t_1}), \dots , \varvec{\gamma }^{+,n}(e^{-t_k})) \rightarrow (\varvec{\gamma }^{+}(e^{-t_1}), \dots , \varvec{\gamma }^{+}(e^{-t_k})) \end{aligned}$$

in distribution as \(n\rightarrow \infty \), the Feller property of the additive coalescent implies that if the coalescent is started at time \(t_1\) in state \(\varvec{\gamma }^{+}(e^{-t_1})\), then the distributions at times \(t_1,t_2,\dots , t_k\) are given by

$$\begin{aligned} (\varvec{\gamma }^{+}(e^{-t_1}), \dots , \varvec{\gamma }^{+}(e^{-t_k})). \end{aligned}$$

In other words, the fdds of \((\varvec{\gamma }^+(e^{-t}))_{t\in {\mathbb {R}}}\) are those of the an additive coalescent, so that it is in fact an additive coalescent. We have the standard additive coalescent since for a single fixed time t, \(\varvec{\gamma }^+{e^{-t}}\) is the scaling limit of a percolated Cayley tree. The proof in the multiplicative case is similar, one only needs to use Note 9 to make the process time-homogeneous for fixed n, and identify the standard multiplicative coalescent because \(\varvec{\gamma }^\times (\lambda )\) is the scaling limit of the cluster sizes of \(G(n,p_\lambda (n))\); the details are omitted. \(\square \)

7.2 Proof of Theorem 5

The representation we have of the process also yields a construction of a standard version of the augmented multiplicative coalescent of Bhamidi et al. [11], which also keeps tracks of the surplus of the connected components (the minimum number of edges to remove in order to obtain a tree). We start with Corollary 20. Consider the random variables \(U_i^\star (k)\), \(i<k\le n\), and arrange them geometrically in a field \((P^n_i(j))_{1\le i \le n, 1\le j\le n-i}\) on the first quadrant \({\mathbb {N}}\times {\mathbb {N}}\) by setting \(P_i^n(j):=U^\star _{i+1}(i+1+j)\) for \(1\le i\le n\) and \(1\le j \le k-i\). Then, the number of extra edges in G(nt) of a connected component corresponding to an interval I is precisely the number of \(P^n_i(j)\), \(i\in I\), lying below the graph of \(Z^t\) whose value is at most t (see Fig. 3).

Fig. 3
figure 3

\(Z^t\) and the field \((P^n_i(j))\) is represented by the bullets; the black ones are the ones whose value is at most t. So here, the graph represented has two connected components, each having one extra edge

Proof of Theorem 5

For every \(t=1/n+\lambda n^{-4/3}\), the Bernoulli point set\(\{(i n^{-2/3}, jn^{-1/3}): P^n_i(j)\le t\}\) converges to a Poisson point process with intensity one on \({\mathbb {R}}^+\times {\mathbb {R}}^+\). Furthermore, the limit point set is independent of \(\lambda \in {\mathbb {R}}\).

The convergence of the finite-dimensional distributions follows from arguments similar to the ones used in the proof of Theorem 3 which rely on Skorokhod’s representation theorem. So it suffices to prove convergence of the marginals. For fixed t, note that the random variables of the Bernoulli point set that are used to define \(Z^t\) and the ones used to define the surplus are distinct. It follows that, for fixed \(\lambda \), and on a space on which \(\Psi y^{(\lambda ),n}_\times \) converges a.s., we have

$$\begin{aligned} ((\gamma _i^{\times ,n}(\lambda ))_{1\le i\le k}, (s_i^n(\lambda ))_{1\le i\le k}) \rightarrow ((\gamma _i^{\times }(\lambda ))_{1\le i\le k},(s_i(\lambda ))_{1\le i\le k}), \end{aligned}$$

for every integer \(k\ge 1\). It now remains to prove that \((\varvec{\gamma }_i^{\times ,n}(\lambda ),\mathbf{s}^n(\lambda ))_{n\ge 1}\) is tight in \({\mathbb {U}}_\downarrow \). Since \(\varvec{\gamma }^{\times ,n}(\lambda )\) converges in \(\ell ^2\), it suffices to verify that for any \(\epsilon >0\), there exists K such that

$$\begin{aligned} \sup _{n\ge 1} {{\mathbb {P}}}\left( \sum _{i\ge 1} \gamma _i^{\times ,n}(\lambda ) \cdot s_i^n(\lambda ) \ge K \right) \le \epsilon . \end{aligned}$$

Let \(I_i(\lambda )\) denote the interval corresponding to the excursion of \(\Psi \mathsf{y}^{(\lambda ),n}_\times \) whose length is recorded in \(\gamma _i^{\times ,n}(\lambda )\). Let also \(a_i^n\) denote the number of integral points lying (strictly) between the horizontal axis and the graph of \(\Psi \mathsf{y}^{(\lambda ),n}_\times \) on the interval \(I_i(\lambda )\). Then,

$$\begin{aligned} {\mathbb {E}} [\gamma _i^{\times ,n}(\lambda )\cdot s_i^n(\lambda )] = p_\lambda (n) \cdot {\mathbb {E}} [\gamma _i^{\times ,n}(\lambda ) \cdot a_i^n(\lambda )]. \end{aligned}$$

However \(a_i^n\) is the area of the tree associated to the interval \(I_i(\lambda )\) in the sense of [1, Section 2]. In particular, given \(\gamma _i^{\times ,n}(\lambda )n^{2/3}=m\), \(a_i(\lambda )\) is distributed as \(a(\tilde{T}^p_m)\) there. More precisely,

$$\begin{aligned} {{\mathbb {P}}}(\tilde{T}^p_m \in {\mathcal {B}}) = \frac{{\mathbb {E}} [\mathbf {1}_{ \{ T_m\in {\mathcal {B}} \} } (1-p_\lambda (n))^{-a(T_m)}]}{{\mathbb {E}} [(1-p_\lambda (n))^{-a(T_m)}]}, \end{aligned}$$

where \(T_m\) is a uniformly random tree on [m], but we shall only use that \(a(\tilde{T}^p_m) \le m h(\tilde{T}^p_m)\), where h(T) denotes the height of the labelled tree T, and refer to some bounds on the height of \(\tilde{T}_m^p\) proved in [1]. By Lemma 25 there and Jensen’s inequality, there is a constant C such that, for all \(1\le m\le n\), we have

$$\begin{aligned} {\mathbb {E}} [a(\tilde{T}_m^p)]\le m {\mathbb {E}} [h(\tilde{T}_m^p)] \le C \max \{m^{3/2}/n, 1\} \cdot m^{3/2}. \end{aligned}$$
(26)

Note that here, m is \(n^{2/3}\gamma _i^{\times ,n}\) for some \(i\ge 1\), and \(m^{3/2}/n\le (\gamma _1^{\times ,n})^{3/2}\le \Vert \varvec{\gamma }^{\times ,n}\Vert ^{3/2}_2\). Since for n large enough we have \(p_\lambda (n)\le 2/n\), it follows that

$$\begin{aligned}&{\mathbb {E}}\left[ \sum _{i\ge 1} \gamma _i^{\times ,n}(\lambda )\cdot s_i^n(\lambda ) \,\bigg | \,\Vert \varvec{\gamma }^{\times ,n}(\lambda )\Vert _2\right] = p_\lambda (n) {\mathbb {E}}\left[ \sum _{i\ge 1} \gamma _i^{\times ,n}(\lambda )\cdot a_i^n(\lambda ) \,\Bigg | \,\Vert \varvec{\gamma }^{\times ,n}(\lambda )\Vert _2\right] \\&\quad \le 2C {\mathbb {E}} \left[ \max \{\Vert \varvec{\gamma }^{\times ,n}(\lambda )\Vert ^{3/2}_2, 1\} \cdot \Vert \varvec{\gamma }^{\times ,n}(\lambda )\Vert _{5/2}^{5/2} \, \big |\,\Vert \varvec{\gamma }^{\times ,n}(\lambda )\Vert _2 \right] \\&\quad \le 2C \max \{\Vert \varvec{\gamma }^{\times ,n}(\lambda )\Vert _2^{4}, \Vert \varvec{\gamma }^{\times ,n}(\lambda )\Vert _2^{5/2}\}. \end{aligned}$$

Since \((\varvec{\gamma }^{\times ,n}(\lambda ))_{n\ge 1}\) is tight in \(\ell ^2\), it follows that \((\varvec{\gamma }^{\times ,n}(\lambda ), \mathbf{s}^{n}(\lambda ))_{n\ge 1}\) is tight in \({\mathbb {U}}_\downarrow \).

The tightness of the sequence of processes \((\varvec{\gamma }^{\times ,n}(\lambda ), \mathbf{s}^n(\lambda ))_{\lambda \in [\lambda _\star ,\lambda ^\star ] }\), \(n\ge 1\), in \({\mathbb {D}}([\lambda _\star ,\lambda ^\star ],{\mathbb {U}}_\downarrow )\) follows from the almost Feller property of the augmented multiplicative coalescent [11], as in the proof of Theorem 3. It suffices to consider the modified cumulative operator \(\Sigma : {\mathbb {U}}_\downarrow \rightarrow \ell ^2_{\ge 0}\) that associates to \((\mathbf{x},\mathbf{s})\in {\mathbb {U}}_\downarrow \) the sequence

$$\begin{aligned} \Sigma (\mathbf{x},\mathbf{s}):=\left(\sum _{i=1}^k x_i^2 + \sum _{i=1}^k x_i s_i, k\ge 1\right). \end{aligned}$$

Then relative compactness in \({\mathbb {D}}([\lambda _\star ,\lambda ^\star ], {\mathbb {U}}_\downarrow )\) reduces to the three conditions in Lemma 22 with \(S^p\) replaced by \(\Sigma \). The proof uses no idea that is not already present in the proof of Theorem 3 that we detailed earlier, so we omit the details. \(\square \)