1 Introduction

When dealing with data from applications, especially from ecology or reliability, one can make the observation that missing values occur rather often. Unfortunately, there is no simple rule for handling missing data in the context of multivariate distributions. General strategies for statistical inference in the presence of missing data are discussed in the popular book by Little and Rubin (2019), see also Graham (2009). In the maximum-likelihood methodology, the EM-algorithm is the method of choice and provides good results in many cases. Many approaches dealing with missing data work by replacing missing data values with plausible values. These imputation methods are discussed in a survey paper by Van Buuren et al. (2006). Imputation is typically based on the idea of calculating the missing components by using its conditional distribution given the observed components.

As is well known, the copula model together with the (one-dimensional) marginal distribution functions define the multivariate distribution functions. A vast number of papers deal with the estimation of one-dimensional distributions. In this paper, the focus is on the parametric estimation of the copula. Here we do not consider imputation methods. We regard the data as given in a special structure, where the set of multivariate data is divided into a certain number of subsets of data with the same pattern of missingness. This approach corresponds to the MCAR model (missing completely at random model) in which the missingness of data items is independent of the data values. The several models for datasets including missing data are explained in Van Buuren et al. (2006) and Little and Rubin (2019), among others. For the data structure under consideration, an adapted Cramér–von Mises statistic is constructed which serves as an approximation measure and describes the discrepancy between the model and the data. Each pattern of missing data leads to a certain Cramér–von Mises statistic. The final statistic is then established as a linear combination of these partial statistics. The Cramér–von Mises statistic has numerical advantages over the Kullback–Leibler statistic (maximum-likelihood estimation) since the computation of the density is not required. The aim of the paper is to prove theorems on the almost sure convergence and on the asymptotic normality of the minimum distance estimators for the copula.

Because of the complexity of the multivariate distribution, we cannot expect that the underlying distribution of the sample vectors coincides with the hypothesis distribution. Thus, it makes sense to assume that the underlying copula does not belong to the parametric family under consideration. The reader finds an extensive discussion about goodness-of-approximation in the one-dimensional case in Liebscher (2014). Considering approximate estimators is another aspect of this paper. Since as a rule there is no explicit formula for the estimators, we have to evaluate the estimator for the copula parameters by a numerical algorithm and receive the estimator as a solution of an optimization problem only at a certain (small) error. These approximate estimators are the subject of the considerations in Sect. 5.

Concerning the estimation of the parameters of the copula, two types of estimators are studied mostly in the literature: maximum pseudo-likelihood estimators and minimum distance estimators. In our approach, minimum distance estimators on the basis of Cramér–von Mises divergence are the appropriate choice. In the case of complete data, minimum distance estimators for the parameters of copulas were examined in the papers by Tsukahara (2005) and by the author (2009). The asymptotic behaviour of likelihood estimators was investigated in papers by Genest and Rivest (1993), Genest et al. (1995), Chen and Fan (2005), and Hofert et al. (2012), among others. Joe (2005) published results on the asymptotic behaviour of two-stage estimation procedures. An application of the EM-algorithm to fitting Gaussian copulas in the presence of missing data can be found in the paper by Kertel and Pauly (2022). Rather few papers deal with estimation of copula parameters in the context of missing data. In the paper by Di Lascio et al. (2015), the authors studied the imputation method for parametric copula estimation. Hamori et al. (2019) considered the estimation of copula parameters in the missing at random model using only complete cases to estimate the actual parameter. In Wang et al. (2014), the estimation of the parameter of Gaussian copulas was examined under the missing completely at random assumption applying a special method tailored only for Gaussian copulas.

The paper is organized as follows: In Sect. 2 we introduce the data structure and the distribution functions of the subsets. The following Sect. 3 introduces the empirical marginal distribution functions appropriate for the data structure and provides a law of iterated logarithm for them. The Cramér–von Mises divergence and its estimator are considered in Sect. 4. Section 5 provides the definition of approximate minimum distance estimators in our context. Moreover, we give the results on almost sure convergence and on asymptotic normality of the estimators of the copula parameters. The problem of goodness of approximation is discussed there, too. Section 6 contains a small simulation study. Section 7 provides the computational results of a data example. The asymptotic normality result of the Cramér–von Mises divergence can be found in Sect. 8. The proofs of the results are located in Sect. 9.

2 Data structure

Let \({\textbf{X}}=(X^{(1)},\ldots ,X^{(d)})^{T}\) be a d-dimensional random vector representing the data without missing values. In the case of a complete observation, we denote the joint distribution function by H and the marginal distribution functions of \(X^{(j)}\) by \(F_{1},\ldots ,F_{d}\). Assume that \(F_{j}\) is continuous (\(j=1,\ldots ,d\)). According to Sklar’s theorem (Sklar 1959), we have

$$\begin{aligned} H(x_{1},\ldots ,x_{d})=C(F_{1}(x_{1}),\ldots ,F_{d}(x_{d}))\quad \text {for } x_{i}\in {\mathbb {R}}\text {,} \end{aligned}$$

where \(C:[0,1]^{d}\rightarrow [0,1]\) is the uniquely determined d-dimensional copula. The reader can find the theory of copulas in the popular monographs by Joe (1997) and by Nelsen (2006).

Next we describe the structure of the data, including missing values. The sample breaks down into m subsets of data items with the same pattern of missing data, where m does not depend on the sample size n. Every pattern is modeled as a binary nonrandom vector \({\textbf{b}}=(b_{1},\ldots ,b_{d})^{T}\in \{0,1\}^{d}\), called the missing indicator vector, which has at least two components equal to 1. \(b_{j}=1\) means that the j-th component is observed whereas \(b_{j}=0\) means that the j-th component is missing. Now let \( {\textbf{b}}^{(1)},\ldots ,{\textbf{b}}^{(m)}\in \{0,1\}^{d}\) be the pattern vectors of the data subsets. \(J_{\mu }=\{l:b_{l}^{(\mu )}=1\}\) denotes the set of numbers of observed non-missing components. To give an example, the pattern \({\textbf{b}}^{(\mu )}=(0,1,0,1)^{T}\) of data subset \(\mu \) means that the data items of this subset have a non-missing second component and a non-missing fourth one, whereas components 1 and 3 are missing (\(J_{\mu }=\{2,4\}\)). Let \({\textbf{1}}=(1,\ldots ,1)^{T}\in {\mathbb {R}}^{d}\).

\(n_{\mu }=n_{\mu }(n)\) is the non-random number of sample items in the subset \(\mu \). In this paper the crucial assumption is that for all data subsets, the distribution function of the data items coincides with the corresponding multivariate marginal distribution functions resulting from H. More precisely, we assume that H is the underlying distribution function of the data, and therefore, the distribution function of data subset \(\mu \) is given by

$$\begin{aligned} H_{\mu }(y_{j},j\in J_{\mu })=H({\bar{\textbf{y}}})\text { for }{\bar{\textbf{y}}} \in {\mathbb {R}}^{d}, \end{aligned}$$
(1)

where \({\bar{\textbf{y}}}_{l}=y_{l}\) for \(l\in J_{\mu }\), and \({\bar{\textbf{y}}} _{l}=\infty \) for \(l\notin J_{\mu }\). \((y_{j},j\in J_{\mu })\) denotes a vector of components of \({\textbf{y}}\in {\mathbb {R}}^{d}\) with ascending indices from \( J_{\mu }\). The data structure and the subset distribution functions are then as follows:

Table 1 Structure of the data

\(d_{\mu }\) is the dimension of the data in the subset \(\mu \). Let C be the copula of distribution function H. The copulas of the subsets are determined by

$$\begin{aligned} C_{\mu }(u_{j},j\in J_{\mu })=C({\textbf{u}}\odot {\textbf{b}}^{(\mu )}+\mathbf { 1-b}^{(\mu )})\quad (\mu =1,\ldots ,m) \end{aligned}$$

for \({\textbf{u}}=(u_{1},\ldots ,u_{d})^{T}\in [0,1]^{d}\), where \( {\textbf{a}}\odot {\textbf{b}}=\hbox {diag}({\textbf{a}})~{\textbf{b}}\) is the Hadamard product of vectors \({\textbf{a}},{\textbf{b}}\in {\mathbb {R}}^{d}\). Define \({\bar{F}} ({\textbf{x}})=(F_{1}(x_{1}),\ldots ,F_{d}(x_{d}))^{T}\) for \({\textbf{x}} =(x_{1},\ldots ,x_{d})^{T}\). Then we have

$$\begin{aligned} H_{\mu }(y_{j},j \in J_{\mu })= & {} C({\bar{F}}({\textbf{y}})\odot {\textbf{b}}^{(\mu )}+{\textbf{1}}-{\textbf{b}}^{(\mu )}) \nonumber \\= & {} C_{\mu }(F_{j}(y_{j}),j\in J_{\mu })\quad (\mu =1,\ldots ,m). \end{aligned}$$
(2)

In the case \({\textbf{b}}^{(\mu )}\ge {\textbf{b}}^{(\nu )}\), i.e. \(b_{j}^{(\mu )}\ge b_{j}^{(\nu )}\) for \(j=1,\ldots ,d\), the function \(\psi _{\nu \mu }\) selects the components of subset \(\mu \), which are also present in subset \( \nu \): \(\psi _{\nu \mu }(y_{j},j\in J_{\mu })=(y_{j},j\in J_{\nu })\). The function \({\bar{\psi }}_{l\mu }\) selects the component l of the corresponding full data vector from data of subset \(\mu \): \({\bar{\psi }}_{l\mu }(y_{j},j\in J_{\mu })=y_{l}\).

The usage of the missing indicators and the sets \(J_{\mu }\) is shown in an example.

Example

We consider \(d=5\) and \({\textbf{b}}^{(\mu )}=(1,0,1,1,0)^{T}\Rightarrow \)

$$\begin{aligned} C_{\mu }(u_{1},u_{3},u_{4})=C(u_{1},1,u_{3},u_{4},1). \end{aligned}$$

Further let \(\nu =2,\mu =3,J_{2}=\{2,4\},J_{3}=\{2,3,4\}\). Then \(\psi _{23}((y_{2},y_{3},y_{4})^{T})=(y_{2},y_{4})^{T}\), \({\bar{\psi }} _{43}((y_{2},y_{3},y_{4})^{T})=y_{4}\).

The requirements on the data following the MCAR model for missing data are summarized in Assumption \({\mathcal {A}}_{\text {MCAR}}\):

Assumption

\({\mathcal {A}}_{MCAR}\): The data structure of Table 1 is given. Moreover (1) is valid where \(J_{\mu }=\{l:b_{l}^{(\mu )}=1\}\).

This data structure is present also in the situation, where m samples having the same underlying distribution are given. These samples may originate from several sources.

3 Empirical distribution functions

In this section we consider the empirical marginal distribution functions and their convergence properties. Let \({\tilde{n}}_{j}\) be the number of data items where the j-th component is present:

$$\begin{aligned} {\tilde{n}}_{j}=\sum _{\mu :1\le \mu \le m,j\in J_{\mu }}n_{\mu }. \end{aligned}$$

We introduce \({\bar{n}}_{\nu }\) to be the number of data items where at least the non-missing components of data subset \(\nu \) are present

$$\begin{aligned} {\bar{n}}_{\nu }=\sum _{\mu :1\le \mu \le m,{\textbf{b}}^{(\mu )}\ge {\textbf{b}} ^{(\nu )}}n_{\mu } \end{aligned}$$

The inequality \({\textbf{b}}^{(\mu )}\ge {\textbf{b}}^{(\nu )}\) means that data subset \(\mu \) has at least the non-missing components of data subset \(\nu \).

Notice that \({\bar{\psi }}_{j\mu }({\textbf{Y}}_{\mu i})\) and \(\psi _{\nu \mu }( {\textbf{Y}}_{\mu i})\) have the distribution functions \(F_{j}\) and \(H_{\nu }\), respectively. Next we consider estimators for the marginal distributions and the joint distribution functions:

$$\begin{aligned} {\hat{F}}_{nj}(z)= & {} \frac{1}{{\tilde{n}}_{j}}\sum _{\mu :1\le \mu \le m,j\in J_{\mu }}\sum _{i=1}^{n_{\mu }}{\textbf{1}}\left\{ {\bar{\psi }}_{j\mu }({\textbf{Y}} _{\mu i})\le z\right\} ,\\ {\hat{H}}_{n\nu }({\textbf{y}})= & {} \frac{1}{{\bar{n}}_{\nu }}\sum _{\mu :1\le \mu \le m,{\textbf{b}}^{(\mu )}\ge {\textbf{b}}^{(\nu )}}\sum _{i=1}^{n_{\mu }} {\textbf{1}}\left\{ \psi _{\nu \mu }({\textbf{Y}}_{\mu i})\le {\textbf{y}}\right\} \end{aligned}$$

for \(z\in {\mathbb {R}},{\textbf{y}}\in {\mathbb {R}}^{d_{\nu }},\nu =1,\ldots ,m\). We pose the following assumption on \(n_{\mu }\).

Assumption

\({\mathcal {A}}_{n}\): For \(\mu =1,\ldots ,m\),

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{n_{\mu }(n)}{n}=\gamma _{\mu } \end{aligned}$$

with constants \(\gamma _{\mu }\in (0,1]\). \(\square \)

For empirical distribution functions, the following law of iterated logarithm holds true (cf. Kiefer 1961, for example).

Proposition 3.1

Suppose that Assumptions \({\mathcal {A}}_{MCAR}\) and \({\mathcal {A}}_{n}\) are fulfilled.

  1. (a)

    Then we have

    $$\begin{aligned} \max _{j=1,\ldots ,d}\sup _{t\in {\mathbb {R}}}\left| F_{nj}(t)-F_{j}(t)\right| =O\left( \sqrt{\frac{\ln \ln n}{n}}\right) \ \ a.s. \end{aligned}$$
  2. (b)

    Moreover,

    $$\begin{aligned} \max _{\mu =1,\ldots ,m}\sup _{{\textbf{y}}\in {\mathbb {R}}^{d_{\mu }}}\left| {\hat{H}}_{n\mu }({\textbf{y}})-H_{\mu }({\textbf{y}})\right| =O\left( \sqrt{ \frac{\ln \ln n}{n}}\right) \ a.s. \end{aligned}$$

    for \(n\rightarrow \infty \).

4 Cramér–von Mises divergence

Let \({\mathcal {F}}=\{{\mathcal {C}}(\cdot \mid \theta )\}_{\theta \in \Theta }\) be a parametric family of copulas. \(\Theta \subset {\mathbb {R}}^{q}\) is the parameter space. In this paper we want to approximate the sample copula C by the family \({\mathcal {F}}\). For this purpose, we consider the Cramér–von Mises divergence as a measure for the discrepancy between the copula C and \({\mathcal {F}}\). Define the model copula for subset \(\mu \):

$$\begin{aligned} {\mathcal {C}}_{\mu }(u_{j},j\in J_{\mu }\mid \theta )={\mathcal {C}}({\textbf{u}} \odot {\textbf{b}}^{(\mu )}+\mathbf {1-b}^{(\mu )}\mid \theta ) \end{aligned}$$

for \(u\in [0,1]^{d},\theta \in \Theta ,\mu =1,\ldots ,m\). Let \({\bar{F}} _{\mu }^{*}(y_{j},j\in J_{\mu })=(F_{j}(y_{j}))_{j\in J_{\mu }}\), and \( {\check{F}}_{n\mu }^{*}(y_{j},j\in J_{\mu })=({\hat{F}}_{nj}(y_{j}))_{j\in J_{\mu }}\). \({\bar{F}}_{\mu }^{*}\) is the vector of the marginal distribution functions in subset \(\mu \), \({\check{F}}_{n\mu }^{*}\) is its empirical counter-part. We introduce the population version of the divergence as

$$\begin{aligned} {\mathcal {D}}(C,{\mathcal {C}}(\cdot \mid \theta ))= & {} \sum _{\mu =1}^{m}\int _{[0,1]^{d_{\mu }}}\left( C_{\mu }({\bar{\textbf{u}}})-{\mathcal {C}} _{\mu }({\bar{\textbf{u}}}\mid \theta )\right) ^{2}w_{\mu }({\bar{\textbf{u}}})~ \text {d}C_{\mu }({\bar{\textbf{u}}}) \nonumber \\= & {} \sum _{\mu =1}^{m}\int _{{\mathbb {R}}^{d_{\mu }}}\left( H_{\mu }({\textbf{y}})- {\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}})\mid \theta )\right) ^{2}w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}}))~\text {d}H_{\mu }({\textbf{y}} ), \end{aligned}$$
(3)

where \(H_{\mu }\) is as in (2). We pose the following assumption on \(w_{\mu }\):

Assumption

\({\mathcal {A}}_{W}\): Assume that \(w_{\mu }:[0,1]^{d_{\mu }}\rightarrow [0,+\infty ),\mu =1,\ldots ,m\) are Lipschitz-continuous weight functions for data subsets \(\mu \). \(\square \)

By assumption \({\mathcal {A}}_{W}\), the functions \(w_{\mu }\) are bounded. An example of such a weight function is given by

$$\begin{aligned} w_{\mu }({\textbf{u}})=\frac{{\bar{w}}_{\mu }}{a+\prod \nolimits _{j=1}^{d_{\mu }}u_{j}(1-u_{j})}\quad \text {for }{} {\textbf {u}}\in {\mathbb {R}}^{d_{\mu }}, \end{aligned}$$
(4)

where \(a,{\bar{w}}_{\mu }>0\) are constants. The divergence \({\mathcal {D}}\) is the weighted sum of the squared discrepancies between the sample copula and the parametric model copula within the concerning data subsets. In general, smaller values of the divergence \({\mathcal {D}}\) show a better approximation by \({\mathcal {F}}\). Observe that \(C_{\mu }({\bar{\textbf{u}}})-{\mathcal {C}}_{\mu }({\bar{\textbf{u}}}\mid \theta )=0\) for \({\bar{\textbf{u}}}\in {\mathcal {B}}:=\{ {\textbf{u}}:u_{j}=0\) for at least one j, or \(u_{j}=1\) for all j except one \(\}\). To put more emphasis on the fit in the boundary regions in the neighbourhood of \({\mathcal {B}}\), the weight functions can be defined in a suitable way similarly to (4).

The concept of a weighted divergence has already been applied by some authors. For instance, Rodriguez and Viollaz (1995) studied the asymptotic distribution of the weighted Cramér–von Mises divergence in the one-dimensional case. Medovikov (2016) examined weighted Cramér–von Mises tests for independence employing the weighted Cramér–von Mises statistic with independence copula as the model copula. We refer to the thorough discussion about weights in Medovikov’s paper where also further references can be found. The \(L^{p}\)-distance and the Kolmogorov–Smirnov distance are alternatives to \({\mathcal {D}}\), see Liebscher (2015), for example.

Next, we construct an estimator for \({\mathcal {D}}(C,{\mathcal {C}}(\cdot \mid \theta ))\) in the situation where the data structure is as introduced in Sect. 2:

$$\begin{aligned} \widehat{{\mathcal {D}}}_{n}(\theta )=\sum _{\mu =1}^{m}\frac{1}{n_{\mu }} \sum _{i=1}^{n_{\mu }}\left( {\hat{H}}_{n\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}} _{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta )\right) ^{2}w_{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})) \end{aligned}$$
(5)

for \(\theta \in \Theta \), where \({\hat{H}}_{n\mu }\) is defined in Sect. 3. This estimator has the advantage of being just a sum and does not require to compute an integral. Genest et al. (2009) found that the use of the Cramér–von Mises statistic leads to more powerful goodness-of-fit tests in comparison to other test statistics like the Kolmogorov–Smirnov one. The next section is devoted to the estimation of the parameter \(\theta \) using the divergence (5).

5 Parameter estimation by the minimum distance method

Let the data structure be as in Sect. 2. Consider the family \({\mathcal {F}}=({\mathcal {C}}(\cdot \mid \theta ))_{\theta \in \Theta }\) of copulas with the parameter set \(\Theta \subset {\mathbb {R}}^{q}\). Throughout the paper, we assume that \(C\notin {\mathcal {F}}\). Many authors call this case the misspecification one. In our opinion, this term is not appropriate for the situation here. If we consider multivariate data, then typically, C does not belong to any parametric family. In this section the aim is to estimate the parameter \(\theta _{0}\) which gives the best approximation for the copula in the case of a unique minimizer of \({\mathcal {D}}\) defined in (3):

$$\begin{aligned} \theta _{0}=\arg \min _{\theta \in \Theta }{\mathcal {D}}(C,{\mathcal {C}}(\cdot \mid \theta )), \end{aligned}$$

\({\mathcal {D}}(C,{\mathcal {C}}(\cdot \mid \theta ))\) as in the previous section. It should be highlighted that in general, \(\theta _{0}\) depends on the choice of the discrepancy measure. There is no “true parameter”. Considering \({\mathcal {D}}(C,{\mathcal {C}}(\cdot \mid \theta ))\), \(\theta _{0}\) depends on the weight functions \(w_{\mu }\), and these functions should be chosen prior to the analysis. In the case of constant weight functions, it seems to be reasonable to choose the weights \(w_{\mu }\) such that the summands in (3) for the estimator of \(\theta _{0}\) are roughly equal.

The estimator \({\hat{\theta }}_{n}\) is referred to as an approximate minimum distance estimator (AMDE) if

$$\begin{aligned} \widehat{{\mathcal {D}}}_{n}({\hat{\theta }}_{n})\le \min _{\theta \in \Theta } \widehat{{\mathcal {D}}}_{n}(\theta )+\varepsilon _{n} \end{aligned}$$

holds true (\(\widehat{{\mathcal {D}}}_{n}\) as in the previous section), where \( \{\varepsilon _{n}\}\) is a sequence of random variables with \(\varepsilon _{n}\rightarrow 0\ a.s.\) Note that \({\hat{\theta }}_{n}\) is an approximate minimizer of \(\theta \mapsto \widehat{{\mathcal {D}}}_{n}(\theta )\). We refer to Liebscher (2009), where the estimator was introduced. In the case of unique \(\theta _{0}\), \({\hat{\theta }}_{n}\) is an estimator for \(\theta _{0}\). Tsukahara (2005) examined properties of a similar (non-approximate) minimum distance estimator.

Let \(\Vert \cdot \Vert \) be the Euclidean norm, and \(d(x,A)=\inf _{y\in A}\Vert x-y\Vert \) for \(x\in {\mathbb {R}}^{q}\) and subsets \(A\subset {\mathbb {R}} ^{q}\). The following theorem provides the result about the consistency of the AMDE including the case of sets of minimizers of \({\mathcal {D}}\).

Theorem 5.1

Assume that Assumptions \({\mathcal {A}}_{MCAR}\), \({\mathcal {A}}_{n}\) and \({\mathcal {A}} _{W} \) are satisfied. Let \(\theta \mapsto C({\textbf{u}}\mid \theta ))\) be continuous on \(\Theta \) for every \({\textbf{u}}\in [0,1]^{d}\). Suppose that \(\Theta \) is compact.

  1. (a)

    Then

    $$\begin{aligned} \lim _{n\rightarrow \infty }d({\hat{\theta }}_{n},\Psi )=0\quad a.s., \end{aligned}$$

    where \(\Psi =\arg \min _{\theta \in \Theta }{\mathcal {D}}(C,{\mathcal {C}}(\cdot \mid \theta ))\subset {\mathbb {R}}^{q}\).

  2. (b)

    If in addition, the condition

    $$\begin{aligned} {\mathcal {D}}(C,{\mathcal {C}}(\cdot \mid \theta ))>{\mathcal {D}}(C,{\mathcal {C}} (\cdot \mid \theta _{0}))\text { \ for all }\theta \in \Theta \backslash \{\theta _{0}\} \end{aligned}$$
    (6)

    (i.e. \(\Psi =\{\theta _{0}\}\)) is satisfied, then

    $$\begin{aligned} \lim _{n\rightarrow \infty }{\hat{\theta }}_{n}=\theta _{0}\quad a.s. \end{aligned}$$

Part (a) of Theorem 5.1 gives sufficient conditions for the almost sure convergence of AMDE \({\hat{\theta }}_{n}\) to the set of minimizers of \( {\mathcal {D}}\) w.r.t. \(\theta \) whereas part b) is the ordinary consistency result. The proof is based on a result from Lachout et al. (1994). The assumption that \(\Theta \) is compact is not as problematic as it seems. In many cases with infinite \(\Theta \), a continuous bijective function can be used to transform the parameter onto a finite interval. Then it can be verified that the consistency holds for the transformed parameter, and hence for the original parameter on suitable intervals. The assumption on compactness of \( \Theta \) is posed to reduce the technical efforts in the proofs.

The next Theorem 5.2 states that \({\hat{\theta }}_{n}\) is asymptotically normally distributed in the case \(\Psi =\{\theta _{0}\}\) under appropriate assumptions. The following assumption on partial derivatives of the copula is needed in this theorem.

Assumption \({\mathcal {A}}_{C}\): \({\bar{\mathcal {C}}} _{k}(\cdot \mid \theta ),{\bar{\mathcal {C}}}_{kl}(\cdot \mid \theta ),\mathcal { {\tilde{C}}}_{j}({\textbf{u}}\mid \cdot ),{\tilde{\mathcal {C}}}_{jk}({\textbf{u}} \mid \theta )\) denote the partial derivatives \(\frac{\partial }{\partial \theta _{k}}{\mathcal {C}}(\cdot \mid \theta ),\frac{\partial ^{2}}{\partial \theta _{k}\partial \theta _{l}}{\mathcal {C}}(\cdot \mid \theta ),\frac{\partial }{\partial u_{j}}{\mathcal {C}}({\textbf{u}}\mid \cdot ),\frac{\partial ^{2}}{\partial \theta _{k}\partial u_{j}}{\mathcal {C}}({\textbf{u}}\mid \theta )\), respectively. We assume that these derivatives exist, and for \( k,l=1,\ldots ,q,j=1,\ldots ,d\), the functions \(({\textbf{u}},t)\longmapsto {\bar{\mathcal {C}}}_{kl}({\textbf{u}}\mid t)\), \(({\textbf{u}},t)\longmapsto {\tilde{\mathcal {C}}}_{jk}({\textbf{u}}\mid t)\) are continuous on \( [0,1]^{d}\times U(\theta _{0})\), where \(U(\theta _{0})\subset \Theta \) is a neighbourhood of \(\theta _{0}\). \(\theta _{0}\) is an interior point of \( \Theta \). Moreover, the partial derivatives of \(w_{\mu }:[0,1]^{d_{\mu }}\rightarrow [0,+\infty )\) are denoted by \(w_{\mu l},l\in J_{\mu }\), and assumed to exist and be continuous. \(\square \)

If Assumption \({\mathcal {A}}_{C}\) is satisfied, then we use the notations

$$\begin{aligned} {\bar{\mathcal {C}}}_{\mu k}^{{{}^\circ } }(u_{\lambda },\lambda \in J_{\mu }\mid \cdot )= & {} {\bar{\mathcal {C}}}_{k}( {\tilde{\textbf{u}}}_{\mu }\mid \cdot ),\quad {\bar{\mathcal {C}}}_{\mu kl}^{ {{}^\circ } }(u_{\lambda },\lambda \in J_{\mu }\mid \cdot )={\bar{\mathcal {C}}}_{kl}( {\tilde{\textbf{u}}}_{\mu }\mid \cdot ), \\ {\tilde{\mathcal {C}}}_{\mu j}^{ {{}^\circ } }(u_{\lambda },\lambda \in J_{\mu }\mid \cdot )= & {} {\tilde{\mathcal {C}}}_{j}( {\tilde{\textbf{u}}}_{\mu }\mid \cdot ),\quad {\tilde{\mathcal {C}}}_{\mu jk}^{ {{}^\circ } }(u_{\lambda },\lambda \in J_{\mu }\mid \cdot )={\tilde{\mathcal {C}}}_{jk}( {\tilde{\textbf{u}}}_{\mu }\mid \cdot ) \end{aligned}$$

for \(k,l=1,\ldots ,q\), \(j\in J_{\mu }\), where \({\tilde{\textbf{u}}}_{\mu }= {\textbf{u}}\odot {\textbf{b}}^{(\mu )}+\mathbf {1-b}^{(\mu )},{\textbf{u}}\in [0,1]^{d}\). Define \({\mathcal {H}}=({\mathcal {H}}_{kl})_{k,l=1,\ldots ,q}\) as the Hessian matrix of \(\theta \longmapsto {\mathcal {D}}(C,{\mathcal {C}}(\cdot \mid \theta ))\) at \(\theta =\theta _{0}\):

$$\begin{aligned} {\mathcal {H}}_{kl}{} & {} =-2\sum _{\mu =1}^{m}\int _{{\mathbb {R}}^{d_{\mu }}}\left( \left( H_{\mu }({\textbf{y}})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}( {\textbf{y}})\mid \theta _{0})\right) {\bar{\mathcal {C}}}_{\mu kl}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{y}})\mid \theta _{0})\right. \\{} & {} \quad \left. -{\bar{\mathcal {C}}}_{\mu k}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{y}})\mid \theta _{0}){\bar{\mathcal {C}}}_{\mu l}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{y}})\mid \theta _{0})\right) w_{\mu }({\bar{F}} _{\mu }^{*}({\textbf{y}}))~\text {d}H_{\mu }({\textbf{y}}). \end{aligned}$$

Now we give the theorem:

Theorem 5.2

Assume that \(\varepsilon _{n}=o_{{\mathbb {P}}}(n^{-1})\), and the matrix \({\mathcal {H}}\) is positive definite. Suppose that Assumptions \( {\mathcal {A}}_{C}\) and the assumptions of Theorem 5.1b) are satisfied. Then

$$\begin{aligned} \sqrt{n}({\hat{\theta }}_{n}-\theta _{0})\overset{{\mathcal {D}}}{\longrightarrow } {\mathcal {N}}(0,\Sigma ). \end{aligned}$$

Here \(\Sigma ={\mathcal {H}}^{-1}\Sigma _{D}{\mathcal {H}}^{-1}\),

$$\begin{aligned} {\textbf{Z}}_{\mu \nu }{} & {} =\left( H_{\mu }({\textbf{Y}}_{\mu 1})-{\mathcal {C}} _{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu 1})\mid \theta _{0})\right) \nabla _{\theta }{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu 1})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu 1})) \\{} & {} \quad \left. +\frac{{\textbf{1}}\left( {\textbf{b}}^{(\nu )}\ge {\textbf{b}}^{(\mu )}\right) }{{\bar{\gamma }}_{\mu }}\int _{{\mathbb {R}}^{d_{\mu }}}{\textbf{1}} \left\{ \psi _{\mu \nu }({\textbf{Y}}_{\mu 1})\le {\textbf{z}}\right\} \nabla _{\theta }{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{z}})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{z}}))~\text {d}H_{\mu }({\textbf{z}} )\right. \\{} & {} \quad +\sum _{l=1}^{d}\frac{1}{{\tilde{\gamma }}_{l}}b_{l}^{(\mu )}b_{l}^{(\nu )}\int _{{\mathbb {R}}^{d_{\mu }}}\left( -{\tilde{\mathcal {C}}}_{\mu l}^{ {{}^\circ } }({\bar{F}}_{\mu }({\textbf{z}})\mid \theta _{0})\nabla _{\theta }{\mathcal {C}} _{\mu }({\bar{F}}_{\mu }^{*}({\textbf{z}})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{z}}))\right. \\{} & {} \quad \quad \quad +\left( H_{\mu }({\textbf{z}})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }({\textbf{z}})\mid \theta _{0})\right) \cdot \\{} & {} \quad \quad \quad \left. \cdot \left( {\tilde{\mathcal {C}}}_{\mu lk}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{z}})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{z}}))+{\bar{\mathcal {C}}}_{\mu k}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{z}})\mid \theta _{0})w_{\mu l}({\bar{F}}_{\mu }^{*}({\textbf{z}}))\right) _{k=1,\ldots ,q}\right) \\{} & {} \quad \quad \quad {\textbf{1}}\left\{ {\bar{\psi }}_{l\nu }({\textbf{Y}}_{\mu 1})\le {\bar{\psi }}_{l\mu }({\textbf{z}})\right\} ~\text {d}H_{\mu }({\textbf{z}}),\\ \Sigma _{D}{} & {} = 4\sum _{\nu =1}^{m}\gamma _{\nu }\sum _{\mu =1}^{m}\sum _{{\bar{\mu }} =1}^{m}\text {cov}({\textbf{Z}}_{\mu \nu },{\textbf{Z}}_{{\bar{\mu }}\nu }). \end{aligned}$$

Here \(\text {cov}(\cdot ,\cdot )\) is the cross-covariance matrix.

In Liebscher (2009) this result was proved in the case \(m=1\) and for complete data. Theorem 5.2 corrects some typos in the formula for \(\Sigma \) in the author’s 2009 paper. Tsukahara (2005) proved consistency and asymptotic normality for his minimum distance estimator in the case where the copula C of \(X_{i}\) belongs to a small neighbourhood of a member of the parametric family. The covariance structure of the estimator \(\hat{\theta }_{n}\) is rather complicated. One potential approach is to estimate \(\Sigma \) by substituting distribution functions with their empirical counterparts, and \(\theta _{0}\) by \({\hat{\theta }}_{n}\). In view of the sophisticated structure of this estimator, one may use alternative techniques like bootstrap to get approximate values for the covariances.

To compare the various fitting results, we introduce the approximation coefficient

$$\begin{aligned} {\hat{\rho }}=1-\frac{\widehat{{\mathcal {D}}}_{n}({\hat{\theta }}_{n})}{\widehat{ {\mathcal {D}}}_{n}^{0}}, \end{aligned}$$

where

$$\begin{aligned} \widehat{{\mathcal {D}}}_{n}^{0}=\sum _{\mu =1}^{m}\frac{1}{n_{\mu }} \sum _{i=1}^{n_{\mu }}\left( {\hat{H}}_{n\mu }({\textbf{Y}}_{\mu i})-\Pi (\check{F }_{n\mu }^{*}({\textbf{Y}}_{\mu i}))\right) ^{2}w_{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})). \end{aligned}$$

\(\widehat{{\mathcal {D}}}_{n}^{0}(\theta )\) is the Cramér–von Mises divergence when the independence copula \(\Pi \) is used for fitting. \(\Pi \) is the reference copula in the definition of \({\hat{\rho }}\). The approximation coefficient is defined in analogy to the regression coefficient of determination and describes the grade of improvement of the fit using model \( {\mathcal {F}}\) in comparison to the independence copula. Obviously, we have \( {\hat{\rho }}\le 1\). In the case where the independence copula is included in the model family \({\mathcal {F}}\), the inequality \(0\le {\hat{\rho }}\le 1\) is fulfilled. Values \({\hat{\rho }}\) close to 1 indicate that the approximation is good. In case of models with very small dependencies, the value of \({\hat{\rho }} \) could be close to zero. Then it is recommended to use another reference copula instead of \(\Pi \).

6 A small simulation study

A simulation study should show that the estimation method studied in this paper lead to a reasonable performance of the estimators. Here we simulated the following data with 100,000 repetitions of it:

three-dimensional data vectors \({\textbf{Y}}_{\mu j}\) having copula C and marginal normal distributions N(1, 0.6), N(2, 0.3), N(3, 0.8), respectively. Copula C is determined by

$$\begin{aligned} C({\textbf{u}})=0.5\cdot C^{(1)}({\textbf{u}})+0.5\cdot C^{(2)}({\textbf{u}} )\quad ({\textbf{u}}\in {\mathbb {R}}^{3}), \end{aligned}$$

where \(C^{(1)}\) is the Clayton (3) copula and \(C^{(2)}\) is the Frank (2) one (for formulas, see Nelsen (2006)).

$$\begin{aligned} {\textbf{b}}^{(1)}=(1,1,1)^{T},\ {\textbf{b}}^{(2)}=(1,1,0),\ n_{1}=n_{2}=n/2 \end{aligned}$$

Half of the data are complete and the remaining data vectors have a missing third component. In the place of C, we considered the Frank, Clayton, Joe and Gumbel-Hougaard copula. Since the second summand in \(\widehat{{\mathcal {D}} }_{n}\) (\(\mu =2\)) is expected to be smaller than the first one (smaller dimension!), we chose \(w_{1}=0.3\) and \(w_{2}=0.7\). The results are summarized in Table 2.

Table 2 Simulation results (sd...standard deviation, mse...mean square error)

The values \(\theta _{0}\) were computed using the computer algebra system Mathematica. The results of Table 2 indicate that the optimization leads to a reasonable approximation of copula C in the case of the considered copula families \({\mathcal {F}}\). The approximation becomes more precise when n increases; i.e. the average lies closer to \(\theta _{0}\) and the standard deviation is smaller. Unfortunately, comparisons with results for other data structures are not very useful, since the divergence is constructed in accordance to the data scheme of Sect. 2. Note that \(\theta _{0}\) depends on the choice of the divergence. Further computations have revealed that the summands in \({\hat{D}}_{n}\) for the several subsets differ only slightly when the weights \(w_{j}\) selected as above.

7 A data example

Here we consider a dataset from the TRY plant trait database, see Kattge et al. (2020). This dataset was already used for modelling and fitting in Liebscher et al. (2022). See this paper for a detailed description of the dataset. Here we restrict the considerations to three variables according to Table 2 and to 9 herb species: ‘Ac.mi’, ‘Be.pe’, ‘Ce.ja’, ‘Ga.mo’, ‘Ga.ve’, ‘Pl.la’, ‘Ra.ac’, ‘Ra.bu’, ‘Ve.ch’ (Table 3).

Table 3 Used variables

A first analysis of the dataset shows the various frequencies of the missing data patterns provided in Table 4.

Table 4 Missing data pattern

Two patterns are sorted out because of too few data items. Therefore, we consider only two missing data patterns (\(m=2\)) with equal weights \(w_{\mu }=0.5\). Now we want to fit product copulas to the ecological dataset (variables RCNC, LCNC, HW). Let \(C^{(1)},C^{(2)},C^{(3)}\) be given copulas taken from parametric Archimedean copula families like the Frank, Clayton, Gumbel families (for formulas, see Nelsen 2006). The product copulas are defined by

$$\begin{aligned} C({\textbf{u}}){} & {} =C^{(1)}(u_{1}^{\alpha _{1}},\ldots ,u_{d}^{\alpha _{d}}\mid t_{1})C^{(2)}(u_{1}^{1-\alpha _{1}},\ldots ,u_{d}^{1-\alpha _{d}}\mid t_{2})\text {, } \\{} & {} \text {in short }C^{(1)}*C^{(2)}\text { with parameter vector } (t_{1},t_{2},\alpha _{1},\ldots ,\alpha _{d})^{T}, \\ C({\textbf{u}}){} & {} =C^{(1)}(u_{1}^{\alpha _{1}},\ldots ,u_{d}^{\alpha _{d}}\mid t_{1})C^{(2)}(u_{1}^{(1-\alpha _{1})\beta _{1}},\ldots ,u_{d}^{(1-\alpha _{d})\beta _{d}}\mid t_{2}) \\{} & {} \left. C^{(3)}(u_{1}^{(1-\alpha _{1})(1-\beta _{1})},\ldots ,u_{d}^{(1-\alpha _{d})(1-\beta _{d})}\mid t_{3})\text {,}\ \text {in short }C^{(1)}*C^{(2)}*C^{(3)}\right. \\{} & {} \text { with parameter vector }(t_{1},t_{2},t_{3},\alpha _{1},\ldots ,\alpha _{d},\beta _{1},\ldots ,\beta _{d})^{T}. \end{aligned}$$

In these formulas \(t_{j}\) is the parameter of \(C^{(j)}\), and \(\alpha _{j},\beta _{j}\in [0,1]\). Product copulas were studied in Liebscher (2008) in detail. Table 5 summarizes the fitting results.

Table 5 Fitting results (abbreviations for copula families: C...Clayton, F...Frank, N...Nelsen #13)

From Table 5 and further computational results, we see that the approximation is fairly good for model C*F in the case of the product of 2 copulas, and for model C*N*N in the case of the product of 3 copulas. In the latter case, a slightly better approximation coefficient is obtained in comparison to the product of 2 copulas.

8 Convergence of the CvM-divergence

In this section we give the theorem on asymptotic normality of the CvM-divergence \(\widehat{{\mathcal {D}}}_{n}({\hat{\theta }}_{n})\).

Theorem 8.1

Assume that the assumptions of Theorem 5.2 are satisfied. Then

$$\begin{aligned} \sqrt{n}(\widehat{{\mathcal {D}}}_{n}({\hat{\theta }}_{n})-{\mathcal {D}}(C,\mathcal {C }(\cdot \mid \theta _{0}))\overset{{\mathcal {D}}}{\longrightarrow }{\mathcal {N}} (0,\Sigma _{0}). \end{aligned}$$

The formula for \(\Sigma _{0}\) is given in the proof.

In the case \(m=1\) and for complete data, this theorem was already proven in Liebscher (2015). Theorem 8.1 can be used to construct tests about the divergence \({\mathcal {D}}(C,{\mathcal {C}}(\cdot \mid \theta ))\). We refer to the discussion in the author’s paper (2015).

9 Proofs

9.1 Proof of Theorem 5.1

Let \(\Phi _{n}:\Theta \rightarrow {\mathbb {R}}\) be a random function, and \( {\hat{\theta }}_{n}\) be an estimator satisfying

$$\begin{aligned} \Phi _{n}({\hat{\theta }}_{n})\le \min _{\theta \in \Theta }\Phi _{n}(\theta )+\varepsilon _{n}. \end{aligned}$$

Here \(\{\varepsilon _{n}\}\) is a sequence of random variables with \( \varepsilon _{n}\rightarrow 0\ a.s.\) Theorem 2.2 of the paper Lachout et al. (2005) leads to the following proposition.

Proposition 9.1

Assume that \(\Theta \) is compact, and \(\lim _{n\rightarrow \infty }\sup _{t\in \Theta }\left| \Phi _{n}(t)-\Phi (t)\right| =0\) a.s. holds for a continuous function \(\Phi \).

  1. (a)

    Then

    $$\begin{aligned} \lim _{n\rightarrow \infty }d({\hat{\theta }}_{n},\Psi )=0\quad a.s., \end{aligned}$$

    where \(\Psi =\text {argmin}_{t\in \Theta }\Phi (t)\subset {\mathbb {R}}^{q}\), \(d(\cdot ,\cdot )\) as above.

  2. (b)

    Moreover, if in addition, \(\Phi (\theta )>\Phi (\theta _{0})\) holds for all \(\theta \in \Theta \backslash \{\theta _{0}\}\), then

    $$\begin{aligned} \lim _{n\rightarrow \infty }{\hat{\theta }}_{n}=\theta _{0}\quad a.s. \end{aligned}$$

Let \(\Phi _{n}(\theta )=\widehat{{\mathcal {D}}}_{n}(\theta )\) and \(\Phi (\theta )={\mathcal {D}}(C,{\mathcal {C}}(\cdot \mid \theta ))\). In this section, the aim is to apply Proposition 9.1 in order to prove the strong consistency result for \({\hat{\theta }}_{n}\). The following lemma justifies the strong uniform consistency assumption in Proposition 9.1.

Lemma 9.2

Assume that assumptions of Theorem 5.1 are fulfilled. Then

$$\begin{aligned} \lim _{n\rightarrow \infty }\sup _{\theta \in \Theta }\left| \widehat{ {\mathcal {D}}}_{n}(\theta )-{\mathcal {D}}(C,{\mathcal {C}}(\cdot \mid \theta ))\right| =0\ \ a.s. \end{aligned}$$

Proof

Notice that

$$\begin{aligned} \left| a^{2}-b^{2}\right| \le 2\left| a-b\right| \text { for }a,b\in [0,1]. \end{aligned}$$

Utilizing the Lipschitz continuity of copulas with Lipschitz constant 1 and the triangle inequality, we obtain

$$\begin{aligned}{} & {} \sup _{\theta \in \Theta }\left| \widehat{{\mathcal {D}}}_{n}(\theta )-{\mathcal {D}}(C,{\mathcal {C}}(\cdot \mid \theta ))\right| \nonumber \\{} & {} \quad =\sup _{\theta \in \Theta }\left| \sum _{\mu =1}^{m}\frac{1}{n_{\mu }} \sum _{i=1}^{n_{\mu }}\left( {\hat{H}}_{n\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}} _{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta )\right) ^{2}w_{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i}))\right. \nonumber \\{} & {} \qquad \left. -\int _{{\mathbb {R}}^{d_{\mu }}}\left( H_{\mu }(y)- {\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}(y)\mid \theta )\right) ^{2}w_{\mu }({\bar{F}}_{\mu }^{*}(y))~\text {d}H_{\mu }(y)\right| \nonumber \\{} & {} \quad \le 2m\max _{\mu :1\le \mu \le m}\left( \left( \sup _{{\textbf{y}}\in {\mathbb {R}}^{d_{\mu }}}\left| {\hat{H}}_{n\mu }({\textbf{y}})-H_{\mu }( {\textbf{y}})\right| +\sup _{\theta \in \Theta }\sup _{{\textbf{x}}\in \mathbb { R}^{d_{\mu }}}\left| {\mathcal {C}}_{\mu }({\check{F}}_{n\mu }^{*}( {\textbf{x}})\mid \theta )-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{x}} )\mid \theta )\right| \right) \right. \nonumber \\{} & {} \qquad \left. \cdot \sup _{{\textbf{x}}\in [0,1]^{d_{\mu }}}w_{\mu }({\textbf{x}})\right) +Q_{n}+2R_{n} \nonumber \\{} & {} \quad \le 2m\max _{\mu :1\le \mu \le m}\left( \left( \sup _{{\textbf{y}}\in {\mathbb {R}}^{d_{\mu }}}\left| {\hat{H}}_{n\mu }({\textbf{y}})-H_{\mu }( {\textbf{y}})\right| +\sum \limits _{j=1}^{d}\sup _{x\in {\mathbb {R}} }\left| F_{nj}(x)-F_{j}(x)\right| \right) \cdot \sup _{{\textbf{x}}\in [0,1]^{d_{\mu }}}w_{\mu }({\textbf{x}})\right) \nonumber \\{} & {} \qquad +Q_{n}+2R_{n}, \end{aligned}$$
(7)

where

$$\begin{aligned} Q_{n}{} & {} =\sum _{\mu =1}^{m}\sup _{\theta \in \Theta }\frac{1}{n_{\mu }}\left| \sum _{i=1}^{n_{\mu }}\left( {\hat{H}}_{n\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}} _{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta )\right) ^{2}\left( w_{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i}))-w_{\mu }( {\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i}))\right) \right| ,\\ R_{n}{} & {} =\sum _{\mu =1}^{m}\sup _{\theta \in \Theta }\left| \frac{1}{ n_{\mu }}\sum _{i=1}^{n_{\mu }}\left( H_{\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}} _{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta )\right) ^{2}w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i}))\right. \\{} & {} \left. -\int _{{\mathbb {R}}^{d_{\mu }}}\left( H_{\mu }({\textbf{y}})- {\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}(\textbf{y})\mid \theta )\right) ^{2}w_{\mu }({\bar{F}}_{\mu }^{*}(\textbf{y}))~\text {d}H_{\mu }(\textbf{y})\right| . \end{aligned}$$

Since \(\theta \longmapsto \left( H_{\mu }(\textbf{y})-C_{\mu }({\bar{F}}_{\mu }^{*}(\textbf{y})\mid \theta )\right) ^{2}w_{\mu }({\bar{F}}_{\mu }^{*}(\textbf{y}))\) is continuous for \(\textbf{y}\in {\mathbb {R}}^{d_{\mu }}\) by assumptions, and the envelope function \(w_{\mu }({\bar{F}}_{\mu }^{*}(.))\) is integrable, the strong Glivenko–Cantelli theorem (see Van der Vaart (1998), Theorem 19.4 and Example 19.8) implies

$$\begin{aligned} R_{n}\rightarrow 0\ \ \ a.s. \end{aligned}$$

as \(n\rightarrow \infty \). Further, by Assumption \({\mathcal {A}}_{W}\) (L is the Lipschitz-constant of \(w_{\mu }\)),

$$\begin{aligned} Q_{n}\le & {} \sum _{\mu =1}^{m}\frac{1}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\left| w_{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i}))-w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i}))\right| \\\le & {} L\sum _{\mu =1}^{m}\frac{1}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\left\| {\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i}))-{\bar{F}}_{\mu }^{*}( {\textbf{Y}}_{\mu i}))\right\| \\\le & {} Lm\sum \limits _{j=1}^{d}\sup _{x\in {\mathbb {R}}}\left| F_{nj}(x)-F_{j}(x)\right| \rightarrow 0\ \ \ a.s. \end{aligned}$$

An application of Proposition 3.1 leads to the lemma. \(\square \)

Proof of Theorem 5.1

Theorem 5.1 is a direct consequence of Proposition 9.1 and Lemma 9.2. \(\square \)

9.2 Auxiliary statements

The following lemma about the convergence of the marginal empirical distribution functions can be stated.

Lemma 9.3

$$\begin{aligned} \max _{j=1,\ldots ,d}\sup _{z\in {\mathbb {R}}}\left| {\hat{F}} _{jn}(z)-F_{j}(z)\right| =O_{{\mathbb {P}}}\left( \frac{1}{\sqrt{n}}\right) \end{aligned}$$

Proof

The assertion follows from the Dvoretzky–Kiefer–Wolfowitz inequality, see van der Vaart (1998, p. 268).

In the following we derive central limit theorems. First we consider

$$\begin{aligned} W_{n}=\sum _{\mu =1}^{m}\sqrt{\frac{n_{\mu }}{n}}W_{n}^{(\mu )},\quad W_{n}^{(\mu )}=\frac{1}{\sqrt{n_{\mu }}}\sum _{i=1}^{n_{\mu }}\left( g_{\mu }( {\textbf{Y}}_{\mu i})-{\mathbb {E}}g_{\mu }({\textbf{Y}}_{\mu i})\right) \end{aligned}$$

with functions \(g_{\mu }:{\mathbb {R}}^{d_{\mu }}\rightarrow {\mathbb {R}}^{\kappa }\) and provide a central limit theorem for \(W_{n}\). \(\square \)

Proposition 9.4

Suppose that assumption \({\mathcal {A}}_{n}\) is fulfilled, and \( {\mathbb {E}}\left\| g_{\mu }(\textbf{Y}_{\mu 1})\right\| ^{2}<\infty \) for \(\mu =1,\ldots ,m\). Then we have

$$\begin{aligned} W_{n}\overset{{\mathcal {D}}}{\longrightarrow }{\mathcal {N}}(0,\Sigma _{W}), \end{aligned}$$

where \(\Sigma _{W}=\sum _{\mu =1}^{m}\gamma _{\mu }\text {cov}(g_{\mu }( {\textbf{Y}}_{\mu 1}))\), and \(\text {cov}(Z)\) is covariance matrix of random vector Z.

Proof

Applying the multivariate central limit theorem (see Serfling 1980, Theorem 1.9.1B), we obtain that \(W_{n}^{(1)},\ldots ,W_{n}^{(m)}\) are asymptotically normally distributed. Since these summands \(W_{n}^{(\mu )}\) of \(W_{n}\) are independent, we can conclude the asymptotic normality of \( W_{n}\). For the covariance matrix of \(W_{n}\), we obtain

$$\begin{aligned} \frac{1}{n}\sum _{\mu =1}^{m}\sum _{i=1}^{n_{\mu }}\text {cov}(g_{\mu }( {\textbf{Y}}_{\mu i}))=\frac{1}{n}\sum _{\mu =1}^{m}n_{\mu }\text {cov} (g_{\mu }({\textbf{Y}}_{\mu 1}))\rightarrow \sum _{\mu =1}^{m}\gamma _{\mu }\text {cov} (g_{\mu }({\textbf{Y}}_{\mu 1})) \end{aligned}$$

as \(n\rightarrow \infty \). \(\square \)

Let \(\Lambda _{\mu \nu }:{\mathbb {R}}^{d_{\mu }}\times {\mathbb {R}}^{d_{\nu }}\rightarrow {\mathbb {R}}^{\kappa }\) be measurable functions for \(\mu ,\nu =1,\ldots ,m\), \(\Lambda _{\mu \nu }=(\Lambda _{\mu \nu }^{(1)},\ldots ,\Lambda _{\mu \nu }^{(\kappa )})^{T}\). Next we derive a central limit theorem for the U-statistic

$$\begin{aligned} U_{n}=\frac{1}{n\sqrt{n}}\sum _{\mu =1}^{m}\sum _{i=1}^{n_{\mu }}\sum _{\nu =1}^{m}\sum _{j=1}^{n_{\nu }}\left( \Lambda _{\mu \nu }({\textbf{Y}}_{\mu i}, {\textbf{Y}}_{\nu j})-{\mathbb {E}}\Lambda _{\mu \nu }({\textbf{Y}}_{\mu i},\textbf{ Y}_{\nu j})\right) . \end{aligned}$$

Let \(\theta _{\mu \nu }={\mathbb {E}}\Lambda _{\mu \nu }({\textbf{Y}}_{\mu 1}, {\textbf{Y}}_{\nu 2})\) for all \(\mu ,\nu \). We introduce

$$\begin{aligned} {\tilde{h}}_{\mu \nu }({\textbf{y}})={\mathbb {E}}\Lambda _{\mu \nu }({\textbf{Y}} _{\mu 1},{\textbf{y}})+{\mathbb {E}}\Lambda _{\nu \mu }({\textbf{y}},{\textbf{Y}} _{\mu 1})-\theta _{\mu \nu }-\theta _{\nu \mu }\text { for }{\textbf{y}}\in {\mathbb {R}}^{d_{\nu }}, \end{aligned}$$

\({\tilde{h}}_{\mu \nu }=({\tilde{h}}_{\mu \nu }^{(1)},\ldots ,{\tilde{h}}_{\mu \nu }^{(\kappa )})^{T}\). Note that \({\mathbb {E}}{\tilde{h}}_{\mu \nu }({\textbf{Y}} _{\nu 1})={\mathbb {E}}{\tilde{h}}_{\nu \mu }({\textbf{Y}}_{\mu 1})=0\) for all \(\mu ,\nu \). Proposition 9.5 provides the central limit theorem for \( U_{n}\). In the proof we use Hájek’s projection principle.

Proposition 9.5

Suppose that \({\mathbb {E}}\Lambda _{\mu \nu }^{(L)}({\textbf{Y}} _{\mu 1},{\textbf{Y}}_{\nu j})^{2}<+\infty \) for all \(\mu ,\nu ,L=1,\ldots ,\kappa ,j=1,2\). We have

$$\begin{aligned} U_{n}\overset{{\mathcal {D}}}{\longrightarrow }{\mathcal {N}}(0,\Sigma _{U}), \end{aligned}$$

where \(\Sigma _{U}=\sum _{\nu =1}^{m}\gamma _{\nu }\sum _{\mu =1}^{m}\sum _{ {\bar{\mu }}=1}^{m}\gamma _{\mu }\gamma _{{\bar{\mu }}}{\mathbb {E}}{\tilde{h}}_{\mu \nu }({\textbf{Y}}_{\nu 1}){\tilde{h}}_{{\bar{\mu }}\nu }^{T}({\textbf{Y}}_{\nu 1})\).

Proof

Define

$$\begin{aligned} U_{n}^{ {{}^\circ } }=\frac{1}{n\sqrt{n}}\sum _{\mu =1}^{m}\sum _{i=1}^{n_{\mu }}\left( \Lambda _{\mu \mu }({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\mu i})-{\mathbb {E}}\Lambda _{\mu \mu }({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\mu i})\right) . \end{aligned}$$

We obtain (\({\mathbb {V}}\) denotes the variance)

$$\begin{aligned} {\mathbb {E}}\left\| U_{n}^{ {{}^\circ } }\right\| ^{2}= & {} n^{-3}\sum _{l=1}^{\kappa }{\mathbb {E}}\left( \sum _{\mu =1}^{m}\sum _{i=1}^{n_{\mu }}\left( \Lambda _{\mu \mu }^{(l)}({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\mu i})-{\mathbb {E}}\Lambda _{\mu \mu }^{(l)}({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\mu i})\right) \right) ^{2} \\\le & {} n^{-3}\sum _{l=1}^{\kappa }\sum _{\mu =1}^{m}\sum _{i=1}^{n_{\mu }}\mathbb { V}\left( \Lambda _{\mu \mu }^{(l)}({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\mu i})\right) \\= & {} O(n^{-2}). \end{aligned}$$

Hence \(U_{n}^{ {{}^\circ } }\overset{{\mathbb {P}}}{\longrightarrow }0\) holds, and

$$\begin{aligned} U_{n}{} & {} =n^{-3/2}\sum _{\mu =1}^{m}\sum _{i=1}^{n_{\mu }}\sum _{\nu =\mu }^{m}\sum _{j=1+\delta _{\nu \mu }i}^{n_{\nu }}\left( \Lambda _{\mu \nu }( {\textbf{Y}}_{\mu i},{\textbf{Y}}_{\nu j})+\Lambda _{\nu \mu }({\textbf{Y}}_{\nu j},{\textbf{Y}}_{\mu i})-\theta _{\mu \nu }-\theta _{\nu \mu }\right) \nonumber \\{} & {} \quad +o_{{\mathbb {P}}}(1), \end{aligned}$$
(8)

where \(\delta _{\mu \mu }=1\) and \(\delta _{\nu \mu }=0\) for \(\nu \ne \mu \). Now we split the sum in (8) into two parts. Define

$$\begin{aligned} {\tilde{U}}_{n\mu \nu }{} & {} =n^{-3/2}\sum _{i=1}^{n_{\mu }}\sum _{j=1+\delta _{\nu \mu }i}^{n_{\nu }}{\tilde{\Lambda }}_{\mu \nu }({\textbf{Y}}_{\mu i},{\textbf{Y}} _{\nu j})\text {, and} \\ {\tilde{\Lambda }}_{\mu \nu }({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\nu j}){} & {} = \Lambda _{\mu \nu }({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\nu j})+\Lambda _{\nu \mu }( {\textbf{Y}}_{\nu j},{\textbf{Y}}_{\mu i})-{\tilde{h}}_{\mu \nu }({\textbf{Y}}_{\nu j})\\{} & {} \quad \ -{\tilde{h}}_{\nu \mu }({\textbf{Y}}_{\mu i})-\theta _{\mu \nu }-\theta _{\nu \mu }. \end{aligned}$$

Notice that \({\mathbb {E}}{\tilde{\Lambda }}_{\mu \nu }({\textbf{Y}}_{\mu i},\textbf{ Y}_{\nu j})=0\). Then we have

$$\begin{aligned} U_{n}= & {} {\tilde{U}}_{n}+{\bar{U}}_{n}+o_{{\mathbb {P}}}(1)\text {, where}\\ {\tilde{U}}_{n}= & {} \sum _{\mu =1}^{m}\sum _{\nu =\mu }^{m}{\tilde{U}}_{n\mu \nu },\quad {\bar{U}}_{n}=n^{-3/2}\sum _{\mu =1}^{m}\sum _{i=1}^{n_{\mu }}\sum _{\nu =\mu }^{m}\sum _{j=1+\delta _{\nu \mu }i}^{n_{\nu }}\left( {\tilde{h}}_{\mu \nu }({\textbf{Y}}_{\nu j})+{\tilde{h}}_{\nu \mu }({\textbf{Y}}_{\mu i})\right) .\nonumber \end{aligned}$$
(9)

The next step is to show \({\tilde{U}}_{n}=o_{{\mathbb {P}}}(1)\). Later we prove asymptotic normality of \({\bar{U}}_{n}\). Observe that

$$\begin{aligned} {\mathbb {E}}\left( {\tilde{\Lambda }}_{\mu \nu }({\textbf{Y}}_{\mu i},{\textbf{Y}} _{\nu j})\mid {\textbf{Y}}_{\mu i}\right)= & {} {\tilde{h}}_{\nu \mu }({\textbf{Y}} _{\mu i})-{\tilde{h}}_{\nu \mu }({\textbf{Y}}_{\mu i})=0,\ \ {\mathbb {E}}\left( {\tilde{\Lambda }}_{\mu \nu }({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\nu j})\right) =0,\\ {\mathbb {E}}\left( {\tilde{\Lambda }}_{\mu \nu }({\textbf{Y}}_{\mu i},{\textbf{Y}} _{\nu j})\mid {\textbf{Y}}_{\nu j}\right)= & {} {\tilde{h}}_{\mu \nu }({\textbf{Y}} _{\nu j})-{\tilde{h}}_{\mu \nu }({\textbf{Y}}_{\nu j})=0,\ \ {\mathbb {E}}\left( {\tilde{\Lambda }}_{\mu \nu }({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\nu j})\right) =0 \end{aligned}$$

for \(i\ne j\) or \(\mu \ne \nu \). Therefore, identity

$$\begin{aligned}{} & {} {\mathbb {E}}{\tilde{\Lambda }}_{\mu \nu }^{(L)}({\textbf{Y}}_{\mu i},{\textbf{Y}} _{\nu j}){\tilde{\Lambda }}_{\mu \nu }^{(L)}({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\nu l})\\{} & {} \quad ={\mathbb {E}}\left( {\mathbb {E}}\left( {\tilde{\Lambda }}_{\mu \nu }^{(L)}( {\textbf{Y}}_{\mu i},{\textbf{Y}}_{\nu j})\mid {\textbf{Y}}_{\mu i}\right) \mathbb { E}\left( {\tilde{\Lambda }}_{\mu \nu }^{(L)}({\textbf{Y}}_{\mu i},Y_{\nu l})\mid {\textbf{Y}}_{\mu i}\right) \right) =0 \end{aligned}$$

holds for \(l\ne j\). Thus, by this equation and similar identities, we have

$$\begin{aligned} {\mathbb {E}}{\tilde{\Lambda }}_{\mu \nu }^{(L)}({\textbf{Y}}_{\mu i},{\textbf{Y}} _{\nu j}){\tilde{\Lambda }}_{\mu \nu }^{(L)}({\textbf{Y}}_{\mu k},{\textbf{Y}}_{\nu l})=0 \end{aligned}$$

for \((i=k,j\ne l)\vee (j=l,i\ne k)\vee (\mu =\nu ,i=l,j\ne k)\vee (\mu =\nu ,j=k,i\ne l)\). Obviously, this equation holds for different indices ijkl. On the other hand, we have

$$\begin{aligned} {\mathbb {E}}{\tilde{h}}_{\mu \nu }^{(L)}({\textbf{Y}}_{\nu 1})^{2}= & {} {\mathbb {E}} \left( {\mathbb {E}}\left( \Lambda _{\mu \nu }^{(L)}({\textbf{Y}}_{\mu 1},\textbf{ Y}_{\nu 1})+\Lambda _{\nu \mu }^{(L)}({\textbf{Y}}_{\nu 1},{\textbf{Y}}_{\mu 1})-\theta _{\mu \nu }-\theta _{\nu \mu }\mid {\textbf{Y}}_{\nu 1}\right) ^{2}\right) \\\le & {} {\mathbb {E}}\left( {\mathbb {E}}\left( \left( \Lambda _{\mu \nu }^{(L)}( {\textbf{Y}}_{\mu 1},{\textbf{Y}}_{\nu 1})+\Lambda _{\nu \mu }^{(L)}({\textbf{Y}} _{\nu 1},{\textbf{Y}}_{\mu 1})-\theta _{\mu \nu }-\theta _{\nu \mu }\right) ^{2}\mid {\textbf{Y}}_{\nu 1}\right) \right) \\\le & {} 2\left( {\mathbb {E}}\Lambda _{\mu \nu }^{(L)}({\textbf{Y}}_{\mu 1}, {\textbf{Y}}_{\nu 1})^{2}+{\mathbb {E}}\Lambda _{\nu \mu }^{(L)}({\textbf{Y}}_{\nu 1},{\textbf{Y}}_{\mu 1})^{2}\right) . \end{aligned}$$

Consequently, for \(\mu \ne \nu \), it can be derived

$$\begin{aligned} {\mathbb {E}}\left\| {\tilde{U}}_{n\mu \nu }\right\| ^{2}= & {} n^{-3}\sum _{L=1}^{\kappa }\sum _{i=1}^{n_{\mu }}\sum _{j=1}^{n_{\nu }}\sum _{k=1}^{n_{\mu }}\sum _{l=1}^{n_{\nu }}{\mathbb {E}}{\tilde{\Lambda }}_{\mu \nu }^{(L)}({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\nu j}){\tilde{\Lambda }}_{\mu \nu }^{(L)}({\textbf{Y}}_{\mu k},{\textbf{Y}}_{\nu l}) \\= & {} n^{-3}\sum _{L=1}^{\kappa }\sum _{i=1}^{n_{\mu }}\sum _{j=1}^{n_{\nu }} {\mathbb {E}}{\tilde{\Lambda }}_{\mu \nu }^{(L)}({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\nu j})^{2} \\\le & {} 4n^{-1} \\{} & {} \sum _{L=1}^{\kappa }\left( {\mathbb {E}}\Lambda _{\mu \nu }^{(L)}({\textbf{Y}} _{\mu 1},{\textbf{Y}}_{\nu 1})^{2}+{\mathbb {E}}\Lambda _{\nu \mu }^{(L)}(\textbf{ Y}_{\nu 1},{\textbf{Y}}_{\mu 1})^{2}+{\mathbb {E}}{\tilde{h}}_{\mu \nu }^{(L)}( {\textbf{Y}}_{\nu 1})^{2}\right. \\{} & {} \left. +{\mathbb {E}}{\tilde{h}}_{\nu \mu }^{(L)}({\textbf{Y}}_{\mu 1})^{2}\right) \\= & {} O(n^{-1}). \end{aligned}$$

In a similar way, we obtain

$$\begin{aligned} {\mathbb {E}}\left\| {\tilde{U}}_{n\mu \mu }\right\| ^{2}= & {} n^{-3}\sum _{L=1}^{\kappa }\left( \sum _{i=1}^{n_{\mu }}\sum _{j=1+i}^{n_{\nu }}{\mathbb {E}}{\tilde{\Lambda }}_{\mu \mu }^{(L)}(\textbf{Y}_{\mu i},{\textbf{Y}}_{\mu j})^{2}\right. \\{} & {} \left. +\sum _{i=1}^{n_{\mu }}\sum _{j=1+i}^{n_{\nu }}{\mathbb {E}}\tilde{ \Lambda }_{\mu \mu }^{(L)}({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\mu j})\tilde{ \Lambda }_{\mu \mu }^{(L)}({\textbf{Y}}_{\mu j},{\textbf{Y}}_{\mu i})\right) \\\le & {} 8n^{-1}\sum _{L=1}^{\kappa }\left( {\mathbb {E}}\Lambda _{\mu \mu }^{(L)}( {\textbf{Y}}_{\mu 1},{\textbf{Y}}_{\mu 2})^{2}+{\mathbb {E}}{\tilde{h}}_{\mu \mu }^{(L)}({\textbf{Y}}_{\mu 1})^{2}\right) \\= & {} O(n^{-1}). \end{aligned}$$

Hence \({\tilde{U}}_{n}\overset{{\mathbb {P}}}{\longrightarrow }0\) holds true. Notice that

$$\begin{aligned} {\mathbb {E}}\left\| n^{-3/2}\sum _{\mu =1}^{m}\sum _{i=1}^{n_{\mu }}{\tilde{h}} _{\mu \mu }({\textbf{Y}}_{\mu i})\right\| ^{2}= & {} n^{-3}\sum _{L=1}^{\kappa } {\mathbb {E}}\left( \sum _{\mu =1}^{m}\sum _{i=1}^{n_{\mu }}{\tilde{h}}_{\mu \mu }^{(L)}({\textbf{Y}}_{\mu i})\right) ^{2} \\= & {} n^{-3}\sum _{L=1}^{\kappa }\sum _{\mu =1}^{m}\sum _{i=1}^{n_{\mu }}{\mathbb {E}}{\tilde{h}}_{\mu \mu }^{(L)}({\textbf{Y}}_{\mu i})^{2} \\\le & {} 4n^{-2}\sum _{L=1}^{\kappa }\sum _{\mu =1}^{m}{\mathbb {E}}\Lambda _{\mu \mu }^{(L)}({\textbf{Y}}_{\mu 1},{\textbf{Y}}_{\mu 2})^{2} \\= & {} O(n^{-1}). \end{aligned}$$

On the other hand, it follows that

$$\begin{aligned} {\bar{U}}_{n}= & {} n^{-3/2}\sum _{\mu =1}^{m}\sum _{i=1}^{n_{\mu }}\sum _{\nu =1}^{m}\sum _{j=1}^{n_{\nu }}{\tilde{h}}_{\mu \nu }({\textbf{Y}}_{\nu j})\textbf{1 }\left( \nu \ne \mu \vee i\ne j\right) \\= & {} n^{-3/2}\left( \sum _{\mu =1}^{m}n_{\mu }\sum _{\nu =1}^{m}\sum _{j=1}^{n_{\nu }}{\tilde{h}}_{\mu \nu }({\textbf{Y}}_{\nu j})-\sum _{\mu =1}^{m}\sum _{i=1}^{n_{\mu }}{\tilde{h}}_{\mu \mu }({\textbf{Y}} _{\mu i})\right) \\= & {} U_{n}^{(1)}+U_{n}^{(2)}+o_{{\mathbb {P}}}(1), \end{aligned}$$

where

$$\begin{aligned} U_{n}^{(1)}= & {} n^{-1/2}\sum _{\nu =1}^{m}\sum _{j=1}^{n_{\nu }}g_{\nu }( {\textbf{Y}}_{\nu j}),\ \ g_{\nu }({\textbf{y}})=\sum _{\mu =1}^{m}\gamma _{\mu } {\tilde{h}}_{\mu \nu }({\textbf{y}}), \\ U_{n}^{(2)}= & {} n^{-1/2}\sum _{\mu =1}^{m}\left( \frac{n_{\mu }}{n}-\gamma _{\mu }\right) \sum _{\nu =1}^{m}\sum _{j=1}^{n_{\nu }}{\tilde{h}}_{\mu \nu }( {\textbf{Y}}_{\nu j}). \end{aligned}$$

An application of Proposition 9.4 to the sum \(U_{n}^{(1)}\) leads to \( U_{n}^{(1)}\overset{d}{\longrightarrow }{\mathcal {N}}(0,\Sigma _{U})\), where

$$\begin{aligned} \Sigma _{U}= & {} \sum _{\nu =1}^{m}\gamma _{\nu }{\mathbb {E}}g_{\nu }({\textbf{Y}}_{\nu 1})g_{\nu }({\textbf{Y}}_{\nu 1})^{T} \\= & {} \sum _{\nu =1}^{m}\gamma _{\nu }\sum _{\mu =1}^{m}\sum _{{\bar{\mu }} =1}^{m}\gamma _{\mu }\gamma _{{\bar{\mu }}}{\mathbb {E}}{\tilde{h}}_{\mu \nu }( {\textbf{Y}}_{\nu 1}){\tilde{h}}_{{\bar{\mu }}\nu }^{T}({\textbf{Y}}_{\nu 1}). \end{aligned}$$

Analogously, one shows that

$$\begin{aligned} \sum _{\nu =1}^{m}\sum _{j=1}^{n_{\nu }}{\tilde{h}}_{\mu \nu }({\textbf{Y}}_{\nu j})\overset{{\mathcal {D}}}{\longrightarrow }{\mathcal {N}}(0,\Sigma _{0}) \end{aligned}$$

for \(\mu =1,\ldots ,m\) with an appropriate \(\Sigma _{0}\), which implies \( U_{n}^{(2)}\overset{{\mathbb {P}}}{\longrightarrow }0\). Therefore, we have \( {\bar{U}}_{n}\overset{{\mathcal {D}}}{\longrightarrow }{\mathcal {N}}(0,\Sigma _{U})\). By (9) and \({\tilde{U}}_{n}\overset{{\mathbb {P}}}{\longrightarrow } 0\), the proof is complete. \(\square \)

9.3 Proof of Theorem 5.2

Throughout this section we suppose that the assumptions of Theorem 5.1b), and Assumption \({\mathcal {A}}_{C}\) are satisfied. Here \(\theta _{0}\) is the unique minimizer of \({\mathcal {D}}\) defined in (3). We introduce \({\mathcal {H}}_{n}(\theta )=({\mathcal {H}}_{nkl}(\theta ))_{k,l=1,\ldots ,q}\) as the Hessian of \(\widehat{{\mathcal {D}}}_{n}(\theta )\):

$$\begin{aligned} {\mathcal {H}}_{nkl}(\theta )= & {} \sum _{\mu =1}^{m}\frac{2}{n_{\mu }} \sum _{i=1}^{n_{\mu }}\left( -\left( {\hat{H}}_{n\mu }({\textbf{Y}}_{\mu i})- {\mathcal {C}}_{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta )\right) {\bar{\mathcal {C}}}_{\mu kl}^{\circ }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta )\right. \\{} & {} \left. +{\bar{\mathcal {C}}}_{\mu k}^{\circ }({\check{F}}_{n\mu }^{*}(\textbf{ Y}_{\mu i})\mid \theta ){\bar{\mathcal {C}}}_{\mu l}^{\circ }({\check{F}}_{n\mu }^{*}( {\textbf{Y}}_{\mu i})\mid \theta )\right) w_{\mu }({\check{F}}_{n\mu }^{*}( {\textbf{Y}}_{\mu i})). \end{aligned}$$

\(\nabla _{\theta }\psi (\theta )\) denotes the gradient of function \(\psi \) w.r.t. \(\theta \), and \(\nabla _{\theta }\psi (\theta _{0})\) is the abbreviation for \(\left. \nabla _{\theta }\psi (\theta )\right| _{\theta =\theta _{0}}\). Observe that

$$\begin{aligned} \nabla _{\theta }\widehat{{\mathcal {D}}}_{n}(\theta )= & {} -\sum _{\mu =1}^{m}\frac{2 }{n_{\mu }}\sum _{i=1}^{n_{\mu }}\left( {\hat{H}}_{n\mu }({\textbf{Y}}_{\mu i})- {\mathcal {C}}_{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta )\right) \\{} & {} \nabla _{\theta }{\mathcal {C}}_{\mu }({\check{F}}_{n\mu }^{*}( {\textbf{Y}}_{\mu i})\mid \theta )w_{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}} _{\mu i})). \end{aligned}$$

Let \({\tilde{\theta }}_{n}\) be a minimizer of \(\widehat{{\mathcal {D}}}_{n}(\cdot ) \). Since \(\nabla _{\theta }\widehat{{\mathcal {D}}}_{n}({\tilde{\theta }}_{n})=0\), we can use the Taylor formula to derive (note that \(\theta _{0}\) is an interior point of \(\Theta \))

$$\begin{aligned} \nabla _{\theta }\widehat{{\mathcal {D}}}_{n}(\theta _{0})=\nabla _{\theta } \widehat{{\mathcal {D}}}_{n}(\theta _{0})-\nabla _{\theta }\widehat{{\mathcal {D}}}_{n}({\tilde{\theta }}_{n})=-{\mathcal {H}}_{n}^{*}\left( {\tilde{\theta }} _{n}-\theta _{0}\right) , \end{aligned}$$
(10)

where \(t_{nk}^{*}=\theta _{0}+\eta _{nk}\left( {\tilde{\theta }}_{n}-\theta _{0}\right) \) and \({\mathcal {H}}_{n}^{*}=({\mathcal {H}}_{nkl}(t_{nk}^{*}))_{k,l=1,\ldots ,q}\). Here \(\eta _{nk}\in (0,1)\) is a random variable, \( k=1,\ldots ,q\).

Taking identity (10) into account, Theorem 5.2 is proven in three steps: we show the asymptotic normality of \(\nabla _{\theta } \widehat{{\mathcal {D}}}_{n}(\theta _{0})\), we prove that \({\mathcal {H}} _{n}^{*}\) converges in probability to a certain matrix, and we show that \({\tilde{\theta }}_{n}-{\hat{\theta }}_{n}\) is \(o_{{\mathbb {P}}}(n^{-1/2})\). The following lemma includes the first step.

Lemma 9.6

We have

$$\begin{aligned} \sqrt{n}\nabla _{\theta }\widehat{{\mathcal {D}}}_{n}(\theta _{0})\overset{ {\mathcal {D}}}{\longrightarrow }{\mathcal {N}}(0,\Sigma _{D}), \end{aligned}$$

with \(\Sigma _{D}\) as in Theorem 5.2.

Proof

We decompose \(-\nabla _{\theta }\widehat{{\mathcal {D}}} _{n}(\theta _{0})\) and obtain

$$\begin{aligned} -\nabla _{\theta }\widehat{{\mathcal {D}}}_{n}(\theta _{0})=A_{1n}+A_{2n}+A_{3n}+A_{4n}, \end{aligned}$$
(11)
$$\begin{aligned} A_{1n}= & {} \sum _{\mu =1}^{m}\frac{2}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\left( H_{\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}( {\textbf{Y}}_{\mu i})\mid \theta _{0})\right) \nabla _{\theta }{\mathcal {C}} _{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})w_{\mu }( {\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})), \\ A_{2n}= & {} \sum _{\mu =1}^{m}\frac{2}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\left( {\hat{H}}_{n\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})-H_{\mu }({\textbf{Y}}_{\mu i})+ {\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right) \\{} & {} \quad \left( \nabla _{\theta }{\mathcal {C}}_{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})w_{\mu }({\check{F}}_{n\mu }^{*}( {\textbf{Y}}_{\mu i}))-\nabla _{\theta }{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i}))\right) , \\ A_{3n}= & {} \sum _{\mu =1}^{m}\frac{2}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\left( {\hat{H}}_{n\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})-H_{\mu }({\textbf{Y}}_{\mu i})+ {\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right) \\{} & {} \quad \nabla _{\theta }{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}(\textbf{Y }_{\mu i})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})), \\ A_{4n}= & {} \sum _{\mu =1}^{m}\frac{2}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\left( H_{\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}( {\textbf{Y}}_{\mu i})\mid \theta _{0})\right) \\{} & {} \quad \left( \nabla _{\theta }{\mathcal {C}}_{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})w_{\mu }({\check{F}}_{n\mu }^{*}( {\textbf{Y}}_{\mu i}))-\nabla _{\theta }{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i}))\right) . \end{aligned}$$

Further we define

$$\begin{aligned} A_{3n}^{*}= & {} \sum _{\mu =1}^{m}\frac{2}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\nabla _{\theta }{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})) \\{} & {} \left( {\hat{H}}_{n\mu }({\textbf{Y}}_{\mu i})-H_{\mu }({\textbf{Y}}_{\mu i})-\sum _{l\in J_{\mu }}{\tilde{\mathcal {C}}}_{\mu l}^{ {{}^\circ }}({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right. \\{} & {} \quad \left. \left( {\hat{F}} _{nl}({\bar{\psi }}_{l\mu }({\textbf{Y}}_{\mu i}))-F_{l}({\bar{\psi }}_{l\mu }( {\textbf{Y}}_{\mu i}))\right) \right) , \\ A_{4nk}^{*}= & {} \sum _{\mu =1}^{m}\frac{2}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\left( H_{\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right) \\{} & {} \left. \sum _{l\in J_{\mu }}\left( {\tilde{\mathcal {C}}}_{\mu lk}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})w_{\mu }({\bar{F}} _{\mu }^{*}({\textbf{Y}}_{\mu i}))+{\bar{\mathcal {C}}}_{\mu k}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})w_{\mu l}(\bar{F}_{\mu }^{*}({\textbf{Y}}_{\mu i}))\right) \right. \\{} & {} \left. ({\hat{F}}_{l}({\bar{\psi }}_{l\mu }({\textbf{Y}}_{\mu i}))-F_{l}({\bar{\psi }}_{l\mu }({\textbf{Y}}_{\mu i}))),\right. \end{aligned}$$

and \(A_{4n}^{*}=\left( A_{4nk}^{*}\right) _{k=1,\ldots ,q}\), \( A_{4n}=\left( A_{4nk}\right) _{k=1,\ldots ,q}\). Obviously, \(\nabla _{\theta }{\mathcal {D}}(C,{\mathcal {C}}(\cdot \mid \theta _0))=0\) such that \({\mathbb {E}}A_{1n}=0\). The next step is to show that \(A_{2n}+\left( A_{3n}-A_{3n}^{*}\right) +\left( A_{4n}-A_{4n}^{*}\right) =o_{{\mathbb {P}}}(n^{-1/2})\). Note that copulas are Lipschitz continuous. Since the partial derivatives \({\bar{\mathcal {C}}}_{\mu l}^{\circ }(\cdot \mid \theta _{0})\) are Lipschitz continuous by assumption \({\mathcal {A}}_{C}\) and the weight functions \(w_{\mu }\) are Lipschitz continuous by assumption \( {\mathcal {A}}_{W}\), we obtain

$$\begin{aligned} \left\| A_{2n}\right\|\le & {} O(1)\max _{\mu =1,\ldots ,m}\sup _{ {\textbf{y}}\in {\mathbb {R}}^{d_{\mu }}}\left( \left| {\hat{H}}_{n\mu }( {\textbf{y}})-H_{\mu }({\textbf{y}})\right| +\left\| {\check{F}}_{n\mu }^{*}({\textbf{y}})-{\bar{F}}_{\mu }^{*}({\textbf{y}})\right\| \right) \nonumber \\{} & {} \max _{\mu =1,\ldots ,m}\sup _{{\textbf{y}}\in {\mathbb {R}}^{d_{\mu }}}\left\| {\check{F}}_{n\mu }^{*}({\textbf{y}})-{\bar{F}}_{\mu }^{*}( {\textbf{y}})\right\| \nonumber \\= & {} \left( O\left( \sqrt{\frac{\ln \ln n}{n}}\right) +\sum _{j=1}^{d}\sup _{z\in {\mathbb {R}}}\left| {\hat{F}}_{jn}(z)-F_{j}(z) \right| \right) \sum _{j=1}^{d}\sup _{z\in {\mathbb {R}}}\left| {\hat{F}} _{jn}(z)-F_{j}(z)\right| \nonumber \\= & {} O\left( \frac{\ln \ln n}{n}\right) \ \ \ a.s. \end{aligned}$$
(12)

by applying Proposition 3.1. Let \(\tau _{n}:=\max _{j=1,\ldots ,d}\sup _{z\in {\mathbb {R}}}\left| {\hat{F}}_{jn}(z)-F_{j}(z)\right| \). Observe that \({\tilde{\mathcal {C}}}_{\mu l}^{ {{}^\circ } }(\cdot \mid \theta )\) is uniformly continuous on \([0,1]^{d_{\mu }}\) for \( \theta \in U(\theta _{0})\) in view of the Heine-Cantor theorem. Further by Lemma 9.3 and the mean value theorem, we obtain

$$\begin{aligned} \left\| A_{3n}-A_{3n}^{*}\right\|\le & {} \sum _{\mu =1}^{m}\frac{2 }{n_{\mu }}\sum _{i=1}^{n_{\mu }}\sum _{l\in J_{\mu }}\left| {\hat{F}}_{nl}( {\bar{\psi }}_{l\mu }({\textbf{Y}}_{\mu i}))-F_{l}({\bar{\psi }}_{l\mu }({\textbf{Y}} _{\mu i}))\right| \nonumber \\{} & {} \sup _{0\le \eta \le 1}\left| {\tilde{\mathcal {C}}}_{\mu l}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})-\mathcal { {\tilde{C}}}_{\mu l}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\right. \nonumber \\{} & {} \quad \left. +\eta \left( {\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})-{\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\right) \mid \theta _{0})\right| \nonumber \\{} & {} \sup _{{\textbf{u}}\in [0,1]^{d_{\mu }}}\left( \left\| \nabla _{\theta }{\mathcal {C}}_{\mu }({\textbf{u}}\mid \theta _{0})\right\| \left| w_{\mu }({\textbf{u}})\right| \right) \nonumber \\\le & {} O\left( \tau _{n}\right) \cdot \sum _{\mu =1}^{m}\sum _{l\in J_{\mu }}\sup _{{\textbf{u}}\in [0,1]^{d_{\mu }}}\sup _{{\bar{\mathbf {\eta }}}:\left\| {\bar{\mathbf {\eta }}}\right\| \le d\tau _{n}}\left| {\tilde{\mathcal {C}}}_{\mu l}^{ {{}^\circ } }({\textbf{u}}\mid \theta _{0})-{\tilde{\mathcal {C}}}_{\mu l}^{ {{}^\circ } }({\textbf{u}}+{\bar{\mathbf {\eta }}}\mid \theta _{0})\right| \nonumber \\= & {} o_{{\mathbb {P}}}(n^{-1/2}). \end{aligned}$$
(13)

On the other hand by Lemma 9.3 and the mean value theorem, we derive (\({\check{F}}_{n\mu \eta }^{**}(y):={\bar{F}}_{\mu }^{*}(y)+\eta \left( {\check{F}}_{n\mu }^{*}(y)-{\bar{F}}_{\mu }^{*}(y)\right) \) for \( y\in {\mathbb {R}}^{d_{\mu }}\))

$$\begin{aligned}{} & {} \left| A_{4nk}-A_{4nk}^{*}\right| \nonumber \\{} & {} \quad \le \sum _{\mu =1}^{m}\frac{2}{n_{\mu }}\sum _{i=1}^{n_{\mu }} \nonumber \\{} & {} \qquad \sum _{l\in J_{\mu }}\left( \sup _{\eta :0\le \eta \le 1}\left| {\tilde{\mathcal {C}}}_{\mu lk}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})w_{\mu }({\bar{F}} _{\mu }^{*}({\textbf{Y}}_{\mu i}))\right. -{\tilde{\mathcal {C}}}_{\mu lk}^{ {{}^\circ } }({\check{F}}_{n\mu \eta }^{**}({\textbf{Y}}_{\mu i})\mid \theta _{0}))w_{\mu }({\check{F}}_{n\mu \eta }^{**}({\textbf{Y}}_{\mu i}))\right| \nonumber \\{} & {} \qquad +\sup _{\eta :0\le \eta \le 1}\left| {\bar{\mathcal {C}}}_{\mu k}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})w_{\mu l}(\bar{F }_{\mu }^{*}({\textbf{Y}}_{\mu i}))\left. -{\bar{\mathcal {C}}}_{\mu k}^{ {{}^\circ } }({\check{F}}_{n\mu \eta }^{**}({\textbf{Y}}_{\mu i})\mid \theta _{0})w_{\mu l}({\check{F}}_{n\mu \eta }^{**}({\textbf{Y}}_{\mu i}))\right| \right) \nonumber \\{} & {} \qquad \qquad \qquad \left| {\hat{F}}_{l}({\bar{\psi }}_{l\mu }({\textbf{Y}}_{\mu i}))-F_{l}({\bar{\psi }}_{l\mu }({\textbf{Y}}_{\mu i}))\right| \nonumber \\{} & {} \quad \le O\left( \tau _{n}\right) \sum _{\mu =1}^{m}\max _{l\in J_{\mu }} \nonumber \\{} & {} \qquad \left( \sup _{{\textbf{u}}\in [0,1]^{d_{\mu }}}\sup _{{\bar{\mathbf {\eta }} }:\left\| {\bar{\mathbf {\eta }}}\right\| \le d\tau _{n}}\left( \left| {\tilde{\mathcal {C}}}_{\mu lk}^{ {{}^\circ } }({\textbf{u}}\mid \theta _{0})-{\tilde{\mathcal {C}}}_{\mu lk}^{ {{}^\circ } }({\textbf{u}}+{\bar{\mathbf {\eta }}}\mid \theta _{0})\right| \right. +\left| w_{\mu }({\textbf{u}})-w_{\mu }({\textbf{u}}+{\bar{\mathbf {\eta }}})\right| \right) \nonumber \\{} & {} \qquad +\sup _{{\textbf{u}}\in [0,1]^{d_{\mu }}}\sup _{\bar{\varvec{\eta }}:\left\| {\bar{\mathbf {\eta }}}\right\| \le d\tau _{n}}\left( \left| {\bar{\mathcal {C}}}_{\mu k}^{ {{}^\circ } }({\textbf{u}}\mid \theta _{0})-{\bar{\mathcal {C}}}_{\mu k}^{ {{}^\circ } }({\textbf{u}}+{\bar{\mathbf {\eta }}}\mid \theta _{0})\right| \left. +\left| w_{\mu l}({\textbf{u}})-w_{\mu l}({\textbf{u}}+{\bar{\mathbf {\eta }}})\right| \right) \right) \nonumber \\{} & {} \quad =o_{{\mathbb {P}}}(n^{-1/2})\qquad (k=1,\ldots ,q) \end{aligned}$$
(14)

since \(w_{\mu }\), \(w_{\mu l}\), \({\tilde{\mathcal {C}}}_{\mu lk}^{ {{}^\circ } }(\cdot \mid \theta _{0})\) and \({\tilde{\mathcal {C}}}_{\mu k}^{ {{}^\circ } }(\cdot \mid \theta _{0})\) are uniformly continuous on \([0,1]^{d_{\mu }}\). Identities (12)-(14) imply

$$\begin{aligned} A_{2n}+\left( A_{3n}-A_{3n}^{*}\right) +\left( A_{4n}-A_{4n}^{*}\right) =o_{{\mathbb {P}}}(n^{-1/2}). \end{aligned}$$
(15)

In the remaining part of the proof, we show the asymptotic normality of \( \sqrt{n}A_{n}\), where \(A_{n}=A_{1n}+A_{3n}^{*}+A_{4n}^{*}\). We have

$$\begin{aligned} A_{n}= & {} A_{1n} \\{} & {} \left. +\sum _{\mu =1}^{m}\frac{2}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\nabla _{\theta }{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i}))\right. \\{} & {} \left( \frac{1}{{\bar{n}}_{\mu }}\sum _{\nu :1\le \nu \le m,{\textbf{b}} ^{(\nu )}\ge {\textbf{b}}^{(\mu )}}\sum _{j=1}^{n_{\mu }}\left( {\textbf{1}} \left\{ \psi _{\mu \nu }({\textbf{Y}}_{\nu j})\le {\textbf{Y}}_{\mu i}\right\} -H_{\mu }({\textbf{Y}}_{\mu i})\right) \right. \\{} & {} \quad \left. -\sum _{l\in J_{\mu }}{\tilde{\mathcal {C}}}_{\mu l}^{ {{}^\circ }}({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right. \\{} & {} \left. \frac{1}{{\tilde{n}}_{l}}\sum _{\nu :1\le \nu \le m,l\in J_{\nu }}\sum _{j=1}^{n_{\nu }}\left( {\textbf{1}}\left\{ {\bar{\psi }}_{l\nu }( {\textbf{Y}}_{\nu j})\le {\bar{\psi }}_{l\mu }({\textbf{Y}}_{\mu i})\right\} -F_{l}({\bar{\psi }}_{l\mu } ({\textbf{Y}}_{\mu i}))\right) \right) \\{} & {} +\sum _{\mu =1}^{m}\frac{2}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\left( H_{\mu }( {\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}} _{\mu i})\mid \theta _{0})\right) \\{} & {} \left. \sum _{l\in J_{\mu }}\left( {\tilde{\mathcal {C}}}_{\mu lk}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})w_{\mu }({\bar{F}} _{\mu }^{*}({\textbf{Y}}_{\mu i}))+{\bar{\mathcal {C}}}_{\mu k}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})w_{\mu l}(\bar{F }_{\mu }^{*}({\textbf{Y}}_{\mu i}))\right) _{k=1,\ldots ,q}\right. \\{} & {} \qquad \frac{1}{{\tilde{n}}_{l}}\sum _{\nu :1\le \nu \le m,l\in J_{\nu }}\sum _{j=1}^{n_{\nu }}\left( {\textbf{1}}\left\{ {\bar{\psi }}_{l\nu }( {\textbf{Y}}_{\nu j})\le {\bar{\psi }}_{l\mu }({\textbf{Y}}_{\mu i})\right\} -F_{l}({\bar{\psi }}_{l\mu }({\textbf{Y}}_{\mu i}))\right) \\= & {} 2\sum _{\mu =1}^{m}\sum _{i=1}^{n_{\mu }}\sum _{\nu =1}^{m}\sum _{j=1}^{n_{\nu }} \\{} & {} \left( \frac{1}{n\ n_{\mu }}\Lambda _{\mu }^{(1)}({\textbf{Y}}_{\mu i})+ \frac{1}{n_{\mu }\ {\bar{n}}_{\mu }}\Lambda _{\mu \nu }^{(2)}({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\nu j})+\sum _{l\in J_{\mu }\cap J_{\nu }}\frac{1}{n_{\mu }\ {\tilde{n}}_{l}}\Lambda _{\mu \nu l}^{(3)}({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\nu j})\right) , \end{aligned}$$

where

$$\begin{aligned} \Lambda _{\mu }^{(1)}({\textbf{y}})= & {} \left( H_{\mu }({\textbf{y}})-{\mathcal {C}} _{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}})\mid \theta _{0})\right) \nabla _{\theta }{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}})),\\ \Lambda _{\mu \nu }^{(2)}({\textbf{y}},{\textbf{z}})= & {} \left\{ \begin{array}{ll} \left( {\textbf{1}}\left( \psi _{\mu \nu }({\textbf{z}})\le {\textbf{y}}\right) -H_{\mu }({\textbf{y}})\right) \nabla _{\theta }{\mathcal {C}}_{\mu }({\bar{F}} _{\mu }^{*}({\textbf{y}})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}( {\textbf{y}}))&{}\quad \text {for }\nu :{\textbf{b}}^{(\nu )}\ge {\textbf{b}}^{(\mu )}, \\ 0&{}\quad \text {otherwise,} \end{array} \right. \\ \Lambda _{\mu \nu l}^{(3)}({\textbf{y}},{\textbf{z}})= & {} \left( -\tilde{\mathcal {C}}_{\mu l}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{y}})\mid \theta _{0})\nabla _{\theta } {\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}}))\right. \\{} & {} +\left( H_{\mu }({\textbf{y}})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}( {\textbf{y}})\mid \theta _{0})\right) \\{} & {} \left. \left( {\tilde{\mathcal {C}}}_{\mu lk}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{y}})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}}))+{\bar{\mathcal {C}}}_{\mu k}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{y}})\mid \theta _{0})w_{\mu l}({\bar{F}}_{\mu }^{*}({\textbf{y}}))\right) _{k=1,\ldots ,q}\right) \\{} & {} \quad \left( {\textbf{1}}\left\{ {\bar{\psi }}_{l\nu }({\textbf{z}})\le {\bar{\psi }} _{l\mu }({\textbf{y}})\right\} -F_{l}({\bar{\psi }}_{l\mu }({\textbf{y}}))\right) . \end{aligned}$$

Now we decompose \(\sqrt{n}A_{n}\) such that we obtain

$$\begin{aligned} \sqrt{n}A_{n}=n^{-3/2}\sum _{\mu =1}^{m}\sum _{i=1}^{n_{\mu }}\sum _{\nu =1}^{m}\sum _{j=1}^{n_{\nu }}\Lambda _{\mu \nu }({\textbf{Y}}_{\mu i},{\textbf{Y}} _{\nu j})+\sqrt{n}{\bar{A}}_{n}, \end{aligned}$$
(16)

where

$$\begin{aligned} \Lambda _{\mu \nu }({\textbf{y}},{\textbf{z}})=\frac{2}{\gamma _{\mu }}\left( \Lambda _{\mu \nu }^{(1)}({\textbf{y}})+\frac{1}{{\bar{\gamma }}_{\mu }}\Lambda _{\mu \nu }^{(2)}({\textbf{y}},{\textbf{z}})+\sum _{l=1}^{d}\frac{b_{l}^{(\mu )}b_{l}^{(\nu )}}{{\tilde{\gamma }}_{l}}\Lambda _{\mu \nu l}^{(3)}({\textbf{y}}, {\textbf{z}})\right) , \end{aligned}$$

and

$$\begin{aligned} {\bar{A}}_{n}= & {} n^{-3/2}\sum _{\mu =1}^{m}\sum _{i=1}^{n_{\mu }}\sum _{\nu =1}^{m}\sum _{j=1}^{n_{\nu }}\left( \left( \frac{n}{n_{\mu }}-\frac{1}{\gamma _{\mu }}\right) \Lambda _{\mu }^{(1)}({\textbf{Y}}_{\mu i})\right. \\{} & {} \left. +\left( \frac{n^{2} }{n_{\mu }\ {\bar{n}}_{\mu }}-\frac{1}{\gamma _{\mu }\ {\bar{\gamma }}_{\mu }} \right) \Lambda _{\mu \nu }^{(2)}({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\nu j})\right. \\{} & {} \left. +\sum _{l\in J_{\mu }\cap J_{\nu }}\left( \frac{n^{2}}{n_{\mu }\ {\tilde{n}}_{l}}-\frac{1}{\gamma _{\mu }\ {\tilde{\gamma }}_{l}}\right) \Lambda _{\mu \nu l}^{(3)}({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\nu j})\right) . \end{aligned}$$

In the following we prove that \({\bar{A}}_{n}=o_{{\mathbb {P}}}(n^{-1/2})\). Note that \({\mathbb {E}}\Lambda _{\mu }^{(1)}({\textbf{Y}}_{\mu i})=0\). We have

$$\begin{aligned} {\mathbb {V}}\left( \frac{1}{n^{1/2}}\sum _{i=1}^{n_{\mu }}\left( \Lambda _{\mu }^{(1)}({\textbf{Y}}_{\mu i})\right) _{k}\right) =\frac{n_{\mu }}{n}{\mathbb {V}} \left( \left( \Lambda _{\mu }^{(1)}({\textbf{Y}}_{\mu i})\right) _{k}\right) =O(1)\quad (k=1,\ldots ,q). \end{aligned}$$

Moreover, in view of Proposition 9.5,

$$\begin{aligned}{} & {} \frac{1}{n^{3/2}}\sum _{i=1}^{n_{\mu }}\sum _{j=1}^{n_{\nu }}\left( \Lambda _{\mu \nu }^{(2)}({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\nu j})\right) _{k}\overset{d}{\longrightarrow }{\mathcal {N}}(0,\Sigma _{\mu \nu k})\text {, and} \\{} & {} \frac{1}{n^{3/2}}\sum _{i=1}^{n_{\mu }}\sum _{j=1}^{n_{\nu }}\left( \Lambda _{\mu \nu l}^{(3)}({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\nu j})\right) _{k}\overset{d}{\longrightarrow }{\mathcal {N}}(0,\Sigma _{\mu \nu kl}) \end{aligned}$$

with certain (finite) covariance matrices \(\Sigma _{\mu \nu k},\Sigma _{\mu \nu kl}\) \((k=1,\ldots ,q,l\in J_{\mu }\cap J_{\nu },\mu ,\nu =1,\ldots ,m)\). Hence by Assumption \({\mathcal {A}}_{n}\),

$$\begin{aligned} {\bar{A}}_{n}=o_{{\mathbb {P}}}(n^{-1/2}), \end{aligned}$$

and by equations (11), (15), (16),

$$\begin{aligned} \sqrt{n}\nabla _{\theta }\widehat{{\mathcal {D}}}_{n}(\theta _{0})=-n^{-3/2}\sum _{\mu =1}^{m}\sum _{i=1}^{n_{\mu }}\sum _{\nu =1}^{m}\sum _{j=1}^{n_{\nu }}\Lambda _{\mu \nu }({\textbf{Y}}_{\mu i},{\textbf{Y}} _{\nu j})+o_{{\mathbb {P}}}(1). \end{aligned}$$
(17)

An application of the central limit theorem in Proposition 9.5 gives the asymptotic normality of \(\sqrt{n}\nabla _{\theta }\widehat{ {\mathcal {D}}}_{n}(\theta _{0})\). To derive a formula for the covariance matrix, we consider

$$\begin{aligned} {\tilde{h}}_{\mu \nu }({\textbf{y}})&:={\mathbb {E}}\Lambda _{\mu \nu }({\textbf{Y}} _{\mu 1},{\textbf{y}})+{\mathbb {E}}\Lambda _{\nu \mu }({\textbf{y}},{\textbf{Y}} _{\mu 1})-{\mathbb {E}}\Lambda _{\mu \nu }({\textbf{Y}}_{\mu 1},{\textbf{Y}}_{\nu 2})-{\mathbb {E}}\Lambda _{\nu \mu }({\textbf{Y}}_{\nu 1},{\textbf{Y}}_{\mu 2}) \\&=\frac{2}{\gamma _{\mu }}\left( \left( \left( H_{\mu }({\textbf{y}})- {\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}})\mid \theta _{0})\right) \nabla _{\theta }{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}( {\textbf{y}})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}}))- {\textbf{e}}_{\mu }\right) \right. \\&\quad \left. +\frac{1}{{\bar{\gamma }}_{\mu }}{\textbf{1}}\left( {\textbf{b}}^{(\nu )}\ge {\textbf{b}}^{(\mu )}\right) \right. \\&\quad \left. \int _{{\mathbb {R}}^{d_{\mu }}}\left( {\textbf{1}}\left( \psi _{\mu \nu }({\textbf{y}})\le {\textbf{z}}\right) -H_{\mu }({\textbf{z}})\right) \nabla _{\theta }{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{z}})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{z}}))~\text {d}H_{\mu }( {\textbf{z}})\right. \\&\quad +\sum _{l=1}^{d}\frac{1}{{\tilde{\gamma }}_{l}}b_{l}^{(\mu )}b_{l}^{(\nu )}\int _{{\mathbb {R}}^{d_{\mu }}}\left( -{\tilde{\mathcal {C}}}_{\mu l}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{z}})\mid \theta _{0})\nabla _{\theta } {\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{z}})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{z}}))\right. \\&\quad +\left( H_{\mu }({\textbf{z}})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{z}})\mid \theta _{0})\right) \cdot \\&\quad \left. \cdot \left( {\tilde{\mathcal {C}}}_{\mu lk}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{z}})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{z}}))+{\bar{\mathcal {C}}}_{\mu k}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{z}})\mid \theta _{0})w_{\mu l}({\bar{F}}_{\mu }^{*}({\textbf{z}}))\right) _{k=1,\ldots ,q}\right) \\&\left. \qquad \left( {\textbf{1}}\left\{ {\bar{\psi }}_{l\nu }({\textbf{y}} )\le {\bar{\psi }}_{l\mu }({\textbf{z}})\right\} -F_{l}({\bar{\psi }}_{l\mu }( {\textbf{z}}))\right) ~\text {d}H_{\mu }({\textbf{z}}) \right) , \end{aligned}$$

where

$$\begin{aligned} {\textbf{e}}_{\mu }=\frac{2}{\gamma _{\mu }}\int _{{\mathbb {R}}^{d_{\mu }}}\left( H_{\mu }({\textbf{y}})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}} )\mid \theta _{0})\right) \nabla _{\theta }{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}})\mid \theta _{0})w_{\mu }({\bar{F}}_{\mu }^{*}( {\textbf{y}}))\text {d}H_{\mu }({\textbf{y}}). \end{aligned}$$

By Proposition 9.5 we obtain the formula for the covariance matrix \( \Sigma _{D}\). \(\square \)

Next we deal with the convergence of \({\mathcal {H}}_{n}^{*}\) and prove the following lemma.

Lemma 9.7

Suppose that \(t_{nk}^{*}\rightarrow \theta _{0}\) for \( k=1,\ldots ,q\). Then we have

$$\begin{aligned} {\mathcal {H}}_{nkl}(t_{nk}^{*})\longrightarrow {\mathcal {H}}_{kl}\ a.s. \end{aligned}$$

for \(k,l=1,\ldots ,q\).

Proof

Notice that \({\bar{\tau }}_{n}:=\sup _{y\in {\mathbb {R}}^{d_{\mu }}}\left\| {\check{F}}_{n\mu }^{*}(y)-{\bar{F}}_{\mu }^{*}(y)\right\| \rightarrow 0\) a.s. Moreover, \({\mathcal {C}}_{\mu },\mathcal { {\bar{C}}}_{\mu k}\), and \({\bar{\mathcal {C}}}_{\mu kl}\) are uniformly continuous on \([0,1]^{d_{\mu }}\times U(\theta _{0})\) in view of the Heine-Cantor theorem, and we obtain

$$\begin{aligned} \zeta _{nk}&:=\max _{\mu =1,\ldots ,m}\sup _{{\textbf{y}}_{1}\textbf{,y} _{2}:\left\| {\textbf{y}}_{1}-{\textbf{y}}_{2}\right\| \le {\bar{\tau }} _{n}}\left| {\mathcal {C}}_{\mu }({\textbf{y}}_{1}\mid t_{nk}^{*})- {\mathcal {C}}_{\mu }({\textbf{y}}_{2}\mid \theta _{0})\right| \rightarrow 0\ a.s., \\ {\bar{\zeta }}_{nk}&:=\max _{\mu =1,\ldots ,m}\sup _{{\textbf{y}}_{1}\textbf{,y} _{2}:\left\| {\textbf{y}}_{1}-{\textbf{y}}_{2}\right\| \le {\bar{\tau }} _{n}}\left| {\bar{\mathcal {C}}}_{\mu k}^{ {{}^\circ } }({\textbf{y}}_{1}\mid t_{nk}^{*})-{\bar{\mathcal {C}}}_{\mu k}^{ {{}^\circ }}({\textbf{y}}_{2}\mid \theta _{0})\right| \rightarrow 0\ a.s., \\ {\bar{\zeta }}_{nkl}&:=\max _{\mu =1,\ldots ,m}\sup _{{\textbf{y}}_{1}\textbf{,y} _{2}:\left\| {\textbf{y}}_{1}-{\textbf{y}}_{2}\right\| \le {\bar{\tau }} _{n}}\left| {\bar{\mathcal {C}}}_{\mu kl}^{ {{}^\circ } }({\textbf{y}}_{1}\mid t_{nk}^{*})-{\bar{\mathcal {C}}}_{\mu kl}^{ {{}^\circ } }({\textbf{y}}_{2}\mid \theta _{0})\right| \rightarrow 0\ a.s., \\ {\bar{\zeta }}_{n}&:=\max _{\mu =1,\ldots ,m}\sup _{{\textbf{y}}_{1}\textbf{,y} _{2}:\left\| {\textbf{y}}_{1}-{\textbf{y}}_{2}\right\| \le {\bar{\tau }} _{n}}\left| w_{\mu }({\textbf{y}}_{1})-w_{\mu }({\textbf{y}}_{2})\right| \rightarrow 0\ a.s. \end{aligned}$$

for \(k,l=1,\ldots ,q\). We define

$$\begin{aligned} \breve{\mathcal {H}}_{nkl}= & {} \sum _{\mu =1}^{m}\frac{2}{n_{\mu }} \sum _{i=1}^{n_{\mu }}\left( -\left( H_{\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}} _{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right) {\bar{\mathcal {C}}}_{\mu kl}^{ {{}^\circ }}({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right. \\{} & {} \left. +{\bar{\mathcal {C}}}_{\mu k}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0}){\bar{\mathcal {C}} }_{\mu l}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right) w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})). \end{aligned}$$

Therefore, and by Theorem 3.1, we have

$$\begin{aligned}{} & {} \max _{k,l=1,\ldots ,q}\left| {\mathcal {H}}_{nkl}(t_{nk}^{*})- \breve{\mathcal {H}}_{nkl}\right| \\{} & {} \quad \le O(1)\sum _{\mu =1}^{m}\frac{1}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\max _{k,l=1,\ldots ,q}\left( \left| {\hat{H}}_{n\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid t_{nk}^{*})\right. \right. \\{} & {} \qquad \qquad \left. -H_{\mu }({\textbf{Y}}_{\mu i})+{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right| \\{} & {} \qquad \qquad +\left| {\bar{\mathcal {C}}}_{\mu kl}^{ {{}^\circ } }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid t_{nk}^{*})-\mathcal { {\bar{C}}}_{\mu kl}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right| \\{} & {} \qquad \qquad +\left| w_{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i}))-w_{\mu }( {\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i}))\right| \\{} & {} \qquad \qquad +\left| {\bar{\mathcal {C}}}_{\mu k}^{ {{}^\circ } }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid t_{nk}^{*})-\mathcal { {\bar{C}}}_{\mu k}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right| \\{} & {} \qquad \qquad \left. +\left| {\bar{\mathcal {C}}}_{\mu l}^{ {{}^\circ } }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid t_{nk}^{*})-\mathcal { {\bar{C}}}_{\mu l}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right| \right) \\{} & {} \qquad \le O(1)\left( O\left( \sqrt{\frac{\ln \ln n}{n}}\right) +\max _{k,l=1,\ldots ,q}\left( \zeta _{nk}+{\bar{\zeta }}_{nkl}+{\bar{\zeta }}_{nk}+ {\bar{\zeta }}_{nl}\right) +{\bar{\zeta }}_{n}\right) \\{} & {} \qquad =o(1)\ \ a.s. \end{aligned}$$

In view of the law of large numbers, we have

$$\begin{aligned} \breve{\mathcal {H}}_{nkl}&\longrightarrow {\mathcal {H}}_{kl}:=2\sum _{\mu =1}^{m}\int _{{\mathbb {R}}^{d_{\mu }}}\left( -\left( H_{\mu }(\textbf{y})-{\mathcal {C}} _{\mu }({\bar{F}}_{\mu }^{*}(\textbf{y})\mid \theta _{0})\right) {\bar{\mathcal {C}}} _{\mu kl}^{{{}^\circ }}({\bar{F}}_{\mu }^{*}(\textbf{y})\mid \theta _{0})\right. \\&\quad \left. +{\bar{\mathcal {C}}}_{\mu k}^{ {{}^\circ }}({\bar{F}}_{\mu }^{*}(\textbf{y})\mid \theta _{0}){\bar{\mathcal {C}}}_{\mu l}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}(\textbf{y})\mid \theta _{0})\right) w_{\mu }({\bar{F}}_{\mu }^{*}(\textbf{y}))~\text {d}H_{\mu }(\textbf{y}) \end{aligned}$$

a.s. for \(k,l=1,\ldots ,q\). This completes the proof. \(\square \)

Now we are in a position to prove Theorem 5.2.

Proof of Theorem 5.2

First we derive the convergence rate of \({\tilde{\theta }}_{n}-{\hat{\theta }}_{n}\). Note that \(\nabla _{\theta } \widehat{{\mathcal {D}}}_{n}({\tilde{\theta }}_{n})=0\), and \(\theta _{0}\) is an interior point of \(\Theta \). Then the Taylor formula leads to

$$\begin{aligned} \widehat{{\mathcal {D}}}_{n}({\tilde{\theta }}_{n}) -\widehat{{\mathcal {D}}}_{n}(\hat{ \theta }_{n})=\left( {\tilde{\theta }}_{n}-{\hat{\theta }}_{n}\right) ^{T}\mathcal {H }_{n}^{**}\left( {\tilde{\theta }}_{n}-{\hat{\theta }}_{n}\right) , \end{aligned}$$

where \(t_{nk}^{**}={\tilde{\theta }}_{n}+\eta _{nk}\left( {\hat{\theta }} _{n}-{\tilde{\theta }}_{n}\right) ,0\le \eta _{nk}\le 1\) and \({\mathcal {H}} _{n}^{**}=({\mathcal {H}}_{nkl}(t_{nk}^{**}))_{k,l=1,\ldots ,q}\). In view of Theorem 5.1, the estimators \({\tilde{\theta }}_{n}\) and \( {\hat{\theta }}_{n}\) are strongly consistent such that \(t_{nk}^{**}\rightarrow \theta _{0}\) a.s. Using Lemma 9.7, we obtain \(\mathcal { H}_{n}^{**}\rightarrow {\mathcal {H}}\ a.s.\). Hence

$$\begin{aligned} \varepsilon _{n}\ge & {} \left| \left( {\tilde{\theta }}_{n}-{\hat{\theta }} _{n}\right) ^{T}{\mathcal {H}}_{n}^{**}\left( {\tilde{\theta }}_{n}-{\hat{\theta }} _{n}\right) \right| \ge \lambda _{\text {min}}({\mathcal {H}}_{n}^{**})\left\| {\tilde{\theta }}_{n}-{\hat{\theta }}_{n}\right\| ^{2} \\= & {} \left( \lambda _{\text {min}}({\mathcal {H}})+o_{{\mathbb {P}}}(1)\right) \left\| {\tilde{\theta }}_{n}-{\hat{\theta }}_{n}\right\| ^{2}, \end{aligned}$$

where \(\lambda _{\text {min}}(A)\) is the smallest absolute eigenvalue of the matrix A. \(\lambda _{\text {min}}({\mathcal {H}})\) is positive by assumption. Therefore, it follows that

$$\begin{aligned} {\tilde{\theta }}_{n}-{\hat{\theta }}_{n}=o_{{\mathbb {P}}}(n^{-1/2}). \end{aligned}$$
(18)

Using (10) and Lemmas 9.69.7, an application of Slutsky’s theorem leads to

$$\begin{aligned} \sqrt{n}\left( {\tilde{\theta }}_{n}-\theta _{0}\right) =-\sqrt{n}{\mathcal {H}} _{n}^{*-1}\nabla _{\theta }\widehat{{\mathcal {D}}}_{n}(\theta _{0})\overset{{\mathcal {D}}}{\longrightarrow }{\mathcal {N}}(0,{\mathcal {H}}^{-1}\Sigma _{D} {\mathcal {H}}^{-1}). \end{aligned}$$

In view of (18), the proof of Theorem 5.2 is complete. \(\square \)

9.4 Proof of Theorem 8.1

First we prove a lemma on asymptotic normality of \(\sqrt{n}\left( \widehat{ {\mathcal {D}}}_{n}(\theta _{0})-{\mathcal {D}}(C,{\mathcal {C}}(\cdot \mid \theta _{0}))\right) \), which is crucial for the proof of asymptotic normality of \( \sqrt{n}\left( \widehat{{\mathcal {D}}}_{n}({\hat{\theta }}_{n})-{\mathcal {D}}(C, {\mathcal {C}}(\cdot \mid \theta _{0}))\right) \) in Theorem 8.1.

Lemma 9.8

Let the assumptions of Theorem 5.2 be satisfied. Then

$$\begin{aligned} \sqrt{n}\left( \widehat{{\mathcal {D}}}_{n}(\theta _{0})-{\mathcal {D}}(C,\mathcal { C}(\cdot \mid \theta _{0}))\right) \overset{{\mathcal {D}}}{\longrightarrow } {\mathcal {N}}(0,\Sigma _{0}). \end{aligned}$$
(19)

Proof

Define

$$\begin{aligned} {\bar{e}}_{\mu }:=\int _{{\mathbb {R}}^{d_{\mu }}}\left( H_{\mu }({\textbf{y}})- {\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}})\mid \theta _{0})\right) ^{2}w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}}))~\text {d}H_{\mu }({\textbf{y}}). \end{aligned}$$

We obtain

$$\begin{aligned}{} & {} \widehat{{\mathcal {D}}}_{n}(\theta _{0})-{\mathcal {D}}(C,{\mathcal {C}}(\cdot \mid \theta _{0}))\\{} & {} \quad =\sum _{\mu =1}^{m}\left( \frac{1}{n_{\mu }} \sum _{i=1}^{n_{\mu }}\left( {\hat{H}}_{n\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}} _{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right) ^{2}w_{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i}))- {\bar{e}}_{\mu }\right) \\{} & {} \quad =B_{1n}+B_{2n}+B_{3n}+{\bar{B}}_{3n}+B_{4n}, \end{aligned}$$

where

$$\begin{aligned} B_{1n}= & {} \sum _{\mu =1}^{m}\frac{1}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\left( \left( H_{\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right) ^{2}w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i}))-{\bar{e}}_{\mu }\right) , \\ B_{2n}= & {} \sum _{\mu =1}^{m}\frac{1}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\left( \left( {\hat{H}}_{n\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\check{F}} _{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right) ^{2}\right. \\{} & {} \left. -\left( H_{\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}( {\textbf{Y}}_{\mu i})\mid \theta _{0})\right) ^{2}\right) \\{} & {} \left( w_{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i}))-w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i}))\right) , \\ B_{3n}= & {} \sum _{\mu =1}^{m}\frac{1}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\left( {\hat{H}}_{n\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\check{F}}_{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})-H_{\mu }({\textbf{Y}}_{\mu i})+ {\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right) ^{2} \\{} & {} w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})),\\ {\bar{B}}_{3n}= & {} \sum _{\mu =1}^{m}\frac{2}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\left( {\hat{H}}_{n\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\check{F}} _{n\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})-H_{\mu }({\textbf{Y}} _{\mu i})+{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right) \\{} & {} \left( H_{\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right) w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})), \\ B_{4n}= & {} \sum _{\mu =1}^{m}\frac{1}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\left( H_{\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}( {\textbf{Y}}_{\mu i})\mid \theta _{0})\right) ^{2}\left( w_{\mu }({\check{F}} _{n\mu }^{*}({\textbf{Y}}_{\mu i}))-w_{\mu }({\bar{F}}_{\mu }^{*}( {\textbf{Y}}_{\mu i}))\right) . \end{aligned}$$

Analogously to the proof of Lemma 9.6, we can derive

$$\begin{aligned} B_{2n}= & {} o_{{\mathbb {P}}}(n^{-1/2}),\qquad B_{3n}=o_{{\mathbb {P}}}(n^{-1/2}), \\ {\bar{B}}_{3n}= & {} B_{3n}^{*}+o_{{\mathbb {P}}}(n^{-1/2}),\qquad B_{4n}=B_{4n}^{*}+o_{{\mathbb {P}}}(n^{-1/2}), \end{aligned}$$

where

$$\begin{aligned} B_{3n}^{*}= & {} \sum _{\mu =1}^{m}\frac{2}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\left( {\hat{H}}_{n\mu }({\textbf{Y}}_{\mu i})-H_{\mu }({\textbf{Y}}_{\mu i}) \right. \\{} & {} \left. -\sum _{l\in J_{\mu }}{\tilde{\mathcal {C}}}_{\mu l}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\left( {\hat{F}} _{nl}({\bar{\psi }}_{l\mu }({\textbf{Y}}_{\mu i}))-F_{l}({\bar{\psi }}_{l\mu }( {\textbf{Y}}_{\mu i}))\right) \right) \\{} & {} \left( H_{\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right) w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})), \\ B_{4n}^{*}= & {} \sum _{\mu =1}^{m}\frac{1}{n_{\mu }}\sum _{i=1}^{n_{\mu }}\left( H_{\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right) ^{2} \\{} & {} \sum _{l\in J_{\mu }}w_{\mu l}({\bar{F}}_{\mu }^{*}({\textbf{Y}} _{\mu i}))({\hat{F}}_{l}({\bar{\psi }}_{l\mu }({\textbf{Y}}_{\mu i}))-F_{l}(\bar{ \psi }_{l\mu }({\textbf{Y}}_{\mu i}))). \end{aligned}$$

Further,

$$\begin{aligned} \sqrt{n}\left( \widehat{{\mathcal {D}}}_{n}(\theta _{0})-{\mathcal {D}}(C,\mathcal { C}(\cdot \mid \theta _{0}))\right) =n^{-3/2}\sum _{\mu =1}^{m}\sum _{i=1}^{n_{\mu }}\sum _{\nu =1}^{m}\sum _{j=1}^{n_{\nu }}\Lambda _{\mu \nu }({\textbf{Y}}_{\mu i},{\textbf{Y}}_{\nu j})+o_{{\mathbb {P}}}(1), \end{aligned}$$

where

$$\begin{aligned} \Lambda _{\mu \nu }({\textbf{y}},{\textbf{z}})= & {} \left( \frac{1}{\gamma _{\mu }} \Lambda _{\mu \nu }^{(1)}({\textbf{y}})+\frac{2}{\gamma _{\mu }\ {\bar{\gamma }} _{\mu }}\Lambda _{\mu \nu }^{(2)}({\textbf{y}},{\textbf{z}})+\sum _{l=1}^{d}\frac{ b_{l}^{(\mu )}b_{l}^{(\nu )}}{\gamma _{\mu }\ {\tilde{\gamma }}_{l}}\Lambda _{\mu \nu l}^{(3)}({\textbf{y}},{\textbf{z}})\right) ,\\ \Lambda _{\mu }^{(1)}({\textbf{y}})= & {} \left( H_{\mu }({\textbf{y}})-{\mathcal {C}} _{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}})\mid \theta _{0})\right) ^{2}w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}}))-{\bar{e}}_{\mu },\\ \Lambda _{\mu \nu }^{(2)}({\textbf{y}},{\textbf{z}})= & {} \left\{ \begin{array}{ll} \left( {\textbf{1}}\left( \psi _{\mu \nu }({\textbf{z}})\le {\textbf{y}}\right) -H_{\mu }({\textbf{y}})\right) \left( H_{\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}} _{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right) w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}})) &{}\quad \text { for }\nu :{\textbf{b}}^{(\nu )}\ge {\textbf{b}}^{(\mu )}, \\ 0&{}\quad \text {otherwise,} \end{array} \right. \\ \Lambda _{\mu \nu l}^{(3)}({\textbf{y}},{\textbf{z}})= & {} \left( -2\mathcal { {\tilde{C}}}_{\mu l}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\left( H_{\mu }( {\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{Y}} _{\mu i})\mid \theta _{0})\right) w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}} ))\right. \\{} & {} \left. +\left( H_{\mu }({\textbf{Y}}_{\mu i})-{\mathcal {C}}_{\mu }({\bar{F}} _{\mu }^{*}({\textbf{Y}}_{\mu i})\mid \theta _{0})\right) ^{2}w_{\mu l}( {\bar{F}}_{\mu }^{*}({\textbf{y}}))\right) \\{} & {} \left( {\textbf{1}}\left\{ {\bar{\psi }}_{l\nu }({\textbf{z}})\le {\bar{\psi }} _{l\mu }({\textbf{y}})\right\} -F_{l}({\bar{\psi }}_{l\mu }({\textbf{y}}))\right) . \end{aligned}$$

The convergence of the "\(o_{{\mathbb {P}}}(1)\)"-term is again proven analogously to the proof of Lemma 9.6. An application of the central limit theorem in Proposition 9.5 gives the asymptotic normality of \( \sqrt{n}\nabla _{\theta }\widehat{{\mathcal {D}}}_{n}(\theta _{0})\). Eventually, we derive the formula for the covariance matrix as follows

$$\begin{aligned} {\tilde{h}}_{\mu \nu }({\textbf{y}})&:={\mathbb {E}}\Lambda _{\mu \nu }({\textbf{Y}} _{\mu 1},{\textbf{y}})+{\mathbb {E}}\Lambda _{\nu \mu }({\textbf{y}},{\textbf{Y}} _{\mu 1})-{\mathbb {E}}\Lambda _{\mu \nu }({\textbf{Y}}_{\mu 1},{\textbf{Y}}_{\nu 2})-{\mathbb {E}}\Lambda _{\nu \mu }({\textbf{Y}}_{\nu 1},{\textbf{Y}}_{\mu 2}) \\&=\frac{1}{\gamma _{\mu }}\left( \left( H_{\mu }({\textbf{y}})-{\mathcal {C}} _{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}})\mid \theta _{0})\right) ^{2}w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{y}}))-{\bar{e}}_{\mu }\right) \\&\quad \left. +\frac{2}{\gamma _{\mu }{\bar{\gamma }}_{\mu }}{\textbf{1}}\left( {\textbf{b}}^{(\nu )}\ge {\textbf{b}}^{(\mu )}\right) \right. \\&\quad \left. \int _{{\mathbb {R}}^{d_{\mu }}}\left( {\textbf{1}}\left( \psi _{\mu \nu }({\textbf{y}})\le {\textbf{z}}\right) -H_{\mu }({\textbf{z}})\right) \left( H_{\mu }({\textbf{z}})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}( {\textbf{z}})\mid \theta _{0})\right) \right. \\&\qquad w_{\mu }({\bar{F}}_{\mu }^{*}(\textbf{z }))~\text {d}H_{\mu }({\textbf{z}}) \\&\quad +\sum _{l=1}^{d}\frac{1}{\gamma _{\mu }{\tilde{\gamma }}_{l}}b_{l}^{(\mu )}b_{l}^{(\nu )}\int _{{\mathbb {R}}^{d_{\mu }}}\left( -2{\tilde{\mathcal {C}}} _{\mu l}^{ {{}^\circ } }({\bar{F}}_{\mu }^{*}({\textbf{z}})\mid \theta _{0})\left( H_{\mu }(\textbf{ z})-{\mathcal {C}}_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{z}})\mid \theta _{0})\right) \right. \\&\qquad w_{\mu }({\bar{F}}_{\mu }^{*}({\textbf{z}})) \\&\quad \left. +\left( H_{\mu }({\textbf{z}})-{\mathcal {C}}_{\mu }(\bar{F }_{\mu }^{*}({\textbf{z}})\mid \theta _{0})\right) ^{2}w_{\mu l}({\bar{F}} _{\mu }^{*}({\textbf{z}}))\right) \\&\quad \left( {\textbf{1}}\left\{ {\bar{\psi }}_{l\nu }({\textbf{y}})\le {\bar{\psi }}_{l\mu }({\textbf{z}})\right\} -F_{l}({\bar{\psi }}_{l\mu }({\textbf{z}} ))\right) ~\text {d}H_{\mu }({\textbf{z}}). \end{aligned}$$

Moreover, we have

$$\begin{aligned} \Sigma _{0}=\sum _{\nu =1}^{m}\gamma _{\nu }\sum _{\mu =1}^{m}\sum _{{\bar{\mu }} =1}^{m}\gamma _{\mu }\gamma _{{\bar{\mu }}}{\mathbb {E}}{\tilde{h}}_{\mu \nu }( {\textbf{Y}}_{\nu 1}){\tilde{h}}_{{\bar{\mu }}\nu }^{T}({\textbf{Y}}_{\nu 1}). \end{aligned}$$

\(\square \)

Proof of Theorem 8.1

Analogously to (), we have

$$\begin{aligned} \widehat{{\mathcal {D}}}_{n}({\tilde{\theta }}_{n})-\widehat{{\mathcal {D}}} _{n}(\theta _{0})=\left( {\tilde{\theta }}_{n}-\theta _{0}\right) ^{T}\mathcal {H }_{n}^{\#}\left( {\tilde{\theta }}_{n}-\theta _{0}\right) , \end{aligned}$$

where \(t_{nk}^{\#}={\tilde{\theta }}_{n}+\eta _{nk}\left( \theta _{0}-\tilde{ \theta }_{n}\right) ,0\le \eta _{nk}\le 1\) and \({\mathcal {H}}_{n}^{\#}=( {\mathcal {H}}_{nkl}(t_{nk}^{\#}))_{k,l=1,\ldots ,q}\). An application of Theorem 5.2 leads to

$$\begin{aligned} \widehat{{\mathcal {D}}}_{n}({\tilde{\theta }}_{n})-\widehat{{\mathcal {D}}} _{n}(\theta _{0})=O_{{\mathbb {P}}}(n^{-1}). \end{aligned}$$

An application of Lemma 9.8 completes the proof. \(\square \)