1 Introduction

In practical problems, due to the existence of noise, it is possible to obtain real measurement data only with bias (noise). This paper considers the following density estimation model. Let \(Y_{1}, Y_{2}, \ldots , Y_{n}\) be identically distributed continuous random variables with the density function

$$\begin{aligned} g(y)=\frac{\omega (y)f(y)}{\mu },\quad y\in [0,1]. \end{aligned}$$
(1)

In this equation, ω is a known biasing function, f denotes the unknown density function of unobserved random variable X, and \(\mu :=\mathbb{E}[\omega (X)]<\infty \). The aim of this model is to estimate the unknown density function f by the observed negatively associated data \(Y_{1}, Y_{2},\ldots , Y_{n}\).

This model has many applications in industry [4] and economics [8]. Since wavelet bases have a good local property in both time and frequency domains, the wavelet method has been widely used for density estimation problem. When the observed data \(Y_{1}, Y_{2},\ldots , Y_{n}\) are independent, Ramírez and Vidakovic [13] constructed a linear wavelet estimator and study the \(L^{2}\) consistency of this wavelet estimator. Shirazi and Doosti [15] expanded their work to the multivariate case. Because the definition of linear wavelet estimator depends on the smooth parametric of the density function f, the linear estimator is not adaptive. To overcome this shortage, Chesneau [3] proposed a nonlinear wavelet estimator by hard thresholding method. Moreover, an optimal convergence rate over \(L^{p}\) \((1\leq p<\infty )\) risk is considered. When the independence of data is relaxed to the strong mixing case, Kou and Guo [10] studied the \(L^{2}\) risk of linear and nonlinear wavelet estimators in the Besov space. Note that all those studies all focus on the global error. There is a lack of theoretical results on pointwise wavelet estimation for this density estimation model (1).

In this paper, we establish wavelet estimations on pointwise \(l^{p}\) \((1\leq p<\infty )\) risk for a density function based on a negatively associated sample. Upper bounds of linear and nonlinear wavelet estimators are considered in the Besov space \(B_{r,q}^{s}( \mathbb{R})\). It turns out that the convergence rate of our estimators coincides with the optimal convergence rate for pointwise estimation [2]. Furthermore, our theorem reduces to the corresponding results of Rebelles [14] when \(\omega (y)\equiv 1\) and the sample is independent.

1.1 Negative association and wavelets

We first introduce the definition of negative association [1].

Definition 1.1

A sequence of random variables \(Y_{1}, Y _{2}, \ldots , Y_{n}\) is said to be negatively associated if for each pair of disjoint nonempty subsets A and B of \(\{i=1, 2, \ldots , n \}\),

$$ \operatorname{Cov} \bigl(f(X_{i},i\in A), g(X_{j},j\in B) \bigr)\leq 0, $$

where f and g are real-valued coordinatewise nondecreasing functions and the corresponding covariances exist.

It is well known that \(\operatorname{Cov} (Y_{i}, Y_{j} )\equiv 0\) when the random variables are independent. Hence the independent and identically distributed data must be negatively associated. Next, we give an important property of negative association, which will be needed in the later discussion.

Lemma 1.1

([9])

Let \(Y_{1}, Y_{2}, \ldots , Y _{n}\) be a sequence of negatively associated random variables, and let \(A_{1}, A_{2}, \ldots , A_{m}\) be pairwise disjoint nonempty subsets of \(\{i=1, 2, \ldots , n\}\). If \(f_{i}\ (i=1, 2, \ldots , m)\) are m coordinatewise nondecreasing (nonincreasing) functions, then \(f_{1} (Y_{i}, i\in A_{1} ), f_{2} (Y_{i}, i\in A_{2} ), \ldots , f_{m} (Y_{i}, i\in A_{m} )\) are also negatively associated.

To construct our wavelet estimators, we provide the basic theory of wavelets.

Throughout this paper, we work with the wavelet basis described as follows. Let \(\{V_{j}, j\in \mathbb{Z}\}\) be a classical orthonormal multiresolution analysis of \(L^{2}(\mathbb{R})\) with a scaling function φ. Then for each \(f\in L^{2}(\mathbb{R})\),

$$\begin{aligned} f=\sum_{k\in \mathbb{Z}}\alpha _{j_{0},k}\varphi _{j_{0},k}+ \sum_{j=j_{0}}^{\infty }\sum _{k\in \mathbb{Z}}\beta _{j,k} \psi _{j,k}, \end{aligned}$$

where \(\alpha _{j_{0},k}=\langle f, \varphi _{j_{0},k} \rangle \), \(\beta _{j,k}=\langle f, \psi _{j,k} \rangle \) and

$$ \varphi _{j_{0},k}=2^{j_{0}/2}\varphi \bigl(2^{j_{0}}x-k \bigr),\qquad \psi _{j_{,}k}=2^{j/2} \psi \bigl(2^{j}x-k \bigr). $$

Let \(P_{j}\) be the orthogonal projection operator from \(L^{2}( \mathbb{R})\) onto the space \(V_{j}\) with orthonormal basis \(\{ \varphi _{j,k}, k\in \mathbb{Z}\}\). If the scaling function φ satisfies Condition θ, that is,

$$ \sum_{k\in \mathbb{Z}} \bigl\vert \varphi (x-k) \bigr\vert \in L^{\infty }( \mathbb{R}), $$

then it can be shown that for each \(f\in L^{p}(\mathbb{R})\) \((1\leq p<\infty )\),

$$\begin{aligned} P_{j}f=\sum_{k\in \mathbb{Z}} \alpha _{j,k}\varphi _{j,k}. \end{aligned}$$
(2)

On the other hand, a scaling function φ is called m regular if \(\varphi \in C^{m}(\mathbb{R})\) and \(|D^{\delta }\varphi (y)| \leq c(1+y^{2})^{-l}\) for each \(l\in \mathbb{Z}\) \((\delta =0, 1, 2, \ldots , m)\). In this paper, we choose the Daubechies scaling function \(D_{2N}\) [5]. It is easy to see that \(D_{2N}\) satisfies m regular when N gets large enough.

Note that a wavelet basis can characterize a Besov space. These spaces contain many well-known function spaces, such as the Hölder and \(L^{2}\) Sobolev spaces. The following lemma gives equivalent definition of Besov spaces.

Lemma 1.2

([7])

Let \(f\in L^{r}(\mathbb{R})\) \((1\leq r\leq +\infty )\), let the scaling function φ be m-regular, and let \(0< s< m\). Then the following statements are equivalent:

  1. (i)

    \(f\in B^{s}_{r,q}(\mathbb{R}), 1\leq q\leq +\infty \);

  2. (ii)

    \(\{2^{js}\|P_{j}f-f\|_{r}\}\in l_{q}\);

  3. (iii)

    \(\{2^{j(s-\frac{1}{r}+\frac{1}{2})}\|\beta _{j}\|_{r}\}\in l_{q}\).

The Besov norm of f can be defined as

$$ \Vert f \Vert _{B^{s}_{r,q}}:= \bigl\Vert (\alpha _{j_{0}}) \bigr\Vert _{r}+ \bigl\Vert \bigl(2^{j(s- \frac{1}{r}+\frac{1}{2})} \Vert \beta _{j} \Vert _{r}\bigr)_{j\geq j_{0}} \bigr\Vert _{q} $$

with \(\|(\alpha _{j_{0}})\|_{r}^{r}:=\sum_{k\in \mathbb{Z}}| \alpha _{j_{0},k}|^{r}\) and \(\|\beta _{j}\|_{r}^{r}:=\sum_{k\in \mathbb{Z}}|\beta _{j,k}|^{r}\).

In this paper, we assume that the density function f belongs to the Besov ball with radius \(H>0\), that is,

$$ f\in B^{s}_{r,q}(H):=\bigl\{ f\in B^{s}_{r,q}( \mathbb{R}), \Vert f \Vert _{B^{s}_{r,q}} \leq H\bigr\} . $$

1.2 Wavelet estimators and theorem

Define our linear wavelet estimator as follows:

$$\begin{aligned} \widehat{f}_{n}^{\mathrm{lin}}(y):=\sum _{k\in \varLambda } \widehat{\alpha }_{j_{0}, k}\varphi _{j_{0}, k}(y) \end{aligned}$$
(3)

with

$$\begin{aligned} \widehat{\alpha }_{j_{0},k}=\frac{\widehat{\mu }_{n}}{n}\sum _{i=1} ^{n}\frac{\varphi _{j_{0}, k}(Y_{i})}{\omega (Y_{i})} \end{aligned}$$
(4)

and

$$\begin{aligned} \widehat{\mu }_{n}= \Biggl[\frac{1}{n} \sum_{i=1}^{n}\frac{1}{ \omega (Y _{i})} \Biggr]^{-1}. \end{aligned}$$
(5)

Using the hard thresholding method, a nonlinear wavelet estimator is defined by

$$\begin{aligned} \widehat{f}_{n}^{\mathrm{non}}(y):=\sum _{k\in \varLambda } \widehat{\alpha }_{j_{0}, k}\varphi _{j_{0}, k}(y)+\sum_{j=j _{0}}^{j_{1}}\sum _{k\in \varLambda _{j}}\widehat{\beta }_{j, k}I _{ \{|\widehat{\beta }_{j, k}|\geq \kappa t_{n} \}}\psi _{j,k}(y), \end{aligned}$$
(6)

where \(t_{n}=\sqrt{\frac{\ln n}{n}}\) and

$$\begin{aligned} \widehat{\beta }_{j,k}=\frac{\widehat{\mu }_{n}}{n}\sum _{i=1}^{n}\frac{ \psi _{j,k}(Y_{i})}{\omega (Y_{i})}. \end{aligned}$$
(7)

In these definitions, \(\varLambda :=\{k\in \mathbb{Z},\operatorname{supp} f \cap \operatorname{supp} \varphi _{j_{0}, k}\neq \varnothing \}\) and \(\varLambda _{j}:=\{k\in \mathbb{Z},\operatorname{supp} f\cap \operatorname{supp} \psi _{j,k}\neq \varnothing \}\). Note that the cardinality of Λ \((\varLambda _{j})\) satisfies \(|\varLambda |\sim 2^{j_{0}}\) \((|\varLambda _{j}|\sim 2^{j} )\) due to the compactly supported properties of the functions f and \(\varphi _{j_{0}, k}\) \((\psi _{j,k})\). Here and further, \(A\sim B\) stands for both \(A\lesssim B\) and \(B\lesssim A\), where \(A\lesssim B\) denotes \(A\leq cB\) with a positive constant c that is independent of A and B. In addition, the constant κ will be chosen in later discussion.

We are in position to state our main theorem.

Theorem 1

Let \(f\in B^{s}_{r,q}(H)\ (r,q\in [1,\infty ), s> \frac{1}{r})\), and let \(\omega (y)\) be a nonincreasing function such that \(\omega (y)\sim 1\). Then for each \(1\leq p<\infty \), the linear wavelet estimator with \(2^{j_{0}}\sim n^{\frac{1}{2(s-1/r)+1}}\) satisfies

$$\begin{aligned} \mathbb{E} \bigl[ \bigl\vert \widehat{f}_{n}^{\mathrm{lin}}(y)-f(y) \bigr\vert \bigr] ^{p}\lesssim n^{-\frac{(s-1/r)p}{2(s-1/r)+1}}, \end{aligned}$$
(8)

and the nonlinear wavelet estimator with \(2^{j_{0}}\sim n^{ \frac{1}{2m+1}}(m>s)\) and \(2^{j_{1}}\sim \frac{n}{\ln n}\) satisfies

$$\begin{aligned} \mathbb{E} \bigl[ \bigl\vert \widehat{f}_{n}^{\mathrm{non}}(y)-f(y) \bigr\vert \bigr] ^{p}\lesssim (\ln n )^{\frac{3p}{2}}n^{- \frac{(s-1/r)p}{2(s-1/r)+1}}. \end{aligned}$$
(9)

Remark 1

Note that \(n^{-\frac{(s-1/r)p}{2(s-1/r)+1}}\) is the optimal convergence rate in the minimax sense for pointwise estimation in a Besov space [2]. Moreover, our theorem reduces to the results of Rebelles [14] when \(\omega (y)\equiv 1\) and the random sample is independent.

Remark 2

In contract to the linear wavelet estimator, the convergence rate of the nonlinear estimator remains the same as that of the linear one up to the lnn factor. However, the nonlinear one is adaptive, which means that both \(j_{0}\) and \(j_{1}\) do not depend on s.

2 Auxiliary lemmas

In this section, we give some lemmas, which are very useful for proving Theorem 1.

Lemma 2.1

For the model defined by (1), we have

$$ \mathbb{E} \biggl[\frac{1}{\omega (Y_{i})} \biggr]=\frac{1}{\mu },\qquad \mathbb{E} \biggl[\frac{\mu \varphi _{j,k}(Y_{i})}{\omega (Y_{i})} \biggr]= \alpha _{j,k}\quad \textit{and}\quad \mathbb{E} \biggl[\frac{\mu \psi _{j,k}(Y_{i})}{\omega (Y_{i})} \biggr]= \beta _{j,k}. $$

Proof

This lemma can be proved by the same arguments of Kou and Guo [10]. □

Lemma 2.2

Let \(f\in B_{r,q}^{s}\ (1\leq r,q<+\infty , s>1/r)\), and let \(\omega (y)\) be a nonincreasing function such that \(\omega (y) \sim 1\). If \(2^{j}\leq n\) and \(1\leq p<+\infty \), then

$$ \mathbb{E} \bigl[ \vert \widehat{\alpha }_{j_{0},k}-\alpha _{j_{0},k} \vert ^{p} \bigr]\lesssim n^{-\frac{p}{2}},\qquad \mathbb{E} \bigl[ \vert \widehat{\beta } _{j,k}-\beta _{j,k} \vert ^{p} \bigr]\lesssim n^{-\frac{p}{2}}. $$

Proof

Because the proofs of both inequalities are similar, we only prove the second one. By the definition of \(\widehat{\beta }_{j,k}\) we have

$$\begin{aligned} \vert \widehat{\beta }_{j,k}-\beta _{j,k} \vert \leq \Biggl\vert \frac{ \widehat{\mu }_{n}}{\mu } \Biggl(\frac{\mu }{n}\sum _{i=1}^{n} \frac{ \psi _{j,k}(Y_{i})}{\omega (Y_{i})}-\beta _{j,k} \Biggr) \Biggr\vert + \biggl\vert \beta _{j,k} \widehat{\mu }_{n} \biggl(\frac{1}{\mu }-\frac{1}{ \widehat{\mu }_{n}} \biggr) \biggr\vert . \end{aligned}$$

Note that the definition of \(\widehat{\mu }_{n}\) and \(\omega (y) \sim 1\) imply \(|\widehat{\mu }_{n}|\lesssim 1\). We have \(B_{r,q}^{s}( \mathbb{R})\subseteq B_{\infty , \infty }^{s-1/r}(\mathbb{R})\) in the case of \(s>\frac{1}{r}\); then \(f\in B_{\infty , \infty }^{s-1/r}( \mathbb{R})\) and \(\|f\|_{\infty }\lesssim 1\). Moreover, \(|\beta _{j,k}|=| \langle f, \psi _{j,k}\rangle |\lesssim 1\) by the Cauchy–Schwarz inequality and the orthonormality of wavelet functions. Hence, we have the following conclusion:

$$\begin{aligned} \mathbb{E} \bigl[ \vert \widehat{\beta }_{j,k}- \beta _{j,k} \vert ^{p} \bigr] \lesssim \mathbb{E} \Biggl[ \Biggl\vert \frac{1}{n}\sum_{i=1}^{n} \frac{ \mu \psi _{j,k}(Y_{i})}{\omega (Y_{i})}-\beta _{j,k} \Biggr\vert ^{p} \Biggr]+ \mathbb{E} \biggl[ \biggl\vert \frac{1}{\mu }- \frac{1}{\widehat{\mu }_{n}} \biggr\vert ^{p} \biggr]. \end{aligned}$$
(10)

Then we need to estimate \(T_{1}:=\mathbb{E} [ \vert \frac{1}{n} \sum_{i=1}^{n} \frac{\mu \psi _{j,k}(Y_{i})}{\omega (Y_{i})}- \beta _{j,k} \vert ^{p} ]\) and \(T_{2}:=\mathbb{E} [ \vert \frac{1}{ \mu }-\frac{1}{\widehat{\mu }_{n}} \vert ^{p} ]\).

• An upper bound for \(T_{1}\). Taking \(\eta _{i}:=\frac{\mu \psi _{j,k}(Y_{i})}{\omega (Y_{i})}-\beta _{j,k}\), we get

$$ T_{1}=\mathbb{E} \Biggl[ \Biggl\vert \frac{1}{n}\sum _{i=1}^{n}\eta _{i} \Biggr\vert ^{p} \Biggr]= \biggl(\frac{1}{n} \biggr)^{p}\mathbb{E} \Biggl[ \Biggl\vert \sum _{i=1}^{n}\eta _{i} \Biggr\vert ^{p} \Biggr]. $$

Note that ψ is a function of bounded variation (see Liu and Xu [12]). We can get \(\psi :=\widetilde{\psi }-\overline{ \psi }\), where ψ̃ and ψ̅ bounded nonnegative nondecreasing functions. Define

$$ \widetilde{\eta }_{i}:=\frac{\mu \widetilde{\psi }_{j,k}(Y_{i})}{ \omega (Y_{i})}-\widetilde{\beta }_{j,k}, \qquad \overline{\eta }_{i}:=\frac{\mu \overline{\psi }_{j,k}(Y_{i})}{\omega (Y_{i})}- \overline{\beta }_{j,k} $$

with \(\widetilde{\beta }_{j,k}:=\langle f, \widetilde{\psi }_{j,k} \rangle \) and \(\overline{\beta }_{j,k}:=\langle f, \overline{\psi } _{j,k}\rangle \). Then \(\eta _{i}=\widetilde{\eta }_{i}-\overline{ \eta }_{i}\), \(\beta _{j,k}=\widetilde{\beta }_{j,k}-\overline{\beta } _{j,k}\), and

$$\begin{aligned} T_{1}= \biggl(\frac{1}{n} \biggr)^{p}\mathbb{E} \Biggl[ \Biggl\vert \sum _{i=1}^{n} (\widetilde{\eta }_{i}- \overline{\eta }_{i} ) \Biggr\vert ^{p} \Biggr]\lesssim \biggl(\frac{1}{n} \biggr)^{p} \Biggl\{ \mathbb{E} \Biggl[ \Biggl\vert \sum_{i=1}^{n}\widetilde{ \eta }_{i} \Biggr\vert ^{p} \Biggr]+ \mathbb{E} \Biggl[ \Biggl\vert \sum_{i=1}^{n}\overline{ \eta }_{i} \Biggr\vert ^{p} \Biggr] \Biggr\} . \end{aligned}$$
(11)

Similar arguments as in Lemma 2.1 show that \(\mathbb{E}[ \widetilde{\eta }_{i}]=0\). The function \(\frac{\widetilde{\psi }_{j,k}(y)}{ \omega (y)}\) is nondecreasing by the monotonicity of \(\widetilde{\psi }(y)\) and \(\omega (y)\). Furthermore, we get that \(\{\widetilde{\eta }_{i}, i=1, 2, \ldots , n\}\) is negatively associated by Lemma 1.1. On the other hand, it follows from (1) and \(\omega (y)\sim 1\) that

$$\begin{aligned} \mathbb{E} \bigl[ \vert \widetilde{\eta }_{i} \vert ^{p} \bigr]\lesssim \mathbb{E} \biggl[ \biggl\vert \frac{\mu \widetilde{\psi }_{j,k}(Y_{i})}{ \omega (Y_{i})} \biggr\vert ^{p} \biggr]\lesssim \int _{[0,1]} \bigl\vert \widetilde{\psi }_{j,k}(y) \bigr\vert ^{p}f(y)\,dy\lesssim 2^{j(p/2-1)}. \end{aligned}$$
(12)

In particular, \(\mathbb{E} [|\widetilde{\eta }_{i}|^{2} ] \lesssim 1\). Recall Rosenthal’s inequality [12]: if \(Y_{1}, Y_{2}, \ldots , Y_{n}\) are negatively associated random variables such that \(\mathbb{E}[Y_{i}]=0\) and \(\mathbb{E}[|Y_{i}|^{p}]< \infty \), then

$$ \mathbb{E} \Biggl[ \Biggl\vert \sum_{i=1}^{n}Y_{i} \Biggr\vert ^{p} \Biggr] \lesssim \textstyle\begin{cases} \sum_{i=1}^{n}\mathbb{E} [ \vert Y_{i} \vert ^{p} ]+ (\sum_{i=1}^{n}\mathbb{E} [ \vert Y_{i} \vert ^{2} ] )^{{p}/{2}}, & \text{$p>2$;} \\ (\sum_{i=1}^{n}\mathbb{E} [ \vert Y_{i} \vert ^{2} ] ) ^{{p}/{2}}, & \text{$1\leq p\leq 2$.} \end{cases} $$

From this we clearly have

$$ \mathbb{E} \Biggl[ \Biggl\vert \sum_{i=1}^{n} \widetilde{\eta }_{i} \Biggr\vert ^{p} \Biggr]\lesssim \textstyle\begin{cases} n2^{j(p/2-1)}+n^{p/2}, & \text{$p>2$;} \\ n^{p/2}, & \text{$1\leq p\leq 2$.} \end{cases} $$

This, together with \(2^{j}\leq n\), shows that \(\mathbb{E} [ \vert \sum_{i=1}^{n}\widetilde{\eta }_{i} \vert ^{p} ]\lesssim n ^{p/2}\). Similarly, \(\mathbb{E} [ \vert \sum_{i=1}^{n}\overline{ \eta }_{i} \vert ^{p} ]\lesssim n^{p/2}\). Combining these with (11), we get that

$$\begin{aligned} T_{1}\lesssim \biggl(\frac{1}{n} \biggr)^{p} \Biggl\{ \mathbb{E} \Biggl[ \Biggl\vert \sum _{i=1}^{n}\widetilde{\eta }_{i} \Biggr\vert ^{p} \Biggr]+\mathbb{E} \Biggl[ \Biggl\vert \sum _{i=1}^{n}\overline{\eta }_{i} \Biggr\vert ^{p} \Biggr] \Biggr\} \lesssim n^{-\frac{p}{2}}. \end{aligned}$$
(13)

• An upper bound for \(T_{2}\). It is easy to see from the definition of \(\widehat{\mu }_{n}\) that

$$\begin{aligned} T_{2}=\mathbb{E} \biggl[ \biggl\vert \frac{1}{\mu }-\frac{1}{\widehat{\mu } _{n}} \biggr\vert ^{p} \biggr]= \biggl(\frac{1}{n} \biggr)^{p}\mathbb{E} \Biggl[ \Biggl\vert \sum_{i=1}^{n} \biggl( \frac{1}{\omega (Y_{i})}-\frac{1}{ \mu } \biggr) \Biggr\vert ^{p} \Biggr]. \end{aligned}$$
(14)

Defining \(\xi _{i}:=\frac{1}{\omega (Y_{i})}-\frac{1}{\mu }\), we obtain that \(\mathbb{E}[\xi _{i}]=0\) and \(\mathbb{E}[|\xi _{i}|^{p}]\lesssim 1\) by Lemma 2.1 and \(\omega (y)\sim 1\). In addition, by the monotonicity of \(\omega (y)\) and Lemma 1.1 we know that \(\xi _{1}, \xi _{2}, \ldots , \xi _{n}\) are also negatively associated. Then using Rosenthal’s inequality, we get

$$ \mathbb{E} \Biggl[ \Biggl\vert \sum_{i=1}^{n} \xi _{i} \Biggr\vert ^{p} \Biggr] \lesssim \textstyle\begin{cases} n+n^{p/2}, & \text{$p>2$;} \\ n^{p/2}, & \text{$1\leq p\leq 2$.} \end{cases} $$

Hence

$$\begin{aligned} T_{2}= \biggl(\frac{1}{n} \biggr)^{p}\mathbb{E} \Biggl[ \Biggl\vert \sum _{i=1}^{n}\xi _{i} \Biggr\vert ^{p} \Biggr] \lesssim n^{-\frac{p}{2}}. \end{aligned}$$
(15)

Finally, by (10), (13), and (15) we have

$$ \mathbb{E} \bigl[ \vert \widehat{\beta }_{j,k}-\beta _{j,k} \vert ^{p} \bigr]\lesssim n^{-\frac{p}{2}}. $$

This ends the proof. □

Lemma 2.3

Let \(f\in B_{r,q}^{s}\) \((1\leq r,q<+\infty , s>1/r)\) and \(\widehat{\beta }_{j,k}\) be defined by (7). If \(\omega (y)\) is a nonincreasing function, \(\omega (y)\sim 1\), and \(2^{j}\leq \frac{n}{ \ln n}\), then for each \(\lambda >0\), there exists a constant \(\kappa >1\) such that

$$ \mathbb{P} \bigl\{ \vert \widehat{\beta }_{j,k}-\beta _{j,k} \vert \geq \kappa t_{n} \bigr\} \lesssim 2^{-\lambda j}. $$

Proof

By the same arguments of (10) we can obtain that

$$\begin{aligned} \mathbb{P} \bigl\{ \vert \widehat{\beta }_{j,k}- \beta _{j,k} \vert \geq \kappa t_{n} \bigr\} \leq{}& \mathbb{P} \Biggl\{ \Biggl\vert \frac{1}{n} \sum _{i=1}^{n} \biggl(\frac{1}{\omega (Y_{i})}- \frac{1}{\mu } \biggr) \Biggr\vert \geq \frac{\kappa t_{n}}{2} \Biggr\} \\ &{}+\mathbb{P} \Biggl\{ \Biggl\vert \frac{1}{n}\sum _{i=1}^{n} \biggl(\frac{ \mu \psi _{j,k}(Y_{i})}{\omega (Y_{i})}-\beta _{j,k} \biggr) \Biggr\vert \geq \frac{\kappa t_{n}}{2} \Biggr\} . \end{aligned}$$
(16)

To estimate \(\mathbb{P} \{ \vert \frac{1}{n}\sum_{i=1} ^{n} (\frac{1}{\omega (Y_{i})}-\frac{1}{\mu } ) \vert \geq \frac{\kappa t_{n}}{2} \}\), we also define \(\xi _{i}:=\frac{1}{ \omega (Y_{i})}-\frac{1}{\mu }\). Then Lemma 2.1 implies that \(\mathbb{E}[\xi _{i}]=0\). Moreover, \(|\xi _{i}|\lesssim 1\) and \(\mathbb{E}[|\xi _{i}|^{2}]\lesssim 1\) thanks to \(\omega (y)\sim 1\). On the other hand, because of the monotonicity of \(\omega (y)\) and Lemma 1.1, \(\xi _{1}, \xi _{2}, \ldots , \xi _{n}\) are also negatively associated.

Recall Bernstein’s inequality [12]: If \(Y_{1}, Y_{2}, \ldots , Y_{n}\) are negatively associated random variables such that \(\mathbb{E}[Y_{i}]=0\), \(|Y_{i}|\leq M<\infty \), and \(\mathbb{E}[|Y _{i}|^{2}]=\sigma ^{2}\), then for each \(\varepsilon >0\),

$$ \mathbb{P} \Biggl\{ \Biggl\vert \frac{1}{n}\sum _{i=1}^{n}Y_{i} \Biggr\vert \geq \varepsilon \Biggr\} \lesssim \exp \biggl(-\frac{n\varepsilon ^{2}}{2(\sigma ^{2}+\varepsilon M/3)} \biggr). $$

Therefore, by the previous arguments for \(\xi _{i}\) and \(t_{n}=\sqrt{\frac{ \ln n}{n}}\), we derive

$$ \mathbb{P} \Biggl\{ \Biggl\vert \frac{1}{n}\sum _{i=1}^{n} \biggl(\frac{1}{ \omega (Y_{i})}- \frac{1}{\mu } \biggr) \Biggr\vert \geq \frac{\kappa t_{n}}{2} \Biggr\} \lesssim \exp \biggl(-\frac{(\ln n) \kappa ^{2}/4}{2(\sigma ^{2}+\kappa /6)} \biggr). $$

Then there exists \(\kappa >1\) such that \(\exp (-\frac{(\ln n) \kappa ^{2}/4}{2(\sigma ^{2}+\kappa /6)} )\lesssim 2^{-\lambda j}\) with fixed \(\lambda >0\). Hence

$$\begin{aligned} \mathbb{P} \Biggl\{ \Biggl\vert \frac{1}{n}\sum _{i=1}^{n} \biggl(\frac{1}{ \omega (Y_{i})}- \frac{1}{\mu } \biggr) \Biggr\vert \geq \frac{\kappa t_{n}}{2} \Biggr\} \lesssim 2^{-\lambda j}. \end{aligned}$$
(17)

Next, we estimate \(\mathbb{P} \{ \vert \frac{1}{n}\sum_{i=1}^{n} (\frac{\mu \psi _{j,k}(Y_{i})}{\omega (Y_{i})}-\beta _{j,k} ) \vert \geq \frac{\kappa t_{n}}{2} \}\). By to the same arguments of (11) we get

$$\begin{aligned} \mathbb{P} \Biggl\{ \Biggl\vert \frac{1}{n}\sum _{i=1}^{n}\eta _{i} \Biggr\vert \geq \frac{\kappa t_{n}}{2} \Biggr\} \leq \mathbb{P} \Biggl\{ \Biggl\vert \frac{1}{n} \sum_{i=1}^{n} \widetilde{\eta }_{i} \Biggr\vert \geq \frac{\kappa t _{n}}{4} \Biggr\} +\mathbb{P} \Biggl\{ \Biggl\vert \frac{1}{n}\sum _{i=1} ^{n}\overline{\eta }_{i} \Biggr\vert \geq \frac{\kappa t_{n}}{4} \Biggr\} . \end{aligned}$$
(18)

It is easy to see from the definition of \(\widetilde{\eta }_{i}\) and Lemma 2.1 that \(\mathbb{E}[\widetilde{\eta }_{i}]=0\). Moreover, \(\mathbb{E}[|\widetilde{\eta }_{i}|^{2}]\lesssim 1\) by (12) with \(p=2\). Using \(\omega (y)\sim 1\), we get \(\vert \frac{\mu \widetilde{\psi }_{j,k}(Y_{i})}{\omega (Y_{i})} \vert \lesssim 2^{j/2}\) and \(|\widetilde{\eta }_{i}|\leq \vert \frac{\mu \widetilde{\psi } _{j,k}(Y_{i})}{\omega (Y_{i})} \vert +\mathbb{E} [ \vert \frac{ \mu \widetilde{\psi }_{j,k}(Y_{i})}{\omega (Y_{i})} \vert ] \lesssim 2^{j/2}\). Then it follows from Bernstein’s inequality, \(2^{j}\leq \frac{n}{\ln n}\), and \(t_{n}=\sqrt{\frac{\ln n}{n}}\) that

$$ \mathbb{P} \Biggl\{ \Biggl\vert \frac{1}{n}\sum _{i=1}^{n} \widetilde{\eta }_{i} \Biggr\vert \geq \frac{\kappa t_{n}}{4} \Biggr\} \lesssim \exp \biggl(- \frac{n(\kappa t_{n}/4)^{2}}{2(\sigma ^{2}+ \kappa t_{n}2^{j/2}/12)} \biggr)\lesssim \exp \biggl(-\frac{(\ln n) \kappa ^{2}/16}{2(\sigma ^{2}+\kappa /12)} \biggr). $$

Clearly, we can take \(\kappa >1\) such that \(\mathbb{P} \{ \vert \frac{1}{n} \sum_{i=1}^{n}\widetilde{\eta }_{i} \vert \geq \frac{\kappa t _{n}}{4} \} \lesssim 2^{-\lambda j}\). Then similar arguments show that \(\mathbb{P} \{ \vert \frac{1}{n}\sum_{i=1}^{n}\overline{ \eta }_{i} \vert \geq \frac{\kappa t_{n}}{4} \} \lesssim 2^{- \lambda j}\). Combining those with (18), we obtain

$$\begin{aligned} \mathbb{P} \Biggl\{ \Biggl\vert \frac{1}{n}\sum _{i=1}^{n} \biggl(\frac{ \mu \psi _{j,k}(Y_{i})}{\omega (Y_{i})}- \beta _{j,k} \biggr) \Biggr\vert \geq \frac{\kappa t_{n}}{2} \Biggr\} \lesssim 2^{-\lambda j}. \end{aligned}$$
(19)

By (16), (17), and (19) we get

$$ \mathbb{P} \bigl\{ \vert \widehat{\beta }_{j,k}-\beta _{j,k} \vert \geq \kappa t_{n} \bigr\} \lesssim 2^{-\lambda j}. $$

This ends the proof. □

3 Proof of theorem

In this section, we prove the Theorem 1.

Proof of (8)

It is easy to see that

$$\begin{aligned} \mathbb{E} \bigl[ \bigl\vert \widehat{f}_{n}^{\mathrm{lin}}(y)-f(y) \bigr\vert ^{p} \bigr] \lesssim \mathbb{E} \bigl[ \bigl\vert \widehat{f}_{n}^{\mathrm{lin}}(y)-P _{j_{0}}f(y) \bigr\vert ^{p} \bigr]+ \bigl\vert P_{j_{0}}f(y)-f(y) \bigr\vert ^{p}. \end{aligned}$$
(20)

Then we need to estimate \(\mathbb{E} [ |\widehat{f}_{n}^{ \mathrm{lin}}(y)-P_{j_{0}}f(y) |^{p} ]\) and \(|P_{j_{0}}f(y)-f(y) |^{p}\).

By (2) and (3) we get that

$$ \mathbb{E} \bigl[ \bigl\vert \widehat{f}_{n}^{\mathrm{lin}}(y)-P_{j_{0}}f(y) \bigr\vert ^{p} \bigr]=\mathbb{E} \biggl[ \biggl\vert \sum _{k\in \varLambda } (\widehat{\alpha }_{j_{0}, k}-\alpha _{j_{0}, k} )\varphi _{j _{0}, k}(y) \biggr\vert ^{p} \biggr]. $$

Using the Hölder inequality \((1/p+1/p'=1)\), we see that

$$ \mathbb{E} \bigl[ \bigl\vert \widehat{f}_{n}^{\mathrm{lin}}(y)-P_{j_{0}}f(y) \bigr\vert ^{p} \bigr]\leq \mathbb{E} \biggl[ \biggl(\sum _{k\in \varLambda } \vert \widehat{\alpha }_{j_{0}, k}-\alpha _{j_{0}, k} \vert ^{p} \bigl\vert \varphi _{j_{0}, k}(y) \bigr\vert \biggr) \biggl(\sum _{k\in \varLambda } \bigl\vert \varphi _{j_{0}, k}(y) \bigr\vert \biggr)^{\frac{p}{p'}} \biggr]. $$

Then it follows from Condition θ and Lemma 2.2 that

$$\begin{aligned} \mathbb{E} \bigl[ \bigl\vert \widehat{f}_{n}^{\mathrm{lin}}(y)-P_{j_{0}}f(y) \bigr\vert ^{p} \bigr] \lesssim \sum_{k\in \varLambda } \mathbb{E} \bigl[ \vert \widehat{\alpha }_{j_{0}, k}-\alpha _{j_{0}, k} \vert ^{p} \bigr] \bigl\vert \varphi _{j_{0}, k}(y) \bigr\vert 2^{\frac{j_{0}p}{2p'}}\lesssim \biggl( \frac{2^{j _{0}}}{n} \biggr)^{\frac{p}{2}}. \end{aligned}$$
(21)

This, together with \(2^{j_{0}}\sim n^{\frac{1}{2(s-1/r)+1}}\), shows that

$$\begin{aligned} \mathbb{E} \bigl[ \bigl\vert \widehat{f}_{n}^{\mathrm{lin}}(y)-P_{j_{0}}f(y) \bigr\vert ^{p} \bigr] \lesssim n^{-\frac{(s-1/r)p}{2(s-1/r)+1}}. \end{aligned}$$
(22)

Note that \(B_{r, q}^{s}(\mathbb{R})\subseteq B_{\infty , \infty }^{s-1/r}( \mathbb{R})\) in the case \(s>1/r\). It should be pointed out that \(B_{\infty , \infty }^{s-1/r}(\mathbb{R})\) is also a Hölder space. Then by Lemma 1.2, \(f\in B_{r, q}^{s}(\mathbb{R})\), and \(2^{j_{0}} \sim n^{\frac{1}{2(s-1/r)+1}}\) we obtain that

$$\begin{aligned} \bigl\vert P_{j_{0}}f(y)-f(y) \bigr\vert ^{p}\lesssim 2^{-j_{0}(s-1/r)p} \lesssim n^{-\frac{(s-1/r)p}{2(s-1/r)+1}}. \end{aligned}$$
(23)

Combining this with (20) and (22), we get

$$ \mathbb{E} \bigl[ \bigl\vert \widehat{f}_{n}^{\mathrm{lin}}(y)-f(y) \bigr\vert ^{p} \bigr] \lesssim n^{-\frac{(s-1/r)p}{2(s-1/r)+1}}. $$

 □

Proof of (9)

Using the definitions of \(\widehat{f} _{n}^{\mathrm{lin}}(y)\) and \(\widehat{f}_{n}^{\mathrm{non}}(y)\), we get that

$$\begin{aligned} \mathbb{E} \bigl[ \bigl\vert \widehat{f}_{n}^{\mathrm{non}}(y)-f(y) \bigr\vert ^{p} \bigr] \lesssim W_{1}+W_{2}+G, \end{aligned}$$
(24)

where \(W_{1}:=\mathbb{E} [ |\widehat{f}_{n}^{\mathrm{lin}}(y)-P _{j_{0}}f(y) |^{p} ]\), \(W_{2}:= |P_{j_{1}+1}f(y)-f(y) |^{p}\), and

$$ G:=\mathbb{E} \Biggl[ \Biggl\vert \sum_{j=j_{0}}^{j_{1}} \sum_{k\in \varLambda _{j}} (\widehat{\beta }_{j,k}I_{\{ \vert \widehat{\beta }_{j,k} \vert \geq \kappa t_{n}\}}- \beta _{j,k} )\psi _{j,k}(y) \Biggr\vert ^{p} \Biggr]. $$

It follows from (21), \(2^{j_{0}}\sim n^{\frac{1}{2m+1}}\) \((m>s)\), and \(s>1/r\) that

$$\begin{aligned} W_{1}\lesssim \biggl(\frac{2^{j_{0}}}{n} \biggr)^{\frac{p}{2}}\sim n ^{-\frac{mp}{2m+1}}< n^{-\frac{(s-1/r)p}{2(s-1/r)+1}}. \end{aligned}$$
(25)

On the other hand, by the same arguments as for (23), we can obtain that \(W_{2}\lesssim 2^{-j_{1}(s-1/r)p}\). This with the choice of \(2^{j_{1}}\sim \frac{n}{\ln n}\) shows

$$\begin{aligned} W_{2}\lesssim 2^{-j_{1}(s-1/r)p}\sim \biggl( \frac{\ln n}{n} \biggr) ^{(s-1/r)p}< \biggl(\frac{\ln n}{n} \biggr)^{ \frac{(s-1/r)p}{2(s-1/r)+1}}. \end{aligned}$$
(26)

Then the remaining task is to estimate G.

Using the classical technique in [6], we get that

$$\begin{aligned} G\lesssim (\ln n)^{p-1}(G_{1}+G_{2}+G_{3}), \end{aligned}$$
(27)

where

$$\begin{aligned} &G_{1}:=\mathbb{E} \Biggl[\sum_{j=j_{0}}^{j_{1}} \biggl(\sum_{k\in \varLambda _{j}} \vert \widehat{\beta }_{j,k}-\beta _{j,k} \vert I _{\{ \vert \widehat{\beta }_{j,k}-\beta _{j,k} \vert \geq \frac{\kappa t_{n}}{2}\}} \bigl\vert \psi _{j,k}(y) \bigr\vert \biggr)^{p} \Biggr], \\ &G_{2}:=\mathbb{E} \Biggl[\sum_{j=j_{0}}^{j_{1}} \biggl(\sum_{k\in \varLambda _{j}} \vert \widehat{\beta }_{j,k}-\beta _{j,k} \vert I _{\{ \vert \beta _{j,k} \vert \geq \frac{\kappa t_{n}}{2}\}} \bigl\vert \psi _{j,k}(y) \bigr\vert \biggr)^{p} \Biggr], \\ &G_{3}:=\sum_{j=j_{0}}^{j_{1}} \biggl(\sum_{k\in \varLambda _{j}} \vert \beta _{j,k} \vert I_{\{ \vert \beta _{j,k} \vert \leq 2\kappa t_{n}\}} \bigl\vert \psi _{j,k}(y) \bigr\vert \biggr)^{p}. \end{aligned}$$

• An upper bound for \(G_{1}\). By the definition of \(\widehat{\beta }_{j,k}\), \(\omega (y)\sim 1\), and Lemma 2.1, \(|\widehat{\beta }_{j,k}|\lesssim 2^{j/2}\) and \(|\widehat{\beta }_{j,k}- \beta _{j,k}|\lesssim 2^{j/2}\). Furthermore, we obtain that

$$ G_{1}\lesssim \mathbb{E} \Biggl[\sum_{j=j_{0}}^{j_{1}} \biggl(\sum_{k\in \varLambda _{j}} 2^{j/2}I_{\{ \vert \widehat{\beta }_{j,k}-\beta _{j,k} \vert \geq \frac{\kappa t_{n}}{2}\}} \bigl\vert \psi _{j,k}(y) \bigr\vert \biggr) ^{p} \Biggr]. $$

On the other hand, it follows from the Hölder inequality and Condition θ that

$$\begin{aligned} &\biggl(\sum_{k\in \varLambda _{j}} I_{\{ \vert \widehat{\beta }_{j,k}- \beta _{j,k} \vert \geq \frac{\kappa t_{n}}{2}\}} \bigl\vert \psi _{j,k}(y) \bigr\vert \biggr) ^{p} \\ &\quad \lesssim \biggl( \sum_{k\in \varLambda _{j}} I_{\{ \vert \widehat{\beta }_{j,k}-\beta _{j,k} \vert \geq \frac{\kappa t_{n}}{2}\}} \bigl\vert \psi _{j,k}(y) \bigr\vert \biggr) \biggl(\sum _{k\in \varLambda _{j}} \bigl\vert \psi _{j,k}(y) \bigr\vert \biggr)^{\frac{p}{p'}} \\ &\quad \lesssim \biggl(\sum_{k\in \varLambda _{j}} I_{\{ \vert \widehat{\beta } _{j,k}-\beta _{j,k} \vert \geq \frac{\kappa t_{n}}{2}\}} \bigl\vert \psi _{j,k}(y) \bigr\vert \biggr)2^{\frac{jp}{2p'}}. \end{aligned}$$

Then using Condition θ and Lemma 2.3, we derive that

$$\begin{aligned} G_{1} &\lesssim \mathbb{E} \Biggl[\sum _{j=j_{0}}^{j_{1}}2^{jp/2} \biggl(\sum _{k\in \varLambda _{j}} I_{\{ \vert \widehat{\beta }_{j,k}- \beta _{j,k} \vert \geq \frac{\kappa t_{n}}{2}\}} \bigl\vert \psi _{j,k}(y) \bigr\vert \biggr)2^{ \frac{jp}{2p'}} \Biggr] \\ &\lesssim \sum_{j=j_{0}}^{j_{1}}2^{\frac{j}{2}(p+\frac{p}{p'})} \sum_{k\in \varLambda _{j}} \bigl\vert \psi _{j,k}(y) \bigr\vert \mathbb{E} [I _{\{ \vert \widehat{\beta }_{j,k}-\beta _{j,k} \vert \geq \frac{\kappa t_{n}}{2}\}} ] \lesssim \sum _{j=j_{0}}^{j_{1}}2^{j(p-\lambda )}. \end{aligned}$$
(28)

Clearly, there exists \(\kappa >1\) such that \(\lambda >p+mp\) in Lemma 2.3. Then \(G_{1}\lesssim \sum_{j=j_{0}}^{j_{1}}2^{j(p-\lambda )}\lesssim \sum_{j=j_{0}}^{j_{1}}2^{-jmp}\). This with the choice of \(2^{j_{0}}\sim n^{\frac{1}{2m+1}}\) \((m>s)\) shows that

$$\begin{aligned} G_{1}\lesssim \sum_{j=j_{0}}^{j_{1}}2^{-jmp} \lesssim 2^{-j_{0}mp} \sim n^{-\frac{mp}{2m+1}}< n^{-\frac{(s-1/r)p}{2(s-1/r)+1}}. \end{aligned}$$
(29)

• An upper bound for \(G_{2}\). Taking \(2^{j_{*}}\sim n^{ \frac{1}{2(s-1/r)+1}}\), we get that \(2^{j_{0}}<2^{j_{*}}<2^{j_{1}}\). It is easy to see that

$$\begin{aligned} G_{21} &:=\mathbb{E} \Biggl[\sum_{j=j_{0}}^{j_{*}} \biggl(\sum_{k\in \varLambda _{j}} \vert \widehat{\beta }_{j,k}-\beta _{j,k} \vert I _{\{ \vert \beta _{j,k} \vert \geq \frac{\kappa t_{n}}{2}\}} \bigl\vert \psi _{j,k}(y) \bigr\vert \biggr)^{p} \Biggr] \\ &\lesssim \sum_{j=j_{0}}^{j_{*}}\mathbb{E} \biggl[ \biggl(\sum_{k\in \varLambda _{j}} \vert \widehat{\beta }_{j,k}-\beta _{j,k} \vert \bigl\vert \psi _{j,k}(y) \bigr\vert \biggr)^{p} \biggr]. \end{aligned}$$

Similarly to the arguments of (21), we get

$$\begin{aligned} G_{21}\lesssim \sum_{j=j_{0}}^{j_{*}} \biggl(\frac{2^{j}}{n} \biggr) ^{\frac{p}{2}} \lesssim \biggl( \frac{2^{j_{*}}}{n} \biggr)^{ \frac{p}{2}}\sim n^{-\frac{(s-1/r)p}{2(s-1/r)+1}} \end{aligned}$$
(30)

by Lemma 2.2 and \(2^{j_{*}}\sim n^{\frac{1}{2(s-1/r)+1}}\).

On the other hand,

$$\begin{aligned} G_{22} &:=\mathbb{E} \Biggl[\sum_{j=j_{*}+1}^{j_{1}} \biggl(\sum_{k\in \varLambda _{j}} \vert \widehat{\beta }_{j,k}-\beta _{j,k} \vert I _{\{ \vert \beta _{j,k} \vert \geq \frac{\kappa t_{n}}{2}\}} \bigl\vert \psi _{j,k}(y) \bigr\vert \biggr)^{p} \Biggr] \\ &\lesssim \mathbb{E} \Biggl[\sum_{j=j_{*}+1}^{j_{1}} \biggl(\sum_{k\in \varLambda _{j}} \vert \widehat{\beta }_{j,k}-\beta _{j,k} \vert \biggl\vert \frac{\beta _{j,k}}{\kappa t_{n}} \biggr\vert \bigl\vert \psi _{j,k}(y) \bigr\vert \biggr)^{p} \Biggr]. \end{aligned}$$

Using the Hölder inequality and Lemma 2.2, we have

$$\begin{aligned} G_{22} &\lesssim \mathbb{E} \Biggl[\sum _{j=j_{*}+1}^{j_{1}} \biggl(\frac{1}{t_{n}} \biggr)^{p} \biggl(\sum_{k\in \varLambda _{j}} \vert \widehat{\beta }_{j,k}-\beta _{j,k} \vert ^{p} \bigl\vert \beta _{j,k}\psi _{j,k}(y) \bigr\vert \biggr) \biggl(\sum_{k\in \varLambda _{j}} \bigl\vert \beta _{j,k}\psi _{j,k}(y) \bigr\vert \biggr) ^{\frac{p}{p'}} \Biggr] \\ &\lesssim \sum_{j=j_{*}+1}^{j_{1}} \biggl( \frac{1}{t_{n}} \biggr) ^{p}n^{-\frac{p}{2}} \biggl(\sum _{k\in \varLambda _{j}} \bigl\vert \beta _{j,k}\psi _{j,k}(y) \bigr\vert \biggr)^{p}. \end{aligned}$$

When \(s>1/r\), \(B_{r,q}^{s}(\mathbb{R})\subseteq B_{\infty , \infty } ^{s-1/r}(\mathbb{R})\). Clearly, \(B_{\infty , \infty }^{s-1/r}( \mathbb{R})\) is a Hölder space. Then we can derive that \(\sum_{k\in \varLambda _{j}} |\beta _{j,k}\psi _{j,k}(y) | \lesssim 2^{-j(s-1/r)}\) as in [11]. Hence it follows from the choice of \(2^{j_{*}}\) that

$$\begin{aligned} G_{22}\lesssim \sum_{j=j_{*}+1}^{j_{1}}2^{-j(s-1/r)p} \lesssim 2^{-j_{*}(s-1/r)p}\sim n^{-\frac{(s-1/r)p}{2(s-1/r)+1}}. \end{aligned}$$
(31)

Therefore we have

$$\begin{aligned} G_{2}=G_{21}+G_{22} \lesssim n^{-\frac{(s-1/r)p}{2(s-1/r)+1}}. \end{aligned}$$
(32)

• An upper bound for \(G_{3}\). Clearly, we can obtain that

$$\begin{aligned} G_{31} &:=\sum_{j=j_{0}}^{j_{*}} \biggl(\sum_{k\in \varLambda _{j}} \vert \beta _{j,k} \vert I_{\{ \vert \beta _{j,k} \vert \leq 2\kappa t_{n}\}} \bigl\vert \psi _{j,k}(y) \bigr\vert \biggr)^{p}\lesssim \sum_{j=j_{0}}^{j_{*}} \biggl(\sum_{k\in \varLambda _{j}} t _{n} \bigl\vert \psi _{j,k}(y) \bigr\vert \biggr)^{p} \\ &\lesssim \sum_{j=j_{0}}^{j_{*}} \biggl( \frac{\ln n}{n} \biggr) ^{\frac{p}{2}}2^{jp/2}\lesssim \biggl( \frac{\ln n}{n} \biggr)^{ \frac{p}{2}}2^{j_{*}p/2}\lesssim (\ln n)^{p/2}n^{- \frac{(s-1/r)p}{2(s-1/r)+1}}. \end{aligned}$$
(33)

In addition, it follows from the Hölder inequality \((1/r+1/r'=1)\), Condition θ, and Lemma 1.2 that

$$\begin{aligned} G_{32} &:=\sum_{j=j_{*}+1}^{j_{1}} \biggl(\sum_{k\in \varLambda _{j}} \vert \beta _{j,k} \vert I_{\{ \vert \beta _{j,k} \vert \leq 2\kappa t_{n}\}} \bigl\vert \psi _{j,k}(y) \bigr\vert \biggr)^{p}\lesssim \sum_{j=j_{*}+1}^{j_{1}} \biggl(\sum_{k\in \varLambda _{j}} \vert \beta _{j,k} \vert \bigl\vert \psi _{j,k}(y) \bigr\vert \biggr)^{p} \\ &\lesssim \sum_{j=j_{*}+1}^{j_{1}} \biggl(\sum _{k\in \varLambda _{j}} \vert \beta _{j,k} \vert ^{r} \biggr)^{ \frac{p}{r}} \biggl(\sum _{k\in \varLambda _{j}} \bigl\vert \psi _{j,k}(y) \bigr\vert ^{r'} \biggr)^{\frac{p}{r'}}\lesssim \sum _{j=j_{*}+1}^{j _{1}}2^{-j(s-1/r)p}. \end{aligned}$$

This with \(2^{j_{*}}\sim n^{\frac{1}{2(s-1/r)+1}}\) shows that

$$\begin{aligned} G_{32}\lesssim \sum_{j=j_{*}+1}^{j_{1}}2^{-j(s-1/r)p} \lesssim 2^{-j_{*}(s-1/r)p}\sim n^{-\frac{(s-1/r)p}{2(s-1/r)+1}}. \end{aligned}$$
(34)

Therefore

$$\begin{aligned} G_{3}=G_{31}+G_{32} \lesssim (\ln n)^{p/2}n^{- \frac{(s-1/r)p}{2(s-1/r)+1}}. \end{aligned}$$
(35)

By (27), (29), (32), and (35) we have \(G\lesssim (\ln n)^{\frac{3p}{2}}n^{-\frac{(s-1/r)p}{2(s-1/r)+1}}\). Then it is easy to see from (24), (25), and (26) that

$$ \mathbb{E} \bigl[ \bigl\vert \widehat{f}_{n}^{\mathrm{non}}(y)-f(y) \bigr\vert ^{p} \bigr] \lesssim (\ln n)^{\frac{3p}{2}}n^{-\frac{(s-1/r)p}{2(s-1/r)+1}}. $$

This ends the proof. □