Poisson subsampling-based estimation for growing-dimensional expectile regression in massive data

Li, **aoyan; **a, **aochao; Zhang, Zhimin

doi:10.1007/s11222-024-10449-x

Poisson subsampling-based estimation for growing-dimensional expectile regression in massive data

Original Paper
Published: 15 June 2024

Volume 34, article number 133, (2024)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

**aoyan Li¹,
**aochao **a¹ &
Zhimin Zhang¹

123 Accesses
1 Altmetric
Explore all metrics

Abstract

As an effective tool for data analysis, expectile regression is widely used in the fields of statistics, econometrics and finance. However, most studies focus on the case where the sample size is not massive and the dimension is low or fixed. This paper studies the parameter estimation and inference for large-scale expectile regression when the number of parameters grows to infinity. Specifically, an inverse probability weighted asymmetric least squares estimator based on Poisson subsampling (ALS-P) is proposed. Theoretically, the convergence rate and asymptotic normality for ALS-P are established. Furthermore, the optimal subsampling probabilities based on the L-optimality criterion are derived. Finally, extensive simulations and two real-world datasets are conducted to illustrate the effectiveness of the proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Optimal subsampling for composite quantile regression in big data

Article 08 February 2022

Single-index composite quantile regression for ultra-high-dimensional data

Article 16 September 2021

Optimal subsampling for functional quantile regression

Article 19 October 2022

Data availability

Data is provided within the manuscript or supplementary information files.

References

Ai, M.Y., Wang, F., Yu, J., Zhang, H.M.: Optimal subsampling for large-scale quantile regression. J. Complex. 62, 101512 (2021). https://doi.org/10.1016/j.jco.2020.101512
Article MathSciNet Google Scholar
Ai, M.Y., Yu, J., Zhang, H.M., Wang, H.Y.: Optimal subsampling algorithms for big data regressions. Stat. Sin. 31(2), 749–772 (2021). https://doi.org/10.5705/ss.202018.0439
Article MathSciNet Google Scholar
Atkinson, A.C., Done, A.N., Tobias, R.D.: Optimum Experimental Designs, with SAS. Oxford University Press, Oxford (2007)
Book Google Scholar
Berger, Y.G., De La Riva Torres, O.: Empirical likelihood confidence intervals for complex sampling designs. J. R. Stat. Soc. Ser. B Stat. Methodol. 78(2), 319–341 (2016). https://doi.org/10.1111/rssb.12115
Article MathSciNet Google Scholar
Bernstein, D.: Matrix Mathematics: Theory, Facts, and Formulas with Application to Linear Systems Theory. Princeton University Press, Princeton (2005)
Google Scholar
Chen, S.: Bei**g multi-site air-quality data. In: UCI Machine Learning Repository (2019). https://doi.org/10.24432/C5RK5G
Ciuperca, G.: Variable selection in high-dimensional linear model with possibly asymmetric errors. Comput. Stat. Data Anal. 155, 107112 (2021). https://doi.org/10.1016/j.csda.2020.107112
Article MathSciNet Google Scholar
Drineas, P., Magdon-Ismail, M., Mahoney, M.W., Woodruff, D.P.: Faster approximation of matrix coherence and statistical leverage. J. Mach. Learn. Res. 13, 3475–3506 (2012)
MathSciNet Google Scholar
Efron, B.: Regression percentiles using asymmetric squared error loss. Stat. Sin. 1(1), 93–125 (1991)
MathSciNet Google Scholar
Eilers, P.H., Boelens, H.F.: Baseline correction with asymmetric least squares smoothing. Leiden Univ. Med. Centre Rep. 1(1), 5 (2005)
Google Scholar
Fan, J.Q., Li, R.Z.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001). https://doi.org/10.1198/016214501753382273
Article MathSciNet Google Scholar
Fan, J.Q., Peng, H.: Nonconcave penalized likelihood with a diverging number of parameters. Ann. Stat. 32(3), 928–961 (2004). https://doi.org/10.1214/009053604000000256
Article MathSciNet Google Scholar
Gao, S.H., Yu, Z.: Parametric expectile regression and its application for premium calculation. Insurance Math. Econ. 111, 242–256 (2023)
Article MathSciNet Google Scholar
Gao, J.Z., Wang, L., Lian, H.: Optimal decorrelated score subsampling for generalized linear models with massive data. Sci. China Math. 67, 405–430 (2024). https://doi.org/10.1007/s11425-022-2057-8
Article MathSciNet Google Scholar
Gu, Y.W., Zou, H.: High-dimensional generalizations of asymmetric least squares regression and their applications. Ann. Stat. 44, 2661–2694 (2016). https://doi.org/10.1214/15-AOS1431
Article MathSciNet Google Scholar
Hamidieh, K.: Superconductivty data. In: UCI Machine Learning Repository (2018). https://doi.org/10.24432/C53P47
Hamidieh, K.: A data-driven statistical model for predicting the critical temperature of a superconductor. Comput. Mater. Sci. 154, 346–354 (2018). https://doi.org/10.1016/j.commatsci.2018.07.052
Article Google Scholar
Kuan, C.M., Yeh, J.H., Hsu, Y.C.: Assessing value at risk with care, the conditional autoregressive expectile models. J. Econom. 150(2), 261–270 (2009). https://doi.org/10.1016/j.jeconom.2008.12.002
Article MathSciNet Google Scholar
Li, X.X., Li, R.Z., **a, Z.M., Xu, C.: Distributed feature screening via componentwise debiasing. J. Mach. Learn. Res. 21(24), 1–32 (2020)
MathSciNet Google Scholar
Lu, X., Su, L.J.: Jackknife model averaging for quantile regressions. J. Econom. 188(1), 40–58 (2015). https://doi.org/10.1016/j.jeconom.2014.11.005
Article MathSciNet Google Scholar
Ma, P., Mahoney, M., Yu, B.: A statistical perspective on algorithmic leveraging. Int. Conf. Mach. Learn. PMLR 32(1), 91–99 (2014)
Google Scholar
Man, R., Tan, K.M., Wang, Z., Zhou, W.X.: Retire: robust expectile regression in high dimensions. J. Econom. 239(2), 105459 (2024). https://doi.org/10.1016/j.jeconom.2023.04.004
Article MathSciNet Google Scholar
Newey, W.K., Powell, J.L.: Asymmetric least squares estimation and testing. Econom. J. Econom. Soc. 55, 819–847 (1987)
MathSciNet Google Scholar
Ren, M., Zhao, S.L., Wang, M.Q., Zhu, X.B.: Robust optimal subsampling based on weighted asymmetric least squares. Stat. Pap. (2023). https://doi.org/10.1007/s00362-023-01480-7
Article Google Scholar
Robins, J.M., Rotnitzky, A.: Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell, N.P., Dietz, K., Farewell, V.T. (eds.) AIDS Epidemiology: Methodological Issues, pp. 297–331. Birkhäuser, Boston (1992)
Chapter Google Scholar
Serfling, R.J.: Approximation Theorems of Mathematical Statistics. Wiley, New York (1980)
Book Google Scholar
Shan, J.H., Wang, L.: Optimal Poisson subsampling decorrelated score for high-dimensional generalized linear models. J. Appl. Stat. (2024). https://doi.org/10.1080/02664763.2024.2315467
Article Google Scholar
Taylor, J.W.: Estimating value at risk and expected shortfall using expectiles. J. Financ. Econom. 6(2), 231–252 (2008). https://doi.org/10.1093/jjfinec/nbn001
Article Google Scholar
Tu, Y.D., Wang, S.W.: Jackknife model averaging for expectile regressions in increasing dimension. Econ. Lett. 197, 109607 (2020). https://doi.org/10.1016/j.econlet.2020.109607
Article MathSciNet Google Scholar
Tu, Y.D., Wang, S.W.: Variable screening and model averaging for expectile regressions. Oxf. Bull. Econ. Stat. 85(3), 574–598 (2023). https://doi.org/10.1111/obes.12538
Article Google Scholar
Van der Geer, S., Bühlmann, P., Ritov, Y., Ruben, D.: On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Stat. 42(3), 1166–1202 (2014). https://doi.org/10.1214/14-AOS1221
Article MathSciNet Google Scholar
Van der Vaart, A.W.: Asymptotic Statistics. Cambridge University Press, Cambridge (1998)
Book Google Scholar
Wang, L.: GEE analysis of clustered binary data with diverging number of covariates. Ann. Stat. 39(1), 389–417 (2011). https://doi.org/10.1214/10-AOS846
Article MathSciNet Google Scholar
Wang, H.Y.: More efficient estimation for logistic regression with optimal subsamples. J. Mach. Learn. Res. 20(132), 1–59 (2019)
MathSciNet Google Scholar
Wang, H.Y., Ma, Y.Y.: Optimal subsampling for quantile regression in big data. Biometrika 108(1), 99–112 (2021). https://doi.org/10.1093/biomet/asaa043
Article MathSciNet Google Scholar
Wang, H.Y., Zhu, R., Ma, P.: Optimal subsampling for large sample logistic regression. J. Am. Stat. Assoc. 113(522), 829–844 (2018). https://doi.org/10.1080/01621459.2017.1292914
Article MathSciNet Google Scholar
Wang, H.Y., Yang, M., Stufken, J.: Information-based optimal subdata selection for big data linear regression. J. Am. Stat. Assoc. 114(525), 393–405 (2019). https://doi.org/10.1080/01621459.2017.1408468
Wang, L., Elmstedt, J., Wong, W.K., Xu, H.: Orthogonal subsampling for big data linear regression. Ann. Appl. Stat. 15(3), 1273–1290 (2021). https://doi.org/10.1214/21-AOAS1462
Article MathSciNet Google Scholar
**ao, J.X., Yu, P., Song, X.Y., Zhang, Z.Z.: Statistical inference in the partial functional linear expectile regression model. Sci. China Math. 65(12), 2601–2630 (2022). https://doi.org/10.1007/s11425-020-1848-8
Article MathSciNet Google Scholar
**e, S.Y., Zhou, Y., Wan, A.T.K.: A varying-coefficient expectile model for estimating value at risk. J. Bus. Econ. Stat. 32(4), 576–592 (2014). https://doi.org/10.1080/07350015.2014.917979
Article MathSciNet Google Scholar
Yang, Z.H., Wang, H.Y., Yan, J.: Subsampling approach for least squares fitting of semi-parametric accelerated failure time models to massive survival data. Stat. Comput. 34, 77 (2024). https://doi.org/10.1007/s11222-024-10391-y
Article MathSciNet Google Scholar
Yao, Y.Q., Wang, H.Y.: A review on optimal subsampling methods for massive datasets. J. Data Sci. 19(1), 151–172 (2021). https://doi.org/10.6339/21-JDS999
Article Google Scholar
Yu, J., Wang, H.Y., Ai, M.Y., Zhang, H.M.: Optimal distributed subsampling for maximum quasi-likelihood estimators with massive data. J. Am. Stat. Assoc. 117(537), 265–276 (2022). https://doi.org/10.1080/01621459.2020.1773832
Article MathSciNet Google Scholar
Yu, J., Ai, M.Y., Ye, Z.Q.: A review on design inspired subsampling for big data. Stat. Pap. (2023). https://doi.org/10.1007/s00362-022-01386-w
Article Google Scholar
Yu, J., Liu, J.Q., Wang, H.Y.: Information-based optimal Subdata selection for non-linear models. Stat. Pap. 64, 1069–1093 (2023). https://doi.org/10.1007/s00362-023-01430-3
Article MathSciNet Google Scholar
Zhang, C.H., Zhang, S.S.: Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 76(1), 217–242 (2014). https://doi.org/10.1111/rssb.12026
Article MathSciNet Google Scholar
Zhou, P., Yu, Z., Ma, J.Y., Tian, M.Z., Fan, Y.: Communication-efficient distributed estimator for generalized linear models with a diverging number of covariates. Comput. Stat. Data Anal. 157, 107154 (2021). https://doi.org/10.1016/j.csda.2020.107154
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors thank the Edi- tor and anonymous reviewers for their constructive comments and suggestions, which greatly improves the quality of the current work. The research of **aochao **a was supported by National Natural Science Foundation of China (Grant Number 11801202) and Fundamental Research Funds for the Central Universities (Grant Number 2021CDJQY-047). The research of Zhimin Zhang was supported by the National Natural Science Foundation of China [Grant Num- bers 12271066, 12171405, 11871121].

Author information

Authors and Affiliations

College of Mathematics and Statistics, Chongqing University, Chongqing, 401331, China
**aoyan Li, **aochao **a & Zhimin Zhang

Authors

**aoyan Li
View author publications
You can also search for this author in PubMed Google Scholar
**aochao **a
View author publications
You can also search for this author in PubMed Google Scholar
Zhimin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

**aoyan Li: Conceptualization, Methodology, Software, Validation, Writing - original draft. **aochao **a: Supervision, Writing - review & editing. Zhimin Zhang: Supervision, Writing - review & editing.

Corresponding author

Correspondence to **aochao **a.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Proofs

We provide the proofs of Theorems 1, 2 and 5 for diverging dimension below. The proof of fixed dimension is a special case of $p_n=p$. In the proofs, we use C to denote a generic positive constant independent of $(n,N,p_n)$, whose magnitude may change from line to line. Define $ \mathcal {L}(\varvec{\beta })=\frac{1}{N}\sum _{i \in \mathcal {I}_{\mathcal {S}}}\frac{1}{\pi _i}\ell _{\tau }(y_i-\varvec{x}_i^{T}\varvec{\beta })=\frac{1}{N}\sum _{i=1}^{N} \frac{\eta _i}{\pi _i}\ell _{\tau }(y_i-\varvec{x}_i^{T}\varvec{\beta }), {Q}(\varvec{\beta }) =\frac{1}{N} \sum _{i=1}^{N} \frac{\eta _i}{\pi _i}\phi _{\tau }(y_i-\varvec{x}_i^{T}\varvec{\beta })\varvec{x}_i$.

Lemma A.1

(Tu and Wang 2020, Lemma A.1) For $\ell _{\tau }(s)=s^2|\tau -I(s\le 0)|, \phi _{\tau }(s)=s|\tau -I(s \le 0)|, \psi _{\tau }(s)=|\tau -I(s \le 0)|, \Gamma (s,t)= I(s<0)-I(s+t<0)$, we have

(i)
$\ell _{\tau }(s+t)-\ell _{\tau }(s)=2\phi _{\tau }(s)t+\psi _{\tau }(s)t^2 + (2\tau -1)(s+t)^2\Gamma (s,t)$,
(ii)
$\phi _{\tau }(s+t)-\phi _{\tau }(s)=t\psi _{\tau }(s)+(2\tau -1)(s+t)\Gamma (s,t)$.

Lemma A.2

Let $a_n=\sqrt{p_n/n}, \Gamma (s,t)= I(s<0)-I(s+t<0),\varvec{u} \in \mathbb {R}^{p_n} $ such that $\Vert \varvec{u}\Vert \le c$, where c is a constant. Under the conditions (C2)-(C4), for $k=2,8$, we have

$$\begin{aligned}&\left| E\left[ (\varepsilon _i-a_n\varvec{x_i}^T\varvec{u})^k\Gamma (\varepsilon _i,-a_n\varvec{x_i}^T\varvec{u})\right] \right| \\&\quad =O\left( a_n^{k+1}p_n^{\frac{k}{2}}\Vert \varvec{u}\Vert ^{k+1}\right) . \end{aligned}$$

Proof

It follows that

$$\begin{aligned}&\left| E\left[ (\varepsilon _i-a_n\varvec{x}_i^T\varvec{u})^k \Gamma (\varepsilon _i,-a_n\varvec{x}_i^T\varvec{u})\right] \right| \\ {}&=\left| E\left\{ E\left[ (\varepsilon _i-a_n\varvec{x}_i^T\varvec{u})^k \Gamma (\varepsilon _i,-a_n\varvec{x}_i^T\varvec{u})|\varvec{x}\right] \right\} \right| \\&\quad = \left| E \left[ \int _{a_n\varvec{x}_i^T\varvec{u}}^0 (\varepsilon _i-a_n\varvec{x}_i^T\varvec{u})^k f(\varepsilon _i|\varvec{x})d\varepsilon _i \right] \right| \\&\quad =\left| E \left[ \int _{0}^{-a_n\varvec{x}_i^T\varvec{u}} t_i^k f(t_i+a_n\varvec{x}_i^T\varvec{u}|\varvec{x})dt_i \right] \right| \\&\quad \le \frac{C}{k+1}a_n^{k+1}E|\varvec{x}_i^{T}\varvec{u}|^{k+1}\\&\quad \le Ca_n^{k+1}\left[ E\left( \varvec{x}_i^{T}\varvec{u}\right) ^{2k}\right] ^{\frac{1}{2}} \left[ E\left( \varvec{x}_i^{T}\varvec{u}\right) ^{2}\right] ^{\frac{1}{2}}\\&\quad \le Ca_n^{k+1}\Vert \varvec{u}\Vert ^{k+1}\left[ p_n^k \max _{m}E\left( {x}_{im}\right) ^{2k}\right] ^{\frac{1}{2}}\\&\qquad \times \left\{ \lambda _{max}\left[ E\left( \varvec{x}_i\varvec{x}_i^T \right) \right] \right\} ^{\frac{1}{2}} \\&\quad = O\left( a_n^{k+1}p_n^{\frac{k}{2}}\Vert \varvec{u}\Vert ^{k+1}\right) , \end{aligned}$$

where the first inequality invokes the condition (C2), the second inequality applies Cauchy-Schwarz inequality, the last inequality uses Schwarz and Loève’s $c_r$ inequalities and the fact $\varvec{u}^TE(\varvec{x}_i\varvec{x}_i^T)\varvec{u}\le \Vert \varvec{u}\Vert ^2 \lambda _{max}\left[ E\left( \varvec{x}_i\varvec{x}_i^T \right) \right] $, the last line holds by the conditions (C3)(ii) and (C4)(i). $\square $

Lemma A.3

Under the conditions (C1), (C3)-(C5), if $p_n^3/n \rightarrow 0$, we have

$$\begin{aligned} \sqrt{n}A_nV^{-1/2}Q(\varvec{\beta }_0){\mathop {\rightarrow }\limits ^{d}} N(0,A_0). \end{aligned}$$

Proof

Let $\sqrt{n}A_nV^{-1/2}Q(\varvec{\beta }_0)=\sum _{i=1}^{N}\frac{\sqrt{n}\eta _i}{N\pi _i}A_n \times V^{-1/2}\phi _{\tau }(\varepsilon _i)\varvec{x}_i=: \sum _{i=1}^{N}\varvec{\xi }_{i}$. Now, we check the condition of Lindeberg-Feller central limit theorem (Proposition 2.27 in Van der Vaart (1998)). For $\forall \epsilon >0$,

$$\begin{aligned}&\sum _{i=1}^{N}E(\Vert \varvec{\xi }_{i}\Vert ^2I(\Vert \varvec{\xi }_{i} \Vert \ge \epsilon )) \nonumber \\&\quad \le \frac{1}{\epsilon }\sum _{i=1}^{N} E\left( \Vert \varvec{\xi }_{i}\Vert ^3 \right) = \frac{1}{\epsilon }\sum _{i=1}^{N} E\left[ E\left( \Vert \varvec{\xi }_{i}\Vert ^3|\mathcal {F}_N\right) \right] \nonumber \\&\quad \le \frac{n^{3/2} |||A_n V^{-1/2}|||^3 }{N\epsilon }\sum _{i=1}^{N} E\left[ \frac{1}{N^2\pi _i^2}|\phi _{\tau }^3(\varepsilon _i)|\Vert \varvec{x}_i\Vert ^3 \right] \nonumber \\&\quad \le \frac{Cn^{3/2} }{N\epsilon }\sum _{i=1}^{N} E\left[ \frac{1}{N^2\pi _i^2}|\varepsilon _i|^3\Vert \varvec{x}_i\Vert ^3 \right] \nonumber \\&\quad = \frac{Cn^{3/2} }{N\epsilon }\sum _{i=1}^{N} E\left[ \frac{1}{N^2\pi _i^2}\Vert \varvec{x}_i\Vert ^3E\left( |\varepsilon _i^3|\Big |\varvec{x}_i\right) \right] \nonumber \\&\quad \le \frac{Cn^{3/2} }{N\epsilon }\sum _{i=1}^{N} E\left[ \frac{1}{N^2\pi _i^2}\Vert \varvec{x}_i\Vert ^3 \right] \nonumber \\&\quad \le \frac{Cn^{3/2} }{N\epsilon }\sum _{i=1}^{N} \left[ E\left( \frac{1}{N^4\pi _i^4}\right) \right] ^{\frac{1}{2}}\left[ E\left( \Vert \varvec{x}_i\Vert ^6\right) \right] ^{\frac{1}{2}} \nonumber \\&\quad \le \frac{Cn^{3/2} }{\epsilon } \left[ \max \limits _{ i }E\left( \frac{1}{N^4\pi _i^4}\right) \right] ^{\frac{1}{2}}\left[ p_n^3\max _m E\left( {x}_{im}^6\right) \right] ^{\frac{1}{2}} \nonumber \\&\quad \le O_P\left( \frac{p_n^{3/2}}{\sqrt{n}}\right) \nonumber \\&\quad =o_P(1), \end{aligned}$$

(A.1)

where the fourth line applies the fact $|\phi _{\tau }(\varepsilon _i)|\le |\varepsilon _i|$ and the conclusion $|||A_n V^{-1/2}|||=O(1)$, the sixth line is due to condition (C3)(iii), the seventh line holds by Cauchy-Schwarz inequality, the eighth line uses Loève’s $c_r$ inequality, the last inequality invokes tha conditions (C5) and (C3)(ii), the last equation uses the condition $p_n^3/n \rightarrow 0$. Next, we show $ |||A_nV^{-1/2} |||=O(1)$. For any $A_n$, by condition (C4)(ii), we have

$$\begin{aligned} |||A_nV^{-1/2}|||&=\left[ \lambda _{\max }\left( V^{-1/2}A_n^TA_n V^{-1/2}\right) \right] ^{1/2} \nonumber \\&\le \left[ tr(V^{-1/2}A_n^TA_n V^{-1/2})\right] ^{1/2} \nonumber \\&= \left[ tr(V^{-1}A_n^TA_n )\right] ^{1/2} \nonumber \\&\le \left[ \lambda _{\max }(V^{-1})tr(A_n^TA_n) \right] ^{1/2} \nonumber \\&=\left[ \lambda _{\max }(V^{-1}) \right] ^{1/2} \left[ tr(A_nA_n^T) \right] ^{1/2} \nonumber \\&=O(1), \end{aligned}$$

(A.2)

where the second inequality applies the conclusion $tr(UW)\le \lambda _{\max }(U)tr(W)$ for any symmetric matrix U and positive semidefinite matrix W (Bernstein 2005). Thus the condition of Lindeberg-Feller central limit theorem is satisfied. Note that

$$\begin{aligned}&\sum _{i=1}^{N}E\left( \varvec{\xi }_i\right) \begin{aligned}{}[t]&=A_nV^{-1/2}\sum _{i=1}^{N}E\left[ E\left( \varvec{\xi }_i|\mathcal {F}_N\right) \right] \\&=\sqrt{n}A_nV^{-1/2}E\left[ \phi _{\tau }(\varepsilon _i)\varvec{x}_i\right] =0, \end{aligned}\\&\sum _{i=1}^{N}Var(\varvec{\xi }_{i}) =\sum _{i=1}^{N}E\left( \varvec{\xi }_{i}\varvec{\xi }_{i}^T\right) =\sum _{i=1}^{N}E \left[ E\left( \varvec{\xi }_{i}\varvec{\xi }_{i}^T \Big |\mathcal {F}_N\right) \right] \nonumber \\&= A_nV^{-1/2} \cdot E\left[ \frac{1}{N}\sum _{i=1}^{N}\frac{n}{N\pi _i}\phi _{\tau }^2(\varepsilon _i)\varvec{x}_i\varvec{x}_i^T \right] \cdot V^{-1/2}A_n^T \\&\rightarrow A_nA_n^T \rightarrow A_0, \end{aligned}$$

where the last line invokes condition (C4)(ii). Then, the desired result holds by Lindeberg–Feller central limit theorem. $\square $

Lemma A.4

Let $Q_1(\Delta )=\sum _{i=1}^{N}\frac{\sqrt{n}\eta _i}{N\pi _i}\psi _{\tau }(\varepsilon _i)A_n \times V^{-1/2}\varvec{x}_i\varvec{x}_i^T\Delta $, where $\Delta =\varvec{\beta }-\varvec{\beta }_0$. Under the conditions (C3) and (C5), if $p_n^5/n \rightarrow 0$ we have

$$\begin{aligned} \sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}} \left\| Q_1(\Delta )-\sqrt{n}A_nV^{-1/2}D\Delta \right\| =o_{P}(1). \end{aligned}$$

Proof

Note that $Q_1(\Delta )=\sqrt{n}A_nV^{-1/2}\mathcal {H}({\varvec{\beta }} _0)\Delta $, where $\mathcal {H}({\varvec{\beta }} _0)=\frac{1}{N}\sum _{i=1}^{N}\frac{\eta _i}{\pi _i}\psi _{\tau }(\varepsilon _i)\varvec{x}_i\varvec{x}_i^T$, then

$$\begin{aligned}&\left\| Q_1(\Delta )-\sqrt{n}A_nV^{-1/2}D\Delta \right\| \nonumber \\&\quad \le \sqrt{n} |||A_nV^{-1/2}||||||\mathcal {H}({\varvec{\beta }} _0)-D|||\Vert \Delta \Vert . \end{aligned}$$

(A.3)

Denote ${\mathcal {D}^*} = \mathcal {H}({\varvec{\beta }} _0)-D=(\mathcal {D}_{lm}^*), 1\le l,m \le p_n$. Using the fact that $|||U |||\le q {\mathop {\max }\nolimits _{1 \le i,j \le q} |U_{ij}|}$ for a $q \times q$ matrix U, we have, $\forall \epsilon >0$,

$$\begin{aligned}&P\left\{ |||\mathcal {H}({\varvec{\beta }} _0)-D |||> \epsilon \right\} \nonumber \\&\quad \le P\left\{ \mathop {\max }\limits _{l,m} \left| {\mathcal {D}_{lm}^*} \right|> \frac{\epsilon }{p_n} \right\} \nonumber \\&\quad \le \sum \limits _{l = 1}^{p_n} \sum \limits _{m = 1}^{p_n} P\left\{ \left| \mathcal {D}_{lm}^* \right|> \frac{\epsilon }{p_n} \right\} \nonumber \\&\quad \le p_n^2\mathop {\max }\limits _{1\le l,m \le p_n} P\left\{ \left| \mathcal {D}_{lm}^* \right| > \frac{\epsilon }{p_n} \right\} \nonumber \\&\quad \le \frac{p_n^4}{\epsilon ^2} \mathop {\max } \limits _{1\le l,m \le p_n} E\left[ \left| \mathcal {D}_{lm}^* \right| ^2 \right] \nonumber \\&\quad = \frac{p_n^4}{\epsilon ^2N^2}\mathop {\max }\limits _{1\le l,m \le p_n} \sum \limits _{i = 1}^N E \Big \{ \frac{\eta _i}{\pi _i} \psi _{\tau }(\varepsilon _i){x}_{il}{x}_{im} \nonumber \\&\qquad -E\left[ \psi _{\tau }(\varepsilon _i){x}_{il}{x}_{im}\right] \Big \} ^2 \nonumber \\&\quad \le \frac{p_n^4}{\epsilon ^2N^2}\mathop {\max }\limits _{1\le l,m \le p_n} \sum \limits _{i = 1}^N E\left\{ \frac{\eta _i}{\pi _i} \psi _{\tau }(\varepsilon _i){x}_{il}{x}_{im}\right\} ^2 \nonumber \\&\quad = \frac{p_n^4}{\epsilon ^2N}\mathop {\max }\limits _{1\le l,m \le p_n} \sum \limits _{i = 1}^N E\left[ \frac{1}{N\pi _i} \psi _{\tau }^2(\varepsilon _i){x}_{il}^2{x}_{im}^2\right] \nonumber \\&\quad \le \frac{p_n^4}{\epsilon ^2N}\mathop {\max }\limits _{1\le l,m \le p_n} \sum \limits _{i = 1}^N E\left[ \frac{1}{N\pi _i} {x}_{il}^2{x}_{im}^2\right] \nonumber \\&\quad \le \frac{p_n^4}{\epsilon ^2N}\mathop {\max }\limits _{1\le l,m \le p_n} \sum \limits _{i = 1}^N \left[ E\left( \frac{1}{N^2\pi _i^2}\right) \right] ^{\frac{1}{2}}\nonumber \\&\qquad \times \left[ E\left( x_{il}^8\right) \right] ^{\frac{1}{4}} \left[ E\left( x_{im}^8\right) \right] ^{\frac{1}{4}} \nonumber \\&\quad \le \frac{Cp_n^4}{\epsilon ^{2}n} , \end{aligned}$$

(A.4)

where the second inequality applies Boole’s inequality, the fourth inequality uses Markov’s inequality, the first equation is duo to the fact $E(\mathcal {D}_{lm}^{*})=0$, the sixth inequality holds by the fact $\psi _{\tau }(\varepsilon _i) \le 1$, the seventh inequality uses Hölder’s inequality, and the last line invokes conditions (C3)(ii) and (C5). (A.4) implies

$$\begin{aligned} |||\mathcal {H}({\varvec{\beta }} _0)-D |||= {O_P}(p_n^2/{\sqrt{n}}). \end{aligned}$$

(A.5)

By (A.2), (A.3), (A.5) and the condition $p_n^5/n \rightarrow 0$, we have

$$\begin{aligned}&\sup _{\Vert \varvec{\beta }-\varvec{\beta }_0\Vert \le C\sqrt{p_n/n}} \left\| Q_1(\Delta )-\sqrt{n}A_nV^{-1/2}D\Delta \right\| \\&\quad \le \sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}} \sqrt{n} |||A_nV^{-1/2}||||||\mathcal {H}({\varvec{\beta }} _0)-D|||\\&\qquad \times \Vert \Delta \Vert \\&\quad \le O_P(p_n^{5/2}/\sqrt{n})=o_P(1). \end{aligned}$$

$\square $

Lemma A.5

Let $g_i(s)=(\varepsilon _i-s)[I(\varepsilon _i<0) -I(\varepsilon _i <s)]$, $Q_2(\Delta ) =\sum _{i=1}^{N}\frac{\sqrt{n}\eta _i}{N\pi _i}g_i\left( \varvec{x}_i^T\Delta \right) A_n V^{-1/2}\varvec{x}_i$, where $\Delta =\varvec{\beta }-\varvec{\beta }_0$. Under the conditions (C2)-(C6), if $p_n^4/n \rightarrow 0$, we have

$$\begin{aligned} \sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}} \left\| Q_2(\Delta )\right\| =o_{P}(1). \end{aligned}$$

Proof

Let $\varvec{b}_i=A_n V^{-1/2}\varvec{x}_i=\varvec{b}_i^{+}-\varvec{b}_i^{-}$, where ${b}_{ij}^{+}=\max \{ {b}_{ij},0\}$, ${b}_{ij}^{-}=\max \{ -{b}_{ij},0\}$, ${b}_{ij}^{+}, {b}_{ij}^{-}$, and ${b}_{ij}$ denote the jth component of $\varvec{b}_{ij}^{+}, \varvec{b}_{ij}^{-}$, and $\varvec{b}_{ij}$ respectively ($j=1,\ldots ,q$). Note that,

$$\begin{aligned}&\sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}} \left\| Q_2(\Delta )\right\| \\&\quad \le \sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}}\left\| Q_2(\Delta )-E\left[ Q_2(\Delta )\right] \right\| \\&\qquad + \sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}}\left\| E\left[ Q_2(\Delta )\right] \right\| \\&\quad =:J_1 +J_2. \end{aligned}$$

Firstly, we show $J_1=o_{P}(1)$. By the triangle inequality,

$$\begin{aligned} J_1&=\sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}}\left\| Q_2(\Delta )-E\left[ Q_2(\Delta )\right] \right\| \\&\le \sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}}\left\| Q_2^{+}(\Delta )-E\left[ Q_2^{+}(\Delta )\right] \right\| \\&~~~+ \sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}}\left\| Q_2^{-}(\Delta )-E\left[ Q_2^{-}(\Delta )\right] \right\| \\&=:J_{11}+J_{12}, \end{aligned}$$

where $Q_2^{+}(\Delta )=\sum _{i=1}^{N}\frac{\sqrt{n}\eta _i}{N\pi _i}g_i\left( \varvec{x}_i^T\Delta \right) \varvec{b}_i^{+}$ and $Q_2^{-}(\Delta )=\sum _{i=1}^{N}\frac{\sqrt{n}\eta _i}{N\pi _i}g_i\left( \varvec{x}_i^T\Delta \right) \varvec{b}_i^{-}$. It suffices to prove that $J_{1l}=o_{P}(1)$ for $ l=1,2$. Now, we only show $J_{11}=o_{P}(1)$ and the second term can be proved by the same token. Let $\mathbb {D} = \left\{ \Delta \in \mathbb {R}^{p_n} \Big | \Vert \Delta \Vert \le C\sqrt{p_n/n} \right\} $, then by selecting $N_0=n^{2p_n}$ grid points $\{ \Delta _t \}_{1}^{m}$, $\mathbb {D}$ can be covered by $\bigcup _{t=1}^{m}\mathbb {D}_t$, where $\mathbb {D}_t = \left\{ \Delta \in \mathbb {R}^{p_n} \Big | \Vert \Delta - \Delta _t\Vert _{\infty } \le \delta _n \right\} $ with $\delta _n= Cp_n^{1/2}n^{-5/2}$. Define $w_{it}(s)=g_i(\varvec{x}_i^T \Delta _t - s\Vert \varvec{x}_i\Vert )$. Note that $g_i(s)$ is monotone, by the triangle inequality, we have

$$\begin{aligned} J_{11}&=\sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}}\left\| Q_2^{+}(\Delta )-E\left[ Q_2^{+}(\Delta )\right] \right\| \\&\le \max _{1\le t \le N_0} \sup _{\Delta \in \mathbb {D}_t } \left\| Q_2^{+}(\Delta )-E\left[ Q_2^{+}(\Delta )\right] \right\| \\&\le \max _{1\le t \le N_0} \left\| Q_2^{+}(\Delta _t)-E\left[ Q_2^{+}(\Delta _t) \right] \right\| \\&\quad + \max _{1\le t \le N_0} \Bigg \Vert \frac{1}{N}\sum _{i=1}^{N}\Bigg \{E\left[ \frac{\sqrt{n}\eta _i}{\pi _i}w_{it}(\delta _n)\varvec{b}_i^{+}\right] \\&\quad -E\left[ \frac{\sqrt{n}\eta _i}{\pi _i}w_{it}(-\delta _n)\varvec{b}_i^{+}\right] \Bigg \} \Bigg \Vert \\&\quad +\max _{1\le t \le N_0} \Bigg \Vert \frac{1}{N} \sum _{i=1}^{N}\Bigg \{\frac{\sqrt{n}\eta _i}{\pi _i}[w_{it}(\delta _n)I(\varepsilon _i\ge 0)\\&\quad +w_{it}(-\delta _n)I(\varepsilon _i< 0) - w_{it}(0) ]\varvec{b}_i^{+}\\&\quad -E\Big [\frac{\sqrt{n}\eta _i}{\pi _i}[w_{it}(\delta _n)I(\varepsilon _i\ge 0)\\&\quad +w_{it}(-\delta _n)I(\varepsilon _i < 0) - w_{it}(0) ]\varvec{b}_i^{+}\Big ]\Bigg \} \Bigg \Vert \\&=: J_{111}+ J_{112}+ J_{113}. \end{aligned}$$

Next, we consider $J_{111}$. Let $d_i =\frac{\sqrt{n}\eta _i}{\pi _i} \varvec{x}_i^T\Delta _t \varvec{b}_i^{+}$, $\zeta _{it}=\frac{\sqrt{n}\eta _i}{\pi _i}g_i\left( \varvec{x}_i^T\Delta _t \right) \varvec{b}_i^{+}$, $\zeta _{it}^*=\zeta _{it} - E(\zeta _{it})$ and $e_n= Nn^{-1/2}p_n^{3/2}$. Then

$$\begin{aligned}&Q_2^{+}(\Delta _t)-E\left[ Q_2^{+}(\Delta _t) \right] \\&\quad =\frac{1}{N}\sum _{i=1}^{N}\Bigg \{\frac{\sqrt{n}\eta _i}{\pi _i}g_i\left( \varvec{x}_i^T\Delta _t \right) \varvec{b}_i^{+} \\&\qquad - E\left[ \frac{\sqrt{n}\eta _i}{\pi _i}g_i\left( \varvec{x}_i^T\Delta _t \right) \varvec{b}_i^{+}\right] \Bigg \} \\&\quad =\frac{1}{N}\sum _{i=1}^{N}\left[ \zeta _{it} - E(\zeta _{it}) \right] =\frac{1}{N}\sum _{i=1}^{N}\zeta _{it}^* \\&\quad = \frac{1}{N}\sum _{i=1}^{N}\zeta _{it}^* I(\Vert d_i \Vert \le e_{n}) + \frac{1}{N}\sum _{i=1}^{N}\zeta _{it}^* I(\Vert d_i \Vert> e_{n})\\&\quad = \frac{1}{N}\sum _{i=1}^{N}\left\{ \zeta _{it}^* I(\Vert d_i \Vert \le e_{n}) - E\left[ \zeta _{it}^* I(\Vert d_i \Vert \le e_{n})\right] \right\} \\&\qquad + \frac{1}{N}\sum _{i=1}^{N}E\left[ \zeta _{it}^* I(\Vert d_i \Vert \le e_{n})\right] \\&\qquad + \frac{1}{N}\sum _{i=1}^{N}\zeta _{it}^* I(\Vert d_i\Vert > e_{n})\\&\quad =: J_{1t}^{*} + J_{2t}^{*} +J_{3t}^{*}. \end{aligned}$$

It is sufficient to show $J_{111}=o_{P}(1)$ by demonstrating that $\max _{1\le t \le N_0}\Vert J_{lt}^{*}\Vert =o_{P}(1)$ for $l=1,2,3$. First, let’s consider the first item $J_{1t}^{*}$, note that $ E\Bigg \{\zeta _{it}^* I(\Vert d_i \Vert \le e_{n})-E\left[ \zeta _{it}^* I(\Vert d_i \Vert \le e_{n})\right] \Bigg \}=\varvec{0} $ and

$$\begin{aligned}&\quad \frac{1}{N}\sum _{i=1}^{N}E\left\| \zeta _{it}^* I(\Vert d_i \Vert \le e_{n})-E\left[ \zeta _{it}^* I(\Vert d_i \Vert \le e_{n})\right] \right\| ^2 \\&\quad \le \frac{1}{N}\sum _{i=1}^{N}E\left\| \zeta _{it}^{*2} I(\Vert d_i \Vert \le e_{n}) \right\| ^2 \\&\quad \le \frac{1}{N} \sum _{i=1}^{N}\Bigg \{E\left[ \Vert \zeta _{it}\Vert ^{2} I(\Vert d_i \Vert \le e_{n})\right] +2 E(\Vert \zeta _{it}\Vert )\\&\qquad \times E\left\| \zeta _{it} I(\Vert d_i \Vert \le e_{n})\right\| \\&\qquad + \left( E\Vert \zeta _{it}\Vert \right) ^2E\left[ I(\Vert d_i \Vert \le e_{n})\right] \Bigg \}, \end{aligned}$$

where

$$\begin{aligned}&\quad E\left\| \zeta _{it}^{2} I(\Vert d_i \Vert \le e_{n})\right\| \\&\quad = E\left[ \frac{n\eta _i^2}{\pi _i^2}g_i^2\left( \varvec{x}_i^T\Delta _t \right) \Vert \varvec{b}_i^{+}\Vert ^2I(\Vert d_i \Vert \le e_{n})\right] \\&\quad = E\left\{ \frac{n\eta _i^2}{\pi _i^2}\Vert \varvec{b}_i^{+}\Vert ^2I(\Vert d_i \Vert \le e_{n})E\left[ g_i^2(\varvec{x}_i^T \Delta _t) |\varvec{x}_i\right] \right\} \\&\quad = E\Bigg \{\frac{n\eta _i^2}{\pi _i^2}\Vert \varvec{b}_i^{+}\Vert ^2I(\Vert d_i \Vert \le e_{n}) \\&\qquad \times \int _{\varvec{x}_i^T \Delta _t}^{0} (\varepsilon _i-\varvec{x}_i^T \Delta _t)^2 f(\varepsilon _i|\varvec{x})d{\varepsilon _i} \Bigg \} \\&\quad \le {C} E\left[ \frac{n\eta _i^2}{\pi _i^2}|\varvec{x}_i^T \Delta _t|^3 \Vert \varvec{b}_i^{+}\Vert ^2I(\Vert d_i \Vert \le e_{n}) \right] \\&\quad = C E\left[ \Vert d_i\Vert ^{2}I(\Vert d_i \Vert \le e_{n}) |\varvec{x}_i^T \Delta _t| \right] \\&\quad \le C e_n^2 \left[ E(\varvec{x}_i^T \Delta _t)^2\right] ^{1/2} \\&\quad \le Ce_n^2 \left\{ \lambda _{\max }\left[ E\left( \varvec{x}_i \varvec{x}_i^T\right) \right] \Vert \Delta _t\Vert ^2\right\} ^{1/2}\\&\quad = O(e_n^2 \sqrt{p_n/n}), \end{aligned}$$

where the second inequality applies Jensen’s inequality, and the last line holds by condition (C4)(i). Similarly, we have $E(\Vert \zeta _{it}\Vert )=O(p_n^{3/2}/\sqrt{n})$ and $E\Vert \zeta _{it}I(\Vert d_i \Vert \le e_{n})\Vert =O(e_n\sqrt{p_n/n})$. Thus

$$\begin{aligned}&\quad \frac{1}{N}\sum _{i=1}^{N}E\left\| \zeta _{it}^* I(\Vert d_i \Vert \le e_{n})-E\left[ \zeta _{it}^* I(\Vert d_i \Vert \le e_{n})\right] \right\| ^2\\&\quad = O(N^{2}n^{-3/2}p_n^{7/2}). \end{aligned}$$

By the fact $|g_i(s)|\le |s|$,

$$\begin{aligned}&\left\| \zeta _{it}^* I(\Vert d_i \Vert \le e_{n})\right\| _{\infty }\\&\quad \le \left\| \zeta _{it} I(\Vert d_i \Vert \le e_{n})\right\| _{\infty }+\left\| E\left( \zeta _{it}\right) I(\Vert d_i \Vert \le e_{n})\right\| _{\infty }\\&\quad =\left\| \frac{\sqrt{n}\eta _i}{\pi _i}g_i\left( \varvec{x}_i^T\Delta _t \right) \varvec{b}_i^{+}I(\Vert d_i \Vert \le e_{n})\right\| _{\infty } \\&\qquad + \left\| E\left[ \frac{\sqrt{n}\eta _i}{\pi _i}g_i\left( \varvec{x}_i^T\Delta _t \right) \varvec{b}_i^{+}\right] I(\Vert d_i \Vert \le e_{n})\right\| _{\infty }\\&\quad \le \left\| \frac{\sqrt{n}\eta _i}{\pi _i}\varvec{x}_i^T\Delta _t \varvec{b}_i^{+}I(\Vert d_i \Vert \le e_{n})\right\| _{\infty } \\&\qquad + E\left\| \left[ \frac{\sqrt{n}\eta _i}{\pi _i}\varvec{x}_i^T\Delta _t \varvec{b}_i^{+}\right] I(\Vert d_i \Vert \le e_{n})\right\| _{\infty }\\&\quad =\left\| d_i I(\Vert d_i \Vert \le e_{n})\right\| _{\infty } + \left\| E(d_i)I(\Vert d_i \Vert \le e_{n})\right\| _{\infty }\\&\quad \le \left\| d_i I(\Vert d_i \Vert \le e_{n})\right\| + \left\| E(d_i)I(\Vert d_i \Vert \le e_{n})\right\| \le 2e_n. \end{aligned}$$

Then

$$\begin{aligned}&\left\| \zeta _{it}^* I(\Vert d_i \Vert \le e_{n})-E\left[ \zeta _{it}^* I(\Vert d_i \Vert \le e_{n})\right] \right\| _{\infty } \\&\quad \le \left\| \zeta _{it}^* I(\Vert d_i \Vert \le e_{n})\right\| _{\infty }+\left\| E\left[ \zeta _{it}^* I(\Vert d_i \Vert \le e_{n})\right] \right\| _{\infty } \\&\quad \le 4Nn^{-1/2}p_n^{3/2}. \end{aligned}$$

Thus, applying Boole’s and Bernstein’s inequalities (Serfling 1980), $\forall \varepsilon >0$, we have

$$\begin{aligned}&P\left\{ \max _{1\le t \le N_0}\Vert J_{1t}^{*}\Vert \ge \epsilon \right\} \\&\quad \le q N_0 \max _{1\le t \le N_0}P\Bigg \{\Bigg \Vert \frac{1}{N}\sum _{i=1}^{N}\{\zeta _{it}^* I(\Vert d_i \Vert \le e_{n})\\&\qquad - E\left[ \zeta _{it}^* I(\Vert d_i \Vert \le e_{n})\right] \} \Bigg \Vert \ge \epsilon \Bigg \} \\&\quad \le 2qN_0 \exp \left( \frac{-N\epsilon ^2}{2CN^{2} n^{-3/2} p_n^{7/2} + 2C \epsilon Nn^{-1/2}p_n^{3/2}/3 } \right) \\&\quad \le 2q\exp (2p_n \log n) \cdot \exp (-3p_n \log n) =o(1), \end{aligned}$$

where the last inequality is duo to $n^{1/2}p_n^{-3/2} \gg n^{3/2}N^{-1}p_n^{-7/2}$ and $n^{3/2}N^{-1}p_n^{-7/2} \gg p_n\log n $ by condition (C6). This implies

$$\begin{aligned} \max _{1\le t \le m}\Vert J_{1t}^{*}\Vert =o_{P}(1). \end{aligned}$$

(A.6)

Note that

$$\begin{aligned}&E\left( \left\| \zeta _{it}\right\| ^2\right) \nonumber \\&\quad = E\left[ \frac{n\eta _i^2}{\pi _i^2}g_i^2\left( \varvec{x}_i^T\Delta _t \right) \Vert \varvec{b}_i^{+}\Vert ^2\right] \nonumber \\&\quad = E\left[ \frac{n}{\pi _i}g_i^2\left( \varvec{x}_i^T\Delta _t \right) \Vert \varvec{b}_i^{+}\Vert ^2\right] \nonumber \\&\quad = E\left\{ \frac{n}{\pi _i}\Vert \varvec{b}_i^{+}\Vert ^2E\left[ g_i^2(\varvec{x}_i^T \Delta _t) |\varvec{x}_i\right] \right\} \nonumber \\&\quad = E\left\{ \frac{n}{\pi _i}\Vert \varvec{b}_i^{+}\Vert ^2 \int _{\varvec{x}_i^T \Delta _t}^{0} (\varepsilon _i-\varvec{x}_i^T \Delta _t)^2 f(\varepsilon _i|\varvec{x})d{\varepsilon _i} \right\} \nonumber \\&\quad \le {C} E\left[ \frac{n}{\pi _i}|\varvec{x}_i^T \Delta _t|^3 \Vert A_nV^{-1/2}\varvec{x}_i\Vert ^2 \right] \nonumber \\&\quad \le C Nn\Vert \Delta _t\Vert ^3|||A_nV^{-1/2}|||^2 E\left[ \frac{1}{N\pi _i} \Vert \varvec{x}_i\Vert ^5 \right] \nonumber \\&\quad \le \frac{CNp_n^{3/2}}{\sqrt{n}} \left[ E\left( \frac{1}{N^2\pi _i^2}\right) \right] ^{\frac{1}{2}} \left[ E\left( \Vert \varvec{x}_i\Vert ^{10}\right) \right] ^{\frac{1}{2}} \nonumber \\&\quad \le \frac{CNp_n^{3/2}}{\sqrt{n}} \left[ \max _i E\left( \frac{1}{N^2\pi _i^2}\right) \right] ^{\frac{1}{2}} \left[ p_n^5 \max _mE\left( {x}_{im}^{10}\right) \right] ^{\frac{1}{2}} \nonumber \\&\quad = O(Nn^{-3/2}p_n^{4}), \end{aligned}$$

(A.7)

where the first inequality invokes the condition (C2), the third inequality is duo (A.2) and Cauchy-Schwarz inequality, the last inequality holds by Loève’s $c_r$ inequality, the last line invokes conditions (C5) and (C3)(ii). Similarly, we have

$$\begin{aligned}&E(\Vert d_i\Vert ^2) \nonumber \\&\quad = E\left[ \frac{n\eta _i^2}{\pi _i^2}(\varvec{x}_i^T \Delta _t)^2\Vert A_nV^{-1/2}\varvec{x}_i\Vert ^2\right] \nonumber \\&\quad = E\left[ \frac{n}{\pi _i}(\varvec{x}_i^T \Delta _t)^2\Vert A_nV^{-1/2}\varvec{x}_i\Vert ^2\right] \nonumber \\&\quad \le Nn |||A_nV^{-1/2} |||^2\Vert \Delta _t\Vert ^2 E\left[ \frac{1}{N\pi _i}\Vert \varvec{x}_i\Vert ^4\right] \nonumber \\&\quad \le CNp_n \left[ E\left( \frac{1}{N^2\pi _i^2}\right) \right] ^{\frac{1}{2}}\left[ E\left( \Vert \varvec{x}_i\Vert ^{8}\right) \right] ^{\frac{1}{2}} \nonumber \\&\quad = O(Nn^{-1}p_n^{3}). \end{aligned}$$

(A.8)

Thus, by (A.7) and (A.8)

$$\begin{aligned}&\Vert J_{2t}^{*}\Vert \le \frac{1}{N}\sum _{i=1}^{N}\left\| E\left[ \zeta _{it}^*I(\Vert d_i \Vert \le e_{n}) \right] \right\| \nonumber \\&\quad \le \frac{1}{N}\sum _{i=1}^{N}E\left\| \left[ \zeta _{it}^*I(\Vert d_i \Vert \le e_{n}) \right] \right\| \nonumber \\&\quad \le \frac{1}{N}\sum _{i=1}^{N}\left[ E\left( \Vert \zeta _{it}^*\Vert ^{2}\right) \right] ^{\frac{1}{2}} \left\{ E\left[ I(\Vert d_i\Vert \le e_n)\right] \right\} ^{\frac{1}{2}} \nonumber \\ {}&=\frac{1}{N}\sum _{i=1}^{N}\left[ E\left( \Vert \zeta _{it}^*\Vert ^{2}\right) \right] ^{\frac{1}{2}} \left[ P\left( \Vert d_i\Vert \le e_n \right) \right] ^{\frac{1}{2}} \nonumber \\&\quad \le \frac{1}{N}\sum _{i=1}^{N} \left[ E\left( \Vert \zeta _{it}\Vert ^{2}\right) \right] ^{\frac{1}{2}} \left[ e_n^{-2}E(\Vert d_i\Vert ^2)\right] ^{\frac{1}{2}}\nonumber \\ {}&=O(p_n^{2}/n^{3/4})\nonumber \\&\quad =o(1), \end{aligned}$$

(A.9)

where the second line applies Cauchy-Schwarz inequality, the last line holds by condition $p_n^4/n \rightarrow 0$. For $J_{3t}^{*}$, let $d_i^{*}=n^{1/2}N^{-1/2}p_n^{-3/2}d_i$, then $E\Vert d_i^{*}\Vert ^2=O(1)$ by (A.8). By the monotone probability and Boole’s inequalities and Lebesgue dominated convergence theorem, we can derive that

$$\begin{aligned}&P\left\{ \max _{1\le t \le N_0}\Vert J_{3t}^{*}\Vert \ge \epsilon \right\} \\&\quad \le P\left\{ \max _{1\le i \le N} \Vert d_i\Vert> e_n \right\} \le \sum _{i=1}^{N} P\left\{ \Vert d_i\Vert> e_n \right\} \nonumber \\&\quad \le \sum _{i=1}^{N} P\left\{ \Vert d_i^*\Vert ^2> nN^{-1}p_n^{-3}e_n^2 \right\} \nonumber \\&\quad = \sum _{i=1}^{N} E\left[ I\left( \Vert d_i^*\Vert ^2>nN^{-1}p_n^{-3}e_n^2\right) \right] \nonumber \\&\quad \le \frac{N^2p_n^3}{ne_n^2} E\left[ \Vert d_i^*\Vert ^{2}I\left( \Vert d_i^*\Vert ^2>nN^{-1}p_n^{-3}e_n^2\right) \right] \\&\quad =o(1), \end{aligned}$$

which results in

$$\begin{aligned} \max _{1\le t \le N_0}\Vert J_{3t}^{*}\Vert =o_{P}(1). \end{aligned}$$

(A.10)

(A.6), (A.9) and (A.10) imply $J_{111}=o_{P}(1)$.

For $J_{112}$, we have

$$\begin{aligned}&~~~J_{112}\\ {}&\le \max _{1\le t \le N_0} \frac{1}{N}\sum _{i=1}^{N} E\left\| \frac{\sqrt{n}\eta _i}{\pi _i} \left[ w_{it}(\delta _n)-w_{it}(-\delta _n)\right] \varvec{b}_i^{+} \right\| \\&\le \sqrt{n} \max _{1\le t \le N_0} \frac{1}{N}\sum _{i=1}^{N} E\left\| \left[ w_{it}(\delta _n)-w_{it}(-\delta _n)\right] \varvec{b}_i^{+} \right\| \\&\le \sqrt{n} \max _{1\le t \le N_0} \frac{1}{N}\sum _{i=1}^{N} E\Big \Vert [g_i(\varvec{x}_i^T \Delta _t - \delta _n\Vert \varvec{x}_i\Vert )\\&~~~-g_i(\varvec{x}_i^T \Delta _t + \delta _n\Vert \varvec{x}_i\Vert ) ]\varvec{b}_i^{+}\Big \Vert \\&\le 2\sqrt{n} \delta _n E \left( \Vert \varvec{x}_i\Vert \Vert \varvec{b}_i^{+}\Vert \right) \\&\le 2\sqrt{n} \delta _n |||A_n V^{-1/2}|||E \left[ \Vert \varvec{x}_i\Vert ^2\right] \\&=O(p_n^{3/2}n^{-2}) \\&=o(1), \end{aligned}$$

where the first equation holds by (A.2), condition (C3)(ii) and Loève’s $c_r$ inequality.

Adoption of similar discussions, we can obtain $J_{113}=o_{P}(1)$. Thus $J_{1}=o_{P}(1)$.

Finally, we show $J_{2}=o_{p}(1)$. Note that

$$\begin{aligned} J_2&= \sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}}\left\| E\left[ Q_2(\Delta )\right] \right\| \\&= \sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}} \left\| \frac{\sqrt{n}}{N} \sum _{i=1}^{N} E\left[ \frac{\eta _i}{\pi _i}g_i\left( \varvec{x}_i^T\Delta \right) \varvec{b}_i\right] \right\| \\&= \sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}} \left\| \frac{\sqrt{n}}{N} \sum _{i=1}^{N} E\left[ g_i\left( \varvec{x}_i^T\Delta \right) \varvec{b}_i\right] \right\| \\&= \sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}} \left\| \frac{\sqrt{n}}{N} \sum _{i=1}^{N} E\left\{ \varvec{b}_iE\left[ g_i(\varvec{x}_i^T \Delta _t) |\varvec{x}_i\right] \right\} \right\| \\&= \sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}} \Bigg \Vert \frac{\sqrt{n}}{N} \sum _{i=1}^{N} E\Bigg \{\varvec{b}_i \int _{\varvec{x}_i^T \Delta _t}^{0} (\varepsilon _i-\varvec{x}_i^T \Delta _t)^2\\&\quad \times f(\varepsilon _i|\varvec{x})d{\varepsilon _i} \Bigg \} \Bigg \Vert \nonumber \\&\le C\sqrt{n} \sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}} \left\| E\left[ (\varvec{x}_i^T \Delta )^2 A_n V^{-1/2} \varvec{x}_i\right] \right\| \\&\le C\sqrt{n} \sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}} E\left\| (\varvec{x}_i^T \Delta )^2 A_n V^{-1/2} \varvec{x}_i \right\| \\&\le C\sqrt{n} |||A_n V^{-1/2}|||\sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}} \Vert \Delta \Vert \\&\quad \times E\left[ |\varvec{x}_i^T \Delta |\Vert \varvec{x}_i\Vert ^2 \right] \\&\le C\sqrt{p_n} \sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}} \left[ E(\varvec{x}_i^T \Delta )^2\right] ^{\frac{1}{2}} \left[ E\left( \Vert \varvec{x}_i\Vert ^4\right) \right] ^{\frac{1}{2}} \\&\le C\sqrt{p_n} \sup _{\Vert \Delta \Vert \le C\sqrt{p_n/n}} \left\{ \lambda _{\max }\left[ E\left( \varvec{x}_i \varvec{x}_i^T\right) \right] \Vert \Delta \Vert ^2\right\} ^{\frac{1}{2}}\\&\quad \times \left[ p_n^2 \max \limits _m E(x_{im}^{4})\right] ^{\frac{1}{2}} \\&= O(p_n^{2}/\sqrt{n})\\&=o(1), \end{aligned}$$

where the last line invokes condition $p_n^4/n \rightarrow 0$.

The proof of the Lemma A.5 is completed. $\square $

Proof of Theorem 1

Let $\mathscr {B}=\{\varvec{\beta }: \varvec{\beta }=\varvec{\beta }_0+\varvec{u} a_n, \varvec{u} \in \mathbb {R}^{p_n}, \Vert \varvec{u}\Vert \le c\}$ for $a_n=\sqrt{{p_n/}{n}}$ and some constant c. By Fan and Li (2001), it is sufficient to demonstrate that for $\forall \epsilon >0$, there exists a sufficiently large constant c, such that

$$\begin{aligned} P\left\{ \inf _{\Vert \varvec{u}\Vert =c} {\mathcal {L}}\left( \varvec{\beta }_0+{\varvec{u} }a_n\right)>{\mathcal {L}} \left( \varvec{\beta }_0\right) \right\} >1-\epsilon \end{aligned}$$

(A.11)

for large enough n. This implies that there is a local minimizer $\widetilde{\varvec{\beta }}_{\mathcal {S}}$ of ${\mathcal {L}}(\varvec{\beta })$ in $\mathscr {B}$ satisfies $\Vert \widetilde{\varvec{\beta }}_{\mathcal {S}}- {\varvec{\beta }_0}\Vert =O_{p}(a_n)$. The local minimizer is the global minimizer because of the convexity of ${\mathcal {L}}(\varvec{\beta })$. By Lemma A.1 (i), we have

$$\begin{aligned}&{\mathcal {L}}\left( {\varvec{\beta }}_{0}+{\varvec{u}}a_n\right) - {\mathcal {L}}\left( \varvec{\beta }_0\right) \nonumber \\&\quad =\frac{1}{N}\sum _{i=1}^{N}\frac{\eta _i}{\pi _i}\left[ \ell _{\tau }(\varepsilon _i-a_n\varvec{x_i}^T\varvec{u})-\ell _{\tau }(\varepsilon _i)\right] \nonumber \\&\quad =\frac{1}{N}\sum _{i=1}^{N}\frac{\eta _i}{\pi _i}\Big \{-2a_n\phi _{\tau }(\varepsilon _i)\varvec{x_i}^T\varvec{u} + a_n^2\psi _{\tau }(\varepsilon _i)(\varvec{x_i}^T\varvec{u})^2 \nonumber \\&\qquad + (2\tau -1)(\varepsilon _i-a_n\varvec{x_i}^T\varvec{u})^2\Gamma (\varepsilon _i,-a_n\varvec{x_i}^T\varvec{u}) \Big \} \nonumber \\&=\frac{-2a_n}{N}\sum _{i=1}^{N}\frac{\eta _i}{\pi _i}\phi _{\tau }(\varepsilon _i)\varvec{x_i}^T\varvec{u} \nonumber \\&\qquad + \frac{a_n^2}{N}\sum _{i=1}^{N}\frac{\eta _i}{\pi _i}\psi _{\tau }(\varepsilon _i)\varvec{u}^T\varvec{x_i}\varvec{x_i}^T\varvec{u} \nonumber \\&\qquad + \sum _{i=1}^{N}\frac{\eta _i}{N\pi _i}(2\tau -1)(\varepsilon _i-a_n\varvec{x_i}^T\varvec{u})^2\Gamma (\varepsilon _i,-a_n\varvec{x_i}^T\varvec{u}) \nonumber \\&\quad =:I_1+I_2+I_3. \end{aligned}$$

(A.12)

Firstly, we consider the term $I_1$. Note that

$$\begin{aligned} E(I_1^2)&=\frac{4a_n^2}{N^2}E\left[ \sum _{i=1}^{N}\frac{\eta _i}{\pi _i}\phi _{\tau }(\varepsilon _i)\varvec{x_i}^T\varvec{u}\right] ^2 \nonumber \\&=\frac{4a_n^2}{N^2}\sum _{i=1}^{N}E\left[ \frac{\eta _i^2}{\pi _i}\phi _{\tau }^2(\varepsilon _i)(\varvec{x_i}^T\varvec{u})^2\right] \nonumber \\&=\frac{4a_n^2}{N^2}\sum _{i=1}^{N} E\left\{ E\left[ \frac{\eta _i^2}{\pi _i}\phi _{\tau }^2(\varepsilon _i)(\varvec{x_i}^T\varvec{u})^2\Bigg |\mathcal {F}_N\right] \right\} \nonumber \\&=\frac{4a_n^2}{N}\sum _{i=1}^{N} E\left[ \frac{1}{N\pi _i}\phi _{\tau }^2(\varepsilon _i)(\varvec{x_i}^T\varvec{u})^2 \right] \nonumber \\&\le \frac{4a_n^2}{N}\sum _{i=1}^{N} E\left[ \frac{1}{N\pi _i}\varepsilon _i^2(\varvec{x_i}^T\varvec{u})^2 \right] \nonumber \\&= \frac{4a_n^2}{N}\sum _{i=1}^{N} E\left[ \frac{1}{N\pi _i}(\varvec{x_i}^T\varvec{u})^2 E\left( \varepsilon _i^2|\varvec{x}_i\right) \right] \nonumber \\&= \frac{4Ca_n^2\Vert \varvec{u}\Vert ^2}{N}\sum _{i=1}^{N} E\left[ \frac{1}{N\pi _i}\Vert \varvec{x_i}\Vert ^2 \right] \nonumber \\&\le \frac{4Ca_n^2\Vert \varvec{u}\Vert ^2}{N}\sum _{i=1}^{N} \left[ E\left( \frac{1}{N^2\pi _i^2}\right) \right] ^{\frac{1}{2}}\left[ E(\Vert \varvec{x_i}\Vert ^4\right] ^{\frac{1}{2}} \nonumber \\&\le 4Ca_n^2 \Vert \varvec{u}\Vert ^2\left[ \max \limits _{1\le i \le N}E\left( \frac{1}{N^2\pi _i^2}\right) \right] ^{\frac{1}{2}} \nonumber \\&\quad \times \left[ p_n^2 \max \limits _m E({x}_{im}^4) \right] ^{\frac{1}{2}} \nonumber \\&=O(a_n^4\Vert \varvec{u}\Vert ^2), \end{aligned}$$

(A.13)

where the second equation is duo to the fact $E\left[ \frac{\eta _i}{\pi _i}\phi _{\tau }(\varepsilon _i)\varvec{x_i}^T\varvec{u}\right] =0$, the first inequality uses the fact $|\phi _{\tau }(\varepsilon _i)|\le |\varepsilon _i|$, the fourth line applies condition (C3)(iii) and Cauchy-Schwarz inequality, the last inequality holds by Loève’s $c_r$ inequality, the last line invokes conditions (C5) and (C3)(ii). Then, by Markov’s inequality and (A.13), we have

$$\begin{aligned} |I_1|=O_P(a_n^2\Vert \varvec{u}\Vert ). \end{aligned}$$

(A.14)

The second term of (A.12), $I_2$, can be decomposed as follows

$$\begin{aligned} I_2=\frac{a_n^2}{2} \varvec{u}^T D \varvec{u} +\frac{a_n^2}{2} \varvec{u}^T [\mathcal {H}({\varvec{\beta }} _0)-D]\varvec{u}, \end{aligned}$$

(A.15)

where $\mathcal {H}({\varvec{\beta }} _0)=\frac{1}{N}\sum _{i=1}^{N}\frac{\eta _i}{\pi _i}\psi _{\tau }(\varepsilon _i)\varvec{x}_i\varvec{x}_i^T$. Combining (A.5), (A.15), conditions (C4)(i) and $p_n^4/n \rightarrow 0$, we have

$$\begin{aligned} I_{2} \ge \frac{c_1}{2}a_n^2\Vert \varvec{u}\Vert ^2 + o_P(1)a_n^2 \Vert \varvec{u}\Vert ^2. \end{aligned}$$

(A.16)

For $I_3$, by Lemma A.2,

$$\begin{aligned} E(I_3)&=E\left[ \frac{\eta _i}{\pi _i}(2\tau -1)(\varepsilon _i-a_n\varvec{x_i}^T\varvec{u})^2\Gamma (\varepsilon _i,-a_n\varvec{x_i}^T\varvec{u})\right] \nonumber \\&=(2\tau -1)E\left[ (\varepsilon _i-a_n\varvec{x_i}^T\varvec{u})^2\Gamma (\varepsilon _i,-a_n\varvec{x_i}^T\varvec{u})\right] \nonumber \\&=O\left( a_n^3p_n \Vert \varvec{u}\Vert ^3\right) , \end{aligned}$$

(A.17)

and

$$\begin{aligned} E(I_3^2)&=CE\left[ \sum _{i=1}^{N}\frac{\eta _i}{N\pi _i}(\varepsilon _i-a_n\varvec{x_i}^T\varvec{u})^2\Gamma (\varepsilon _i,-a_n\varvec{x_i}^T\varvec{u})\right] ^2 \nonumber \\&=\frac{C}{N^2}\sum _{i=1}^{N}E\left[ \frac{\eta _i^2}{\pi _i^2}(\varepsilon _i-a_n\varvec{x_i}^T\varvec{u})^4\Gamma ^2(\varepsilon _i,-a_n\varvec{x_i}^T\varvec{u})\right] \nonumber \\&\quad + \frac{C}{N^2}\sum _{i=1}^{N}\sum _{j \ne i }E\Bigg [\frac{\eta _i}{\pi _i}(\varepsilon _i-a_n\varvec{x_i}^T\varvec{u})^2 \nonumber \\&\quad \times \Gamma (\varepsilon _i,-a_n\varvec{x_i}^T\varvec{u})\Bigg ]\nonumber \\&\quad \times E\left[ \frac{\eta _j}{\pi _j}(\varepsilon _j-a_n\varvec{x_j}^T\varvec{u})^2\Gamma (\varepsilon _j,-a_n\varvec{x_j}^T\varvec{u})\right] \nonumber \\&\le \frac{C}{N^2}\sum _{i=1}^{N}E\left[ \frac{\eta _i^2}{\pi _i^2}(\varepsilon _i-a_n\varvec{x_i}^T\varvec{u})^4\Gamma ^2(\varepsilon _i,-a_n\varvec{x_i}^T\varvec{u})\right] \nonumber \\&\quad + \left[ E(I_3)\right] ^2 \nonumber \\&=\frac{C}{N}\sum _{i=1}^{N}E\left[ \frac{1}{N\pi _i}(\varepsilon _i-a_n\varvec{x_i}^T\varvec{u})^4\Gamma ^2(\varepsilon _i,-a_n\varvec{x_i}^T\varvec{u})\right] \nonumber \\&\quad + \left[ E(I_3)\right] ^2 \nonumber \\&=\frac{C}{N}\sum _{i=1}^{N}\left[ E\left( \frac{1}{N^2\pi _i^2}\right) \right] ^{\frac{1}{2}} \nonumber \\&\quad \times \left\{ E\left[ (\varepsilon _i-a_n\varvec{x_i}^T\varvec{u})^8\Gamma ^4(\varepsilon _i,-a_n\varvec{x_i}^T\varvec{u})\right] \right\} ^{\frac{1}{2}} \nonumber \\&\quad + \left[ E(I_3)\right] ^2 \nonumber \\&\le C\left[ \max \limits _{1\le i \le N}E\left( \frac{1}{N^2\pi _i^2}\right) \right] ^{\frac{1}{2}}\nonumber \\&\quad \times \left\{ E\left[ (\varepsilon _i-a_n\varvec{x_i}^T\varvec{u})^8\Gamma (\varepsilon _i,-a_n\varvec{x_i}^T\varvec{u})\right] \right\} ^{\frac{1}{2}} \nonumber \\&\quad + \left[ E(I_3)\right] ^2 \nonumber \\&=O\left( n^{-1}a_n^{9/2}p_n^{2}\Vert \varvec{u}\Vert ^{9/2}\right) + O\left( a_n^6p_n^{2} \Vert \varvec{u}\Vert ^6\right) \nonumber \\&=O\left( a_n^6p_n^{2} \Vert \varvec{u}\Vert ^6\right) , \end{aligned}$$

(A.18)

where $C=(2\tau -1)^2$. Thus

$$\begin{aligned} I_{3}=O_P(a_n^3p_n \Vert \varvec{u}\Vert ^2)=o_P(1)a_n^2 \Vert \varvec{u}\Vert ^2 \end{aligned}$$

(A.19)

by Chebyshev’s inequality and the condition $p_n^4/n\rightarrow 0$. Combining (A.14), (A.16) and (A.19), the second term $I_2$, which is positive, dominates other two terms for a sufficiently large c with probability approaching one. Thus, (A.11) holds.

Proof of Theorem 2

Let $\Delta =\varvec{\beta }- \varvec{\beta }_0 $ and $\widetilde{\Delta }=\widetilde{\varvec{\beta }}_{\mathcal {S}}- \varvec{\beta }_0 $. By Lemma A.1 (ii),

$$\begin{aligned}&\sqrt{n}A_n V^{-1/2} \left[ Q(\varvec{\beta })-Q(\varvec{\beta }_0)\right] \\&\quad =\sum _{i=1}^{N}\frac{\sqrt{n}\eta _i}{N \pi _i} \left[ \phi _{\tau } \left( \varepsilon _i-\varvec{x}_i^T\Delta \right) - \phi _{\tau }(\varepsilon _i) \right] A_n V^{-1/2}\varvec{x}_i \\&\quad =\sum _{i=1}^{N}\frac{\sqrt{n}\eta _i}{N \pi _i} \{-\varvec{x}_i^T\Delta \psi _{\tau }(\varepsilon _i) + (2\tau - 1) \left( \varepsilon _i-\varvec{x}_i^T\Delta \right) \\&\qquad \times \Gamma (\varepsilon _i,-\varvec{x}_i^T\Delta ) \} A_n V^{-1/2}\varvec{x}_i \\&\quad =-\sum _{i=1}^{N}\frac{\sqrt{n}\eta _i}{N\pi _i}\psi _{\tau }(\varepsilon _i)A_nV^{-1/2}\varvec{x}_i\varvec{x}_i^T\Delta \\&\qquad +(2\tau - 1) \sum _{i=1}^{N}\frac{\sqrt{n}\eta _i}{N\pi _i}\left( \varepsilon _i-\varvec{x}_i^T\Delta \right) \Gamma (\varepsilon _i,-\varvec{x}_i^T\Delta )\\&\qquad \times A_n V^{-1/2}\varvec{x}_i \\&\quad =-Q_1(\Delta ) + (2\tau - 1) Q_2(\Delta ). \end{aligned}$$

Then, we have $\sqrt{n}A_n V^{-1/2} \left[ Q(\widetilde{\varvec{\beta }}_{\mathcal {S}})-Q(\varvec{\beta }_0) \right] =-Q_1(\widetilde{\Delta }) + (2\tau - 1) Q_2(\widetilde{\Delta })$. By Lemma A.4–A.5 and the fact $Q(\widetilde{\varvec{\beta }}_{\mathcal {S}})=0$,

$$\begin{aligned}&\sqrt{n}A_n V^{-1/2} Q(\varvec{\beta }_0)\\&\quad =Q_1(\widetilde{\Delta }) + (1-2\tau ) Q_2(\widetilde{\Delta })\\&\quad =Q_1(\widetilde{\Delta })-\sqrt{n}A_n V^{-1/2}D\widetilde{\Delta } + (1-2\tau ) Q_2(\widetilde{\Delta }) \\&\qquad + \sqrt{n}A_n V^{-1/2}D(\widetilde{\varvec{\beta }}_{\mathcal {S}}-\varvec{\beta }_0) \\&\quad =\sqrt{n}A_n V^{-1/2}D(\widetilde{\varvec{\beta }}_{\mathcal {S}}-\varvec{\beta }_0) +o_{P}(1). \end{aligned}$$

Then the desired result holds by Lemma A.3 and Slutsky’s theorem.

Table 1 Mean squared error (MSE) for ALS-LP estimator versus different H values with $p_n=60$ and $\widehat{\varvec{\beta }}$ replaced by $\widehat{\varvec{\beta }}_{\mathcal {P}}$ in Experiment 1, where $H=E$ denotes ALS-LP with $H=E$ and E represents the exact H value, $H=\infty $ denotes ALS-LP with $H=\infty $

Full size table

Table 2 Empirical coverage probabilities and average lengths (in the parentheses) of 95% confidence intervals for the second component $(\Lambda _n{\beta }_{t})_2$ with $p_n=20$ and $\Lambda _n=I_{p_n\times p_n}$ in Experiment 1

Full size table

Table 3 Variable description for the Bei**g multi-site air-quality dataset

Full size table

Proof of Theorem 5

Without loss of generality, we assume $h_i>0, i=1,\ldots ,N, h_{(N+1)}=+\infty $ and $h_1 \le h_2 \le \dots \le h_N$. According to the L-optimal criterion, minimizing the empirical AMSE of $\Lambda _n\widetilde{\varvec{\beta }}$ is equivalent to minimizing $tr(\Lambda _nD_N^{-1}V_ND_N^{-1}\Lambda _n)$. Thus the optimization problem can be described as follows:

$$\begin{aligned} \begin{aligned} \min G (\varvec{p})&:= \min \Bigg \{ tr\Bigg [\frac{n}{N^2}\sum \limits _{i=1}^{N}\frac{1}{\pi _i}\phi _{\tau }^2(\varepsilon _i)\Lambda _nD_N^{-1}\\&\quad \times ~\varvec{x}_i\varvec{x}_i^TD_N^{-1}\Lambda _n^{T}\Bigg ] \Bigg \},\\&s.t. \sum \limits _{i=1}^{N}\pi _i=n,0\le \pi _i \le 1,i=1,\ldots ,N. \end{aligned} \end{aligned}$$

(A.20)

By Cauchy–Schwarz inequality,

$$\begin{aligned} G(\varvec{p})&= tr \left[ \frac{n}{N^2}\sum \limits _{i=1}^{N}\frac{1}{\pi _i}\phi _{\tau }^2(\varepsilon _i)\Lambda _nD_N^{-1}\varvec{x}_i\varvec{x}_i^TD_N^{-1}\Lambda _n^{T}\right] \nonumber \\&=\frac{n}{N^2}\sum \limits _{i=1}^{N}\frac{1}{\pi _i}\phi _{\tau }^2(\varepsilon _i)\Vert \Lambda _nD_N^{-1}\varvec{x}_i\Vert ^2 \nonumber \\&=\frac{n}{N^2}\sum \limits _{i=1}^{N}\frac{1}{\pi _i}(h_i)^2 \nonumber \\&=\frac{n}{N^2}\frac{1}{n}\left( \sum \limits _{i=1}^{N}\pi _i\right) \left( \sum \limits _{i=1}^{N}\frac{1}{\pi _i}(h_i)^2\right) \nonumber \\&\ge \frac{1}{N^2}\left( \sum \limits _{i=1}^{N}h_i\right) ^2, \end{aligned}$$

(A.21)

where the equation in the last line holds if and only if $\pi _i\propto h_i$, that is to say, when $\pi _i\propto h_i$, $tr(\Lambda _nD_N^{-1}V_ND_N^{-1}\Lambda _n)$ attains the minimum. Note that all $\pi _i$s need to satisfy $0\le \pi _i \le 1$. We consider two scenarios:

(1) If $nh_i/(\sum _{j=1}^{N}h_j)\le 1$ for $i=i,\ldots ,N$, then $\pi _i^{opt}= \frac{nh_i }{\sum _{j=1}^{N}(h_i)}$.

(2) If there are some is that make $nh_i/(\sum _{i=j}^{N}h_j)> 1$, then there must be k of those is. To be specific, by the definition of k, $kh_i>\sum _{i=1}^{N}h_i=\sum _{j=1}^{N-k}h_j+ \sum _{j=N-k+1}^{N}h_j > (n-k)h_{N-k}+kh_{N-k} = nh_{N-k}$, then $h_i>h_{N-k}$, which yields $i>N-k$. In this case, the original optimization problem (A.20) is equivalent to

$$\begin{aligned} \begin{aligned}&\min \left\{ \frac{n}{N^2}\sum \limits _{i=1}^{N-k}\frac{1}{\pi _i} \phi _{\tau }^2(\varepsilon _i)\Vert \Lambda _nD_N^{-1}\varvec{x}_i\Vert ^2\right\} ,\\&s.t. \sum \limits _{i=1}^{N-k}\pi _i=n-k,0\le \pi _i \le 1,i=1,\ldots ,N-k,\\&~~~~~~~~ p_{N-k+1},\ldots ,p_N=1. \end{aligned} \end{aligned}$$

(A.22)

Similarly, applying Cauchy-Schwarz inequality,

$$\begin{aligned}&\frac{n}{N^2}\sum \limits _{i=1}^{N-k}\frac{1}{\pi _i}\phi _{\tau }^2(\varepsilon _i) \Vert \Lambda _nD_N^{-1}\varvec{x}_i\Vert ^2 \nonumber \\&=\frac{n}{N^2}\frac{1}{n-k}\left( \sum \limits _{i=1}^{N-k}\pi _i\right) \left( \sum \limits _{i=1}^{N-k}\frac{1}{\pi _i}(h_i)^2\right) \nonumber \\&\ge \frac{n}{N^2(n-k)}\left( \sum \limits _{i=1}^{N-k}h_i\right) ^2, \end{aligned}$$

(A.23)

where the equation in the last line holds if and only if $\pi _i\propto h_i$, namely, when

$$\begin{aligned} \pi _i= \left\{ \begin{aligned}&(n-k)h_i/(\sum _{j=1}^{N-k}h_j),i=1,\ldots ,N-k \\&1, i=N-k+1,\ldots ,N \end{aligned} \right. , \end{aligned}$$

$tr(\Lambda _nD_N^{-1}V_ND_N^{-1}\Lambda _n)$ attains the minimum. Next, we are eager to unify the results of $\pi _i$. Suppose there exits an H such that

$$\begin{aligned} \max \limits _{i=1,\ldots ,N}n \frac{h_i\wedge H}{\sum _{j=1}^{N}(h_j\wedge H)}=1, \end{aligned}$$

and $h_{N-k}<H<h_{N-k+1}$, then it follows that

$$\begin{aligned} \sum _{j=1}^{N-k}h_i=(n-k)H. \end{aligned}$$

(A.24)

By (A.21) and (A.24), it follows that

$$\begin{aligned}&G_{\min }(\varvec{\pi })\\&\quad =\frac{n}{N^2}\sum \limits _{i=1}^{N}\frac{1}{\pi _i}(h_i)^2\\&=\frac{n}{N^2}\sum \limits _{i=1}^{N-k}\frac{1}{\pi _i}(h_i)^2 +\frac{n}{N^2}\sum \limits _{i=N-k+1}^{N}\frac{1}{\pi _i}(h_i)^2 \\&\quad =\frac{n}{N^2(n-k)}\left( \sum \limits _{i=1}^{N-k}h_i\right) ^2+\frac{n}{N^2}\sum \limits _{i=N-k+1}^{N}(h_i)^2\\&\quad =\frac{H^2n(n-k)}{N^2}+\frac{n}{N^2}\sum \limits _{i=N-k+1}^{N}(h_i)^2. \end{aligned}$$

Let $\pi _i^{opt}=n \frac{h_i\wedge H}{\sum _{j=1}^{N}(h_j\wedge H)}$ and $\varvec{\pi }^{opt}=(\pi _1^{opt},\ldots ,\pi _N^{opt})$, substitute $\varvec{\pi }^{opt}$ into (A.22), we can get

$$\begin{aligned}&G(\varvec{\pi }^{opt})\\&\quad =tr\left[ \frac{1}{N^2}\sum \limits _{i=1}^{N}\frac{1}{\pi _i^{opt}}\phi _{\tau }^2(\varepsilon _i)\Lambda _nD_N^{-1}\varvec{x}_i\varvec{x}_i^TD_N^{-1}\Lambda _n^T\right] \nonumber \\&\quad =\frac{n}{N^2}\sum \limits _{i=1}^{N}\frac{1}{\pi _i^{opt}}h_i^2 \nonumber \\&\quad =\frac{n}{N^2}\sum \limits _{i=1}^{N-k}\frac{1}{\pi _i^{opt}}h_i^2 +\frac{1}{N^2}\sum \limits _{i=N-k+1}^{N}\frac{1}{\pi _i^{opt}}(h_i)^2 \\&\quad =\frac{1}{N^2}\sum \limits _{i=1}^{N-k} \frac{\sum _{j=1}^{N}(h_i\wedge H)}{h_i\wedge H}h_i^2 +\frac{n}{N^2}\sum \limits _{i=N-k+1}^{N}h_i^2\\&\quad =\frac{1}{N^2}\sum \limits _{i=1}^{N-k} \frac{\sum _{j=1}^{N-k}h_i+kH}{h_i }h_i^2 +\frac{n}{N^2}\sum \limits _{i=N-k+1}^{N}h_i^2\\&\quad =\frac{H^2n(n-k)}{N^2}+\frac{n}{N^2}\sum \limits _{i=N-k+1}^{N}h_i^2\\&\quad =G_{\min }(\varvec{p}), \end{aligned}$$

which implies $\pi _i^{opt}$ is the optimal solution of (A.22).

Finally, we verify the existence of the aforementioned H and it satisfies $h_{N-k}<H<h_{N-k+1}$. The definition of k implies that

$$\begin{aligned} \frac{(n-k+1)h_{N-k+1}}{\sum _{i=1}^{N-k+1}h_{i}}\ge 1 ~~~~ \text{ and } ~~~~ \frac{(n-k)h_{N-k}}{\sum _{i=1}^{N-k}h_{i}}< 1. \end{aligned}$$

Let $H_1=h_{N-k+1}, H_2=h_{N-k}$, then

$$\begin{aligned} \frac{(n-k+1)h_{N-k+1}+(k-1)H_1}{\sum _{i=1}^{N-k+1}h_{i}+(k-1)H_1}\ge 1 \end{aligned}$$

and

$$\begin{aligned} \frac{(n-k)h_{N-k}+kH_2}{\sum _{i=1}^{N-k}h_{i}+kH_2}< 1. \end{aligned}$$

As a result,

$$\begin{aligned} n\frac{h_{N}\wedge H_1}{\sum _{j=1}^{N}h_{j}\wedge H_1}\ge 1 ~~~~ \text{ and } ~~~~ n\frac{h_{N}\wedge H_2}{\sum _{j=1}^{N-k}h_{j}\wedge H_2}< 1. \end{aligned}$$

Note that $\max \limits _{i=1,\ldots ,N}n \frac{h_i\wedge H}{\sum _{j=1}^{N}(h_j\wedge H)}$ is continuous with respect to H, then there must exist $h_{N-k}<H<h_{N-k+1}$ such that $\max \limits _{i=1,\ldots ,N}n \frac{h_i\wedge H}{\sum _{j=1}^{N}(h_j\wedge H)}=1$.

Scenario (1) is a special case of scenario (2), which is the case when $k=0$. The proof of the Theorem 5 is completed. $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, X., **a, X. & Zhang, Z. Poisson subsampling-based estimation for growing-dimensional expectile regression in massive data. Stat Comput 34, 133 (2024). https://doi.org/10.1007/s11222-024-10449-x

Download citation

Received: 28 February 2024
Accepted: 05 June 2024
Published: 15 June 2024
DOI: https://doi.org/10.1007/s11222-024-10449-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Poisson subsampling-based estimation for growing-dimensional expectile regression in massive data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Optimal subsampling for composite quantile regression in big data

Single-index composite quantile regression for ultra-high-dimensional data

Optimal subsampling for functional quantile regression

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: Proofs

Lemma A.1

Lemma A.2

Proof

Lemma A.3

Proof

Lemma A.4

Proof

Lemma A.5

Proof

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 5

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Poisson subsampling-based estimation for growing-dimensional expectile regression in massive data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Optimal subsampling for composite quantile regression in big data

Single-index composite quantile regression for ultra-high-dimensional data

Optimal subsampling for functional quantile regression

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: Proofs

Appendix A: Proofs

Lemma A.1

Lemma A.2

Proof

Lemma A.3

Proof

Lemma A.4

Proof

Lemma A.5

Proof

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 5

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation