Abstract
As an effective tool for data analysis, expectile regression is widely used in the fields of statistics, econometrics and finance. However, most studies focus on the case where the sample size is not massive and the dimension is low or fixed. This paper studies the parameter estimation and inference for large-scale expectile regression when the number of parameters grows to infinity. Specifically, an inverse probability weighted asymmetric least squares estimator based on Poisson subsampling (ALS-P) is proposed. Theoretically, the convergence rate and asymptotic normality for ALS-P are established. Furthermore, the optimal subsampling probabilities based on the L-optimality criterion are derived. Finally, extensive simulations and two real-world datasets are conducted to illustrate the effectiveness of the proposed methods.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11222-024-10449-x/MediaObjects/11222_2024_10449_Figa_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11222-024-10449-x/MediaObjects/11222_2024_10449_Figb_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11222-024-10449-x/MediaObjects/11222_2024_10449_Figc_HTML.png)
Similar content being viewed by others
Data availability
Data is provided within the manuscript or supplementary information files.
References
Ai, M.Y., Wang, F., Yu, J., Zhang, H.M.: Optimal subsampling for large-scale quantile regression. J. Complex. 62, 101512 (2021). https://doi.org/10.1016/j.jco.2020.101512
Ai, M.Y., Yu, J., Zhang, H.M., Wang, H.Y.: Optimal subsampling algorithms for big data regressions. Stat. Sin. 31(2), 749–772 (2021). https://doi.org/10.5705/ss.202018.0439
Atkinson, A.C., Done, A.N., Tobias, R.D.: Optimum Experimental Designs, with SAS. Oxford University Press, Oxford (2007)
Berger, Y.G., De La Riva Torres, O.: Empirical likelihood confidence intervals for complex sampling designs. J. R. Stat. Soc. Ser. B Stat. Methodol. 78(2), 319–341 (2016). https://doi.org/10.1111/rssb.12115
Bernstein, D.: Matrix Mathematics: Theory, Facts, and Formulas with Application to Linear Systems Theory. Princeton University Press, Princeton (2005)
Chen, S.: Bei**g multi-site air-quality data. In: UCI Machine Learning Repository (2019). https://doi.org/10.24432/C5RK5G
Ciuperca, G.: Variable selection in high-dimensional linear model with possibly asymmetric errors. Comput. Stat. Data Anal. 155, 107112 (2021). https://doi.org/10.1016/j.csda.2020.107112
Drineas, P., Magdon-Ismail, M., Mahoney, M.W., Woodruff, D.P.: Faster approximation of matrix coherence and statistical leverage. J. Mach. Learn. Res. 13, 3475–3506 (2012)
Efron, B.: Regression percentiles using asymmetric squared error loss. Stat. Sin. 1(1), 93–125 (1991)
Eilers, P.H., Boelens, H.F.: Baseline correction with asymmetric least squares smoothing. Leiden Univ. Med. Centre Rep. 1(1), 5 (2005)
Fan, J.Q., Li, R.Z.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96(456), 1348–1360 (2001). https://doi.org/10.1198/016214501753382273
Fan, J.Q., Peng, H.: Nonconcave penalized likelihood with a diverging number of parameters. Ann. Stat. 32(3), 928–961 (2004). https://doi.org/10.1214/009053604000000256
Gao, S.H., Yu, Z.: Parametric expectile regression and its application for premium calculation. Insurance Math. Econ. 111, 242–256 (2023)
Gao, J.Z., Wang, L., Lian, H.: Optimal decorrelated score subsampling for generalized linear models with massive data. Sci. China Math. 67, 405–430 (2024). https://doi.org/10.1007/s11425-022-2057-8
Gu, Y.W., Zou, H.: High-dimensional generalizations of asymmetric least squares regression and their applications. Ann. Stat. 44, 2661–2694 (2016). https://doi.org/10.1214/15-AOS1431
Hamidieh, K.: Superconductivty data. In: UCI Machine Learning Repository (2018). https://doi.org/10.24432/C53P47
Hamidieh, K.: A data-driven statistical model for predicting the critical temperature of a superconductor. Comput. Mater. Sci. 154, 346–354 (2018). https://doi.org/10.1016/j.commatsci.2018.07.052
Kuan, C.M., Yeh, J.H., Hsu, Y.C.: Assessing value at risk with care, the conditional autoregressive expectile models. J. Econom. 150(2), 261–270 (2009). https://doi.org/10.1016/j.jeconom.2008.12.002
Li, X.X., Li, R.Z., **a, Z.M., Xu, C.: Distributed feature screening via componentwise debiasing. J. Mach. Learn. Res. 21(24), 1–32 (2020)
Lu, X., Su, L.J.: Jackknife model averaging for quantile regressions. J. Econom. 188(1), 40–58 (2015). https://doi.org/10.1016/j.jeconom.2014.11.005
Ma, P., Mahoney, M., Yu, B.: A statistical perspective on algorithmic leveraging. Int. Conf. Mach. Learn. PMLR 32(1), 91–99 (2014)
Man, R., Tan, K.M., Wang, Z., Zhou, W.X.: Retire: robust expectile regression in high dimensions. J. Econom. 239(2), 105459 (2024). https://doi.org/10.1016/j.jeconom.2023.04.004
Newey, W.K., Powell, J.L.: Asymmetric least squares estimation and testing. Econom. J. Econom. Soc. 55, 819–847 (1987)
Ren, M., Zhao, S.L., Wang, M.Q., Zhu, X.B.: Robust optimal subsampling based on weighted asymmetric least squares. Stat. Pap. (2023). https://doi.org/10.1007/s00362-023-01480-7
Robins, J.M., Rotnitzky, A.: Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell, N.P., Dietz, K., Farewell, V.T. (eds.) AIDS Epidemiology: Methodological Issues, pp. 297–331. Birkhäuser, Boston (1992)
Serfling, R.J.: Approximation Theorems of Mathematical Statistics. Wiley, New York (1980)
Shan, J.H., Wang, L.: Optimal Poisson subsampling decorrelated score for high-dimensional generalized linear models. J. Appl. Stat. (2024). https://doi.org/10.1080/02664763.2024.2315467
Taylor, J.W.: Estimating value at risk and expected shortfall using expectiles. J. Financ. Econom. 6(2), 231–252 (2008). https://doi.org/10.1093/jjfinec/nbn001
Tu, Y.D., Wang, S.W.: Jackknife model averaging for expectile regressions in increasing dimension. Econ. Lett. 197, 109607 (2020). https://doi.org/10.1016/j.econlet.2020.109607
Tu, Y.D., Wang, S.W.: Variable screening and model averaging for expectile regressions. Oxf. Bull. Econ. Stat. 85(3), 574–598 (2023). https://doi.org/10.1111/obes.12538
Van der Geer, S., Bühlmann, P., Ritov, Y., Ruben, D.: On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Stat. 42(3), 1166–1202 (2014). https://doi.org/10.1214/14-AOS1221
Van der Vaart, A.W.: Asymptotic Statistics. Cambridge University Press, Cambridge (1998)
Wang, L.: GEE analysis of clustered binary data with diverging number of covariates. Ann. Stat. 39(1), 389–417 (2011). https://doi.org/10.1214/10-AOS846
Wang, H.Y.: More efficient estimation for logistic regression with optimal subsamples. J. Mach. Learn. Res. 20(132), 1–59 (2019)
Wang, H.Y., Ma, Y.Y.: Optimal subsampling for quantile regression in big data. Biometrika 108(1), 99–112 (2021). https://doi.org/10.1093/biomet/asaa043
Wang, H.Y., Zhu, R., Ma, P.: Optimal subsampling for large sample logistic regression. J. Am. Stat. Assoc. 113(522), 829–844 (2018). https://doi.org/10.1080/01621459.2017.1292914
Wang, H.Y., Yang, M., Stufken, J.: Information-based optimal subdata selection for big data linear regression. J. Am. Stat. Assoc. 114(525), 393–405 (2019). https://doi.org/10.1080/01621459.2017.1408468
Wang, L., Elmstedt, J., Wong, W.K., Xu, H.: Orthogonal subsampling for big data linear regression. Ann. Appl. Stat. 15(3), 1273–1290 (2021). https://doi.org/10.1214/21-AOAS1462
**ao, J.X., Yu, P., Song, X.Y., Zhang, Z.Z.: Statistical inference in the partial functional linear expectile regression model. Sci. China Math. 65(12), 2601–2630 (2022). https://doi.org/10.1007/s11425-020-1848-8
**e, S.Y., Zhou, Y., Wan, A.T.K.: A varying-coefficient expectile model for estimating value at risk. J. Bus. Econ. Stat. 32(4), 576–592 (2014). https://doi.org/10.1080/07350015.2014.917979
Yang, Z.H., Wang, H.Y., Yan, J.: Subsampling approach for least squares fitting of semi-parametric accelerated failure time models to massive survival data. Stat. Comput. 34, 77 (2024). https://doi.org/10.1007/s11222-024-10391-y
Yao, Y.Q., Wang, H.Y.: A review on optimal subsampling methods for massive datasets. J. Data Sci. 19(1), 151–172 (2021). https://doi.org/10.6339/21-JDS999
Yu, J., Wang, H.Y., Ai, M.Y., Zhang, H.M.: Optimal distributed subsampling for maximum quasi-likelihood estimators with massive data. J. Am. Stat. Assoc. 117(537), 265–276 (2022). https://doi.org/10.1080/01621459.2020.1773832
Yu, J., Ai, M.Y., Ye, Z.Q.: A review on design inspired subsampling for big data. Stat. Pap. (2023). https://doi.org/10.1007/s00362-022-01386-w
Yu, J., Liu, J.Q., Wang, H.Y.: Information-based optimal Subdata selection for non-linear models. Stat. Pap. 64, 1069–1093 (2023). https://doi.org/10.1007/s00362-023-01430-3
Zhang, C.H., Zhang, S.S.: Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B Stat. Methodol. 76(1), 217–242 (2014). https://doi.org/10.1111/rssb.12026
Zhou, P., Yu, Z., Ma, J.Y., Tian, M.Z., Fan, Y.: Communication-efficient distributed estimator for generalized linear models with a diverging number of covariates. Comput. Stat. Data Anal. 157, 107154 (2021). https://doi.org/10.1016/j.csda.2020.107154
Acknowledgements
The authors thank the Edi- tor and anonymous reviewers for their constructive comments and suggestions, which greatly improves the quality of the current work. The research of **aochao **a was supported by National Natural Science Foundation of China (Grant Number 11801202) and Fundamental Research Funds for the Central Universities (Grant Number 2021CDJQY-047). The research of Zhimin Zhang was supported by the National Natural Science Foundation of China [Grant Num- bers 12271066, 12171405, 11871121].
Author information
Authors and Affiliations
Contributions
**aoyan Li: Conceptualization, Methodology, Software, Validation, Writing - original draft. **aochao **a: Supervision, Writing - review & editing. Zhimin Zhang: Supervision, Writing - review & editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Proofs
Appendix A: Proofs
We provide the proofs of Theorems 1, 2 and 5 for diverging dimension below. The proof of fixed dimension is a special case of \(p_n=p\). In the proofs, we use C to denote a generic positive constant independent of \((n,N,p_n)\), whose magnitude may change from line to line. Define \( \mathcal {L}(\varvec{\beta })=\frac{1}{N}\sum _{i \in \mathcal {I}_{\mathcal {S}}}\frac{1}{\pi _i}\ell _{\tau }(y_i-\varvec{x}_i^{T}\varvec{\beta })=\frac{1}{N}\sum _{i=1}^{N} \frac{\eta _i}{\pi _i}\ell _{\tau }(y_i-\varvec{x}_i^{T}\varvec{\beta }), {Q}(\varvec{\beta }) =\frac{1}{N} \sum _{i=1}^{N} \frac{\eta _i}{\pi _i}\phi _{\tau }(y_i-\varvec{x}_i^{T}\varvec{\beta })\varvec{x}_i\).
Lemma A.1
(Tu and Wang 2020, Lemma A.1) For \(\ell _{\tau }(s)=s^2|\tau -I(s\le 0)|, \phi _{\tau }(s)=s|\tau -I(s \le 0)|, \psi _{\tau }(s)=|\tau -I(s \le 0)|, \Gamma (s,t)= I(s<0)-I(s+t<0)\), we have
-
(i)
\(\ell _{\tau }(s+t)-\ell _{\tau }(s)=2\phi _{\tau }(s)t+\psi _{\tau }(s)t^2 + (2\tau -1)(s+t)^2\Gamma (s,t)\),
-
(ii)
\(\phi _{\tau }(s+t)-\phi _{\tau }(s)=t\psi _{\tau }(s)+(2\tau -1)(s+t)\Gamma (s,t)\).
Lemma A.2
Let \(a_n=\sqrt{p_n/n}, \Gamma (s,t)= I(s<0)-I(s+t<0),\varvec{u} \in \mathbb {R}^{p_n} \) such that \(\Vert \varvec{u}\Vert \le c\), where c is a constant. Under the conditions (C2)-(C4), for \(k=2,8\), we have
Proof
It follows that
where the first inequality invokes the condition (C2), the second inequality applies Cauchy-Schwarz inequality, the last inequality uses Schwarz and Loève’s \(c_r\) inequalities and the fact \(\varvec{u}^TE(\varvec{x}_i\varvec{x}_i^T)\varvec{u}\le \Vert \varvec{u}\Vert ^2 \lambda _{max}\left[ E\left( \varvec{x}_i\varvec{x}_i^T \right) \right] \), the last line holds by the conditions (C3)(ii) and (C4)(i). \(\square \)
Lemma A.3
Under the conditions (C1), (C3)-(C5), if \(p_n^3/n \rightarrow 0\), we have
Proof
Let \(\sqrt{n}A_nV^{-1/2}Q(\varvec{\beta }_0)=\sum _{i=1}^{N}\frac{\sqrt{n}\eta _i}{N\pi _i}A_n \times V^{-1/2}\phi _{\tau }(\varepsilon _i)\varvec{x}_i=: \sum _{i=1}^{N}\varvec{\xi }_{i}\). Now, we check the condition of Lindeberg-Feller central limit theorem (Proposition 2.27 in Van der Vaart (1998)). For \(\forall \epsilon >0\),
where the fourth line applies the fact \(|\phi _{\tau }(\varepsilon _i)|\le |\varepsilon _i|\) and the conclusion \(|||A_n V^{-1/2}|||=O(1)\), the sixth line is due to condition (C3)(iii), the seventh line holds by Cauchy-Schwarz inequality, the eighth line uses Loève’s \(c_r\) inequality, the last inequality invokes tha conditions (C5) and (C3)(ii), the last equation uses the condition \(p_n^3/n \rightarrow 0\). Next, we show \( |||A_nV^{-1/2} |||=O(1)\). For any \(A_n\), by condition (C4)(ii), we have
where the second inequality applies the conclusion \(tr(UW)\le \lambda _{\max }(U)tr(W)\) for any symmetric matrix U and positive semidefinite matrix W (Bernstein 2005). Thus the condition of Lindeberg-Feller central limit theorem is satisfied. Note that
where the last line invokes condition (C4)(ii). Then, the desired result holds by Lindeberg–Feller central limit theorem. \(\square \)
Lemma A.4
Let \(Q_1(\Delta )=\sum _{i=1}^{N}\frac{\sqrt{n}\eta _i}{N\pi _i}\psi _{\tau }(\varepsilon _i)A_n \times V^{-1/2}\varvec{x}_i\varvec{x}_i^T\Delta \), where \(\Delta =\varvec{\beta }-\varvec{\beta }_0\). Under the conditions (C3) and (C5), if \(p_n^5/n \rightarrow 0\) we have
Proof
Note that \(Q_1(\Delta )=\sqrt{n}A_nV^{-1/2}\mathcal {H}({\varvec{\beta }} _0)\Delta \), where \(\mathcal {H}({\varvec{\beta }} _0)=\frac{1}{N}\sum _{i=1}^{N}\frac{\eta _i}{\pi _i}\psi _{\tau }(\varepsilon _i)\varvec{x}_i\varvec{x}_i^T\), then
Denote \({\mathcal {D}^*} = \mathcal {H}({\varvec{\beta }} _0)-D=(\mathcal {D}_{lm}^*), 1\le l,m \le p_n\). Using the fact that \(|||U |||\le q {\mathop {\max }\nolimits _{1 \le i,j \le q} |U_{ij}|}\) for a \(q \times q\) matrix U, we have, \(\forall \epsilon >0\),
where the second inequality applies Boole’s inequality, the fourth inequality uses Markov’s inequality, the first equation is duo to the fact \(E(\mathcal {D}_{lm}^{*})=0\), the sixth inequality holds by the fact \(\psi _{\tau }(\varepsilon _i) \le 1\), the seventh inequality uses Hölder’s inequality, and the last line invokes conditions (C3)(ii) and (C5). (A.4) implies
By (A.2), (A.3), (A.5) and the condition \(p_n^5/n \rightarrow 0\), we have
\(\square \)
Lemma A.5
Let \(g_i(s)=(\varepsilon _i-s)[I(\varepsilon _i<0) -I(\varepsilon _i <s)]\), \(Q_2(\Delta ) =\sum _{i=1}^{N}\frac{\sqrt{n}\eta _i}{N\pi _i}g_i\left( \varvec{x}_i^T\Delta \right) A_n V^{-1/2}\varvec{x}_i\), where \(\Delta =\varvec{\beta }-\varvec{\beta }_0\). Under the conditions (C2)-(C6), if \(p_n^4/n \rightarrow 0\), we have
Proof
Let \(\varvec{b}_i=A_n V^{-1/2}\varvec{x}_i=\varvec{b}_i^{+}-\varvec{b}_i^{-}\), where \({b}_{ij}^{+}=\max \{ {b}_{ij},0\}\), \({b}_{ij}^{-}=\max \{ -{b}_{ij},0\}\), \({b}_{ij}^{+}, {b}_{ij}^{-}\), and \({b}_{ij}\) denote the jth component of \(\varvec{b}_{ij}^{+}, \varvec{b}_{ij}^{-}\), and \(\varvec{b}_{ij}\) respectively (\(j=1,\ldots ,q\)). Note that,
Firstly, we show \(J_1=o_{P}(1)\). By the triangle inequality,
where \(Q_2^{+}(\Delta )=\sum _{i=1}^{N}\frac{\sqrt{n}\eta _i}{N\pi _i}g_i\left( \varvec{x}_i^T\Delta \right) \varvec{b}_i^{+}\) and \(Q_2^{-}(\Delta )=\sum _{i=1}^{N}\frac{\sqrt{n}\eta _i}{N\pi _i}g_i\left( \varvec{x}_i^T\Delta \right) \varvec{b}_i^{-}\). It suffices to prove that \(J_{1l}=o_{P}(1)\) for \( l=1,2\). Now, we only show \(J_{11}=o_{P}(1)\) and the second term can be proved by the same token. Let \(\mathbb {D} = \left\{ \Delta \in \mathbb {R}^{p_n} \Big | \Vert \Delta \Vert \le C\sqrt{p_n/n} \right\} \), then by selecting \(N_0=n^{2p_n}\) grid points \(\{ \Delta _t \}_{1}^{m}\), \(\mathbb {D}\) can be covered by \(\bigcup _{t=1}^{m}\mathbb {D}_t\), where \(\mathbb {D}_t = \left\{ \Delta \in \mathbb {R}^{p_n} \Big | \Vert \Delta - \Delta _t\Vert _{\infty } \le \delta _n \right\} \) with \(\delta _n= Cp_n^{1/2}n^{-5/2}\). Define \(w_{it}(s)=g_i(\varvec{x}_i^T \Delta _t - s\Vert \varvec{x}_i\Vert )\). Note that \(g_i(s)\) is monotone, by the triangle inequality, we have
Next, we consider \(J_{111}\). Let \(d_i =\frac{\sqrt{n}\eta _i}{\pi _i} \varvec{x}_i^T\Delta _t \varvec{b}_i^{+}\), \(\zeta _{it}=\frac{\sqrt{n}\eta _i}{\pi _i}g_i\left( \varvec{x}_i^T\Delta _t \right) \varvec{b}_i^{+}\), \(\zeta _{it}^*=\zeta _{it} - E(\zeta _{it})\) and \(e_n= Nn^{-1/2}p_n^{3/2}\). Then
It is sufficient to show \(J_{111}=o_{P}(1)\) by demonstrating that \(\max _{1\le t \le N_0}\Vert J_{lt}^{*}\Vert =o_{P}(1)\) for \(l=1,2,3\). First, let’s consider the first item \(J_{1t}^{*}\), note that \( E\Bigg \{\zeta _{it}^* I(\Vert d_i \Vert \le e_{n})-E\left[ \zeta _{it}^* I(\Vert d_i \Vert \le e_{n})\right] \Bigg \}=\varvec{0} \) and
where
where the second inequality applies Jensen’s inequality, and the last line holds by condition (C4)(i). Similarly, we have \(E(\Vert \zeta _{it}\Vert )=O(p_n^{3/2}/\sqrt{n})\) and \(E\Vert \zeta _{it}I(\Vert d_i \Vert \le e_{n})\Vert =O(e_n\sqrt{p_n/n})\). Thus
By the fact \(|g_i(s)|\le |s|\),
Then
Thus, applying Boole’s and Bernstein’s inequalities (Serfling 1980), \(\forall \varepsilon >0\), we have
where the last inequality is duo to \(n^{1/2}p_n^{-3/2} \gg n^{3/2}N^{-1}p_n^{-7/2}\) and \(n^{3/2}N^{-1}p_n^{-7/2} \gg p_n\log n \) by condition (C6). This implies
Note that
where the first inequality invokes the condition (C2), the third inequality is duo (A.2) and Cauchy-Schwarz inequality, the last inequality holds by Loève’s \(c_r\) inequality, the last line invokes conditions (C5) and (C3)(ii). Similarly, we have
where the second line applies Cauchy-Schwarz inequality, the last line holds by condition \(p_n^4/n \rightarrow 0\). For \(J_{3t}^{*}\), let \(d_i^{*}=n^{1/2}N^{-1/2}p_n^{-3/2}d_i\), then \(E\Vert d_i^{*}\Vert ^2=O(1)\) by (A.8). By the monotone probability and Boole’s inequalities and Lebesgue dominated convergence theorem, we can derive that
which results in
(A.6), (A.9) and (A.10) imply \(J_{111}=o_{P}(1)\).
For \(J_{112}\), we have
where the first equation holds by (A.2), condition (C3)(ii) and Loève’s \(c_r\) inequality.
Adoption of similar discussions, we can obtain \(J_{113}=o_{P}(1)\). Thus \(J_{1}=o_{P}(1)\).
Finally, we show \(J_{2}=o_{p}(1)\). Note that
where the last line invokes condition \(p_n^4/n \rightarrow 0\).
The proof of the Lemma A.5 is completed. \(\square \)
Proof of Theorem 1
Let \(\mathscr {B}=\{\varvec{\beta }: \varvec{\beta }=\varvec{\beta }_0+\varvec{u} a_n, \varvec{u} \in \mathbb {R}^{p_n}, \Vert \varvec{u}\Vert \le c\}\) for \(a_n=\sqrt{{p_n/}{n}}\) and some constant c. By Fan and Li (2001), it is sufficient to demonstrate that for \(\forall \epsilon >0\), there exists a sufficiently large constant c, such that
for large enough n. This implies that there is a local minimizer \(\widetilde{\varvec{\beta }}_{\mathcal {S}}\) of \({\mathcal {L}}(\varvec{\beta })\) in \(\mathscr {B}\) satisfies \(\Vert \widetilde{\varvec{\beta }}_{\mathcal {S}}- {\varvec{\beta }_0}\Vert =O_{p}(a_n)\). The local minimizer is the global minimizer because of the convexity of \({\mathcal {L}}(\varvec{\beta })\). By Lemma A.1 (i), we have
Firstly, we consider the term \(I_1\). Note that
where the second equation is duo to the fact \(E\left[ \frac{\eta _i}{\pi _i}\phi _{\tau }(\varepsilon _i)\varvec{x_i}^T\varvec{u}\right] =0\), the first inequality uses the fact \(|\phi _{\tau }(\varepsilon _i)|\le |\varepsilon _i|\), the fourth line applies condition (C3)(iii) and Cauchy-Schwarz inequality, the last inequality holds by Loève’s \(c_r\) inequality, the last line invokes conditions (C5) and (C3)(ii). Then, by Markov’s inequality and (A.13), we have
The second term of (A.12), \(I_2\), can be decomposed as follows
where \(\mathcal {H}({\varvec{\beta }} _0)=\frac{1}{N}\sum _{i=1}^{N}\frac{\eta _i}{\pi _i}\psi _{\tau }(\varepsilon _i)\varvec{x}_i\varvec{x}_i^T\). Combining (A.5), (A.15), conditions (C4)(i) and \(p_n^4/n \rightarrow 0\), we have
For \(I_3\), by Lemma A.2,
and
where \(C=(2\tau -1)^2\). Thus
by Chebyshev’s inequality and the condition \(p_n^4/n\rightarrow 0\). Combining (A.14), (A.16) and (A.19), the second term \(I_2\), which is positive, dominates other two terms for a sufficiently large c with probability approaching one. Thus, (A.11) holds.
Proof of Theorem 2
Let \(\Delta =\varvec{\beta }- \varvec{\beta }_0 \) and \(\widetilde{\Delta }=\widetilde{\varvec{\beta }}_{\mathcal {S}}- \varvec{\beta }_0 \). By Lemma A.1 (ii),
Then, we have \(\sqrt{n}A_n V^{-1/2} \left[ Q(\widetilde{\varvec{\beta }}_{\mathcal {S}})-Q(\varvec{\beta }_0) \right] =-Q_1(\widetilde{\Delta }) + (2\tau - 1) Q_2(\widetilde{\Delta })\). By Lemma A.4–A.5 and the fact \(Q(\widetilde{\varvec{\beta }}_{\mathcal {S}})=0\),
Then the desired result holds by Lemma A.3 and Slutsky’s theorem.
Proof of Theorem 5
Without loss of generality, we assume \(h_i>0, i=1,\ldots ,N, h_{(N+1)}=+\infty \) and \(h_1 \le h_2 \le \dots \le h_N\). According to the L-optimal criterion, minimizing the empirical AMSE of \(\Lambda _n\widetilde{\varvec{\beta }}\) is equivalent to minimizing \(tr(\Lambda _nD_N^{-1}V_ND_N^{-1}\Lambda _n)\). Thus the optimization problem can be described as follows:
By Cauchy–Schwarz inequality,
where the equation in the last line holds if and only if \(\pi _i\propto h_i\), that is to say, when \(\pi _i\propto h_i\), \(tr(\Lambda _nD_N^{-1}V_ND_N^{-1}\Lambda _n)\) attains the minimum. Note that all \(\pi _i\)s need to satisfy \(0\le \pi _i \le 1\). We consider two scenarios:
(1) If \(nh_i/(\sum _{j=1}^{N}h_j)\le 1\) for \(i=i,\ldots ,N\), then \(\pi _i^{opt}= \frac{nh_i }{\sum _{j=1}^{N}(h_i)}\).
(2) If there are some is that make \(nh_i/(\sum _{i=j}^{N}h_j)> 1\), then there must be k of those is. To be specific, by the definition of k, \(kh_i>\sum _{i=1}^{N}h_i=\sum _{j=1}^{N-k}h_j+ \sum _{j=N-k+1}^{N}h_j > (n-k)h_{N-k}+kh_{N-k} = nh_{N-k}\), then \(h_i>h_{N-k}\), which yields \(i>N-k\). In this case, the original optimization problem (A.20) is equivalent to
Similarly, applying Cauchy-Schwarz inequality,
where the equation in the last line holds if and only if \(\pi _i\propto h_i\), namely, when
\(tr(\Lambda _nD_N^{-1}V_ND_N^{-1}\Lambda _n)\) attains the minimum. Next, we are eager to unify the results of \(\pi _i\). Suppose there exits an H such that
and \(h_{N-k}<H<h_{N-k+1}\), then it follows that
By (A.21) and (A.24), it follows that
Let \(\pi _i^{opt}=n \frac{h_i\wedge H}{\sum _{j=1}^{N}(h_j\wedge H)}\) and \(\varvec{\pi }^{opt}=(\pi _1^{opt},\ldots ,\pi _N^{opt})\), substitute \(\varvec{\pi }^{opt}\) into (A.22), we can get
which implies \(\pi _i^{opt}\) is the optimal solution of (A.22).
Finally, we verify the existence of the aforementioned H and it satisfies \(h_{N-k}<H<h_{N-k+1}\). The definition of k implies that
Let \(H_1=h_{N-k+1}, H_2=h_{N-k}\), then
and
As a result,
Note that \(\max \limits _{i=1,\ldots ,N}n \frac{h_i\wedge H}{\sum _{j=1}^{N}(h_j\wedge H)}\) is continuous with respect to H, then there must exist \(h_{N-k}<H<h_{N-k+1}\) such that \(\max \limits _{i=1,\ldots ,N}n \frac{h_i\wedge H}{\sum _{j=1}^{N}(h_j\wedge H)}=1\).
Scenario (1) is a special case of scenario (2), which is the case when \(k=0\). The proof of the Theorem 5 is completed. \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, X., **a, X. & Zhang, Z. Poisson subsampling-based estimation for growing-dimensional expectile regression in massive data. Stat Comput 34, 133 (2024). https://doi.org/10.1007/s11222-024-10449-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-024-10449-x