Quantile forward regression for high-dimensional survival data

Lee, Eun Ryung; Park, Seyoung; Lee, Sang Kyu; Hong, Hyokyoung G.

doi:10.1007/s10985-023-09603-w

Quantile forward regression for high-dimensional survival data

Published: 02 July 2023

Volume 29, pages 769–806, (2023)
Cite this article

Lifetime Data Analysis Aims and scope Submit manuscript

Eun Ryung Lee¹^na1,
Seyoung Park¹^na1,
Sang Kyu Lee^2,3 &
…
Hyokyoung G. Hong ORCID: orcid.org/0000-0002-4280-6243³

420 Accesses
Explore all metrics

Abstract

Despite the urgent need for an effective prediction model tailored to individual interests, existing models have mainly been developed for the mean outcome, targeting average people. Additionally, the direction and magnitude of covariates’ effects on the mean outcome may not hold across different quantiles of the outcome distribution. To accommodate the heterogeneous characteristics of covariates and provide a flexible risk model, we propose a quantile forward regression model for high-dimensional survival data. Our method selects variables by maximizing the likelihood of the asymmetric Laplace distribution (ALD) and derives the final model based on the extended Bayesian Information Criterion (EBIC). We demonstrate that the proposed method enjoys a sure screening property and selection consistency. We apply it to the national health survey dataset to show the advantages of a quantile-specific prediction model. Finally, we discuss potential extensions of our approach, including the nonlinear model and the globally concerned quantile regression coefficients model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Bayesian variable selection approach to longitudinal quantile regression

Article 27 May 2022

Feature Screening for High-Dimensional Survival Data via Censored Quantile Correlation

Article 07 November 2020

Conditional-quantile screening for ultrahigh-dimensional survival data via martingale difference correlation

Article 08 August 2018

References

Bang H, Tsiatis AA (2002) Median regression with censored cost data. Biometrics 58(3):643–649
Article MathSciNet MATH Google Scholar
Barut E, Fan J, Verhasselt A (2016) Conditional sure independence screening. J Am Stat Assoc 111(515):1266–1277
Article MathSciNet Google Scholar
Belloni A, Chernozhukov V (2011) $\ell _1$-penalized quantile regression in high-dimensional sparse models. Ann Stat 39(1):82–130
Article MATH Google Scholar
Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95(3):759–771
Article MathSciNet MATH Google Scholar
Cheng MY, Honda T, Zhang JT (2016) Forward variable selection for sparse ultra-high dimensional varying coefficient models. J Am Stat Assoc 111(515):1209–1221
Article MathSciNet Google Scholar
Eli S, Tangvik RJ, Nymo LS, Harthug S, Lassen K, Viste A (2020) Weight loss and bmi criteria in GLIM’s definition of malnutrition is associated with postoperative complications following abdominal resections - results from a national quality registry. Clin Nutrit 39(5):1593–1599
Article Google Scholar
Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space (with discussion). J Royal Stat Soc: Series B (Stat Methodol) 70(5):849–911
Article MathSciNet MATH Google Scholar
Fan J, Lv J (2010) A selective overview of variable selection in high dimensional feature space. Stat Sin 20(1):101–148
MathSciNet MATH Google Scholar
Fan J, Song R (2010) Sure independence screening in generalized linear models with NP-dimensionality. Ann Stat 38:3567–3604
Article MathSciNet MATH Google Scholar
Fard NA, Morales GDF, Mejova Y, Schifanella R (2021) On the interplay between educational attainment and nutrition: a spatially-aware perspective. EPJ Data Sci 10(1):18
Article Google Scholar
Flegal KM, Kit BK, Orpana H, Graubard BI (2013) Association of all-cause mortality with overweight and obesity using standard body mass index categories: a systematic review and meta-analysis. JAMA 309(1):71–82
Article Google Scholar
Gearhardt AN, Corbin WR (2009) Body mass index and alcohol consumption: family history of alcoholism as a moderator. Psychol Addict Behav 23(2):216–225
Article Google Scholar
He X, Wang L, Hong HG (2013) Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data. Ann Stat 41:342–369
MathSciNet MATH Google Scholar
Honda T, Lin C (2022) Forward variable selection for ultra-high dimensional quantile regression models. Ann Instit Stat Math 1–32
Hong HG, Kang J, Li Y (2018) Conditional screening for ultra-high dimensional covariates with survival outcomes. Lifetime Data Anal 24(1):45–71
Article MathSciNet MATH Google Scholar
Hong HG, Christiani DC, Li Y (2019) Quantile regression for survival data in modern cancer research: expanding statistical tools for precision medicine. Precis Clin Med 2(2):90–99
Article Google Scholar
Hong HG, Zheng Q, Li Y (2019) Forward regression for Cox models with high-dimensional covariates. J Multivar Anal 173:268–290
Article MathSciNet MATH Google Scholar
Hwang WY, Zhang HH, Ghosal S (2009) First: combining forward iterative selection and shrinkage in high dimensional sparse linear regression. Stat Interface 2:341–348
Article MathSciNet MATH Google Scholar
Karavasiloglou N, Pestoni G, Wanner M, Faeh D, Rohrmann S (2019) Healthy lifestyle is inversely associated with mortality in cancer survivors: results from the third national health and nutrition examination survey (NHANES III). PLOS ONE 14(6):1–11
Article Google Scholar
Kleiner KD, Gold MS, Frostpineda K, Lenzbrunsman B, Perri MG, Jacobs WS (2004) Body mass index and alcohol use. J Addict Dis 23(3):105–118
Article Google Scholar
Knight K (1998) Limiting distributions for $l_1$ regression estimators under general conditions. Ann Stat 26(2):755–770
Article MATH Google Scholar
Koenker R, Machado JAF (1999) Goodness of fit and related inference processes for quantile regression. J Am Stat Assoc 94:1296–1310
Article MathSciNet MATH Google Scholar
Kong Y, Li Y, Zerom D (2019) Screening and selection for quantile regression using an alternative measure of variable importance. J Multiv Anal 173:435–455
Article MathSciNet MATH Google Scholar
Ledoux M, Talagrand M (1991) Probability in Banach Spaces: Isoperimetry and Processes. Springer, New York
Book MATH Google Scholar
Lee ER, Noh H, Park BU (2014) Model selection via Bayesian information criterion for quantile regression models. J Am Stat Assoc 109:216–229
Article MathSciNet MATH Google Scholar
Leng C, Tong X (2013) A quantile regression estimator for censored data. Bernoulli 19(1):344–361, http://www.jstor.org/stable/23525643
Liu J, Zhong W, Li R (2015) A selective overview of feature screening for ultrahigh-dimensional data. Sci China Math 58:20–33
Article MathSciNet MATH Google Scholar
Luo S, Chen Z (2014) Sequential lasso cum EBIC for feature selection with ultra-high dimensional feature space. J Am Stat Assoc 109:1229–1240
Article MathSciNet MATH Google Scholar
Ma S, Li R, Tsai CL (2017) Variable screening via quantile partial correlation. J Am Stat Assoc 112:650–663
Article MathSciNet Google Scholar
Must A, Spadano J, Coakley EH, Field AE, Colditz G, Dietz WH (1999) The disease burden associated with overweight and obesity. JAMA 282(16):1523–1529
Article Google Scholar
Park S, He X (2017) Hypothesis testing for regional quantiles. J Stat Plan Inference 191:13–24
Article MathSciNet MATH Google Scholar
Park S, Lee ER (2021) Hypothesis testing of varying coefficients for regional quantiles. Comput Stat Data Anal 159:107204
Article MathSciNet MATH Google Scholar
Park S, Lee ER, Zhao H (2022) Low-rank regression models for multiple binary responses and their applications to cancer cell-line encyclopedia data. J Am Stat Assoc. https://doi.org/10.1080/01621459.2022.2105704
Article Google Scholar
Peng L (2021) Quantile regression for survival data. Annu Rev Stat Appl 8(1):413–437
Article MathSciNet Google Scholar
Pijyan A, Zheng Q, Hong HG, Li Y (2020) Consistent estimation of generalized linear models with high dimensional predictors via stepwise regression. Entropy 22(9):965
Article MathSciNet Google Scholar
Radchenko P, James GM (2011) Improved variable selection with forward-lasso adaptive shrinkage. Ann Appl Stat 5:427–448
Article MathSciNet MATH Google Scholar
Sherwood B, Wang L (2016) Partially linear additive quantile regression in ultra-high dimension. Ann Stat 44:288–317
Article MathSciNet MATH Google Scholar
Sluik D, Brouwer-Brolsma EM, Berendsen AAM, Mikkilä V, Poppitt SD, Silvestre MP, Tremblay A, Pérusse L, Bouchard C, Raben A, Feskens EJM (2019) Protein intake and the incidence of pre-diabetes and diabetes in 4 population-based studies: the preview project. Am J Clin Nutrit 109(5):1310–1318
Article Google Scholar
Tibshirani R (1997) The lasso method for variable selection in the cox model. Stat Medi 28:385–395
Article Google Scholar
van der Vaart Wellner JA (1996) Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics, Springer, New York
Book MATH Google Scholar
Wang H (2009) Forward regression for ultra-high dimensional variable screening. J Am Stat Assoc 104(488):1512–1524
Article MathSciNet MATH Google Scholar
Wang L, Wu Y, Li R (2012) Quantile regression for analyzing heterogeneity in ultra-high dimension. J Am Stat Assoc 107:214–222
Article MathSciNet MATH Google Scholar
Yu K, Moyeed RA (2001) Bayesian quantile regression. Stat Prob Lett 54:437–447
Article MathSciNet MATH Google Scholar
Zhang CH, Huang J (2008) The sparsity and bias of the lasso selection in high-dimensional linear regression. Ann Stat 36:1567–1594
Article MathSciNet MATH Google Scholar
Zheng Q, Peng L, He X (2015) Globally adaptive quantile regression with ultra-high dimensional data. Ann Stat 43:2225–2258
Article MathSciNet MATH Google Scholar
Zheng Q, Hong HG, Li Y (2020) Building generalized linear models with ultrahigh dimensional features: a sequentially conditional approach. Biometrics 76(1):47–60
Article MathSciNet MATH Google Scholar

Download references

Author information

Eun Ryung Lee and Seyoung Park have contributed equally as first-authors.

Authors and Affiliations

Department of Statistics, Sungkyunkwan University, Seoul, 03063, Korea
Eun Ryung Lee & Seyoung Park
Department of Statistics and Probability, Michigan State University, East Lansing, MI, 48823, USA
Sang Kyu Lee
Biostatistics Branch, National Cancer Institute, Bethesda, MD, 20892, USA
Sang Kyu Lee & Hyokyoung G. Hong

Authors

Eun Ryung Lee
View author publications
You can also search for this author in PubMed Google Scholar
Seyoung Park
View author publications
You can also search for this author in PubMed Google Scholar
Sang Kyu Lee
View author publications
You can also search for this author in PubMed Google Scholar
Hyokyoung G. Hong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hyokyoung G. Hong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 146 KB)

Appendix 1 (Theoretical Details)

Throughout the proof, for any set $S \subseteq \{1,\ldots ,p\}$, let

$$\begin{aligned} \ell _S(\beta _S) := \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{\hat{G}_i} \rho _\tau (Y_i^C-{{\varvec{X}}}_{iS}^\top \beta _S), \ell ^o_S(\beta _S) := \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{G^o_i} \rho _\tau (Y_i^C-{{\varvec{X}}}_{iS}^\top \beta _S), \end{aligned}$$

where $G^o_i := G(Y_i^C)$ is the censoring probability, and $\beta ^*_S := \mathop {\mathrm {arg\,min}}\limits _{\beta _S} E[\ell _S(\beta _S)]$. Given a random sample $Z_1,\ldots , Z_n$ and a function $f(\cdot )$, let $G_n(f) := n^{-1/2} \sum _{i=1}^n (f(Z_i)-E[f(Z_i)])$.

We first introduce useful lemmas and prove the main theorems.

1.1 Lemmas and proofs

The following Lemma 1 is given in Lemma 8.4 of He et al. (2013).

Lemma 1

Suppose Assumptions 5-6. Then, the Kaplan-Meier estimator ${\hat{G}}(t)$ satisfies

$$\begin{aligned} \sup _{0 \le t \le T} |{\hat{G}}(t)- G(t)|= & {} O\left( n^{-1/2} (\log n)^{1/2}\right) \quad \text{ almost } \text{ surely }.\\ \sup _{0 \le t \le T} |{\hat{G}}(t)^{-1}- G(t)^{-1}|= & {} O\left( n^{-1/2} (\log n)^{1/2}\right) \quad \text{ almost } \text{ surely }. \end{aligned}$$

The following lemmas will be used to prove the main theorems.

Lemma 2

Suppose Assumptions 3,5, and 6. Given an index set S, let $B_S^o(d_1) := \{\beta _S:\ \Vert \beta _S-\tilde{\beta }_S\Vert \le d_1/\sqrt{s}\}$. Then, we have

$$\begin{aligned} \sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } \left| \ell ^o_S(\beta _S) - \ell ^o_S(\tilde{\beta }_S) - \left( \ell _S(\beta _S) - \ell _S(\tilde{\beta }_S)\right) \right| = O\left( d_1 s^{-1/2} n^{-1/2} (\log n)^{1/2}\right) . \end{aligned}$$

Proof

We have

$$\begin{aligned}&\sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } \left| \ell ^o_S(\beta _S) - \ell ^o_S(\tilde{\beta }_S) - \left( \ell _S(\beta _S) - \ell _S(\tilde{\beta }_S)\right) \right| \\&=\sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } \left| \frac{1}{n} \sum _{i=1}^n \left( \frac{\varDelta _i}{G^o_i} - \frac{\varDelta _i}{{\hat{G}}_i} \right) \left\{ \rho _\tau (Y_i^C-{{\varvec{X}}}_{iS}^\top \beta _S) - \rho _\tau (Y_i^C-{{\varvec{X}}}_{iS}^\top \tilde{\beta }_S)\right\} \right| \\&\le O\left( n^{-1/2} (\log n)^{1/2}\right) \sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } \frac{1}{n} \sum _{i=1}^n \left| {{\varvec{X}}}_{iS}^\top (\beta _S - \tilde{\beta }_S)\right| \\&\le O\left( n^{-1/2} (\log n)^{1/2}\right) \sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } \sqrt{ \sum _{i=1}^n \left| {{\varvec{X}}}_{iS}^\top (\beta _S - \tilde{\beta }_S)\right| ^2/n } \\&\le O\left( n^{-1/2} (\log n)^{1/2}\right) \lambda _{M}^{1/2} \sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } \Vert \beta _S - \tilde{\beta }_S\Vert \\&= O\left( d_1 s^{-1/2} n^{-1/2} (\log n)^{1/2}\right) , \end{aligned}$$

where the first inequality follows from Lemma 1 and the fact that $|\rho _\tau (a)-\rho _\tau (b)| \le |a-b|$ for numbers a and b, the second inequality follows from the Cauchy-Schwarz inequality, the third inequality follows from Assumption 3. This completes the proof. $\square$

Lemma 3

Suppose conditions of Theorem 1. Given an index set S, let $B_S^o(d_1) := \{\beta _S:\ \Vert \beta _S-\tilde{\beta }_S\Vert \le d_1/\sqrt{s}\}$. Then, with probability at least $1-6\exp (-6\rho \log p)$,

$$\begin{aligned} \sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } |\ell _S(\beta _S) - \ell _S(\tilde{\beta }_S) - E [\ell _S(\beta _S)] + E[\ell _S(\tilde{\beta }_S)]| \le 2A_3 d_1 \sqrt{\rho \log p/n}, \end{aligned}$$

where $A_3= 4\sqrt{\lambda _{M}} \tau _0^{-1}$.

Proof

Note that

$$\begin{aligned} \ell ^o_S(\beta _S) - \ell ^o_S(\tilde{\beta }_S) - E [\ell ^o_S(\beta _S)] + E[\ell ^o_S(\tilde{\beta }_S)] = \frac{1}{\sqrt{n}} G_n\left( \frac{\varDelta _i}{G^o_i}\rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) -\frac{\varDelta _i}{G^o_i} \rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \tilde{\beta }_S)\right) . \end{aligned}$$

Fix S with $s=|S| \le \rho$. Let $t:= d_1/\sqrt{s}$. Let

$$\begin{aligned} A(t) := \sup _{\beta _S \in B^o_S(d_1)} \left| G_n\left( \frac{\varDelta _i}{G^o_i}\rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) -\frac{\varDelta _i}{G^o_i} \rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \tilde{\beta }_S)\right) \right| . \end{aligned}$$

For any $\beta _S \in B^o_S(d_1)$, we have by Assumption 5,

$$\begin{aligned}&\text{ var }\left( G_n\left( \frac{\varDelta _i}{G^o_i}\rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) -\frac{\varDelta _i}{G^o_i} \rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \tilde{\beta }_S)\right) \right) \\&\le \tau _0^{-2} E[|{{\varvec{X}}}_{iS}^\top (\beta _S-\tilde{\beta }_S)|^2] \le \lambda _{M} \tau _0^{-2} t^2. \end{aligned}$$

By symmetrization lemma for probabilities, we obtain that for $M > 2\sqrt{\lambda _{M}} \tau _0^{-1} t$,

$$\begin{aligned} P(A(t) \ge M) \le {2P(A^o(t) \ge M/4)}/ \left\{ 1- 4\lambda _{M} \tau _0^{-2} t^2 / M^2\right\} , \end{aligned}$$

where $A^o(t)$ is the symmetrized version of A(t), i.e.,

$$\begin{aligned} A^o(t) := \sup _{\beta _S \in B^0_S(t)}\left| G_n\left( \frac{V_i\varDelta _i}{G^o_i}\rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) -\frac{V_i\varDelta _i}{G^o_i} \rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \tilde{\beta }_S)\right) \right| , \end{aligned}$$

where $V_1,\ldots , V_n$ are i.i.d. Rademacher random variables. We can write

$$\begin{aligned} \frac{\varDelta _i}{G^o_i}\rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) -\frac{\varDelta _i}{G^o_i} \rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \tilde{\beta }_S) = \frac{\varDelta _i}{G^o_i}\tau {{\varvec{X}}}_{iS}^\top (\beta _S-\tilde{\beta }_S) + \frac{\varDelta _i}{G^o_i} W_i(\tau , \delta ), \end{aligned}$$

where $W_i(\tau , \delta _S) = [Y_i - {{\varvec{X}}}_{iS}^\top (\tilde{\beta }_S + \delta _S)]_- - [Y_i-{{\varvec{X}}}_{iS}^\top \tilde{\beta }_S]_-$ and $\delta _S := \beta _S- \tilde{\beta }_S$, where $u_- = 1\{u < 0\} |u|$. Then, we have $A^o(t) \le B^o(t) + C^o(t)$, where

$$\begin{aligned} B^o(t) = \sup _{\Vert \delta _S\Vert \le t} \,\left| G_n \left[ \frac{V_i\varDelta _i}{G^o_i} {{\varvec{X}}}_{iS}^\top \delta _S\right] \right| ,\quad C^o(t) = \sup _{\Vert \delta _S\Vert \le t}\, \left| G_n \left[ \frac{V_i\varDelta _i}{G^o_i} W_i(\tau , \delta _S)\right] \right| . \end{aligned}$$

We have for $\lambda >0$,

$$\begin{aligned} E[\exp (\lambda B^o(t))]= & {} E\left[ E\left[ \exp \left( \lambda \sup _{\Vert \delta _S\Vert \le t} \, \left| G_n \left[ \frac{V_i \varDelta _i}{G^o_i} {{\varvec{X}}}_{iS}^\top \delta _S\right] \right| \right) \mid {{\varvec{X}}}, \{\varDelta _i\} \right] \right] \nonumber \\\le & {} E\left[ E\left[ \exp \left( \lambda \sqrt{s} t \max _{j \in S} \left| G_n \left[ \frac{V_i \varDelta _i}{G^o_i} x_{ij}\right] \right| \right) \mid {{\varvec{X}}}, \{\varDelta _i\} \right] \right] \nonumber \\\le & {} 2 E\left[ E\left[ \exp \left( \lambda \sqrt{s} t \max _{j \in S} G_n \left[ \frac{V_i \varDelta _i}{G^o_i} x_{ij}\right] \right) \mid {{\varvec{X}}}, \{\varDelta _i\} \right] \right] \nonumber \\\le & {} 2s E\left[ \exp \left( \lambda ^2 s t^2 \frac{\sum _i x_{ij}^2}{n} \frac{ \varDelta _i^2}{(G^o_i)^2}\right) \mid X, \{\varDelta _i\}\right] \nonumber \\\le & {} 2s \exp \left( \lambda ^2 s \tau _0^{-2} t^2\right) , \end{aligned}$$

(7)

where the first inequality is elementary, the second inequality follows from the fact that $E[\exp (|Z|)] \le 2E[\exp (Z)]$ for any symmetric random variable Z, the third inequality follows by the intermediate step in the proof of the Hoeffding’s inequality (page 100 in van der Vaart (1996)), and the last inequality follows from Assumption 5 and $\varDelta _i^2 \le 1$. Therefore, we have

$$\begin{aligned} P(B^o(t)> K_1) \le \min _{\lambda>0} \exp (-\lambda K_1) E[\exp (\lambda B^o(t))] \le \min _{\lambda >0}2s \exp (-\lambda K_1+\lambda ^2 s \tau _0^{-2} t^2). \end{aligned}$$

The minimum is achieved at $\lambda = K_1 \tau _0^{2} / (2st^2)$. Thus, we have

$$\begin{aligned} P(B^o(t) > K_1) \le 2s\exp (-K_1^2 \tau _0^{2} / (4st^2)). \end{aligned}$$

Next, for $\lambda >0$, we have

$$\begin{aligned} E[\exp (\lambda C^o(t))]= & {} E\left[ E\left[ \exp \left( \lambda \sup _{\Vert \delta _S\Vert \le t} \, \left| G_n \left[ \frac{V_i \varDelta _i}{G^o_i} W_i(\tau , \delta _S)\right] \right| \right) \mid {{\varvec{X}}}, \{\varDelta _i\} \right] \right] \\\le & {} E\left[ E\left[ \exp \left( 2\lambda \sup _{\Vert \delta _S\Vert \le t} \, \left| G_n \left[ \frac{V_i \varDelta _i}{G^o_i} {{\varvec{X}}}_{is}^\top \delta _S\right] \right| \right) \mid {{\varvec{X}}}, \{\varDelta _i\} \right] \right] \\\le & {} 2s \exp \left( 4\lambda ^2 s \tau _0^{-2} t^2\right) , \end{aligned}$$

where the first inequality follows from the contractive property $|W_i(\tau , u)-W_i(\tau ,v)| < |{{\varvec{X}}}_{iS}^\top (u-v)|$, and from Theorem 4.12 in Ledoux and Talagrand (1991) with the convex function $\exp (\cdot )$, and the second inequality follows from the same arguments used in (7).

Therefore, we have

$$\begin{aligned} P(C^o(t)> K_2) \le \min _{\lambda>0} \exp (-\lambda K_2) E[\exp (\lambda C^o(t))] \le \min _{\lambda >0}2s \exp (-\lambda K_2+4\lambda ^2 s \tau _0^{-2} t^2). \end{aligned}$$

The minimum is achieved at $\lambda = K_2 \tau _0^{2} / (8st^2)$. Thus, we have

$$\begin{aligned} P(C^o(t) > K_2) \le 2s\exp (-K_2^2 \tau _0^{2} / (16st^2)). \end{aligned}$$

Thus, we have

$$\begin{aligned} P(A^o(t) \ge M/4)\le & {} P(B^o(t) \ge M/8) + P(C^o(t) \ge M/8) \\\le & {} 4s\exp (-M^2 \tau _0^2 / (1024st^2)) \end{aligned}$$

Hence, $P(A(t) \ge M) \le Cs\exp (-M^2 \tau _0^2 / (1024st^2)) = Cs\exp (-M^2 K^2 \tau _0^2/ (1024d_1^2))$ for some $C>0$, provided that $M > 2\sqrt{\lambda _{M}} \tau _0^{-1} t = 2\sqrt{\lambda _{M}} \tau _0^{-1} d_1/\sqrt{s}$. Setting $M=A_3 d_1 \sqrt{\rho \log p}$, where $A_3:= 4\sqrt{\lambda _{M}} \tau _0^{-1}$, we have $P(A(t) \ge A_3 d_1 \sqrt{\rho \log p}) \lesssim s \exp (-C \rho \log p)$. Thus, we obtain

$$\begin{aligned} P(\max _{S: |S| \le \rho } A(t) \ge A_3 d_1 \sqrt{\rho \log p})\le & {} \sum _{s=1}^{\rho } {p \atopwithdelims ()s} P(A(t) \ge A_3 d_1 \sqrt{\rho \log p}) \\\lesssim & {} \sum _{s=1}^{\rho } (ep/s)^s s \exp (-C \rho \log p) \\\lesssim & {} \exp (-C \rho \log p). \end{aligned}$$

Combining with Lemma 2 and $\log n = o(s\rho \log p)$, which holds due to Assumption 2, we obtain

$$\begin{aligned} \sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } |\ell _S(\beta _S) - \ell _S(\tilde{\beta }_S) - E [\ell _S(\beta _S)] + E[\ell _S(\tilde{\beta }_S)]| \le 2A_3 d_1 \sqrt{\rho \log p/n}. \end{aligned}$$

This completes the proof. $\square$

Lemma 4

Suppose conditions of Theorem 1. Then, given a model S, for any $\beta _S$ satisfying $\Vert \beta _S - \beta ^*_S\Vert \le 4q \lambda _{M}^{-1/2}$,

$$\begin{aligned}&\frac{1}{8}\lambda_{m}\tau_0\underline{f} \Vert \beta _S - \beta ^*_S\Vert ^2 - C_3 n^{-1} \log n \le E[\ell _S(\beta _S)] - E[\ell _S(\beta ^*_S)]\\&\quad \le \frac{3\lambda _{M} \bar{f}}{2\tau _0} \Vert \beta _S - \beta ^*_S\Vert ^2 + C_3 n^{-1} \log n. \end{aligned}$$

for some positive constant $C_3$. For any $\beta _S$ satisfying $\Vert \beta _S - \beta ^*_S\Vert > 4q \lambda _{M}^{-1/2}$,

$$\begin{aligned} \frac{1}{4} q \lambda _{M}^{-1/2} \lambda _{m}\tau_0 \underline{f} \Vert \beta _S - \beta ^*_S\Vert \le E[\ell _S(\beta _S)] - E[\ell _S(\beta ^*_S)] \le 2\tau _0^{-1} \lambda _{M}^{1/2}\Vert \beta _S - \beta ^*_S\Vert . \end{aligned}$$

Proof

We first consider the following Case 1.

Case 1: When $\Vert \beta _S - \beta ^*_S\Vert \le 4q \lambda _{M}^{-1/2}$

Let $\delta _S := \beta _S - \beta ^*_S$. By the definition of $\beta ^*_S$, we have

$$\begin{aligned} E\left[ \frac{1}{n}\sum _{i=1}^n \frac{\varDelta _i}{G^o_i} {{\varvec{X}}}_{iS}\left( \tau - 1\{Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S < 0\}\right) \right] =0. \end{aligned}$$

Hence, we have

$$\begin{aligned}&\left| E\left[ \frac{1}{n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} {{\varvec{X}}}_{iS}^\top \delta _S \left( \tau - 1\{Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S< 0\}\right) \right] \right| \nonumber \\&= \left| E\left[ \frac{1}{n}\sum _{i=1}^n \varDelta _i \left( \frac{1}{{\hat{G}}_i}- \frac{1}{G^o_i} \right) {{\varvec{X}}}_{iS}^\top \delta _S \left( \tau - 1\{Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S < 0\}\right) \right] \right| \nonumber \\&\le \frac{1}{n} \sum _{i=1}^n E\left[ |{{\varvec{X}}}_{iS}^\top \delta _S|\right] O\left( n^{-1/2} (\log n)^{1/2}\right) \nonumber \\&\le E\left[ \left( \frac{1}{n}\sum _{i=1}^n |{{\varvec{X}}}_{iS}^\top \delta _S|^2\right) ^{1/2} \right] O\left( n^{-1/2} (\log n)^{1/2}\right) \nonumber \\&\le E\left[ \frac{1}{n}\sum _{i=1}^n |{{\varvec{X}}}_{iS}^\top \delta _S|^2 \right] ^{1/2} O\left( n^{-1/2} (\log n)^{1/2}\right) \\&\le \lambda _{M}^{1/2} \Vert \delta _S\Vert O\left( n^{-1/2} (\log n)^{1/2}\right) , \nonumber \end{aligned}$$

(8)

where the first inequality follows from Lemma 1, the second inequality follows from the Cauchy-Schwarz inequality, the third inequality follows from the fact that $EX^2 \ge [E|X|]^2$ for a random variable X, and the last inequality follows from Assumption 3.

Hence, by the identity by Knight (1998), we have for $\tilde{z} \in [0,z]$,

$$\begin{aligned}&E\left[ \frac{1}{n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} \rho _{\tau }(Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) - \frac{\varDelta _i}{{\hat{G}}_i} \rho _{\tau }(Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S) \right] \\= & {} E\left[ \frac{1}{n} \sum _{i=1}^n - \frac{\varDelta _i}{{\hat{G}}_i}{{\varvec{X}}}_{iS}^\top \delta _S \left( \tau - 1\{Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S < 0\}\right) + \frac{\varDelta _i}{{\hat{G}}_i} \int _0^{{{\varvec{X}}}_{iS}^\top \delta _S} F({{\varvec{X}}}_{iS}^\top \beta ^*_S.\right. \nonumber \\&\left. + z \mid {{\varvec{X}}}_{i}, \varDelta _i) - F({{\varvec{X}}}_{iS}^\top \beta ^*_S \mid {{\varvec{X}}}_{i}, \varDelta _i) dz \right] \nonumber \\= & {} E\left[ \frac{1}{n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} \int _0^{{{\varvec{X}}}_{iS}^\top \delta _S} F({{\varvec{X}}}_{iS}^\top \beta ^*_S + z \mid {{\varvec{X}}}_{i}, \varDelta _i) - F({{\varvec{X}}}_{iS}^\top \beta ^*_S \mid {{\varvec{X}}}_{i}, \varDelta _i) dz \right] +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \nonumber \\= & {} E\left[ \frac{1}{n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} \int _0^{{{\varvec{X}}}_{iS}^\top \delta _S} z f({{\varvec{X}}}_{iS}^\top \beta ^*_S \mid {{\varvec{X}}}_{i}, \varDelta _i) + \frac{z^2}{2} f'({{\varvec{X}}}_{iS}^\top \beta ^*_S + \tilde{z} \mid {{\varvec{X}}}_{i}, \varDelta _i) dz\right] +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \nonumber \\\ge & {} E\left[ \frac{\underline{f}}{2n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} |{{\varvec{X}}}_{iS}^\top \delta _S|^2 - \frac{\bar{f}'}{6n} \sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} |{{\varvec{X}}}_{iS}^\top \delta _S|^3\right] +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \nonumber \\\ge & {} \underbrace{E\left[ \frac{\underline{f}}{2n}\sum _{i=1}^n \varDelta _i |{{\varvec{X}}}_{iS}^\top \delta _S|^2 - \frac{\bar{f}'}{6n} \sum _{i=1}^n \frac{\varDelta _i}{\tau _0} |{{\varvec{X}}}_{iS}^\top \delta _S|^3\right] +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert }_{I}, \nonumber \end{aligned}$$

(9)

where the first inequality follows from Assumption 1 and the second inequality follows from Assumption 5. By Assumption 3, $\Vert \delta _S\Vert \le 4q \lambda _{M}^{-1/2}$, and $\varDelta _i \in \{0,1\}$, we have

$$\begin{aligned} E[\varDelta _i |{{\varvec{X}}}_{iS}^\top \delta _S|^2]^{1/2} \le \lambda _{M}^{1/2} \Vert \delta _S\Vert \le 4q \le \frac{3\tau_0\underline{f} }{2\bar{f}'} \frac{E[\varDelta _i ({{\varvec{X}}}_{iS}^\top \delta _S)^2]^{3/2}}{E[\varDelta _i |{{\varvec{X}}}_{iS}^\top \delta _S|^3]}, \end{aligned}$$

which gives

$$\begin{aligned} E\left[ \frac{\bar{f}'}{6n} \sum _{i=1}^n \frac{\varDelta _i}{\tau _0} |{{\varvec{X}}}_{iS}^\top \delta _S|^3\right] \le E\left[ \frac{\underline{f}}{4n}\sum _{i=1}^n \varDelta _i |{{\varvec{X}}}_{iS}^\top \delta _S|^2\right] . \end{aligned}$$

(10)

Hence, we have

$$\begin{aligned} I\ge & {} E\left[ \frac{\underline{f}}{4n}\sum _{i=1}^n \varDelta _i |{{\varvec{X}}}_{iS}^\top \delta _S|^2\right] + O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \\\ge & {} E\left[ \frac{\tau_0\underline{f}}{4n}\sum _{i=1}^n |{{\varvec{X}}}_{iS}^\top \delta _S|^2\right] + O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \\\ge & {} \frac{\lambda _{m} \tau_0\underline{f}}{4} \Vert \delta _S\Vert ^2 +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \\\ge & {} \frac{\lambda _{m} \tau_0\underline{f}}{8} \Vert \delta _S\Vert ^2 +O\left( n^{-1} \log n\right) \\\ge & {} \frac{\lambda _{m} \tau_0\underline{f}}{8} \Vert \delta _S\Vert ^2 -C_3 n^{-1} \log n \end{aligned}$$

for some $C_3>0$, where the second inequality follows from Assumption 5 and the third inequality follows from Assumption 3.

Furthermore, we have

$$\begin{aligned}&E\left[ \frac{1}{n} \frac{\varDelta _i}{{\hat{G}}_i} \sum _{i=1}^n \rho _{\tau }(Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) - \frac{\varDelta _i}{{\hat{G}}_i} \rho _{\tau }(Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S) \right] \\\le & {} E\left[ \frac{\bar{f}}{2n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} |{{\varvec{X}}}_{iS}^\top \delta _S|^2 \right] + E\left[ \frac{\bar{f}'}{6n} \sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} |{{\varvec{X}}}_{iS}^\top \delta _S|^3 \right] +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \\\le & {} E\left[ \frac{\bar{f}}{2n}\sum _{i=1}^n \frac{\varDelta _i}{\tau _0} |{{\varvec{X}}}_{iS}^\top \delta _S|^2 \right] + E\left[ \frac{\bar{f}'}{6n} \sum _{i=1}^n \frac{\varDelta _i}{\tau _0} |{{\varvec{X}}}_{iS}^\top \delta _S|^3 \right] +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \\\le & {} E\left[ \frac{3\bar{f}}{4n}\sum _{i=1}^n \frac{\varDelta _i}{\tau _0} |{{\varvec{X}}}_{iS}^\top \delta _S|^2 \right] +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \\\le & {} \frac{3\lambda _{M} \bar{f}}{4\tau _0} \Vert \delta _S\Vert ^2 +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \\\le & {} \frac{3\lambda _{M} \bar{f}}{2\tau _0} \Vert \delta _S\Vert ^2 + C_3 n^{-1} \log n, \end{aligned}$$

for some $C_3>0$, where the third inequality follows from (10). This completes the proof of the first part of the lemma. Next, we consider the following Case 2.

Case 2: When $\Vert \beta _S - \beta ^*_S\Vert > 4q \lambda _{M}^{-1/2}$

Let $\delta _S = \beta _S - \beta ^*_S$. Then, for $a := \Vert \delta _S\Vert / (4q \lambda _{M}^{-1/2}) > 1$, it holds that $\delta _S = a \tilde{\delta }_S$ for some $\Vert \tilde{\delta }_S\Vert = 4q \lambda _{M}^{-1/2}$. Define

$$\begin{aligned} Q(\delta _S) := E[\ell _S(\beta ^*_S+\delta _S)] - E[\ell _S(\beta ^*_S)] = E[\ell _S(\beta _S)] - E[\ell _S(\beta ^*_S)]. \end{aligned}$$

Because $Q(\cdot )$ is a convex function, we have

$$\begin{aligned} \frac{1}{a} Q(a \tilde{\delta }_S) + \left( 1-\frac{1}{a}\right) Q(0) \ge Q(\tilde{\delta }_S). \end{aligned}$$

Because $Q(0)=0$ and $\Vert \tilde{\delta }_S\Vert = 4q \lambda _{M}^{-1/2}$, we have

$$\begin{aligned} Q(\delta _S)= & {} Q(a \tilde{\delta }_S) \ge a Q(\tilde{\delta }_S) \ge \frac{a\lambda _{m} \tau_0\underline{f}}{8} \Vert \tilde{\delta }_S\Vert ^2 - a C_3 n^{-1} \log n \\= & {} \Vert \delta _S\Vert \left\{ 0.5 q \lambda _{M}^{-1/2} \lambda _{m} \tau_0\underline{f} + O\left( n^{-1}\log n \right) \right\} \\\ge & {} 0.25 q \lambda _{M}^{-1/2} \lambda _{m} \tau_0\underline{f} \Vert \delta _S\Vert , \end{aligned}$$

where the second inequality follows from Case 1.

Note that we have

$$\begin{aligned} E\left[ \rho _{\tau }(Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) - \rho _{\tau }(Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S) \mid X_i, \varDelta _i\right]\le & {} E[|{{\varvec{X}}}_{iS}^\top (\beta _S -\beta ^*_S)|\mid X_i, \varDelta _i] \\\le & {} E[|{{\varvec{X}}}_{iS}^\top (\beta _S -\beta ^*_S)|^2 \mid X_i, \varDelta _i]^{1/2} \\\le & {} \lambda _{M}^{1/2} \Vert \beta _S -\beta ^*_S\Vert , \end{aligned}$$

where the last inequality follows from Assumption 3. By Lemma 1 and Assumption 5, we also have ${\hat{G}}_i \gtrsim G^o_i -n^{-1/2} (\log n)^{1/2} \ge \tau _0 -n^{-1/2} (\log n)^{1/2} \gtrsim 0.5 \tau _0$. Hence,

$$\begin{aligned} E\left[ \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i}\rho _{\tau }(Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) - \frac{\varDelta _i}{{\hat{G}}_i} \rho _{\tau }(Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S) \right] \le 2\tau _0^{-1} \lambda _{M}^{1/2} \Vert \delta _S\Vert . \end{aligned}$$

This completes the proof of the second part of the lemma. $\square$

Lemma 5

Under conditions of Theorem 1, there exist some universal constants $A_4, A_5 >0$ such that

$$\begin{aligned} \Vert \hat{\beta }_S - \beta ^*_S\Vert \le A_4 \sqrt{\rho ^2 \log p / n},\quad |\ell _S(\hat{\beta }_S) - \ell _S(\beta ^*_S)| \le A_5 \rho ^2 \log p/n \end{aligned}$$

hold uniformly over $S: |S| \le \rho$, with probability at least $1-6\exp (-6\rho \log p)$.

Proof

We first define an event

$$\begin{aligned} E = \left\{ \sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } |\ell _S(\beta _S) - \ell _S(\beta ^*_S) - E [\ell _S(\beta _S)] + E[\ell _S(\beta ^*_S)]| \le 2A_3 d_1 \sqrt{\rho \log p/n} \right\} . \end{aligned}$$

By Lemma 3, the event E holds with probability at least $1-6\exp (-6\rho \log p)$.

If $\Vert \beta _S-\beta ^*_S\Vert =A_4 \sqrt{\rho ^2 \log p / n}$, then $\Vert \beta _S-\beta ^*_S\Vert \le A_4 \sqrt{\rho ^3 \log p /n} / \sqrt{s}$ and $\Vert \beta _S - \beta ^*_S\Vert \le 4q \lambda _{M}^{-1/2}$ because $s \le \rho$ and $\rho ^2 \log p = o(n)$. Thus, we have under the event E,

$$\begin{aligned} \ell _S(\beta ^*_S) - \ell _S(\beta _S)= & {} \ell _S(\beta ^*_S) - E[\ell _S(\beta ^*_S)] - \ell _S(\beta _S) + E[\ell _S(\beta _S)] + E[\ell _S(\beta ^*_S)] - E[\ell _S(\beta _S)] \\\le & {} 2A_3 A_4 \sqrt{\rho ^3 \log p /n} \,\sqrt{\rho \log p/n} - \frac{1}{8}\lambda _{m}\tau_0\underline{f} \Vert \beta _S - \beta ^*_S\Vert ^2 +O\left( n^{-1} \log n\right) \\\le & {} 2A_3 A_4 \rho ^2 \log p /n - \frac{1}{8} \lambda _{m} \tau_0\underline{f} A_4^2 {\rho ^2 \log p / n} + O\left( n^{-1} \log n\right) \\< & {} 0, \end{aligned}$$

where the first inequality follows from Lemma 4 and the last inequality holds for sufficiently large constant $A_4>0$ due to the growth rate condition of $\rho$ in Assumption 1. Thus

$$\begin{aligned} \min _{|S| \le \rho , \Vert \beta _S-\beta ^*_S\Vert =A_4 \sqrt{\rho ^2 \log p / n}} \ell _S(\beta _S) -\ell _S(\beta ^*_S) > 0. \end{aligned}$$

By the convexity of $\ell _S(\cdot )$, $\Vert \hat{\beta }_S-\beta ^*_S\Vert \le A_4 \sqrt{\rho ^2 \log p / n}$. This proves the first part.

For any $\beta _S$ satisfying $\Vert \beta _S - \beta ^*_S\Vert \le A_4 \sqrt{\rho ^2 \log p / n}$, we have under the event E,

$$\begin{aligned} |\ell _S(\beta ^*_S) - \ell _S(\beta _S)|\le & {} |\ell _S(\beta ^*_S) - E[\ell _S(\beta ^*_S)] - \ell _S(\beta _S) + E[\ell _S(\beta _S)]| + |E[\ell _S(\beta ^*_S)] - E[\ell _S(\beta _S)]| \\\le & {} 2A_3 A_4 \rho ^2 \log p /n + 1.5 \lambda _{M} \bar{f} \tau _0^{-1} \Vert \beta _S - \beta ^*_S\Vert ^2 + O\left( n^{-1} \log n\right) \\\le & {} 2A_3 A_4 \rho ^2 \log p /n + 1.5 \lambda _{M} \bar{f} \tau _0^{-1} A_4^2 {\rho ^2 \log p / n} + O\left( n^{-1} \log n\right) \\\le & {} A_5 \rho ^2 \log p/n \end{aligned}$$

for some constant $A_5>0$, where the second inequality follows from Lemma 4, and the last inequality follows from the growth rate condition of $\rho$ in Assumption 1. This completes the proof. $\square$

Lemma 6

Given a model S such that $|S| < \rho$, $M_\tau \not \subseteq S$, under Assumption 4, there exists $j \in M_\tau \setminus S$ such that $[\beta ^*_{S \bigcup \{j\}}]_j \ne 0$. Further, if Assumption 1 holds, then there exists $j \in M_\tau \setminus S$ such that $|[\beta ^*_{S \bigcup \{j\}}]_j| \ge C \bar{f}^{-1} \tau _0 n^{-\alpha }$, where $C>0$ is defined in Assumption 4.

Proof

Suppose that $[\beta ^*_{S \bigcup \{j\}}]_j = 0$ for all $j \in M_\tau \setminus S$. Fix $j \in M_\tau \setminus S$. As $\beta ^*_{S \bigcup \{j\}}$ is the minimizer of $E[\ell ^o_{S \bigcup \{j\}}(\beta )]$, by the convexity of $E[\ell ^o_{S \bigcup \{j\}}(\beta )]$, it holds that

$$\begin{aligned} 0= & {} E\left[ \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{G^o_i} x_{ij} \psi _\tau \left( Y_i-{{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}}\right) \right] \\= & {} E\left[ \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{G^o_i} x_{ij} \psi _\tau \left( Y_i-{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S\right) \right] , \end{aligned}$$

where the second equality follows from the fact that $[\beta ^*_{S \bigcup \{j\}}]_j = 0$.

which contradicts to Assumption 4. Thus, there exists $j \in M_\tau \setminus S$ such that $[\beta ^*_{S \bigcup \{j\}}]_j \ne 0$.

Further, by Assumption 4, there exists $j \in M_\tau \setminus S$ such that

$$\begin{aligned} Cn^{-\alpha }\le & {} \left| E\left[ \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{G^o_i} x_{ij} \psi _\tau \left( Y_i-{{\varvec{X}}}_{iS}^\top \left[ \beta ^*_{S \bigcup \{j\}}\right] _S\right) \right] \right| \\= & {} \left| E\left[ \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{G^o_i} x_{ij} \psi _\tau \left( Y_i-{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S\right) \right] \right. \\&\left. - E\left[ \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{G^o_i} x_{ij} \psi _\tau (Y_i- {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}})\right] \right| \\\le & {} \left| E\left[ \frac{1}{n} \sum _{i=1}^n \frac{|x_{ij}|}{G^o_i} P\left\{ Y_i \in \left( {{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S, {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}}\right) \mid {{\varvec{X}}}_i \right\} \right] \right| \\\le & {} \left| E\left[ \frac{1}{n} \sum _{i=1}^n {\bar{f}} \frac{x_{ij}^2}{G^o_i} |[\beta ^*_{S \bigcup \{j\}}]_j| \right] \right| \\= & {} \frac{{\bar{f}}}{\tau _0} \left| [\beta ^*_{S \bigcup \{j\}}]_j\right| , \end{aligned}$$

where the first equation holds due to the definition of $\beta ^*_{S \bigcup \{j\}}$, the third inequality holds due to Assumption 1 and the fact that $\left| {{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S-{{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}}\right| = \left| x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j\right|$, and the last equality follows from the fact that $E[x_{ij}^2]=1$ and Assumption 6. This completes the second part of Lemma 6. $\square$

1.2 Proof of Theorem 1

Proof

We first define an event

$$\begin{aligned} E= & {} \left\{ \sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } |\ell _S(\beta _S) - \ell _S(\beta ^*_S) - E [\ell _S(\beta _S)] + E[\ell _S(\beta ^*_S)]| \le 2A_3 d_1 \sqrt{\rho \log p/n} \right\} \\&\bigcup \left\{ \sup _{|S| \le \rho } \Vert \hat{\beta }_S - \beta ^*_S\Vert \le A_4 \sqrt{\rho ^2 \log p / n},\quad \sup _{|S| \le \rho } |\ell _S(\hat{\beta }_S) - \ell _S(\beta ^*_S)| \le A_5 \rho ^2 \log p/n \right\} \end{aligned}$$

By Lemmas 3 and 5, the event E holds with probability at least $1-6\exp (-6\rho \log p)$. Under the event E, we have $\hat{\beta }_S \in B_S^o(d_1) = \{\beta _S:\ \Vert \beta _S-\beta ^*_S\Vert \le d_1/\sqrt{s}\}$, where $d_1 := A_4 \sqrt{s\rho ^2 \log p / n}$. Note that

$$\begin{aligned}&\ell _S(\hat{\beta }_S)-\ell _{S \bigcup \{j\}}(\beta ^*_{S \bigcup \{j\}}) \\&= \ell _S(\hat{\beta }_S) - \ell _S(\beta ^*_S) + \ell _S(\beta ^*_S) - \ell _{S \bigcup \{j\}}(\beta ^*_{S\bigcup \{j\}}) \\&= \underbrace{\ell _S(\hat{\beta }_S) - \ell _S(\beta ^*_S) - E \left[ \ell _S(\hat{\beta }_S)\right] + E[\ell _S(\beta ^*_S)]}_{I_1} + \underbrace{E[\ell _S(\hat{\beta }_S)] - E[\ell _S(\beta ^*_S)]}_{I_2} + \underbrace{\ell _S(\beta ^*_S) - \ell _{S \bigcup \{j\}}(\beta ^*_{S\bigcup \{j\}})}_{I_3}. \end{aligned}$$

Under the event E, $I_1 \ge -2A_3 d_1 \sqrt{\rho \log p/n} \ge -2A_3 A_4 \rho ^2 \log p /n$. Also by Lemma 4, we obtain $I_2 \ge \frac{1}{8}\lambda _{m}\underline{f} \tau _0 \Vert \hat{\beta }_S - \beta ^*_S\Vert ^2 - C_3 n^{-1} \log n \ge - C_3 n^{-1} \log n$. Next, we have

$$\begin{aligned} I_3= & {} \underbrace{\ell _S(\beta ^*_S) -E \ell _S(\beta ^*_S) + E \ell _{S \bigcup \{j\}}(\beta ^*_{S\bigcup \{j\}}) - \ell _{S \bigcup \{j\}}(\beta ^*_{S\bigcup \{j\}})}_{I_{31}} \\&+ \underbrace{E\ell _S(\beta ^*_S) - E \ell _{S \bigcup \{j\}}(\beta ^*_{S\bigcup \{j\}})}_{I_{32}}. \end{aligned}$$

Using the similar arguments in the proof of Lemma 3, with probability at least $1-6\exp (-6\rho \log p)$, we have $I_{31} \ge -2 \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert A_3 \rho \sqrt{\log p/n}$.

Next, we consider $I_{32}$. Note that by Lemma 6, we can consider $j \in S^c$ such that $|[\beta ^*_{S \bigcup \{j\}}]_j| \ge C \bar{f}^{-1} \tau _0 n^{-\alpha }$, i.e., $\Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert \ge C \bar{f}^{-1}\tau _0 n^{-\alpha }$. By Lemma 4, we have

$$\begin{aligned} I_{32} \gtrsim \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert ^2 \wedge \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert - C_3 n^{-1} \log n. \end{aligned}$$

Hence, we have with probability at least $1-6\exp (-6\rho \log p)$,

$$\begin{aligned} I_3 \gtrsim \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert ^2 \wedge \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert - n^{-1} \log n - \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert \rho \sqrt{\log p/n}. \end{aligned}$$

Hence, if $\Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert \gtrsim 1$, then

$$\begin{aligned} I_3 \gtrsim \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert - n^{-1} \log n \gtrsim C \bar{f}^{-1} \tau _0 n^{-\alpha } - n^{-1} \log n, \end{aligned}$$

where the inequality follows from $\Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert \ge |[\beta ^*_{S \bigcup \{j\}}]_j| \ge C \bar{f}^{-1} \tau _0 n^{-\alpha }$. Else if $\Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert \lesssim 1$, using the similar argument,

$$\begin{aligned} I_3\gtrsim & {} \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert ^2 - n^{-1} \log n - \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert \rho \sqrt{\log p/n} \\\gtrsim & {} \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert ^2 - n^{-1} \log n - \rho \sqrt{\log p/n} \\\gtrsim & {} C^2 \bar{f}^{-2} \tau _0^2 n^{-2\alpha } - n^{-1} \log n - \rho \sqrt{\log p/n}. \end{aligned}$$

Hence, we have with probability at least $1-6\exp (-6\rho \log p)$,

$$\begin{aligned} I_3 \gtrsim C^2 \bar{f}^{-2} \tau _0^2 n^{-2\alpha } - n^{-1} \log n - \rho \sqrt{\log p/n}. \end{aligned}$$

Combining these inequalities, we have

$$\begin{aligned} \max _{j \in S^c} \ell _S(\hat{\beta }_S)-\ell _{S \bigcup \{j\}} (\hat{\beta }_{S \bigcup \{j\}})\ge & {} C^2 \bar{f}^{-2} \tau _0^2 n^{-2\alpha } -\rho ^2 \log p /n - \rho \sqrt{\log p/n}- n^{-1} \log n\\\gtrsim & {} 0.5 C^2 \bar{f}^{-2} \tau _0^2 n^{-2\alpha } \end{aligned}$$

uniformly in S satisfying $|S|<\rho ,\ M_\tau \nsubseteq S$, where the last inequality follows from the growth rate condition of $\rho$ in Assumption 4. By letting $C_1 := 0.5 C^2 \bar{f}^{-2} \tau _0^2$, this completes the proof. $\square$

1.3 Proof of Theorem 2

Proof

The proof consists of the three steps.

Claim 1: It holds that $|[\beta ^*_{S \bigcup \{j\}}]_j| =o(n^{-\alpha })$ for any $j \in (M_\tau \bigcup S)^c$.

Proof of Claim 1: We have by Assumption 1, if ${{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}} -{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S \ge 0$, i.e., $x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j \ge 0$, then

$$\begin{aligned}&E\left[ \psi _\tau (Y_i-{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S) - \psi _\tau (Y_i- {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}}) \mid \varDelta _i, {{\varvec{X}}}_i\right] \\&= P\left( Y_i \le {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}} \mid \varDelta _i, {{\varvec{X}}}_i\right) - P\left( Y_i \le {{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S \mid \varDelta _i, {{\varvec{X}}}_i\right) \\&\ge \underline{f} \left( {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}} -{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S \right) \\&= \underline{f} x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j. \end{aligned}$$

Else, if ${{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}} -{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S < 0$, i.e., $x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j < 0$, then

$$\begin{aligned} E\left[ \psi _\tau (Y_i-{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S) - \psi _\tau (Y_i- {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}}) \mid \varDelta _i, {{\varvec{X}}}_i\right] \le \underline{f} x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j. \end{aligned}$$

Thus, we have

$$\begin{aligned}&E\left[ x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j\psi _\tau (Y_i-{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S) -x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j \psi _\tau (Y_i- {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}}) \mid \varDelta _i, {{\varvec{X}}}_i\right] \nonumber \\&\ge \underline{f} x^2_{ij} [\beta ^*_{S \bigcup \{j\}}]^2_j. \end{aligned}$$

(11)

By the conditions of Theorem 2 and the fact that $E[\varDelta _i \mid {{\varvec{X}}}_i] = G^o_i$, we have for any $j \in (M_\tau \bigcup S)^c$,

$$\begin{aligned}&o(n^{-\alpha }|[\beta ^*_{S \bigcup \{j\}}]_j|) \\= & {} \left| E\left[ \frac{\varDelta _i}{G^o_i} x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j \psi _\tau (Y_i-{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S)\right] \right| \\= & {} \left| E\left[ \frac{\varDelta _i}{G^o_i} x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j \psi _\tau (Y_i-{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S)\right] - E\left[ \frac{\varDelta _i}{G^o_i} x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j \psi _\tau (Y_i- {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}})\right] \right| \\= & {} \left| E\left[ \frac{\varDelta _i}{G^o_i} E\left[ x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j \psi _\tau (Y_i-{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S) - x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j \psi _\tau (Y_i- {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}}) \mid \varDelta _i, {{\varvec{X}}}_i \right] \right] \right| \\\ge & {} E\left[ \underline{f} x^2_{ij} [\beta ^*_{S \bigcup \{j\}}]^2_j \frac{\varDelta _i}{G^o_i}\right] \\= & {} \underline{f}[\beta ^*_{S \bigcup \{j\}}]^2_j, \end{aligned}$$

where the second equation holds due to the definition of $\beta ^*_{S \bigcup \{j\}}$, the first inequality follows from (11), and the last equality follows from the fact that $E[\varDelta _i] = G^o_i$ and $E[x_{ij}^2]=1$. Thus, $|[\beta ^*_{S \bigcup \{j\}}]_j| =o(n^{-\alpha })$ for any $j \in (M_\tau \bigcup S)^c$.

Claim 2: It holds that $\max _{|S| < \rho , j \in (M_\tau \bigcup S)^c} \ -\ell _{S \bigcup \{j\}}(\hat{\beta }_{S \bigcup \{j\}}) + \ell _S(\hat{\beta }_S) \le 0.5 C_1 n^{-2\alpha }$ with probability at least $1-36 \exp (-6\rho \log p)$.

Proof of Claim 2: We can write

$$\begin{aligned}&\max _{|S|< \rho , j \in (M_\tau \bigcup S)^c} \ -\ell _{S \bigcup \{j\}}(\hat{\beta }_{S \bigcup \{j\}}) + \ell _S(\hat{\beta }_S) \\\le & {} \underbrace{\max _{|S|< \rho , j \in (M_\tau \bigcup S)^c} -\ell _{S \bigcup \{j\}}(\hat{\beta }_{S \bigcup \{j\}}) + \ell _{S \bigcup \{j\}}(\beta ^*_{S \bigcup \{j\}})}_{I_1} +\underbrace{\max _{|S| < \rho , j \in (M_\tau \bigcup S)^c} -\ell _{S \bigcup \{j\}}(\beta ^*_{S \bigcup \{j\}}) + \ell _S(\hat{\beta }_S)}_{I_2}. \end{aligned}$$

Using the similar arguments in Lemma 5, we have with high probability at least $1-6\exp (-6\rho \log p)$,

$$\begin{aligned} |I_1| \le A_5 \rho ^2 \log p/n. \end{aligned}$$

(12)

Next, we can decompose $I_2$ as

$$\begin{aligned} I_2 = \underbrace{\ell _S(\hat{\beta }_S) - \ell _S(\beta ^*_S) - E \left[ \ell _S(\hat{\beta }_S)\right] + E[\ell _S(\beta ^*_S)]}_{I_{21}} + \underbrace{E[\ell _S(\hat{\beta }_S)] - E[\ell _S(\beta ^*_S)]}_{I_{22}} +\underbrace{\ell _S(\beta ^*_S)-\ell _{S \bigcup \{j\}}(\beta ^*_{S \bigcup \{j\}})}_{I_{23}}. \end{aligned}$$

Because $\Vert \hat{\beta }_S - \beta ^*_S\Vert \le A_4 \sqrt{\rho ^2 \log p / n}$ by Lemma 5, we have by Lemma 3,

$$\begin{aligned} I_{21} \le 2A_3 A_4 \rho ^2 \log p/n. \end{aligned}$$

(13)

By Lemma 4, we have

$$\begin{aligned} I_{22} \le 1.5 \lambda _{M} \tau _0^{-1} \bar{f} \Vert \hat{\beta }_S - \beta ^*_S\Vert ^2 + C_3 n^{-1} \log n \le 1.5 \lambda _{M} \tau _0^{-1} \bar{f} A_4^2 \rho ^2 \log p / n + C_3 \log n/n. \end{aligned}$$

(14)

Next, we consider $I_{23}$. We obtain

$$\begin{aligned} I_{23}= & {} \underbrace{\ell _S(\beta ^*_S) - \ell _{S \bigcup \{j\}}(\beta ^*_{S\bigcup \{j\}}) -E \ell _S(\beta ^*_S) + E \ell _{S \bigcup \{j\}}(\beta ^*_{S\bigcup \{j\}})}_{I_{231}} \\&+ \underbrace{E\ell _S(\beta ^*_S) - E \ell _{S \bigcup \{j\}}(\beta ^*_{S\bigcup \{j\}})}_{I_{232}}. \end{aligned}$$

By Lemma 2 and Assumption 3, we have with probability at least $1-6\exp (-6\rho \log p)$,

$$\begin{aligned} I_{231}\le & {} 2A_3 \rho \sqrt{\frac{\log p}{n}} \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert \\\le & {} 2A_3 \rho \sqrt{\frac{\log p}{n}} \lambda _{m}^{-1/2} {\bar{f}}^{-1/2} E\left[ \frac{{\bar{f}}}{n} \sum _{i=1}^n \left| {{\varvec{X}}}_i^\top \left( \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\right) \right| ^2 \right] ^{1/2} \\\le & {} 4A_3^2 \rho ^2 \lambda _{m}^{-1} {\bar{f}}^{-1} \frac{\log p}{n} + E\left[ \frac{{\bar{f}}}{n} \sum _{i=1}^n \left| {{\varvec{X}}}_i^\top \left( \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\right) \right| ^2 \right] , \end{aligned}$$

where the last inequality follows from $xy \le x^2 + y^2$ for any non-negative numbers x and y.

Furthermore, by (8) and (9) in Lemma 4,

$$\begin{aligned} I_{232}\le & {} E\left[ \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i}\int _0^{{{\varvec{X}}}_i^\top ( \beta ^*_{S \bigcup \{j\}} - \beta ^*_S)} F\left( {{\varvec{X}}}_{i, S \bigcup \{j\}} \beta ^*_{S \bigcup \{j\}} +z\right) - F\left( {{\varvec{X}}}_{i, S \bigcup \{j\}} \beta ^*_{S \bigcup \{j\}} \right) dz\right] \\&+ O\left( n^{-1/2} (\log n)^{1/2}\right) E\left[ \frac{\bar{f}}{2n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} \left| {{\varvec{X}}}_{i}^\top \left( \beta ^*_{S \bigcup \{j\}} - \beta ^*_S\right) \right| ^2 \right] ^{1/2} \\\le & {} E\left[ \frac{\bar{f}}{2n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} |{{\varvec{X}}}_{i}^\top (\beta ^*_{S \bigcup \{j\}} - \beta ^*_S)|^2 \right] + O\left( n^{-1/2} (\log n)^{1/2}\right) E\left[ \frac{\bar{f}}{2n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} \left| {{\varvec{X}}}_{i}^\top \left( \beta ^*_{S \bigcup \{j\}} - \beta ^*_S\right) \right| ^2 \right] ^{1/2} \\\le & {} E\left[ \frac{\bar{f}}{n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} |{{\varvec{X}}}_{i}^\top (\beta ^*_{S \bigcup \{j\}} - \beta ^*_S)|^2 \right] + O\left( n^{-1} \log n \right) \\\le & {} E\left[ \frac{\bar{f}}{n}\sum _{i=1}^n \frac{2\varDelta _i}{G^o_i} |{{\varvec{X}}}_{i}^\top (\beta ^*_{S \bigcup \{j\}} - \beta ^*_S)|^2 \right] + O\left( n^{-1} \log n \right) \\= & {} E\left[ \frac{2\bar{f}}{n}\sum _{i=1}^n |{{\varvec{X}}}_{i}^\top (\beta ^*_{S \bigcup \{j\}} - \beta ^*_S)|^2 \right] + O\left( n^{-1} \log n \right) \\\le & {} E\left[ \frac{2\bar{f}}{n}\sum _{i=1}^n |{{\varvec{X}}}_{i}^\top \delta _S|^2 \right] + E\left[ \frac{2\bar{f}}{n}\sum _{i=1}^n x_{ij}^2 [\beta ^*_{S \bigcup \{j\}}]^2_j \right] + O\left( n^{-1} \log n \right) , \end{aligned}$$

where the second inequality follows from Assumption 1, the third inequality follows from $xy \le x^2 + y^2$ for any non-negative numbers x and y, the fourth inequality follows from Assumption 5 and Lemma 1, and the last equality follows from the fact that $E\left[ \varDelta _i \mid {{\varvec{X}}}_i \right] = G^o_i$.

Hence, we have

$$\begin{aligned} I_{23}\le & {} I_{231} + I_{232} \nonumber \\\le & {} 4A_3^2 \rho ^2 \lambda _{m}^{-1} {\bar{f}}^{-1} \frac{\log p}{n}+ E\left[ \frac{3{\bar{f}}}{n} \sum _{i=1}^n \left| {{\varvec{X}}}_i^\top \left( \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\right) \right| ^2 \right] + O\left( \frac{\log n}{n} \right) . \end{aligned}$$

(15)

Let $\delta _S:= [\beta ^*_{S \bigcup \{j\}}]_S - \beta ^*_S$. By the definition of $\beta ^*_S$ and $\beta ^*_{S \bigcup \{j\}}$, we have

$$\begin{aligned} 0\ge & {} E[\ell ^o_{S \bigcup \{j\}}(\beta ^*_{S \bigcup \{j\}})] - E[\ell ^o_S(\beta ^*_S)] \nonumber \\= & {} E\left[ \frac{1}{n} \sum _{i=1}^n -\frac{\varDelta _i}{G^o_i} ({{\varvec{X}}}_{iS}^\top \delta _S + x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j) \left( \tau - 1\{Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S< 0\}\right) \right. \nonumber \\&\quad \quad + \left. \frac{\varDelta _i}{G^o_i}\int _0^{{{\varvec{X}}}_{iS}^\top \delta _S + x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j} F({{\varvec{X}}}_{iS}^\top \beta ^*_S + z) - F({{\varvec{X}}}_{iS}^\top \beta ^*_S) dz \right] \nonumber \\= & {} E\left[ \frac{1}{n} \sum _{i=1}^n -\frac{\varDelta _i}{G^o_i} (x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j) \left( \tau - 1\{Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S < 0\}\right) \right. \nonumber \\&\quad \quad \left. + \frac{\varDelta _i}{G^o_i} \int _0^{{{\varvec{X}}}_{iS}^\top \delta _S + x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j} F({{\varvec{X}}}_{iS}^\top \beta ^*_S + z) - F({{\varvec{X}}}_{iS}^\top \beta ^*_S) dz \right] \nonumber \\\ge & {} -o(n^{-\alpha })|[\beta ^*_{S \bigcup \{j\}}]_j| + E\left[ \frac{\underline{f}}{2n}\sum _{i=1}^n \left| {{\varvec{X}}}_{i}^\top \delta _S + x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j \right| ^2 \right] \nonumber \\= & {} -o(n^{-\alpha }) |[\beta ^*_{S \bigcup \{j\}}]_j| + E\left[ \frac{\underline{f}}{2n}\sum _{i=1}^n \left| {{\varvec{X}}}_{i}^\top \left( \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\right) \right| ^2 \right] , \end{aligned}$$

(16)

where the first equality holds due to the identity of Knight (1998), and the second inequality follows from the condition of Theorem 2, Assumption 1, and the fact that $E[\varDelta _i \mid {{\varvec{X}}}_i]=G^o_i$. Combining (15) and (16), we obtain

$$\begin{aligned} I_{23}\le & {} 4A_3^2 \rho ^2 \lambda _{m}^{-1} {\bar{f}}^{-1} \frac{\log p}{n}+ o(n^{-\alpha }) |[\beta ^*_{S \bigcup \{j\}}]_j| + O\left( \frac{\log n}{n} \right) \nonumber \\\le & {} 4A_3^2 \rho ^2 \lambda _{m}^{-1} {\bar{f}}^{-1} \frac{\log p}{n}+ o(n^{-2\alpha }) + O\left( \frac{\log n}{n} \right) . \end{aligned}$$

(17)

Combining (13), (14), and (17), we obtain

$$\begin{aligned} I_2 = I_{21}+I_{22}+I_{23} \le C_4 \rho ^2 \log p/n + C_3 \log n/n+ o(n^{-2\alpha }) + O\left( \frac{\log n}{n} \right) \end{aligned}$$

(18)

for some constant $C_4>0$. Hence, we have by (12) and (18), with probability at least $1-36 \exp (-6\rho \log p)$,

$$\begin{aligned}&\max _{|S| < \rho , j \in (M_\tau \bigcup S)^c} \ -\ell _{S \bigcup \{j\}}(\hat{\beta }_{S \bigcup \{j\}}) + \ell _S(\hat{\beta }_S) \\\le & {} A_5 \rho ^2 \log p/n +C_4 \rho ^2 \log p/n + C_3 \log n/n+ o(n^{-2\alpha }) + O\left( \frac{\log n}{n} \right) \\\le & {} 0.5 C_1 n^{-2\alpha }, \end{aligned}$$

where the last inequality follows from the growth rate condition of $\rho$ in Assumption 4.

Claim 3: It holds that $M_\tau \subset F_{C_2 |M_\tau |}$ with probability at least $1-36 \exp (-3\rho \log p)$.

Proof of Claim 3: By Claim 2 and the fact that $\log n \log p / n = o(n^{-2\alpha })$, if $M_\tau \not \subseteq S$, CQ-FR would select a noise variable with probability less than $36 \exp (-6\rho \log p)$. For $k > |M_\tau |$, $M_\tau \not \subset F_k$ implies that at least $k-|M_\tau |$ noise variables are selected within the k steps. Then for $k=C_2|M_\tau |$ with $C_2 > 1$,

$$\begin{aligned} P(M_\tau \not \subset F_k)\le & {} \sum _{j=k-|M_\tau |}^k {k \atopwithdelims ()j} \{36 \exp (-6\rho \log p)\}^j \\\le & {} |M_\tau | k^{|M_\tau |} \{36 \exp (-6\rho \log p)\}^{k-|M_\tau |} \\\le & {} 36 \exp (-3\rho \log p). \end{aligned}$$

Therefore, $M_\tau \subset F_{C_2 |M_\tau |}$ with probability at least $1-36 \exp (-3\rho \log p)$. This completes the proof. $\square$

1.4 Proof of Theorem 3

Proof

By Theorem 2, CQ-FR does not stop at the kth step when $M_\tau \not \subseteq F_k$, and CQ-FR satisfies $M_\tau \subseteq F_k$ for some $k< C_2 |M_{\tau }|$ with probability going to one. Hence, with probability going to one, CQ-FR stops at the kth step if $\text{ EBIC}^c(F_{k+1}) > \text{ EBIC}^c(F_k)$.

Suppose that $M_\tau \nsubseteq F_{k-1}$ and $M_\tau \subseteq F_k$. We can easily see that $\text{ EBIC}^c(F_{k+1}) > \text{ EBIC}^c(F_k)$ if and only if $\log \ell _{F_{k}}(\hat{\beta }_{F_{k}}) - \log \ell _{F_{k+1}}(\hat{\beta }_{F_{k+1}}) \le \frac{\log n}{2n} \log p$.

Recall that $\beta (\tau )$ is the underlying quantile coefficient vector. Let

$$\begin{aligned} M_n(\tau ) := n^{-1/2} \sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} {{\varvec{X}}}_i \psi _{\tau }(Y_i-{{\varvec{X}}}_i^\top \beta (\tau )) \end{aligned}$$

and

$$\begin{aligned} H_{S,\tau } := \begin{bmatrix} E\left[ \frac{\varDelta _i}{{\hat{G}}_i}{{\varvec{X}}}_{iS} {{\varvec{X}}}^\top _{iS} f\left( {{\varvec{X}}}_{iS}^\top \beta (\tau ) \mid {{\varvec{X}}}_{iS}\right) \right] , &{} 0\\ 0, &{} 0 \end{bmatrix}, \quad H_{S,\tau }^{-1} := \begin{bmatrix} E\left[ \frac{\varDelta _i}{{\hat{G}}_i}{{\varvec{X}}}_{iS} {{\varvec{X}}}^\top _{iS} f\left( {{\varvec{X}}}_{iS}^\top \beta (\tau ) \mid {{\varvec{X}}}_{iS}\right) \right] ^{-1}, &{} 0\\ 0, &{} 0 \end{bmatrix}. \end{aligned}$$

Using the similar argument in the proof of Lemma 7.10 of Zheng et al. (2015) with the fact that $M_\tau \subseteq F_k \subset F_{k+1}$ we can show that with probability at least $1-16\exp (-\log p)-12/p$,

$$\begin{aligned}&\ell _{F_{k+1}}(\hat{\beta }_{F_{k+1}}) - \ell _{F_{k}}(\hat{\beta }_{F_{k}}) \\= & {} \ell _{F_{k+1}}(\hat{\beta }_{F_{k+1}}) - \ell _{F_{k+1}}(\beta (\tau )) - \left[ \ell _{F_{k}}(\hat{\beta }_{F_{k}}) - \ell _{F_{k+1}}(\beta (\tau )) \right] \\= & {} -\frac{M_n^\top (\tau ) [H_{F_{k+1},\tau }^{-1} - H_{F_k,\tau }^{-1}] M_n(\tau )}{4n} + o_p(\sqrt{k \log p/n^3}) \\\ge & {} -600 ( {f} \lambda _{m})^{-1} \log p/n + o_p(\sqrt{k \log p/n^3}), \end{aligned}$$

which yields with probability tending to 1,

$$\begin{aligned} \ell _{F_{k}}(\hat{\beta }_{F_{k}}) - \ell _{F_{k+1}}(\hat{\beta }_{F_{k+1}}) = o(\log p \log n/n). \end{aligned}$$

Using the similar argument in Lee et al. (2014), we have $\ell _{F_{k}}(\hat{\beta }_{F_{k}}), \ell _{F_{k+1}}(\hat{\beta }_{F_{k+1}}) \ge c_5$ for some constant $c_5>0$ with probability tending to one. Hence,

$$\begin{aligned} \log \ell _{F_{k}}(\hat{\beta }_{F_{k}}) - \log \ell _{F_{k+1}}(\hat{\beta }_{F_{k+1}}) \le \frac{\log n}{2n} \log p. \end{aligned}$$

This completes the proof. $\square$

1.5 Proofs of Theorems 4 and 5

Proof

Suppose that $\ell ^* \in M_\tau ^c$. Let $k:= \max _{j \in M_\tau \setminus F} |(\beta ^*_{F \bigcup \{j\}})_j|$. By Lemma 6, $|[\beta ^*_{F \bigcup \{k\}}]_k| \ge C \bar{f}^{-1} n^{-\alpha }$. By the definition of $\ell ^*$, we also have

$$\begin{aligned} \ell _{F \bigcup \{\ell ^*\}}(\hat{\beta }_{F \bigcup \{\ell ^*\}}) \le \ell _{F \bigcup \{k\}}(\hat{\beta }_{F \bigcup \{k\}}), \end{aligned}$$

which implies

$$\begin{aligned} \ell _{F \bigcup \{\ell ^*\}}(\beta ^*_{F \bigcup \{\ell ^*\}})\le & {} \ell _{F \bigcup \{k\}}(\beta ^*_{F \bigcup \{k\}}) \\&+ \lambda _{M}^{1/2} \Vert \hat{\beta }_{F \bigcup \{\ell ^*\}} - \beta ^*_{F \bigcup \{\ell ^*\}}\Vert + \lambda _{M}^{1/2} \Vert \hat{\beta }_{F \bigcup \{k\}} - \beta ^*_{F \bigcup \{k\}}\Vert \\\le & {} \ell _{F \bigcup \{k\}}(\beta ^*_{F \bigcup \{k\}}) + 2A_4\lambda _{M}^{1/2} \sqrt{\rho ^2 \log p / n}, \end{aligned}$$

where the first inequality follows from the Cauchy-Schwarz inequality, Assumption 3, and the fact that $\rho _{\tau }(\cdot )$ satisfies $|\rho _{\tau }(x) -\rho _{\tau }(y)| \le |x-y|$, and the last inequality follows from Lemma 5.

By Lemmas 2 and 3, this also implies

$$\begin{aligned}&E\left[ \ell ^o_{F \bigcup \{\ell ^*\}}(\beta ^*_{F \bigcup \{\ell ^*\}})\right] \lesssim E\left[ \ell ^o_{F \bigcup \{k\}}(\beta ^*_{F \bigcup \{k\}})\right] \nonumber \\&\quad + \sqrt{\rho ^2 \log p / n} + \sqrt{\frac{\log n}{n}} \Vert \beta ^*_{F \bigcup \{\ell ^*\}} - \beta ^*_{F \bigcup \{k\}}\Vert . \end{aligned}$$

(19)

Using the similar arguments in the proof of Lemma 4, we obtain

$$\begin{aligned}&E\left[ \ell ^o_{F \bigcup \{\ell ^*\}}(\beta ^*_{F \bigcup \{\ell ^*\}})\right] - E\left[ \ell ^o_{F \bigcup \{k\}}(\beta ^*_{F \bigcup \{k\}})\right] \nonumber \\= & {} E\left[ \rho _{\tau }(Y_i-{{\varvec{X}}}_{i}^\top \beta ^*_{F \bigcup \{\ell ^*\}}) - \rho _{\tau }(Y_i-{{\varvec{X}}}_{i}^\top \beta ^*_{F \bigcup \{k\}}) \right] \nonumber \\= & {} E\left[ - {{\varvec{X}}}_{i}^\top \left( \beta ^*_{F \bigcup \{\ell ^*\}}-\beta ^*_{F \bigcup \{k\}}\right) \left( \tau - 1\{Y_i-{{\varvec{X}}}_{i}^\top \beta ^*_{F \bigcup \{k\}}< 0\}\right) \right. \nonumber \\&+ \left. \int _0^{{{\varvec{X}}}_{i}^\top \left( \beta ^*_{F \bigcup \{\ell ^*\}}-\beta ^*_{F \bigcup \{k\}}\right) } F({{\varvec{X}}}_{i}^\top \beta ^*_{F \bigcup \{k\}} + z \mid {{\varvec{X}}}_{i}, \varDelta _i) - F({{\varvec{X}}}_{i}^\top \beta ^*_{F \bigcup \{k\}} \mid {{\varvec{X}}}_{i}, \varDelta _i) dz \right] \nonumber \\= & {} E\left[ - X_{i,\ell ^*} [\beta ^*_{F \bigcup \{\ell ^*\}}]_{\ell ^*} \left( \tau - 1\{Y_i-{{\varvec{X}}}_{i}^\top \beta ^*_{F \bigcup \{k\}} < 0\}\right) \right. \nonumber \\&+ \left. \int _0^{{{\varvec{X}}}_{i}^\top \left( \beta ^*_{F \bigcup \{\ell ^*\}}-\beta ^*_{F \bigcup \{k\}}\right) } F({{\varvec{X}}}_{i}^\top \beta ^*_{F \bigcup \{k\}} + z \mid {{\varvec{X}}}_{i}, \varDelta _i) - F({{\varvec{X}}}_{i}^\top \beta ^*_{F \bigcup \{k\}} \mid {{\varvec{X}}}_{i}, \varDelta _i) dz \right] \nonumber \\\ge & {} - o(n^{-\alpha }) [\beta ^*_{F \bigcup \{\ell ^*\}}]_{\ell ^*} +\frac{\underline{f}}{4} E\left[ \left\{ {{\varvec{X}}}_{i}^\top \left( \beta ^*_{F \bigcup \{\ell ^*\}}-\beta ^*_{F \bigcup \{k\}}\right) \right\} ^2\right] , \end{aligned}$$

(20)

where the third equality follows from the definition of $\beta ^*_{F \bigcup \{k\}}$, and the inequality follows from the same arguments in the proof of Lemma 4 using the condition in Theorem 4. Let $\delta _{F \bigcup \{\ell ^*\}} := \beta ^*_{F \bigcup \{\ell ^*\}}-[\beta ^*_{F \bigcup \{k\}}]_{F \bigcup \{\ell ^*\}}$. Then, have

$$\begin{aligned}&E\left[ \left\{ {{\varvec{X}}}_{i}^\top \left( \beta ^*_{F \bigcup \{\ell ^*\}}-\beta ^*_{F \bigcup \{k\}}\right) \right\} ^2\right] \nonumber \\&= E\left[ \left\{ {{\varvec{X}}}_{i, F \bigcup \{\ell ^*\}}^\top \delta _{F \bigcup \{\ell ^*\}} - X_{i,k} [\beta ^*_{F \bigcup \{k\}}]_k \right\} ^2\right] \nonumber \\&= E\left[ \left\{ {{\varvec{X}}}_{i, F \bigcup \{\ell ^*\}}^\top \delta _{F \bigcup \{\ell ^*\}} \right\} ^2\right] + [\beta ^*_{F \bigcup \{k\}}]_k^2 - 2 E\left[ {{\varvec{X}}}_{i, F \bigcup \{\ell ^*\}}^\top \delta _{F \bigcup \{\ell ^*\}} X_{i,k} [\beta ^*_{F \bigcup \{k\}}]_k \right] \nonumber \\&\ge E\left[ \left\{ {{\varvec{X}}}_{i, F \bigcup \{\ell ^*\}}^\top \delta _{F \bigcup \{\ell ^*\}} \right\} ^2\right] + [\beta ^*_{F \bigcup \{k\}}]_k^2\nonumber \\&- 2 \sqrt{E\left[ \left\{ {{\varvec{X}}}_{i, F \bigcup \{\ell ^*\}}^\top \delta _{F \bigcup \{\ell ^*\}}\right\} ^2 X^2_{i,k}\right] [\beta ^*_{F \bigcup \{k\}}]_k^2 }\nonumber \\&\ge E\left[ \left\{ {{\varvec{X}}}_{i, F \bigcup \{\ell ^*\}}^\top \delta _{F \bigcup \{\ell ^*\}} \right\} ^2\right] + [\beta ^*_{F \bigcup \{k\}}]_k^2\nonumber \\&- 2 \sqrt{ E\left[ \left\{ {{\varvec{X}}}_{i, F \bigcup \{\ell ^*\}}^\top \delta _{F \bigcup \{\ell ^*\}}\right\} ^2\right] (1-\epsilon ) [\beta ^*_{F \bigcup \{k\}}]_k^2 } \nonumber \\&\ge \epsilon [\beta ^*_{F \bigcup \{k\}}]_k^2 \end{aligned}$$

(21)

where the first inequality follows from $E[X]^2 \le E[X^2]$ for any random variable X, the second inequality follows from Assumption 8 and the fact that $x^2 + y^2 \ge 2xy$ for any numbers x, y. Combining (19)-(20) and Assumption 8, we have

$$\begin{aligned} \sqrt{\rho ^2 \log p / n} + \sqrt{\frac{\log n}{n}}\ge & {} - o(n^{-\alpha }) [\beta ^*_{F \bigcup \{\ell ^*\}}]_{\ell ^*} + \epsilon [\beta ^*_{F \bigcup \{k\}}]_k^2 \\\gtrsim & {} - o(n^{-2\alpha }) - [\beta ^*_{F \bigcup \{\ell ^*\}}]_{\ell ^*}^2 + \epsilon [\beta ^*_{F \bigcup \{k\}}]_k^2. \end{aligned}$$

Because $[\beta ^*_{F \bigcup \{k\}}]_k^2 \gtrsim n^{-2\alpha } \gtrsim n^{-2\alpha } + \sqrt{\rho ^2 \log p / n} + \sqrt{\frac{\log n}{n}}$ due to the rate condition of $\alpha$ in Assumption 4, this implies

$$\begin{aligned} |[\beta ^*_{F \bigcup \{\ell ^*\}}]_{\ell ^*}| \ge \frac{1}{C_B} |[\beta ^*_{F \bigcup \{k\}}]_k|. \end{aligned}$$

Hence, there exists some constant $C_L>\frac{1}{C_B}$ satisfying

$$\begin{aligned} \frac{|[\beta ^*_{F \bigcup \{\ell ^*\}}]_{\ell ^*}|}{\max _{k \in M_\tau \setminus F} |[\beta ^*_{F \bigcup \{k\}}]_k|} > C_L \end{aligned}$$

uniformly in $F \not \subseteq M_\tau$. Thus, we have $\ell^{\ast} \in M_\tau \setminus F$ for any $F \not \subseteq M_\tau$. This completes the proof. $\square$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, E.R., Park, S., Lee, S.K. et al. Quantile forward regression for high-dimensional survival data. Lifetime Data Anal 29, 769–806 (2023). https://doi.org/10.1007/s10985-023-09603-w

Download citation

Received: 10 October 2022
Accepted: 17 May 2023
Published: 02 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10985-023-09603-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Quantile forward regression for high-dimensional survival data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Bayesian variable selection approach to longitudinal quantile regression

Feature Screening for High-Dimensional Survival Data via Censored Quantile Correlation

Conditional-quantile screening for ultrahigh-dimensional survival data via martingale difference correlation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 146 KB)

Appendix 1 (Theoretical Details)

Appendix 1 (Theoretical Details)

1.1 Lemmas and proofs

Lemma 1

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

Lemma 6

Proof

1.2 Proof of Theorem 1

Proof

1.3 Proof of Theorem 2

Proof

1.4 Proof of Theorem 3

Proof

1.5 Proofs of Theorems 4 and 5

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation