Appendix 1 (Theoretical Details)
Throughout the proof, for any set \(S \subseteq \{1,\ldots ,p\}\), let
$$\begin{aligned} \ell _S(\beta _S) := \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{\hat{G}_i} \rho _\tau (Y_i^C-{{\varvec{X}}}_{iS}^\top \beta _S), \ell ^o_S(\beta _S) := \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{G^o_i} \rho _\tau (Y_i^C-{{\varvec{X}}}_{iS}^\top \beta _S), \end{aligned}$$
where \(G^o_i := G(Y_i^C)\) is the censoring probability, and \(\beta ^*_S := \mathop {\mathrm {arg\,min}}\limits _{\beta _S} E[\ell _S(\beta _S)]\). Given a random sample \(Z_1,\ldots , Z_n\) and a function \(f(\cdot )\), let \(G_n(f) := n^{-1/2} \sum _{i=1}^n (f(Z_i)-E[f(Z_i)])\).
We first introduce useful lemmas and prove the main theorems.
1.1 Lemmas and proofs
The following Lemma 1 is given in Lemma 8.4 of He et al. (2013).
Lemma 1
Suppose Assumptions 5-6. Then, the Kaplan-Meier estimator \({\hat{G}}(t)\) satisfies
$$\begin{aligned} \sup _{0 \le t \le T} |{\hat{G}}(t)- G(t)|= & {} O\left( n^{-1/2} (\log n)^{1/2}\right) \quad \text{ almost } \text{ surely }.\\ \sup _{0 \le t \le T} |{\hat{G}}(t)^{-1}- G(t)^{-1}|= & {} O\left( n^{-1/2} (\log n)^{1/2}\right) \quad \text{ almost } \text{ surely }. \end{aligned}$$
The following lemmas will be used to prove the main theorems.
Lemma 2
Suppose Assumptions 3,5, and 6. Given an index set S, let \(B_S^o(d_1) := \{\beta _S:\ \Vert \beta _S-\tilde{\beta }_S\Vert \le d_1/\sqrt{s}\}\). Then, we have
$$\begin{aligned} \sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } \left| \ell ^o_S(\beta _S) - \ell ^o_S(\tilde{\beta }_S) - \left( \ell _S(\beta _S) - \ell _S(\tilde{\beta }_S)\right) \right| = O\left( d_1 s^{-1/2} n^{-1/2} (\log n)^{1/2}\right) . \end{aligned}$$
Proof
We have
$$\begin{aligned}&\sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } \left| \ell ^o_S(\beta _S) - \ell ^o_S(\tilde{\beta }_S) - \left( \ell _S(\beta _S) - \ell _S(\tilde{\beta }_S)\right) \right| \\&=\sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } \left| \frac{1}{n} \sum _{i=1}^n \left( \frac{\varDelta _i}{G^o_i} - \frac{\varDelta _i}{{\hat{G}}_i} \right) \left\{ \rho _\tau (Y_i^C-{{\varvec{X}}}_{iS}^\top \beta _S) - \rho _\tau (Y_i^C-{{\varvec{X}}}_{iS}^\top \tilde{\beta }_S)\right\} \right| \\&\le O\left( n^{-1/2} (\log n)^{1/2}\right) \sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } \frac{1}{n} \sum _{i=1}^n \left| {{\varvec{X}}}_{iS}^\top (\beta _S - \tilde{\beta }_S)\right| \\&\le O\left( n^{-1/2} (\log n)^{1/2}\right) \sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } \sqrt{ \sum _{i=1}^n \left| {{\varvec{X}}}_{iS}^\top (\beta _S - \tilde{\beta }_S)\right| ^2/n } \\&\le O\left( n^{-1/2} (\log n)^{1/2}\right) \lambda _{M}^{1/2} \sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } \Vert \beta _S - \tilde{\beta }_S\Vert \\&= O\left( d_1 s^{-1/2} n^{-1/2} (\log n)^{1/2}\right) , \end{aligned}$$
where the first inequality follows from Lemma 1 and the fact that \(|\rho _\tau (a)-\rho _\tau (b)| \le |a-b|\) for numbers a and b, the second inequality follows from the Cauchy-Schwarz inequality, the third inequality follows from Assumption 3. This completes the proof. \(\square\)
Lemma 3
Suppose conditions of Theorem 1. Given an index set S, let \(B_S^o(d_1) := \{\beta _S:\ \Vert \beta _S-\tilde{\beta }_S\Vert \le d_1/\sqrt{s}\}\). Then, with probability at least \(1-6\exp (-6\rho \log p)\),
$$\begin{aligned} \sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } |\ell _S(\beta _S) - \ell _S(\tilde{\beta }_S) - E [\ell _S(\beta _S)] + E[\ell _S(\tilde{\beta }_S)]| \le 2A_3 d_1 \sqrt{\rho \log p/n}, \end{aligned}$$
where \(A_3= 4\sqrt{\lambda _{M}} \tau _0^{-1}\).
Proof
Note that
$$\begin{aligned} \ell ^o_S(\beta _S) - \ell ^o_S(\tilde{\beta }_S) - E [\ell ^o_S(\beta _S)] + E[\ell ^o_S(\tilde{\beta }_S)] = \frac{1}{\sqrt{n}} G_n\left( \frac{\varDelta _i}{G^o_i}\rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) -\frac{\varDelta _i}{G^o_i} \rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \tilde{\beta }_S)\right) . \end{aligned}$$
Fix S with \(s=|S| \le \rho\). Let \(t:= d_1/\sqrt{s}\). Let
$$\begin{aligned} A(t) := \sup _{\beta _S \in B^o_S(d_1)} \left| G_n\left( \frac{\varDelta _i}{G^o_i}\rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) -\frac{\varDelta _i}{G^o_i} \rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \tilde{\beta }_S)\right) \right| . \end{aligned}$$
For any \(\beta _S \in B^o_S(d_1)\), we have by Assumption 5,
$$\begin{aligned}&\text{ var }\left( G_n\left( \frac{\varDelta _i}{G^o_i}\rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) -\frac{\varDelta _i}{G^o_i} \rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \tilde{\beta }_S)\right) \right) \\&\le \tau _0^{-2} E[|{{\varvec{X}}}_{iS}^\top (\beta _S-\tilde{\beta }_S)|^2] \le \lambda _{M} \tau _0^{-2} t^2. \end{aligned}$$
By symmetrization lemma for probabilities, we obtain that for \(M > 2\sqrt{\lambda _{M}} \tau _0^{-1} t\),
$$\begin{aligned} P(A(t) \ge M) \le {2P(A^o(t) \ge M/4)}/ \left\{ 1- 4\lambda _{M} \tau _0^{-2} t^2 / M^2\right\} , \end{aligned}$$
where \(A^o(t)\) is the symmetrized version of A(t), i.e.,
$$\begin{aligned} A^o(t) := \sup _{\beta _S \in B^0_S(t)}\left| G_n\left( \frac{V_i\varDelta _i}{G^o_i}\rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) -\frac{V_i\varDelta _i}{G^o_i} \rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \tilde{\beta }_S)\right) \right| , \end{aligned}$$
where \(V_1,\ldots , V_n\) are i.i.d. Rademacher random variables. We can write
$$\begin{aligned} \frac{\varDelta _i}{G^o_i}\rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) -\frac{\varDelta _i}{G^o_i} \rho _\tau (Y_i-{{\varvec{X}}}_{iS}^\top \tilde{\beta }_S) = \frac{\varDelta _i}{G^o_i}\tau {{\varvec{X}}}_{iS}^\top (\beta _S-\tilde{\beta }_S) + \frac{\varDelta _i}{G^o_i} W_i(\tau , \delta ), \end{aligned}$$
where \(W_i(\tau , \delta _S) = [Y_i - {{\varvec{X}}}_{iS}^\top (\tilde{\beta }_S + \delta _S)]_- - [Y_i-{{\varvec{X}}}_{iS}^\top \tilde{\beta }_S]_-\) and \(\delta _S := \beta _S- \tilde{\beta }_S\), where \(u_- = 1\{u < 0\} |u|\). Then, we have \(A^o(t) \le B^o(t) + C^o(t)\), where
$$\begin{aligned} B^o(t) = \sup _{\Vert \delta _S\Vert \le t} \,\left| G_n \left[ \frac{V_i\varDelta _i}{G^o_i} {{\varvec{X}}}_{iS}^\top \delta _S\right] \right| ,\quad C^o(t) = \sup _{\Vert \delta _S\Vert \le t}\, \left| G_n \left[ \frac{V_i\varDelta _i}{G^o_i} W_i(\tau , \delta _S)\right] \right| . \end{aligned}$$
We have for \(\lambda >0\),
$$\begin{aligned} E[\exp (\lambda B^o(t))]= & {} E\left[ E\left[ \exp \left( \lambda \sup _{\Vert \delta _S\Vert \le t} \, \left| G_n \left[ \frac{V_i \varDelta _i}{G^o_i} {{\varvec{X}}}_{iS}^\top \delta _S\right] \right| \right) \mid {{\varvec{X}}}, \{\varDelta _i\} \right] \right] \nonumber \\\le & {} E\left[ E\left[ \exp \left( \lambda \sqrt{s} t \max _{j \in S} \left| G_n \left[ \frac{V_i \varDelta _i}{G^o_i} x_{ij}\right] \right| \right) \mid {{\varvec{X}}}, \{\varDelta _i\} \right] \right] \nonumber \\\le & {} 2 E\left[ E\left[ \exp \left( \lambda \sqrt{s} t \max _{j \in S} G_n \left[ \frac{V_i \varDelta _i}{G^o_i} x_{ij}\right] \right) \mid {{\varvec{X}}}, \{\varDelta _i\} \right] \right] \nonumber \\\le & {} 2s E\left[ \exp \left( \lambda ^2 s t^2 \frac{\sum _i x_{ij}^2}{n} \frac{ \varDelta _i^2}{(G^o_i)^2}\right) \mid X, \{\varDelta _i\}\right] \nonumber \\\le & {} 2s \exp \left( \lambda ^2 s \tau _0^{-2} t^2\right) , \end{aligned}$$
(7)
where the first inequality is elementary, the second inequality follows from the fact that \(E[\exp (|Z|)] \le 2E[\exp (Z)]\) for any symmetric random variable Z, the third inequality follows by the intermediate step in the proof of the Hoeffding’s inequality (page 100 in van der Vaart (1996)), and the last inequality follows from Assumption 5 and \(\varDelta _i^2 \le 1\). Therefore, we have
$$\begin{aligned} P(B^o(t)> K_1) \le \min _{\lambda>0} \exp (-\lambda K_1) E[\exp (\lambda B^o(t))] \le \min _{\lambda >0}2s \exp (-\lambda K_1+\lambda ^2 s \tau _0^{-2} t^2). \end{aligned}$$
The minimum is achieved at \(\lambda = K_1 \tau _0^{2} / (2st^2)\). Thus, we have
$$\begin{aligned} P(B^o(t) > K_1) \le 2s\exp (-K_1^2 \tau _0^{2} / (4st^2)). \end{aligned}$$
Next, for \(\lambda >0\), we have
$$\begin{aligned} E[\exp (\lambda C^o(t))]= & {} E\left[ E\left[ \exp \left( \lambda \sup _{\Vert \delta _S\Vert \le t} \, \left| G_n \left[ \frac{V_i \varDelta _i}{G^o_i} W_i(\tau , \delta _S)\right] \right| \right) \mid {{\varvec{X}}}, \{\varDelta _i\} \right] \right] \\\le & {} E\left[ E\left[ \exp \left( 2\lambda \sup _{\Vert \delta _S\Vert \le t} \, \left| G_n \left[ \frac{V_i \varDelta _i}{G^o_i} {{\varvec{X}}}_{is}^\top \delta _S\right] \right| \right) \mid {{\varvec{X}}}, \{\varDelta _i\} \right] \right] \\\le & {} 2s \exp \left( 4\lambda ^2 s \tau _0^{-2} t^2\right) , \end{aligned}$$
where the first inequality follows from the contractive property \(|W_i(\tau , u)-W_i(\tau ,v)| < |{{\varvec{X}}}_{iS}^\top (u-v)|\), and from Theorem 4.12 in Ledoux and Talagrand (1991) with the convex function \(\exp (\cdot )\), and the second inequality follows from the same arguments used in (7).
Therefore, we have
$$\begin{aligned} P(C^o(t)> K_2) \le \min _{\lambda>0} \exp (-\lambda K_2) E[\exp (\lambda C^o(t))] \le \min _{\lambda >0}2s \exp (-\lambda K_2+4\lambda ^2 s \tau _0^{-2} t^2). \end{aligned}$$
The minimum is achieved at \(\lambda = K_2 \tau _0^{2} / (8st^2)\). Thus, we have
$$\begin{aligned} P(C^o(t) > K_2) \le 2s\exp (-K_2^2 \tau _0^{2} / (16st^2)). \end{aligned}$$
Thus, we have
$$\begin{aligned} P(A^o(t) \ge M/4)\le & {} P(B^o(t) \ge M/8) + P(C^o(t) \ge M/8) \\\le & {} 4s\exp (-M^2 \tau _0^2 / (1024st^2)) \end{aligned}$$
Hence, \(P(A(t) \ge M) \le Cs\exp (-M^2 \tau _0^2 / (1024st^2)) = Cs\exp (-M^2 K^2 \tau _0^2/ (1024d_1^2))\) for some \(C>0\), provided that \(M > 2\sqrt{\lambda _{M}} \tau _0^{-1} t = 2\sqrt{\lambda _{M}} \tau _0^{-1} d_1/\sqrt{s}\). Setting \(M=A_3 d_1 \sqrt{\rho \log p}\), where \(A_3:= 4\sqrt{\lambda _{M}} \tau _0^{-1}\), we have \(P(A(t) \ge A_3 d_1 \sqrt{\rho \log p}) \lesssim s \exp (-C \rho \log p)\). Thus, we obtain
$$\begin{aligned} P(\max _{S: |S| \le \rho } A(t) \ge A_3 d_1 \sqrt{\rho \log p})\le & {} \sum _{s=1}^{\rho } {p \atopwithdelims ()s} P(A(t) \ge A_3 d_1 \sqrt{\rho \log p}) \\\lesssim & {} \sum _{s=1}^{\rho } (ep/s)^s s \exp (-C \rho \log p) \\\lesssim & {} \exp (-C \rho \log p). \end{aligned}$$
Combining with Lemma 2 and \(\log n = o(s\rho \log p)\), which holds due to Assumption 2, we obtain
$$\begin{aligned} \sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } |\ell _S(\beta _S) - \ell _S(\tilde{\beta }_S) - E [\ell _S(\beta _S)] + E[\ell _S(\tilde{\beta }_S)]| \le 2A_3 d_1 \sqrt{\rho \log p/n}. \end{aligned}$$
This completes the proof. \(\square\)
Lemma 4
Suppose conditions of Theorem 1. Then, given a model S, for any \(\beta _S\) satisfying \(\Vert \beta _S - \beta ^*_S\Vert \le 4q \lambda _{M}^{-1/2}\),
$$\begin{aligned}&\frac{1}{8}\lambda_{m}\tau_0\underline{f} \Vert \beta _S - \beta ^*_S\Vert ^2 - C_3 n^{-1} \log n \le E[\ell _S(\beta _S)] - E[\ell _S(\beta ^*_S)]\\&\quad \le \frac{3\lambda _{M} \bar{f}}{2\tau _0} \Vert \beta _S - \beta ^*_S\Vert ^2 + C_3 n^{-1} \log n. \end{aligned}$$
for some positive constant \(C_3\). For any \(\beta _S\) satisfying \(\Vert \beta _S - \beta ^*_S\Vert > 4q \lambda _{M}^{-1/2}\),
$$\begin{aligned} \frac{1}{4} q \lambda _{M}^{-1/2} \lambda _{m}\tau_0 \underline{f} \Vert \beta _S - \beta ^*_S\Vert \le E[\ell _S(\beta _S)] - E[\ell _S(\beta ^*_S)] \le 2\tau _0^{-1} \lambda _{M}^{1/2}\Vert \beta _S - \beta ^*_S\Vert . \end{aligned}$$
Proof
We first consider the following Case 1.
Case 1: When \(\Vert \beta _S - \beta ^*_S\Vert \le 4q \lambda _{M}^{-1/2}\)
Let \(\delta _S := \beta _S - \beta ^*_S\). By the definition of \(\beta ^*_S\), we have
$$\begin{aligned} E\left[ \frac{1}{n}\sum _{i=1}^n \frac{\varDelta _i}{G^o_i} {{\varvec{X}}}_{iS}\left( \tau - 1\{Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S < 0\}\right) \right] =0. \end{aligned}$$
Hence, we have
$$\begin{aligned}&\left| E\left[ \frac{1}{n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} {{\varvec{X}}}_{iS}^\top \delta _S \left( \tau - 1\{Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S< 0\}\right) \right] \right| \nonumber \\&= \left| E\left[ \frac{1}{n}\sum _{i=1}^n \varDelta _i \left( \frac{1}{{\hat{G}}_i}- \frac{1}{G^o_i} \right) {{\varvec{X}}}_{iS}^\top \delta _S \left( \tau - 1\{Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S < 0\}\right) \right] \right| \nonumber \\&\le \frac{1}{n} \sum _{i=1}^n E\left[ |{{\varvec{X}}}_{iS}^\top \delta _S|\right] O\left( n^{-1/2} (\log n)^{1/2}\right) \nonumber \\&\le E\left[ \left( \frac{1}{n}\sum _{i=1}^n |{{\varvec{X}}}_{iS}^\top \delta _S|^2\right) ^{1/2} \right] O\left( n^{-1/2} (\log n)^{1/2}\right) \nonumber \\&\le E\left[ \frac{1}{n}\sum _{i=1}^n |{{\varvec{X}}}_{iS}^\top \delta _S|^2 \right] ^{1/2} O\left( n^{-1/2} (\log n)^{1/2}\right) \\&\le \lambda _{M}^{1/2} \Vert \delta _S\Vert O\left( n^{-1/2} (\log n)^{1/2}\right) , \nonumber \end{aligned}$$
(8)
where the first inequality follows from Lemma 1, the second inequality follows from the Cauchy-Schwarz inequality, the third inequality follows from the fact that \(EX^2 \ge [E|X|]^2\) for a random variable X, and the last inequality follows from Assumption 3.
Hence, by the identity by Knight (1998), we have for \(\tilde{z} \in [0,z]\),
$$\begin{aligned}&E\left[ \frac{1}{n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} \rho _{\tau }(Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) - \frac{\varDelta _i}{{\hat{G}}_i} \rho _{\tau }(Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S) \right] \\= & {} E\left[ \frac{1}{n} \sum _{i=1}^n - \frac{\varDelta _i}{{\hat{G}}_i}{{\varvec{X}}}_{iS}^\top \delta _S \left( \tau - 1\{Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S < 0\}\right) + \frac{\varDelta _i}{{\hat{G}}_i} \int _0^{{{\varvec{X}}}_{iS}^\top \delta _S} F({{\varvec{X}}}_{iS}^\top \beta ^*_S.\right. \nonumber \\&\left. + z \mid {{\varvec{X}}}_{i}, \varDelta _i) - F({{\varvec{X}}}_{iS}^\top \beta ^*_S \mid {{\varvec{X}}}_{i}, \varDelta _i) dz \right] \nonumber \\= & {} E\left[ \frac{1}{n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} \int _0^{{{\varvec{X}}}_{iS}^\top \delta _S} F({{\varvec{X}}}_{iS}^\top \beta ^*_S + z \mid {{\varvec{X}}}_{i}, \varDelta _i) - F({{\varvec{X}}}_{iS}^\top \beta ^*_S \mid {{\varvec{X}}}_{i}, \varDelta _i) dz \right] +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \nonumber \\= & {} E\left[ \frac{1}{n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} \int _0^{{{\varvec{X}}}_{iS}^\top \delta _S} z f({{\varvec{X}}}_{iS}^\top \beta ^*_S \mid {{\varvec{X}}}_{i}, \varDelta _i) + \frac{z^2}{2} f'({{\varvec{X}}}_{iS}^\top \beta ^*_S + \tilde{z} \mid {{\varvec{X}}}_{i}, \varDelta _i) dz\right] +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \nonumber \\\ge & {} E\left[ \frac{\underline{f}}{2n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} |{{\varvec{X}}}_{iS}^\top \delta _S|^2 - \frac{\bar{f}'}{6n} \sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} |{{\varvec{X}}}_{iS}^\top \delta _S|^3\right] +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \nonumber \\\ge & {} \underbrace{E\left[ \frac{\underline{f}}{2n}\sum _{i=1}^n \varDelta _i |{{\varvec{X}}}_{iS}^\top \delta _S|^2 - \frac{\bar{f}'}{6n} \sum _{i=1}^n \frac{\varDelta _i}{\tau _0} |{{\varvec{X}}}_{iS}^\top \delta _S|^3\right] +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert }_{I}, \nonumber \end{aligned}$$
(9)
where the first inequality follows from Assumption 1 and the second inequality follows from Assumption 5. By Assumption 3, \(\Vert \delta _S\Vert \le 4q \lambda _{M}^{-1/2}\), and \(\varDelta _i \in \{0,1\}\), we have
$$\begin{aligned} E[\varDelta _i |{{\varvec{X}}}_{iS}^\top \delta _S|^2]^{1/2} \le \lambda _{M}^{1/2} \Vert \delta _S\Vert \le 4q \le \frac{3\tau_0\underline{f} }{2\bar{f}'} \frac{E[\varDelta _i ({{\varvec{X}}}_{iS}^\top \delta _S)^2]^{3/2}}{E[\varDelta _i |{{\varvec{X}}}_{iS}^\top \delta _S|^3]}, \end{aligned}$$
which gives
$$\begin{aligned} E\left[ \frac{\bar{f}'}{6n} \sum _{i=1}^n \frac{\varDelta _i}{\tau _0} |{{\varvec{X}}}_{iS}^\top \delta _S|^3\right] \le E\left[ \frac{\underline{f}}{4n}\sum _{i=1}^n \varDelta _i |{{\varvec{X}}}_{iS}^\top \delta _S|^2\right] . \end{aligned}$$
(10)
Hence, we have
$$\begin{aligned} I\ge & {} E\left[ \frac{\underline{f}}{4n}\sum _{i=1}^n \varDelta _i |{{\varvec{X}}}_{iS}^\top \delta _S|^2\right] + O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \\\ge & {} E\left[ \frac{\tau_0\underline{f}}{4n}\sum _{i=1}^n |{{\varvec{X}}}_{iS}^\top \delta _S|^2\right] + O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \\\ge & {} \frac{\lambda _{m} \tau_0\underline{f}}{4} \Vert \delta _S\Vert ^2 +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \\\ge & {} \frac{\lambda _{m} \tau_0\underline{f}}{8} \Vert \delta _S\Vert ^2 +O\left( n^{-1} \log n\right) \\\ge & {} \frac{\lambda _{m} \tau_0\underline{f}}{8} \Vert \delta _S\Vert ^2 -C_3 n^{-1} \log n \end{aligned}$$
for some \(C_3>0\), where the second inequality follows from Assumption 5 and the third inequality follows from Assumption 3.
Furthermore, we have
$$\begin{aligned}&E\left[ \frac{1}{n} \frac{\varDelta _i}{{\hat{G}}_i} \sum _{i=1}^n \rho _{\tau }(Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) - \frac{\varDelta _i}{{\hat{G}}_i} \rho _{\tau }(Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S) \right] \\\le & {} E\left[ \frac{\bar{f}}{2n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} |{{\varvec{X}}}_{iS}^\top \delta _S|^2 \right] + E\left[ \frac{\bar{f}'}{6n} \sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} |{{\varvec{X}}}_{iS}^\top \delta _S|^3 \right] +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \\\le & {} E\left[ \frac{\bar{f}}{2n}\sum _{i=1}^n \frac{\varDelta _i}{\tau _0} |{{\varvec{X}}}_{iS}^\top \delta _S|^2 \right] + E\left[ \frac{\bar{f}'}{6n} \sum _{i=1}^n \frac{\varDelta _i}{\tau _0} |{{\varvec{X}}}_{iS}^\top \delta _S|^3 \right] +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \\\le & {} E\left[ \frac{3\bar{f}}{4n}\sum _{i=1}^n \frac{\varDelta _i}{\tau _0} |{{\varvec{X}}}_{iS}^\top \delta _S|^2 \right] +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \\\le & {} \frac{3\lambda _{M} \bar{f}}{4\tau _0} \Vert \delta _S\Vert ^2 +O\left( n^{-1/2} (\log n)^{1/2}\right) \Vert \delta _S\Vert \\\le & {} \frac{3\lambda _{M} \bar{f}}{2\tau _0} \Vert \delta _S\Vert ^2 + C_3 n^{-1} \log n, \end{aligned}$$
for some \(C_3>0\), where the third inequality follows from (10). This completes the proof of the first part of the lemma. Next, we consider the following Case 2.
Case 2: When \(\Vert \beta _S - \beta ^*_S\Vert > 4q \lambda _{M}^{-1/2}\)
Let \(\delta _S = \beta _S - \beta ^*_S\). Then, for \(a := \Vert \delta _S\Vert / (4q \lambda _{M}^{-1/2}) > 1\), it holds that \(\delta _S = a \tilde{\delta }_S\) for some \(\Vert \tilde{\delta }_S\Vert = 4q \lambda _{M}^{-1/2}\). Define
$$\begin{aligned} Q(\delta _S) := E[\ell _S(\beta ^*_S+\delta _S)] - E[\ell _S(\beta ^*_S)] = E[\ell _S(\beta _S)] - E[\ell _S(\beta ^*_S)]. \end{aligned}$$
Because \(Q(\cdot )\) is a convex function, we have
$$\begin{aligned} \frac{1}{a} Q(a \tilde{\delta }_S) + \left( 1-\frac{1}{a}\right) Q(0) \ge Q(\tilde{\delta }_S). \end{aligned}$$
Because \(Q(0)=0\) and \(\Vert \tilde{\delta }_S\Vert = 4q \lambda _{M}^{-1/2}\), we have
$$\begin{aligned} Q(\delta _S)= & {} Q(a \tilde{\delta }_S) \ge a Q(\tilde{\delta }_S) \ge \frac{a\lambda _{m} \tau_0\underline{f}}{8} \Vert \tilde{\delta }_S\Vert ^2 - a C_3 n^{-1} \log n \\= & {} \Vert \delta _S\Vert \left\{ 0.5 q \lambda _{M}^{-1/2} \lambda _{m} \tau_0\underline{f} + O\left( n^{-1}\log n \right) \right\} \\\ge & {} 0.25 q \lambda _{M}^{-1/2} \lambda _{m} \tau_0\underline{f} \Vert \delta _S\Vert , \end{aligned}$$
where the second inequality follows from Case 1.
Note that we have
$$\begin{aligned} E\left[ \rho _{\tau }(Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) - \rho _{\tau }(Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S) \mid X_i, \varDelta _i\right]\le & {} E[|{{\varvec{X}}}_{iS}^\top (\beta _S -\beta ^*_S)|\mid X_i, \varDelta _i] \\\le & {} E[|{{\varvec{X}}}_{iS}^\top (\beta _S -\beta ^*_S)|^2 \mid X_i, \varDelta _i]^{1/2} \\\le & {} \lambda _{M}^{1/2} \Vert \beta _S -\beta ^*_S\Vert , \end{aligned}$$
where the last inequality follows from Assumption 3. By Lemma 1 and Assumption 5, we also have \({\hat{G}}_i \gtrsim G^o_i -n^{-1/2} (\log n)^{1/2} \ge \tau _0 -n^{-1/2} (\log n)^{1/2} \gtrsim 0.5 \tau _0\). Hence,
$$\begin{aligned} E\left[ \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i}\rho _{\tau }(Y_i-{{\varvec{X}}}_{iS}^\top \beta _S) - \frac{\varDelta _i}{{\hat{G}}_i} \rho _{\tau }(Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S) \right] \le 2\tau _0^{-1} \lambda _{M}^{1/2} \Vert \delta _S\Vert . \end{aligned}$$
This completes the proof of the second part of the lemma. \(\square\)
Lemma 5
Under conditions of Theorem 1, there exist some universal constants \(A_4, A_5 >0\) such that
$$\begin{aligned} \Vert \hat{\beta }_S - \beta ^*_S\Vert \le A_4 \sqrt{\rho ^2 \log p / n},\quad |\ell _S(\hat{\beta }_S) - \ell _S(\beta ^*_S)| \le A_5 \rho ^2 \log p/n \end{aligned}$$
hold uniformly over \(S: |S| \le \rho\), with probability at least \(1-6\exp (-6\rho \log p)\).
Proof
We first define an event
$$\begin{aligned} E = \left\{ \sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } |\ell _S(\beta _S) - \ell _S(\beta ^*_S) - E [\ell _S(\beta _S)] + E[\ell _S(\beta ^*_S)]| \le 2A_3 d_1 \sqrt{\rho \log p/n} \right\} . \end{aligned}$$
By Lemma 3, the event E holds with probability at least \(1-6\exp (-6\rho \log p)\).
If \(\Vert \beta _S-\beta ^*_S\Vert =A_4 \sqrt{\rho ^2 \log p / n}\), then \(\Vert \beta _S-\beta ^*_S\Vert \le A_4 \sqrt{\rho ^3 \log p /n} / \sqrt{s}\) and \(\Vert \beta _S - \beta ^*_S\Vert \le 4q \lambda _{M}^{-1/2}\) because \(s \le \rho\) and \(\rho ^2 \log p = o(n)\). Thus, we have under the event E,
$$\begin{aligned} \ell _S(\beta ^*_S) - \ell _S(\beta _S)= & {} \ell _S(\beta ^*_S) - E[\ell _S(\beta ^*_S)] - \ell _S(\beta _S) + E[\ell _S(\beta _S)] + E[\ell _S(\beta ^*_S)] - E[\ell _S(\beta _S)] \\\le & {} 2A_3 A_4 \sqrt{\rho ^3 \log p /n} \,\sqrt{\rho \log p/n} - \frac{1}{8}\lambda _{m}\tau_0\underline{f} \Vert \beta _S - \beta ^*_S\Vert ^2 +O\left( n^{-1} \log n\right) \\\le & {} 2A_3 A_4 \rho ^2 \log p /n - \frac{1}{8} \lambda _{m} \tau_0\underline{f} A_4^2 {\rho ^2 \log p / n} + O\left( n^{-1} \log n\right) \\< & {} 0, \end{aligned}$$
where the first inequality follows from Lemma 4 and the last inequality holds for sufficiently large constant \(A_4>0\) due to the growth rate condition of \(\rho\) in Assumption 1. Thus
$$\begin{aligned} \min _{|S| \le \rho , \Vert \beta _S-\beta ^*_S\Vert =A_4 \sqrt{\rho ^2 \log p / n}} \ell _S(\beta _S) -\ell _S(\beta ^*_S) > 0. \end{aligned}$$
By the convexity of \(\ell _S(\cdot )\), \(\Vert \hat{\beta }_S-\beta ^*_S\Vert \le A_4 \sqrt{\rho ^2 \log p / n}\). This proves the first part.
For any \(\beta _S\) satisfying \(\Vert \beta _S - \beta ^*_S\Vert \le A_4 \sqrt{\rho ^2 \log p / n}\), we have under the event E,
$$\begin{aligned} |\ell _S(\beta ^*_S) - \ell _S(\beta _S)|\le & {} |\ell _S(\beta ^*_S) - E[\ell _S(\beta ^*_S)] - \ell _S(\beta _S) + E[\ell _S(\beta _S)]| + |E[\ell _S(\beta ^*_S)] - E[\ell _S(\beta _S)]| \\\le & {} 2A_3 A_4 \rho ^2 \log p /n + 1.5 \lambda _{M} \bar{f} \tau _0^{-1} \Vert \beta _S - \beta ^*_S\Vert ^2 + O\left( n^{-1} \log n\right) \\\le & {} 2A_3 A_4 \rho ^2 \log p /n + 1.5 \lambda _{M} \bar{f} \tau _0^{-1} A_4^2 {\rho ^2 \log p / n} + O\left( n^{-1} \log n\right) \\\le & {} A_5 \rho ^2 \log p/n \end{aligned}$$
for some constant \(A_5>0\), where the second inequality follows from Lemma 4, and the last inequality follows from the growth rate condition of \(\rho\) in Assumption 1. This completes the proof. \(\square\)
Lemma 6
Given a model S such that \(|S| < \rho\), \(M_\tau \not \subseteq S\), under Assumption 4, there exists \(j \in M_\tau \setminus S\) such that \([\beta ^*_{S \bigcup \{j\}}]_j \ne 0\). Further, if Assumption 1 holds, then there exists \(j \in M_\tau \setminus S\) such that \(|[\beta ^*_{S \bigcup \{j\}}]_j| \ge C \bar{f}^{-1} \tau _0 n^{-\alpha }\), where \(C>0\) is defined in Assumption 4.
Proof
Suppose that \([\beta ^*_{S \bigcup \{j\}}]_j = 0\) for all \(j \in M_\tau \setminus S\). Fix \(j \in M_\tau \setminus S\). As \(\beta ^*_{S \bigcup \{j\}}\) is the minimizer of \(E[\ell ^o_{S \bigcup \{j\}}(\beta )]\), by the convexity of \(E[\ell ^o_{S \bigcup \{j\}}(\beta )]\), it holds that
$$\begin{aligned} 0= & {} E\left[ \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{G^o_i} x_{ij} \psi _\tau \left( Y_i-{{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}}\right) \right] \\= & {} E\left[ \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{G^o_i} x_{ij} \psi _\tau \left( Y_i-{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S\right) \right] , \end{aligned}$$
where the second equality follows from the fact that \([\beta ^*_{S \bigcup \{j\}}]_j = 0\).
which contradicts to Assumption 4. Thus, there exists \(j \in M_\tau \setminus S\) such that \([\beta ^*_{S \bigcup \{j\}}]_j \ne 0\).
Further, by Assumption 4, there exists \(j \in M_\tau \setminus S\) such that
$$\begin{aligned} Cn^{-\alpha }\le & {} \left| E\left[ \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{G^o_i} x_{ij} \psi _\tau \left( Y_i-{{\varvec{X}}}_{iS}^\top \left[ \beta ^*_{S \bigcup \{j\}}\right] _S\right) \right] \right| \\= & {} \left| E\left[ \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{G^o_i} x_{ij} \psi _\tau \left( Y_i-{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S\right) \right] \right. \\&\left. - E\left[ \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{G^o_i} x_{ij} \psi _\tau (Y_i- {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}})\right] \right| \\\le & {} \left| E\left[ \frac{1}{n} \sum _{i=1}^n \frac{|x_{ij}|}{G^o_i} P\left\{ Y_i \in \left( {{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S, {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}}\right) \mid {{\varvec{X}}}_i \right\} \right] \right| \\\le & {} \left| E\left[ \frac{1}{n} \sum _{i=1}^n {\bar{f}} \frac{x_{ij}^2}{G^o_i} |[\beta ^*_{S \bigcup \{j\}}]_j| \right] \right| \\= & {} \frac{{\bar{f}}}{\tau _0} \left| [\beta ^*_{S \bigcup \{j\}}]_j\right| , \end{aligned}$$
where the first equation holds due to the definition of \(\beta ^*_{S \bigcup \{j\}}\), the third inequality holds due to Assumption 1 and the fact that \(\left| {{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S-{{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}}\right| = \left| x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j\right|\), and the last equality follows from the fact that \(E[x_{ij}^2]=1\) and Assumption 6. This completes the second part of Lemma 6. \(\square\)
1.2 Proof of Theorem 1
Proof
We first define an event
$$\begin{aligned} E= & {} \left\{ \sup _{\beta _S \in B^o_S(d_1),\ |S| \le \rho } |\ell _S(\beta _S) - \ell _S(\beta ^*_S) - E [\ell _S(\beta _S)] + E[\ell _S(\beta ^*_S)]| \le 2A_3 d_1 \sqrt{\rho \log p/n} \right\} \\&\bigcup \left\{ \sup _{|S| \le \rho } \Vert \hat{\beta }_S - \beta ^*_S\Vert \le A_4 \sqrt{\rho ^2 \log p / n},\quad \sup _{|S| \le \rho } |\ell _S(\hat{\beta }_S) - \ell _S(\beta ^*_S)| \le A_5 \rho ^2 \log p/n \right\} \end{aligned}$$
By Lemmas 3 and 5, the event E holds with probability at least \(1-6\exp (-6\rho \log p)\). Under the event E, we have \(\hat{\beta }_S \in B_S^o(d_1) = \{\beta _S:\ \Vert \beta _S-\beta ^*_S\Vert \le d_1/\sqrt{s}\}\), where \(d_1 := A_4 \sqrt{s\rho ^2 \log p / n}\). Note that
$$\begin{aligned}&\ell _S(\hat{\beta }_S)-\ell _{S \bigcup \{j\}}(\beta ^*_{S \bigcup \{j\}}) \\&= \ell _S(\hat{\beta }_S) - \ell _S(\beta ^*_S) + \ell _S(\beta ^*_S) - \ell _{S \bigcup \{j\}}(\beta ^*_{S\bigcup \{j\}}) \\&= \underbrace{\ell _S(\hat{\beta }_S) - \ell _S(\beta ^*_S) - E \left[ \ell _S(\hat{\beta }_S)\right] + E[\ell _S(\beta ^*_S)]}_{I_1} + \underbrace{E[\ell _S(\hat{\beta }_S)] - E[\ell _S(\beta ^*_S)]}_{I_2} + \underbrace{\ell _S(\beta ^*_S) - \ell _{S \bigcup \{j\}}(\beta ^*_{S\bigcup \{j\}})}_{I_3}. \end{aligned}$$
Under the event E, \(I_1 \ge -2A_3 d_1 \sqrt{\rho \log p/n} \ge -2A_3 A_4 \rho ^2 \log p /n\). Also by Lemma 4, we obtain \(I_2 \ge \frac{1}{8}\lambda _{m}\underline{f} \tau _0 \Vert \hat{\beta }_S - \beta ^*_S\Vert ^2 - C_3 n^{-1} \log n \ge - C_3 n^{-1} \log n\). Next, we have
$$\begin{aligned} I_3= & {} \underbrace{\ell _S(\beta ^*_S) -E \ell _S(\beta ^*_S) + E \ell _{S \bigcup \{j\}}(\beta ^*_{S\bigcup \{j\}}) - \ell _{S \bigcup \{j\}}(\beta ^*_{S\bigcup \{j\}})}_{I_{31}} \\&+ \underbrace{E\ell _S(\beta ^*_S) - E \ell _{S \bigcup \{j\}}(\beta ^*_{S\bigcup \{j\}})}_{I_{32}}. \end{aligned}$$
Using the similar arguments in the proof of Lemma 3, with probability at least \(1-6\exp (-6\rho \log p)\), we have \(I_{31} \ge -2 \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert A_3 \rho \sqrt{\log p/n}\).
Next, we consider \(I_{32}\). Note that by Lemma 6, we can consider \(j \in S^c\) such that \(|[\beta ^*_{S \bigcup \{j\}}]_j| \ge C \bar{f}^{-1} \tau _0 n^{-\alpha }\), i.e., \(\Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert \ge C \bar{f}^{-1}\tau _0 n^{-\alpha }\). By Lemma 4, we have
$$\begin{aligned} I_{32} \gtrsim \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert ^2 \wedge \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert - C_3 n^{-1} \log n. \end{aligned}$$
Hence, we have with probability at least \(1-6\exp (-6\rho \log p)\),
$$\begin{aligned} I_3 \gtrsim \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert ^2 \wedge \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert - n^{-1} \log n - \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert \rho \sqrt{\log p/n}. \end{aligned}$$
Hence, if \(\Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert \gtrsim 1\), then
$$\begin{aligned} I_3 \gtrsim \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert - n^{-1} \log n \gtrsim C \bar{f}^{-1} \tau _0 n^{-\alpha } - n^{-1} \log n, \end{aligned}$$
where the inequality follows from \(\Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert \ge |[\beta ^*_{S \bigcup \{j\}}]_j| \ge C \bar{f}^{-1} \tau _0 n^{-\alpha }\). Else if \(\Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert \lesssim 1\), using the similar argument,
$$\begin{aligned} I_3\gtrsim & {} \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert ^2 - n^{-1} \log n - \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert \rho \sqrt{\log p/n} \\\gtrsim & {} \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert ^2 - n^{-1} \log n - \rho \sqrt{\log p/n} \\\gtrsim & {} C^2 \bar{f}^{-2} \tau _0^2 n^{-2\alpha } - n^{-1} \log n - \rho \sqrt{\log p/n}. \end{aligned}$$
Hence, we have with probability at least \(1-6\exp (-6\rho \log p)\),
$$\begin{aligned} I_3 \gtrsim C^2 \bar{f}^{-2} \tau _0^2 n^{-2\alpha } - n^{-1} \log n - \rho \sqrt{\log p/n}. \end{aligned}$$
Combining these inequalities, we have
$$\begin{aligned} \max _{j \in S^c} \ell _S(\hat{\beta }_S)-\ell _{S \bigcup \{j\}} (\hat{\beta }_{S \bigcup \{j\}})\ge & {} C^2 \bar{f}^{-2} \tau _0^2 n^{-2\alpha } -\rho ^2 \log p /n - \rho \sqrt{\log p/n}- n^{-1} \log n\\\gtrsim & {} 0.5 C^2 \bar{f}^{-2} \tau _0^2 n^{-2\alpha } \end{aligned}$$
uniformly in S satisfying \(|S|<\rho ,\ M_\tau \nsubseteq S\), where the last inequality follows from the growth rate condition of \(\rho\) in Assumption 4. By letting \(C_1 := 0.5 C^2 \bar{f}^{-2} \tau _0^2\), this completes the proof. \(\square\)
1.3 Proof of Theorem 2
Proof
The proof consists of the three steps.
Claim 1: It holds that \(|[\beta ^*_{S \bigcup \{j\}}]_j| =o(n^{-\alpha })\) for any \(j \in (M_\tau \bigcup S)^c\).
Proof of Claim 1: We have by Assumption 1, if \({{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}} -{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S \ge 0\), i.e., \(x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j \ge 0\), then
$$\begin{aligned}&E\left[ \psi _\tau (Y_i-{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S) - \psi _\tau (Y_i- {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}}) \mid \varDelta _i, {{\varvec{X}}}_i\right] \\&= P\left( Y_i \le {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}} \mid \varDelta _i, {{\varvec{X}}}_i\right) - P\left( Y_i \le {{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S \mid \varDelta _i, {{\varvec{X}}}_i\right) \\&\ge \underline{f} \left( {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}} -{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S \right) \\&= \underline{f} x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j. \end{aligned}$$
Else, if \({{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}} -{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S < 0\), i.e., \(x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j < 0\), then
$$\begin{aligned} E\left[ \psi _\tau (Y_i-{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S) - \psi _\tau (Y_i- {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}}) \mid \varDelta _i, {{\varvec{X}}}_i\right] \le \underline{f} x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j. \end{aligned}$$
Thus, we have
$$\begin{aligned}&E\left[ x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j\psi _\tau (Y_i-{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S) -x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j \psi _\tau (Y_i- {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}}) \mid \varDelta _i, {{\varvec{X}}}_i\right] \nonumber \\&\ge \underline{f} x^2_{ij} [\beta ^*_{S \bigcup \{j\}}]^2_j. \end{aligned}$$
(11)
By the conditions of Theorem 2 and the fact that \(E[\varDelta _i \mid {{\varvec{X}}}_i] = G^o_i\), we have for any \(j \in (M_\tau \bigcup S)^c\),
$$\begin{aligned}&o(n^{-\alpha }|[\beta ^*_{S \bigcup \{j\}}]_j|) \\= & {} \left| E\left[ \frac{\varDelta _i}{G^o_i} x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j \psi _\tau (Y_i-{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S)\right] \right| \\= & {} \left| E\left[ \frac{\varDelta _i}{G^o_i} x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j \psi _\tau (Y_i-{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S)\right] - E\left[ \frac{\varDelta _i}{G^o_i} x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j \psi _\tau (Y_i- {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}})\right] \right| \\= & {} \left| E\left[ \frac{\varDelta _i}{G^o_i} E\left[ x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j \psi _\tau (Y_i-{{\varvec{X}}}_{iS}^\top [\beta ^*_{S \bigcup \{j\}}]_S) - x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j \psi _\tau (Y_i- {{\varvec{X}}}_{i,S \bigcup \{j\}}^\top \beta ^*_{S \bigcup \{j\}}) \mid \varDelta _i, {{\varvec{X}}}_i \right] \right] \right| \\\ge & {} E\left[ \underline{f} x^2_{ij} [\beta ^*_{S \bigcup \{j\}}]^2_j \frac{\varDelta _i}{G^o_i}\right] \\= & {} \underline{f}[\beta ^*_{S \bigcup \{j\}}]^2_j, \end{aligned}$$
where the second equation holds due to the definition of \(\beta ^*_{S \bigcup \{j\}}\), the first inequality follows from (11), and the last equality follows from the fact that \(E[\varDelta _i] = G^o_i\) and \(E[x_{ij}^2]=1\). Thus, \(|[\beta ^*_{S \bigcup \{j\}}]_j| =o(n^{-\alpha })\) for any \(j \in (M_\tau \bigcup S)^c\).
Claim 2: It holds that \(\max _{|S| < \rho , j \in (M_\tau \bigcup S)^c} \ -\ell _{S \bigcup \{j\}}(\hat{\beta }_{S \bigcup \{j\}}) + \ell _S(\hat{\beta }_S) \le 0.5 C_1 n^{-2\alpha }\) with probability at least \(1-36 \exp (-6\rho \log p)\).
Proof of Claim 2: We can write
$$\begin{aligned}&\max _{|S|< \rho , j \in (M_\tau \bigcup S)^c} \ -\ell _{S \bigcup \{j\}}(\hat{\beta }_{S \bigcup \{j\}}) + \ell _S(\hat{\beta }_S) \\\le & {} \underbrace{\max _{|S|< \rho , j \in (M_\tau \bigcup S)^c} -\ell _{S \bigcup \{j\}}(\hat{\beta }_{S \bigcup \{j\}}) + \ell _{S \bigcup \{j\}}(\beta ^*_{S \bigcup \{j\}})}_{I_1} +\underbrace{\max _{|S| < \rho , j \in (M_\tau \bigcup S)^c} -\ell _{S \bigcup \{j\}}(\beta ^*_{S \bigcup \{j\}}) + \ell _S(\hat{\beta }_S)}_{I_2}. \end{aligned}$$
Using the similar arguments in Lemma 5, we have with high probability at least \(1-6\exp (-6\rho \log p)\),
$$\begin{aligned} |I_1| \le A_5 \rho ^2 \log p/n. \end{aligned}$$
(12)
Next, we can decompose \(I_2\) as
$$\begin{aligned} I_2 = \underbrace{\ell _S(\hat{\beta }_S) - \ell _S(\beta ^*_S) - E \left[ \ell _S(\hat{\beta }_S)\right] + E[\ell _S(\beta ^*_S)]}_{I_{21}} + \underbrace{E[\ell _S(\hat{\beta }_S)] - E[\ell _S(\beta ^*_S)]}_{I_{22}} +\underbrace{\ell _S(\beta ^*_S)-\ell _{S \bigcup \{j\}}(\beta ^*_{S \bigcup \{j\}})}_{I_{23}}. \end{aligned}$$
Because \(\Vert \hat{\beta }_S - \beta ^*_S\Vert \le A_4 \sqrt{\rho ^2 \log p / n}\) by Lemma 5, we have by Lemma 3,
$$\begin{aligned} I_{21} \le 2A_3 A_4 \rho ^2 \log p/n. \end{aligned}$$
(13)
By Lemma 4, we have
$$\begin{aligned} I_{22} \le 1.5 \lambda _{M} \tau _0^{-1} \bar{f} \Vert \hat{\beta }_S - \beta ^*_S\Vert ^2 + C_3 n^{-1} \log n \le 1.5 \lambda _{M} \tau _0^{-1} \bar{f} A_4^2 \rho ^2 \log p / n + C_3 \log n/n. \end{aligned}$$
(14)
Next, we consider \(I_{23}\). We obtain
$$\begin{aligned} I_{23}= & {} \underbrace{\ell _S(\beta ^*_S) - \ell _{S \bigcup \{j\}}(\beta ^*_{S\bigcup \{j\}}) -E \ell _S(\beta ^*_S) + E \ell _{S \bigcup \{j\}}(\beta ^*_{S\bigcup \{j\}})}_{I_{231}} \\&+ \underbrace{E\ell _S(\beta ^*_S) - E \ell _{S \bigcup \{j\}}(\beta ^*_{S\bigcup \{j\}})}_{I_{232}}. \end{aligned}$$
By Lemma 2 and Assumption 3, we have with probability at least \(1-6\exp (-6\rho \log p)\),
$$\begin{aligned} I_{231}\le & {} 2A_3 \rho \sqrt{\frac{\log p}{n}} \Vert \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\Vert \\\le & {} 2A_3 \rho \sqrt{\frac{\log p}{n}} \lambda _{m}^{-1/2} {\bar{f}}^{-1/2} E\left[ \frac{{\bar{f}}}{n} \sum _{i=1}^n \left| {{\varvec{X}}}_i^\top \left( \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\right) \right| ^2 \right] ^{1/2} \\\le & {} 4A_3^2 \rho ^2 \lambda _{m}^{-1} {\bar{f}}^{-1} \frac{\log p}{n} + E\left[ \frac{{\bar{f}}}{n} \sum _{i=1}^n \left| {{\varvec{X}}}_i^\top \left( \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\right) \right| ^2 \right] , \end{aligned}$$
where the last inequality follows from \(xy \le x^2 + y^2\) for any non-negative numbers x and y.
Furthermore, by (8) and (9) in Lemma 4,
$$\begin{aligned} I_{232}\le & {} E\left[ \frac{1}{n} \sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i}\int _0^{{{\varvec{X}}}_i^\top ( \beta ^*_{S \bigcup \{j\}} - \beta ^*_S)} F\left( {{\varvec{X}}}_{i, S \bigcup \{j\}} \beta ^*_{S \bigcup \{j\}} +z\right) - F\left( {{\varvec{X}}}_{i, S \bigcup \{j\}} \beta ^*_{S \bigcup \{j\}} \right) dz\right] \\&+ O\left( n^{-1/2} (\log n)^{1/2}\right) E\left[ \frac{\bar{f}}{2n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} \left| {{\varvec{X}}}_{i}^\top \left( \beta ^*_{S \bigcup \{j\}} - \beta ^*_S\right) \right| ^2 \right] ^{1/2} \\\le & {} E\left[ \frac{\bar{f}}{2n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} |{{\varvec{X}}}_{i}^\top (\beta ^*_{S \bigcup \{j\}} - \beta ^*_S)|^2 \right] + O\left( n^{-1/2} (\log n)^{1/2}\right) E\left[ \frac{\bar{f}}{2n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} \left| {{\varvec{X}}}_{i}^\top \left( \beta ^*_{S \bigcup \{j\}} - \beta ^*_S\right) \right| ^2 \right] ^{1/2} \\\le & {} E\left[ \frac{\bar{f}}{n}\sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} |{{\varvec{X}}}_{i}^\top (\beta ^*_{S \bigcup \{j\}} - \beta ^*_S)|^2 \right] + O\left( n^{-1} \log n \right) \\\le & {} E\left[ \frac{\bar{f}}{n}\sum _{i=1}^n \frac{2\varDelta _i}{G^o_i} |{{\varvec{X}}}_{i}^\top (\beta ^*_{S \bigcup \{j\}} - \beta ^*_S)|^2 \right] + O\left( n^{-1} \log n \right) \\= & {} E\left[ \frac{2\bar{f}}{n}\sum _{i=1}^n |{{\varvec{X}}}_{i}^\top (\beta ^*_{S \bigcup \{j\}} - \beta ^*_S)|^2 \right] + O\left( n^{-1} \log n \right) \\\le & {} E\left[ \frac{2\bar{f}}{n}\sum _{i=1}^n |{{\varvec{X}}}_{i}^\top \delta _S|^2 \right] + E\left[ \frac{2\bar{f}}{n}\sum _{i=1}^n x_{ij}^2 [\beta ^*_{S \bigcup \{j\}}]^2_j \right] + O\left( n^{-1} \log n \right) , \end{aligned}$$
where the second inequality follows from Assumption 1, the third inequality follows from \(xy \le x^2 + y^2\) for any non-negative numbers x and y, the fourth inequality follows from Assumption 5 and Lemma 1, and the last equality follows from the fact that \(E\left[ \varDelta _i \mid {{\varvec{X}}}_i \right] = G^o_i\).
Hence, we have
$$\begin{aligned} I_{23}\le & {} I_{231} + I_{232} \nonumber \\\le & {} 4A_3^2 \rho ^2 \lambda _{m}^{-1} {\bar{f}}^{-1} \frac{\log p}{n}+ E\left[ \frac{3{\bar{f}}}{n} \sum _{i=1}^n \left| {{\varvec{X}}}_i^\top \left( \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\right) \right| ^2 \right] + O\left( \frac{\log n}{n} \right) . \end{aligned}$$
(15)
Let \(\delta _S:= [\beta ^*_{S \bigcup \{j\}}]_S - \beta ^*_S\). By the definition of \(\beta ^*_S\) and \(\beta ^*_{S \bigcup \{j\}}\), we have
$$\begin{aligned} 0\ge & {} E[\ell ^o_{S \bigcup \{j\}}(\beta ^*_{S \bigcup \{j\}})] - E[\ell ^o_S(\beta ^*_S)] \nonumber \\= & {} E\left[ \frac{1}{n} \sum _{i=1}^n -\frac{\varDelta _i}{G^o_i} ({{\varvec{X}}}_{iS}^\top \delta _S + x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j) \left( \tau - 1\{Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S< 0\}\right) \right. \nonumber \\&\quad \quad + \left. \frac{\varDelta _i}{G^o_i}\int _0^{{{\varvec{X}}}_{iS}^\top \delta _S + x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j} F({{\varvec{X}}}_{iS}^\top \beta ^*_S + z) - F({{\varvec{X}}}_{iS}^\top \beta ^*_S) dz \right] \nonumber \\= & {} E\left[ \frac{1}{n} \sum _{i=1}^n -\frac{\varDelta _i}{G^o_i} (x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j) \left( \tau - 1\{Y_i-{{\varvec{X}}}_{iS}^\top \beta ^*_S < 0\}\right) \right. \nonumber \\&\quad \quad \left. + \frac{\varDelta _i}{G^o_i} \int _0^{{{\varvec{X}}}_{iS}^\top \delta _S + x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j} F({{\varvec{X}}}_{iS}^\top \beta ^*_S + z) - F({{\varvec{X}}}_{iS}^\top \beta ^*_S) dz \right] \nonumber \\\ge & {} -o(n^{-\alpha })|[\beta ^*_{S \bigcup \{j\}}]_j| + E\left[ \frac{\underline{f}}{2n}\sum _{i=1}^n \left| {{\varvec{X}}}_{i}^\top \delta _S + x_{ij} [\beta ^*_{S \bigcup \{j\}}]_j \right| ^2 \right] \nonumber \\= & {} -o(n^{-\alpha }) |[\beta ^*_{S \bigcup \{j\}}]_j| + E\left[ \frac{\underline{f}}{2n}\sum _{i=1}^n \left| {{\varvec{X}}}_{i}^\top \left( \beta ^*_S - \beta ^*_{S\bigcup \{j\}}\right) \right| ^2 \right] , \end{aligned}$$
(16)
where the first equality holds due to the identity of Knight (1998), and the second inequality follows from the condition of Theorem 2, Assumption 1, and the fact that \(E[\varDelta _i \mid {{\varvec{X}}}_i]=G^o_i\). Combining (15) and (16), we obtain
$$\begin{aligned} I_{23}\le & {} 4A_3^2 \rho ^2 \lambda _{m}^{-1} {\bar{f}}^{-1} \frac{\log p}{n}+ o(n^{-\alpha }) |[\beta ^*_{S \bigcup \{j\}}]_j| + O\left( \frac{\log n}{n} \right) \nonumber \\\le & {} 4A_3^2 \rho ^2 \lambda _{m}^{-1} {\bar{f}}^{-1} \frac{\log p}{n}+ o(n^{-2\alpha }) + O\left( \frac{\log n}{n} \right) . \end{aligned}$$
(17)
Combining (13), (14), and (17), we obtain
$$\begin{aligned} I_2 = I_{21}+I_{22}+I_{23} \le C_4 \rho ^2 \log p/n + C_3 \log n/n+ o(n^{-2\alpha }) + O\left( \frac{\log n}{n} \right) \end{aligned}$$
(18)
for some constant \(C_4>0\). Hence, we have by (12) and (18), with probability at least \(1-36 \exp (-6\rho \log p)\),
$$\begin{aligned}&\max _{|S| < \rho , j \in (M_\tau \bigcup S)^c} \ -\ell _{S \bigcup \{j\}}(\hat{\beta }_{S \bigcup \{j\}}) + \ell _S(\hat{\beta }_S) \\\le & {} A_5 \rho ^2 \log p/n +C_4 \rho ^2 \log p/n + C_3 \log n/n+ o(n^{-2\alpha }) + O\left( \frac{\log n}{n} \right) \\\le & {} 0.5 C_1 n^{-2\alpha }, \end{aligned}$$
where the last inequality follows from the growth rate condition of \(\rho\) in Assumption 4.
Claim 3: It holds that \(M_\tau \subset F_{C_2 |M_\tau |}\) with probability at least \(1-36 \exp (-3\rho \log p)\).
Proof of Claim 3: By Claim 2 and the fact that \(\log n \log p / n = o(n^{-2\alpha })\), if \(M_\tau \not \subseteq S\), CQ-FR would select a noise variable with probability less than \(36 \exp (-6\rho \log p)\). For \(k > |M_\tau |\), \(M_\tau \not \subset F_k\) implies that at least \(k-|M_\tau |\) noise variables are selected within the k steps. Then for \(k=C_2|M_\tau |\) with \(C_2 > 1\),
$$\begin{aligned} P(M_\tau \not \subset F_k)\le & {} \sum _{j=k-|M_\tau |}^k {k \atopwithdelims ()j} \{36 \exp (-6\rho \log p)\}^j \\\le & {} |M_\tau | k^{|M_\tau |} \{36 \exp (-6\rho \log p)\}^{k-|M_\tau |} \\\le & {} 36 \exp (-3\rho \log p). \end{aligned}$$
Therefore, \(M_\tau \subset F_{C_2 |M_\tau |}\) with probability at least \(1-36 \exp (-3\rho \log p)\). This completes the proof. \(\square\)
1.4 Proof of Theorem 3
Proof
By Theorem 2, CQ-FR does not stop at the kth step when \(M_\tau \not \subseteq F_k\), and CQ-FR satisfies \(M_\tau \subseteq F_k\) for some \(k< C_2 |M_{\tau }|\) with probability going to one. Hence, with probability going to one, CQ-FR stops at the kth step if \(\text{ EBIC}^c(F_{k+1}) > \text{ EBIC}^c(F_k)\).
Suppose that \(M_\tau \nsubseteq F_{k-1}\) and \(M_\tau \subseteq F_k\). We can easily see that \(\text{ EBIC}^c(F_{k+1}) > \text{ EBIC}^c(F_k)\) if and only if \(\log \ell _{F_{k}}(\hat{\beta }_{F_{k}}) - \log \ell _{F_{k+1}}(\hat{\beta }_{F_{k+1}}) \le \frac{\log n}{2n} \log p\).
Recall that \(\beta (\tau )\) is the underlying quantile coefficient vector. Let
$$\begin{aligned} M_n(\tau ) := n^{-1/2} \sum _{i=1}^n \frac{\varDelta _i}{{\hat{G}}_i} {{\varvec{X}}}_i \psi _{\tau }(Y_i-{{\varvec{X}}}_i^\top \beta (\tau )) \end{aligned}$$
and
$$\begin{aligned} H_{S,\tau } := \begin{bmatrix} E\left[ \frac{\varDelta _i}{{\hat{G}}_i}{{\varvec{X}}}_{iS} {{\varvec{X}}}^\top _{iS} f\left( {{\varvec{X}}}_{iS}^\top \beta (\tau ) \mid {{\varvec{X}}}_{iS}\right) \right] , &{} 0\\ 0, &{} 0 \end{bmatrix}, \quad H_{S,\tau }^{-1} := \begin{bmatrix} E\left[ \frac{\varDelta _i}{{\hat{G}}_i}{{\varvec{X}}}_{iS} {{\varvec{X}}}^\top _{iS} f\left( {{\varvec{X}}}_{iS}^\top \beta (\tau ) \mid {{\varvec{X}}}_{iS}\right) \right] ^{-1}, &{} 0\\ 0, &{} 0 \end{bmatrix}. \end{aligned}$$
Using the similar argument in the proof of Lemma 7.10 of Zheng et al. (2015) with the fact that \(M_\tau \subseteq F_k \subset F_{k+1}\) we can show that with probability at least \(1-16\exp (-\log p)-12/p\),
$$\begin{aligned}&\ell _{F_{k+1}}(\hat{\beta }_{F_{k+1}}) - \ell _{F_{k}}(\hat{\beta }_{F_{k}}) \\= & {} \ell _{F_{k+1}}(\hat{\beta }_{F_{k+1}}) - \ell _{F_{k+1}}(\beta (\tau )) - \left[ \ell _{F_{k}}(\hat{\beta }_{F_{k}}) - \ell _{F_{k+1}}(\beta (\tau )) \right] \\= & {} -\frac{M_n^\top (\tau ) [H_{F_{k+1},\tau }^{-1} - H_{F_k,\tau }^{-1}] M_n(\tau )}{4n} + o_p(\sqrt{k \log p/n^3}) \\\ge & {} -600 ( {f} \lambda _{m})^{-1} \log p/n + o_p(\sqrt{k \log p/n^3}), \end{aligned}$$
which yields with probability tending to 1,
$$\begin{aligned} \ell _{F_{k}}(\hat{\beta }_{F_{k}}) - \ell _{F_{k+1}}(\hat{\beta }_{F_{k+1}}) = o(\log p \log n/n). \end{aligned}$$
Using the similar argument in Lee et al. (2014), we have \(\ell _{F_{k}}(\hat{\beta }_{F_{k}}), \ell _{F_{k+1}}(\hat{\beta }_{F_{k+1}}) \ge c_5\) for some constant \(c_5>0\) with probability tending to one. Hence,
$$\begin{aligned} \log \ell _{F_{k}}(\hat{\beta }_{F_{k}}) - \log \ell _{F_{k+1}}(\hat{\beta }_{F_{k+1}}) \le \frac{\log n}{2n} \log p. \end{aligned}$$
This completes the proof. \(\square\)
1.5 Proofs of Theorems 4 and 5
Proof
Suppose that \(\ell ^* \in M_\tau ^c\). Let \(k:= \max _{j \in M_\tau \setminus F} |(\beta ^*_{F \bigcup \{j\}})_j|\). By Lemma 6, \(|[\beta ^*_{F \bigcup \{k\}}]_k| \ge C \bar{f}^{-1} n^{-\alpha }\). By the definition of \(\ell ^*\), we also have
$$\begin{aligned} \ell _{F \bigcup \{\ell ^*\}}(\hat{\beta }_{F \bigcup \{\ell ^*\}}) \le \ell _{F \bigcup \{k\}}(\hat{\beta }_{F \bigcup \{k\}}), \end{aligned}$$
which implies
$$\begin{aligned} \ell _{F \bigcup \{\ell ^*\}}(\beta ^*_{F \bigcup \{\ell ^*\}})\le & {} \ell _{F \bigcup \{k\}}(\beta ^*_{F \bigcup \{k\}}) \\&+ \lambda _{M}^{1/2} \Vert \hat{\beta }_{F \bigcup \{\ell ^*\}} - \beta ^*_{F \bigcup \{\ell ^*\}}\Vert + \lambda _{M}^{1/2} \Vert \hat{\beta }_{F \bigcup \{k\}} - \beta ^*_{F \bigcup \{k\}}\Vert \\\le & {} \ell _{F \bigcup \{k\}}(\beta ^*_{F \bigcup \{k\}}) + 2A_4\lambda _{M}^{1/2} \sqrt{\rho ^2 \log p / n}, \end{aligned}$$
where the first inequality follows from the Cauchy-Schwarz inequality, Assumption 3, and the fact that \(\rho _{\tau }(\cdot )\) satisfies \(|\rho _{\tau }(x) -\rho _{\tau }(y)| \le |x-y|\), and the last inequality follows from Lemma 5.
By Lemmas 2 and 3, this also implies
$$\begin{aligned}&E\left[ \ell ^o_{F \bigcup \{\ell ^*\}}(\beta ^*_{F \bigcup \{\ell ^*\}})\right] \lesssim E\left[ \ell ^o_{F \bigcup \{k\}}(\beta ^*_{F \bigcup \{k\}})\right] \nonumber \\&\quad + \sqrt{\rho ^2 \log p / n} + \sqrt{\frac{\log n}{n}} \Vert \beta ^*_{F \bigcup \{\ell ^*\}} - \beta ^*_{F \bigcup \{k\}}\Vert . \end{aligned}$$
(19)
Using the similar arguments in the proof of Lemma 4, we obtain
$$\begin{aligned}&E\left[ \ell ^o_{F \bigcup \{\ell ^*\}}(\beta ^*_{F \bigcup \{\ell ^*\}})\right] - E\left[ \ell ^o_{F \bigcup \{k\}}(\beta ^*_{F \bigcup \{k\}})\right] \nonumber \\= & {} E\left[ \rho _{\tau }(Y_i-{{\varvec{X}}}_{i}^\top \beta ^*_{F \bigcup \{\ell ^*\}}) - \rho _{\tau }(Y_i-{{\varvec{X}}}_{i}^\top \beta ^*_{F \bigcup \{k\}}) \right] \nonumber \\= & {} E\left[ - {{\varvec{X}}}_{i}^\top \left( \beta ^*_{F \bigcup \{\ell ^*\}}-\beta ^*_{F \bigcup \{k\}}\right) \left( \tau - 1\{Y_i-{{\varvec{X}}}_{i}^\top \beta ^*_{F \bigcup \{k\}}< 0\}\right) \right. \nonumber \\&+ \left. \int _0^{{{\varvec{X}}}_{i}^\top \left( \beta ^*_{F \bigcup \{\ell ^*\}}-\beta ^*_{F \bigcup \{k\}}\right) } F({{\varvec{X}}}_{i}^\top \beta ^*_{F \bigcup \{k\}} + z \mid {{\varvec{X}}}_{i}, \varDelta _i) - F({{\varvec{X}}}_{i}^\top \beta ^*_{F \bigcup \{k\}} \mid {{\varvec{X}}}_{i}, \varDelta _i) dz \right] \nonumber \\= & {} E\left[ - X_{i,\ell ^*} [\beta ^*_{F \bigcup \{\ell ^*\}}]_{\ell ^*} \left( \tau - 1\{Y_i-{{\varvec{X}}}_{i}^\top \beta ^*_{F \bigcup \{k\}} < 0\}\right) \right. \nonumber \\&+ \left. \int _0^{{{\varvec{X}}}_{i}^\top \left( \beta ^*_{F \bigcup \{\ell ^*\}}-\beta ^*_{F \bigcup \{k\}}\right) } F({{\varvec{X}}}_{i}^\top \beta ^*_{F \bigcup \{k\}} + z \mid {{\varvec{X}}}_{i}, \varDelta _i) - F({{\varvec{X}}}_{i}^\top \beta ^*_{F \bigcup \{k\}} \mid {{\varvec{X}}}_{i}, \varDelta _i) dz \right] \nonumber \\\ge & {} - o(n^{-\alpha }) [\beta ^*_{F \bigcup \{\ell ^*\}}]_{\ell ^*} +\frac{\underline{f}}{4} E\left[ \left\{ {{\varvec{X}}}_{i}^\top \left( \beta ^*_{F \bigcup \{\ell ^*\}}-\beta ^*_{F \bigcup \{k\}}\right) \right\} ^2\right] , \end{aligned}$$
(20)
where the third equality follows from the definition of \(\beta ^*_{F \bigcup \{k\}}\), and the inequality follows from the same arguments in the proof of Lemma 4 using the condition in Theorem 4. Let \(\delta _{F \bigcup \{\ell ^*\}} := \beta ^*_{F \bigcup \{\ell ^*\}}-[\beta ^*_{F \bigcup \{k\}}]_{F \bigcup \{\ell ^*\}}\). Then, have
$$\begin{aligned}&E\left[ \left\{ {{\varvec{X}}}_{i}^\top \left( \beta ^*_{F \bigcup \{\ell ^*\}}-\beta ^*_{F \bigcup \{k\}}\right) \right\} ^2\right] \nonumber \\&= E\left[ \left\{ {{\varvec{X}}}_{i, F \bigcup \{\ell ^*\}}^\top \delta _{F \bigcup \{\ell ^*\}} - X_{i,k} [\beta ^*_{F \bigcup \{k\}}]_k \right\} ^2\right] \nonumber \\&= E\left[ \left\{ {{\varvec{X}}}_{i, F \bigcup \{\ell ^*\}}^\top \delta _{F \bigcup \{\ell ^*\}} \right\} ^2\right] + [\beta ^*_{F \bigcup \{k\}}]_k^2 - 2 E\left[ {{\varvec{X}}}_{i, F \bigcup \{\ell ^*\}}^\top \delta _{F \bigcup \{\ell ^*\}} X_{i,k} [\beta ^*_{F \bigcup \{k\}}]_k \right] \nonumber \\&\ge E\left[ \left\{ {{\varvec{X}}}_{i, F \bigcup \{\ell ^*\}}^\top \delta _{F \bigcup \{\ell ^*\}} \right\} ^2\right] + [\beta ^*_{F \bigcup \{k\}}]_k^2\nonumber \\&- 2 \sqrt{E\left[ \left\{ {{\varvec{X}}}_{i, F \bigcup \{\ell ^*\}}^\top \delta _{F \bigcup \{\ell ^*\}}\right\} ^2 X^2_{i,k}\right] [\beta ^*_{F \bigcup \{k\}}]_k^2 }\nonumber \\&\ge E\left[ \left\{ {{\varvec{X}}}_{i, F \bigcup \{\ell ^*\}}^\top \delta _{F \bigcup \{\ell ^*\}} \right\} ^2\right] + [\beta ^*_{F \bigcup \{k\}}]_k^2\nonumber \\&- 2 \sqrt{ E\left[ \left\{ {{\varvec{X}}}_{i, F \bigcup \{\ell ^*\}}^\top \delta _{F \bigcup \{\ell ^*\}}\right\} ^2\right] (1-\epsilon ) [\beta ^*_{F \bigcup \{k\}}]_k^2 } \nonumber \\&\ge \epsilon [\beta ^*_{F \bigcup \{k\}}]_k^2 \end{aligned}$$
(21)
where the first inequality follows from \(E[X]^2 \le E[X^2]\) for any random variable X, the second inequality follows from Assumption 8 and the fact that \(x^2 + y^2 \ge 2xy\) for any numbers x, y. Combining (19)-(20) and Assumption 8, we have
$$\begin{aligned} \sqrt{\rho ^2 \log p / n} + \sqrt{\frac{\log n}{n}}\ge & {} - o(n^{-\alpha }) [\beta ^*_{F \bigcup \{\ell ^*\}}]_{\ell ^*} + \epsilon [\beta ^*_{F \bigcup \{k\}}]_k^2 \\\gtrsim & {} - o(n^{-2\alpha }) - [\beta ^*_{F \bigcup \{\ell ^*\}}]_{\ell ^*}^2 + \epsilon [\beta ^*_{F \bigcup \{k\}}]_k^2. \end{aligned}$$
Because \([\beta ^*_{F \bigcup \{k\}}]_k^2 \gtrsim n^{-2\alpha } \gtrsim n^{-2\alpha } + \sqrt{\rho ^2 \log p / n} + \sqrt{\frac{\log n}{n}}\) due to the rate condition of \(\alpha\) in Assumption 4, this implies
$$\begin{aligned} |[\beta ^*_{F \bigcup \{\ell ^*\}}]_{\ell ^*}| \ge \frac{1}{C_B} |[\beta ^*_{F \bigcup \{k\}}]_k|. \end{aligned}$$
Hence, there exists some constant \(C_L>\frac{1}{C_B}\) satisfying
$$\begin{aligned} \frac{|[\beta ^*_{F \bigcup \{\ell ^*\}}]_{\ell ^*}|}{\max _{k \in M_\tau \setminus F} |[\beta ^*_{F \bigcup \{k\}}]_k|} > C_L \end{aligned}$$
uniformly in \(F \not \subseteq M_\tau\). Thus, we have \(\ell^{\ast} \in M_\tau \setminus F\) for any \(F \not \subseteq M_\tau\). This completes the proof. \(\square\)