Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors

Breheny, Patrick; Huang, Jian

doi:10.1007/s11222-013-9424-2

Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors

Published: 05 November 2013

Volume 25, pages 173–187, (2015)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Patrick Breheny¹ &
Jian Huang^1,2

4828 Accesses
187 Citations
1 Altmetric
Explore all metrics

Abstract

Penalized regression is an attractive framework for variable selection problems. Often, variables possess a grou** structure, and the relevant selection problem is that of selecting groups, not individual variables. The group lasso has been proposed as a way of extending the ideas of the lasso to the problem of group selection. Nonconvex penalties such as SCAD and MCP have been proposed and shown to have several advantages over the lasso; these penalties may also be extended to the group selection problem, giving rise to group SCAD and group MCP methods. Here, we describe algorithms for fitting these models stably and efficiently. In addition, we present simulation results and real data examples comparing and contrasting the statistical properties of these methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A group VISA algorithm for variable selection

Article 27 August 2014

Adaptive group Lasso for high-dimensional generalized linear models

Article 03 February 2017

Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models

Article 06 May 2022

References

Bakin, S.: Adaptive regression and model selection in data mining problems. Ph.D. thesis, Australian National University (1999)
Bertsekas, D.: Nonlinear Programming, 2nd edn. Athena Scientific, Nashua (1999)
MATH Google Scholar
Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5, 232–253 (2011)
Article MATH MathSciNet Google Scholar
Chiang, A., Beck, J., Yen, H., Tayeh, M., Scheetz, T., Swiderski, R., Nishimura, D., Braun, T., Kim, K., Huang, J., et al.: Homozygosity map** with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet–Biedl syndrome gene (BBS11). Proc. Natl. Acad. Sci. 103, 6287–6292 (2006)
Article Google Scholar
Donoho, D., Johnstone, J.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425–455 (1994)
Article MATH MathSciNet Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Article MATH MathSciNet Google Scholar
Foygel, R., Drton, M.: Exact block-wise optimization in group lasso and sparse group lasso for linear regression (2010). Arxiv preprint ar**v:1010.3320
Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1, 302–332 (2007)
Article MATH MathSciNet Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso (2010a). Arxiv preprint ar**v:1001.0736
Friedman, J.H., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010b)
Google Scholar
Fu, W.: Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 7, 397–416 (1998)
Google Scholar
Gao, H., Bruce, A.: Waveshrink with firm shrinkage. Stat. Sin. 7, 855–874 (1997)
MATH MathSciNet Google Scholar
Huang, J., Breheny, P., Ma, S.: A selective review of group selection in high-dimensional models. Stat. Sci. 27, 481–499 (2012)
Article MathSciNet Google Scholar
Hunter, D., Lange, K.: A tutorial on MM algorithms. Am. Stat. 58, 30–38 (2004)
Article MathSciNet Google Scholar
Krishnapuram, B., Carin, L., Figueiredo, M., Hartemink, A.: Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27, 957–968 (2005)
Article Google Scholar
Lange, K., Hunter, D., Yang, I.: Optimization transfer using surrogate objective functions. J. Comput. Graph. Stat. 9, 1–20 (2000)
MathSciNet Google Scholar
Meier, L., van de Geer, S., Buhlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. B 70, 53 (2008)
Article MATH Google Scholar
Puig, A., Wiesel, A., Fleury, G., Hero, A.: Multidimensional shrinkage-thresholding operator and group lasso penalties. IEEE Signal Process. Lett. 18, 363–366 (2011)
Article Google Scholar
Ravikumar, P., Lafferty, J., Liu, H., Wasserman, L.: Sparse additive models. J. R. Stat. Soc. B 71, 1009–1030 (2009)
Article MathSciNet Google Scholar
Scheetz, T., Kim, K., Swiderski, R., Philp, A., Braun, T., Knudtson, K., Dorrance, A., DiBona, G., Huang, J., Casavant, T., et al.: Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc. Natl. Acad. Sci. 103, 14429–14434 (2006)
Article Google Scholar
She, Y.: An iterative algorithm for fitting nonconvex penalized generalized linear models with grouped predictors. Comput. Stat. Data Anal. 56, 2976–2990 (2012)
Article MATH MathSciNet Google Scholar
Simon, N., Tibshirani, R.: Standardization and the group lasso penalty. Stat. Sin. 22, 983–1001 (2011)
MathSciNet Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
MATH MathSciNet Google Scholar
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001)
Article MATH MathSciNet Google Scholar
Wang, L., Chen, G., Li, H.: Group SCAD regression analysis for microarray time course gene expression data. Bioinformatics 23, 1486 (2007)
Article Google Scholar
Wang, L., Li, H., Huang, J.Z.: Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J. Am. Stat. Assoc. 103, 1556–1569 (2008)
Article MATH MathSciNet Google Scholar
Wu, T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2, 224–244 (2008)
Article MATH MathSciNet Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68, 49–67 (2006)
Article MATH MathSciNet Google Scholar
Zhang, C.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Article MATH Google Scholar
Zou, H., Li, R.: One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 36, 1509–1533 (2008)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgements

Jian Huang’s work is supported in part by NIH Grants R01CA120988, R01CA142774 and NSF Grants DMS-0805670 and DMS-1208225. The authors would like to thank Rob Mullins for the genetic association data analyzed in Sect. 5.2, as well as the associate editor and two anonymous reviewers, who provided many helpful remarks that led to considerable refinement of this article.

Author information

Authors and Affiliations

Department of Biostatistics, University of Iowa, Iowa City, US
Patrick Breheny & Jian Huang
Department of Statistics and Actuarial Sciences, University of Iowa, Iowa City, US
Jian Huang

Authors

Patrick Breheny
View author publications
You can also search for this author in PubMed Google Scholar
Jian Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick Breheny.

Appendix

Before proving Proposition 1, we establish the groupwise convexity of all the objective functions under consideration. Note that for group SCAD and group MCP, although they contain nonconvex components and are not necessarily convex overall, the objective functions are still convex with respect to the variables in a single group.

Lemma 1

The objective function Q(β _j) for regularized linear regression is a strictly convex function with respect to β _j for the group lasso, for group SCAD with γ>2, and for group MCP with γ>1.

Proof

Although Q(β _j) is not differentiable, it is directionally twice differentiable everywhere. Let $\nabla_{\mathbf{d}}^{2} Q(\boldsymbol{\beta} _{j})$ denote the second derivative of Q(β _j) in the direction d. Then the strict convexity of Q(β _j) follows if $\nabla_{\mathbf{d}}^{2} Q(\boldsymbol{\beta} _{j})$ is positive definite at all β _j and for all d. Let ξ _∗ denote the infimum over β _j and d of the minimum eigenvalue of $\nabla_{\mathbf{d}}^{2} Q(\boldsymbol{\beta}_{j})$. Then, after some algebra, we obtain

$$\begin{aligned} \begin{array}{l@{\quad}l} \displaystyle\xi_* = 1 & \mathrm{Group}\ \mathrm{lasso} \\ \displaystyle\xi_* = 1 - \frac{1}{\gamma-1} & \mathrm{Group}\ \mathrm{SCAD} \\ \displaystyle\xi_* = 1 - \frac{1}{\gamma}& \mathrm{Group}\ \mathrm{MCP}. \end{array} \end{aligned}$$

These quantities are positive under the conditions specified in the lemma. □

We now proceed to the proof of Proposition 1.

Proof of Proposition 1

The descent property is a direct consequence of the fact that each updating step consists of minimizing Q(β) with respect to β _j. Lemma 1, along with the fact that the least squares loss function is continuously differentiable and coercive, provide sufficient conditions to apply Theorem 4.1 of Tseng (2001), thereby establishing that every limit point of {β ^(m)} is a stationary point of Q(β).

We further note that {β ^(m)} is guaranteed to converge to a unique limit point. Suppose that the sequence possessed two limit points, β′ and β″, such that for at least one group, $\boldsymbol{\beta}'_{j} \neq\boldsymbol{\beta}''_{j}$. For the transition β′→β″ to occur, the algorithm must pass through the point $(\boldsymbol{\beta}''_{j}, \boldsymbol{\beta}'_{-j})$. However, by Lemma 1, $\boldsymbol{\beta}'_{j}$ is the unique value minimizing Q(β _j|β _−j). Thus, β′→β″ is not allowed by the group descent algorithm and {β ^(m)} possesses a single limit point. □

For Proposition 2, involving logistic regression, we proceed similarly, letting $R(\boldsymbol{\beta}|\tilde{\boldsymbol {\beta}})$ denote the majorizing approximation to Q(β) at $\tilde{\boldsymbol {\beta}}$.

Lemma 2

The majorizing approximation $R(\boldsymbol{\beta}_{j}|\tilde {\boldsymbol{\beta}})$ for regularized logistic regression is a strictly convex function with respect to β _j at all $\tilde{\boldsymbol{\beta}}$ for the group lasso, for group SCAD with γ>5, and for group MCP with γ>4.

Proof

Proceeding as in the previous lemma, and letting ξ _∗ denote the infimum over $\tilde{\boldsymbol{\beta}}$, β _j and d of the minimum eigenvalue of $\nabla_{\mathbf{d}}^{2} Q(\boldsymbol{\beta}_{j})$, we obtain

$$\begin{aligned} \begin{array}{l@{\quad}l} \displaystyle\xi_* = \frac{1}{4}& \mathrm{Group}\ \mathrm{lasso} \\ \displaystyle\xi_* = \frac{1}{4} - \frac{1}{\gamma-1}& \mathrm{Group}\ \mathrm{SCAD} \\ \displaystyle\xi_* = \frac{1}{4} - \frac{1}{\gamma}& \mathrm{Group}\ \mathrm{MCP}. \end{array} \end{aligned}$$

These quantities are positive under the conditions specified in the lemma. □

Proof of Proposition 2

The proposition makes two claims: descent with every iteration and convergence to a stationary point. To establish descent for logistic regression, we note that because L is twice differentiable, for any point η there exists a vector η ^∗∗ on the line segment joining η and η ^∗ such that

$$\begin{aligned} L(\boldsymbol{\eta}) &= L\bigl(\boldsymbol{\eta}^*\bigr) + \bigl(\boldsymbol{ \eta}-\boldsymbol{\eta}^*\bigr)^T \nabla L\bigl(\boldsymbol {\eta}^* \bigr) \\ &\quad{}+ \frac{1}{2} \bigl(\boldsymbol{\eta} -\boldsymbol{\eta }^* \bigr)^T \nabla^2 L\bigl(\boldsymbol{\eta}^{**} \bigr) \bigl(\boldsymbol{\eta }-\boldsymbol{\eta}^*\bigr) \\ &\leq\tilde{L}\bigl(\boldsymbol{\eta }|\boldsymbol{\eta}^*\bigr) \end{aligned}$$

where the inequality follows from the fact that v I−∇² L(η ^∗∗) is a positive semidefinite matrix. Descent now follows from the descent property of MM algorithms (Lange et al. 2000) coupled with the fact that each updating step consists of minimizing $R(\boldsymbol {\beta}_{j}|\tilde{\boldsymbol{\beta}})$.

To establish convergence to a stationary point, we note that if no elements of β tend to ±∞, then the descent property of the algorithm ensures that the sequence β ^(k) stays within a compact set and therefore possesses a limit point $\tilde{\boldsymbol{\beta }}$. Then, as in the proof of Proposition 1, Lemma 2 allows us to apply the results of Tseng (2001) and conclude that $\tilde {\boldsymbol{\beta}}$ must be a stationary point of $R(\boldsymbol{\beta}|\tilde {\boldsymbol{\beta}})$. Furthermore, because $R(\boldsymbol{\beta} |\tilde{\boldsymbol{\beta}})$ is tangent to Q(β) at $\tilde{\boldsymbol{\beta}}$, $\tilde{\boldsymbol{\beta}}$ must also be a stationary point of Q(β). □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Breheny, P., Huang, J. Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Stat Comput 25, 173–187 (2015). https://doi.org/10.1007/s11222-013-9424-2

Download citation

Received: 13 March 2013
Accepted: 27 September 2013
Published: 05 November 2013
Issue Date: March 2015
DOI: https://doi.org/10.1007/s11222-013-9424-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A group VISA algorithm for variable selection

Adaptive group Lasso for high-dimensional generalized linear models

Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Lemma 1

Proof

Proof of Proposition 1

Lemma 2

Proof

Proof of Proposition 2

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A group VISA algorithm for variable selection

Adaptive group Lasso for high-dimensional generalized linear models

Group linear algorithm with sparse principal decomposition: a variable selection and clustering method for generalized linear models

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Lemma 1

Proof

Proof of Proposition 1

Lemma 2

Proof

Proof of Proposition 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation