Abstract
Penalized regression is an attractive framework for variable selection problems. Often, variables possess a grou** structure, and the relevant selection problem is that of selecting groups, not individual variables. The group lasso has been proposed as a way of extending the ideas of the lasso to the problem of group selection. Nonconvex penalties such as SCAD and MCP have been proposed and shown to have several advantages over the lasso; these penalties may also be extended to the group selection problem, giving rise to group SCAD and group MCP methods. Here, we describe algorithms for fitting these models stably and efficiently. In addition, we present simulation results and real data examples comparing and contrasting the statistical properties of these methods.
Similar content being viewed by others
References
Bakin, S.: Adaptive regression and model selection in data mining problems. Ph.D. thesis, Australian National University (1999)
Bertsekas, D.: Nonlinear Programming, 2nd edn. Athena Scientific, Nashua (1999)
Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5, 232–253 (2011)
Chiang, A., Beck, J., Yen, H., Tayeh, M., Scheetz, T., Swiderski, R., Nishimura, D., Braun, T., Kim, K., Huang, J., et al.: Homozygosity map** with SNP arrays identifies TRIM32, an E3 ubiquitin ligase, as a Bardet–Biedl syndrome gene (BBS11). Proc. Natl. Acad. Sci. 103, 6287–6292 (2006)
Donoho, D., Johnstone, J.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425–455 (1994)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Foygel, R., Drton, M.: Exact block-wise optimization in group lasso and sparse group lasso for linear regression (2010). Arxiv preprint ar**v:1010.3320
Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1, 302–332 (2007)
Friedman, J., Hastie, T., Tibshirani, R.: A note on the group lasso and a sparse group lasso (2010a). Arxiv preprint ar**v:1001.0736
Friedman, J.H., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010b)
Fu, W.: Penalized regressions: the bridge versus the lasso. J. Comput. Graph. Stat. 7, 397–416 (1998)
Gao, H., Bruce, A.: Waveshrink with firm shrinkage. Stat. Sin. 7, 855–874 (1997)
Huang, J., Breheny, P., Ma, S.: A selective review of group selection in high-dimensional models. Stat. Sci. 27, 481–499 (2012)
Hunter, D., Lange, K.: A tutorial on MM algorithms. Am. Stat. 58, 30–38 (2004)
Krishnapuram, B., Carin, L., Figueiredo, M., Hartemink, A.: Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27, 957–968 (2005)
Lange, K., Hunter, D., Yang, I.: Optimization transfer using surrogate objective functions. J. Comput. Graph. Stat. 9, 1–20 (2000)
Meier, L., van de Geer, S., Buhlmann, P.: The group lasso for logistic regression. J. R. Stat. Soc. B 70, 53 (2008)
Puig, A., Wiesel, A., Fleury, G., Hero, A.: Multidimensional shrinkage-thresholding operator and group lasso penalties. IEEE Signal Process. Lett. 18, 363–366 (2011)
Ravikumar, P., Lafferty, J., Liu, H., Wasserman, L.: Sparse additive models. J. R. Stat. Soc. B 71, 1009–1030 (2009)
Scheetz, T., Kim, K., Swiderski, R., Philp, A., Braun, T., Knudtson, K., Dorrance, A., DiBona, G., Huang, J., Casavant, T., et al.: Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc. Natl. Acad. Sci. 103, 14429–14434 (2006)
She, Y.: An iterative algorithm for fitting nonconvex penalized generalized linear models with grouped predictors. Comput. Stat. Data Anal. 56, 2976–2990 (2012)
Simon, N., Tibshirani, R.: Standardization and the group lasso penalty. Stat. Sin. 22, 983–1001 (2011)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109, 475–494 (2001)
Wang, L., Chen, G., Li, H.: Group SCAD regression analysis for microarray time course gene expression data. Bioinformatics 23, 1486 (2007)
Wang, L., Li, H., Huang, J.Z.: Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J. Am. Stat. Assoc. 103, 1556–1569 (2008)
Wu, T., Lange, K.: Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2, 224–244 (2008)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68, 49–67 (2006)
Zhang, C.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Zou, H., Li, R.: One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 36, 1509–1533 (2008)
Acknowledgements
Jian Huang’s work is supported in part by NIH Grants R01CA120988, R01CA142774 and NSF Grants DMS-0805670 and DMS-1208225. The authors would like to thank Rob Mullins for the genetic association data analyzed in Sect. 5.2, as well as the associate editor and two anonymous reviewers, who provided many helpful remarks that led to considerable refinement of this article.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Before proving Proposition 1, we establish the groupwise convexity of all the objective functions under consideration. Note that for group SCAD and group MCP, although they contain nonconvex components and are not necessarily convex overall, the objective functions are still convex with respect to the variables in a single group.
Lemma 1
The objective function Q(β j ) for regularized linear regression is a strictly convex function with respect to β j for the group lasso, for group SCAD with γ>2, and for group MCP with γ>1.
Proof
Although Q(β j ) is not differentiable, it is directionally twice differentiable everywhere. Let \(\nabla_{\mathbf{d}}^{2} Q(\boldsymbol{\beta} _{j})\) denote the second derivative of Q(β j ) in the direction d. Then the strict convexity of Q(β j ) follows if \(\nabla_{\mathbf{d}}^{2} Q(\boldsymbol{\beta} _{j})\) is positive definite at all β j and for all d. Let ξ ∗ denote the infimum over β j and d of the minimum eigenvalue of \(\nabla_{\mathbf{d}}^{2} Q(\boldsymbol{\beta}_{j})\). Then, after some algebra, we obtain
These quantities are positive under the conditions specified in the lemma. □
We now proceed to the proof of Proposition 1.
Proof of Proposition 1
The descent property is a direct consequence of the fact that each updating step consists of minimizing Q(β) with respect to β j . Lemma 1, along with the fact that the least squares loss function is continuously differentiable and coercive, provide sufficient conditions to apply Theorem 4.1 of Tseng (2001), thereby establishing that every limit point of {β (m)} is a stationary point of Q(β).
We further note that {β (m)} is guaranteed to converge to a unique limit point. Suppose that the sequence possessed two limit points, β′ and β″, such that for at least one group, \(\boldsymbol{\beta}'_{j} \neq\boldsymbol{\beta}''_{j}\). For the transition β′→β″ to occur, the algorithm must pass through the point \((\boldsymbol{\beta}''_{j}, \boldsymbol{\beta}'_{-j})\). However, by Lemma 1, \(\boldsymbol{\beta}'_{j}\) is the unique value minimizing Q(β j |β −j ). Thus, β′→β″ is not allowed by the group descent algorithm and {β (m)} possesses a single limit point. □
For Proposition 2, involving logistic regression, we proceed similarly, letting \(R(\boldsymbol{\beta}|\tilde{\boldsymbol {\beta}})\) denote the majorizing approximation to Q(β) at \(\tilde{\boldsymbol {\beta}}\).
Lemma 2
The majorizing approximation \(R(\boldsymbol{\beta}_{j}|\tilde {\boldsymbol{\beta}})\) for regularized logistic regression is a strictly convex function with respect to β j at all \(\tilde{\boldsymbol{\beta}}\) for the group lasso, for group SCAD with γ>5, and for group MCP with γ>4.
Proof
Proceeding as in the previous lemma, and letting ξ ∗ denote the infimum over \(\tilde{\boldsymbol{\beta}}\), β j and d of the minimum eigenvalue of \(\nabla_{\mathbf{d}}^{2} Q(\boldsymbol{\beta}_{j})\), we obtain
These quantities are positive under the conditions specified in the lemma. □
Proof of Proposition 2
The proposition makes two claims: descent with every iteration and convergence to a stationary point. To establish descent for logistic regression, we note that because L is twice differentiable, for any point η there exists a vector η ∗∗ on the line segment joining η and η ∗ such that
where the inequality follows from the fact that v I−∇2 L(η ∗∗) is a positive semidefinite matrix. Descent now follows from the descent property of MM algorithms (Lange et al. 2000) coupled with the fact that each updating step consists of minimizing \(R(\boldsymbol {\beta}_{j}|\tilde{\boldsymbol{\beta}})\).
To establish convergence to a stationary point, we note that if no elements of β tend to ±∞, then the descent property of the algorithm ensures that the sequence β (k) stays within a compact set and therefore possesses a limit point \(\tilde{\boldsymbol{\beta }}\). Then, as in the proof of Proposition 1, Lemma 2 allows us to apply the results of Tseng (2001) and conclude that \(\tilde {\boldsymbol{\beta}}\) must be a stationary point of \(R(\boldsymbol{\beta}|\tilde {\boldsymbol{\beta}})\). Furthermore, because \(R(\boldsymbol{\beta} |\tilde{\boldsymbol{\beta}})\) is tangent to Q(β) at \(\tilde{\boldsymbol{\beta}}\), \(\tilde{\boldsymbol{\beta}}\) must also be a stationary point of Q(β). □
Rights and permissions
About this article
Cite this article
Breheny, P., Huang, J. Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Stat Comput 25, 173–187 (2015). https://doi.org/10.1007/s11222-013-9424-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-013-9424-2