Abstract
In this paper, we propose a calibrated ConCave-Convex Procedure (CCCP) for variable selection in high-dimensional functional linear models. The calibrated CCCP approach for the Smoothly Clipped Absolute Deviation (SCAD) penalty is known to produce a consistent solution path with probability converging to one in linear models. We incorporate the SCAD penalty into function-on-scalar regression models and phrase them as a type of group-penalized estimation using a basis expansion approach. We then implement the calibrated CCCP method to solve the nonconvex group-penalized problem. For the tuning procedure, we use the Extended Bayesian Information Criterion (EBIC) to ensure consistency in high-dimensional settings. In simulation studies, we compare the performance of the proposed method with two existing convex-penalized estimators in terms of variable selection consistency and prediction accuracy. Lastly, we apply the method to the gene expression dataset for sparsely estimating the time-varying effects of transcription factors on the regulation of yeast cell cycle genes.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42952-023-00242-3/MediaObjects/42952_2023_242_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42952-023-00242-3/MediaObjects/42952_2023_242_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42952-023-00242-3/MediaObjects/42952_2023_242_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs42952-023-00242-3/MediaObjects/42952_2023_242_Fig4_HTML.png)
Similar content being viewed by others
Data availability
The yeast cell cycle gene expression dataset is available in the spls package in R.
References
Abramowicz, K., Häger, C. K., Pini, A., Schelin, L., de Luna, S. S., & Vantini, S. (2018). Nonparametric inference for functional-on-scalar linear models applied to knee kinematic hop data after injury of the anterior cruciate ligament. Scandinavian Journal of Statistics, 45(4), 1036–1061.
Banerjee, N. (2003). Identifying cooperativity among transcription factors controlling the cell cycle in yeast. Nucleic Acids Research, 31(23), 7024–7031.
Barber, R. F., Reimherr, M., & Schill, T. (2017). The function-on-scalar LASSO with applications to longitudinal GWAS. Electronic Journal of Statistics, 11(1), 1351–1389.
Cardot, H., Mas, A., & Sarda, P. (2006). CLT in functional linear regression models. Probability Theory and Related Fields, 138(3–4), 325–361.
Chen, J., & Chen, Z. (2008). Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), 759–771.
Chen, Y., Goldsmith, J., & Ogden, R. T. (2016). Variable selection in function-onscalar regression. Stat, 5(1), 88–101.
Cheng, C., & Li, L. M. (2008). Systematic identification of cell cycle regulated transcription factors from microarray time series data. BMC Genomics, 9(1), 116.
Chun, H., & Keleş, S. (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 72(1), 3–25.
Fan, J., Feng, Y., & Wu, Y. (2009). Network exploration via the adaptive LASSO and SCAD penalties. The Annals of Applied Statistics, 3(2), 521–541.
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.
Fan, Z., & Reimherr, M. (2017). High-dimensional adaptive function-on-scalar regression. Econometrics and Statistics, 1, 167–183.
Foygel, R., & Drton, M. (2010). Extended bayesian information criteria for gaussian graphical models. Advances in Neural Information Processing Systems, 23.
Guo, W. (2002). Functional mixed effects models. Biometrics, 58(1), 121–128.
Kim, Y., & Kwon, S. (2012). Global optimality of nonconvex penalized estimators. Biometrika, 99(2), 315–325.
Kim, Y., Choi, H., & Oh, H.-S. (2008). Smoothly clipped absolute deviation on high dimensions. Journal of the American Statistical Association, 103(484), 1665–1673.
Lee, S., Oh, M., & Kim, Y. (2015). Sparse optimization for nonconvex group penalized estimation. Journal of Statistical Computation and Simulation, 86(3), 597–610.
Lee, T. I., Rinaldi, N. J., Robert, F., Odom, D. T., Bar-Joseph, Z., Gerber, G. K., Hannett, N. M., Harbison, C. T., Thompson, C. M., Simon, I., Zeitlinger, J., Jennings, E. G., Murray, H. L., Gordon, D. B., Ren, B., Wyrick, J. J., Tagne, J.-B., Volkert, T. L., Fraenkel, E., & Young, R. A. (2002). Transcriptional regulatory networks in Saccharomyces cerevisiae. Science, 298(5594), 799–804.
Parodi, A., & Reimherr, M. (2018). Simultaneous variable selection and smoothing for high-dimensional function-on-scalar regression. Electronic Journal of Statistics, 12(2), 4602–4639.
Ramsay, J. O., & Silverman, B. W. (2005). Functional data analysis. Springer, New York.
Reiss, P. T., Huang, L., & Mennes, M. (2010). Fast function-on-scalar regression with penalized basis expansions. The International Journal of Biostatistics, 6(1), 28.
Scheipl, F., & Greven, S. (2016). Identifiability in penalized function-on-function regression models. Electronic Journal of Statistics, 10(1), 495–526.
Shedden, K., & Cooper, S. (2002). Analysis of cell-cycle-specific gene expression in human cells as determined by microarrays and double-thymidine block synchronization. Proceedings of the National Academy of Sciences, 99(7), 4379–4384.
Son, S., Park, C., & Jeon, Y. (2019). Sparse graphical models via calibrated concave convex procedure with application to fMRI data. Journal of Applied Statistics, 47(6), 997–1016.
Song, R., Yi, F., & Zou, H. (2014). On varying-coefficient independence screening for high-dimensional varying-coefficient models. Statistica Sinica, 24(4), 1735–1752.
Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D., & Futcher, B. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9(12), 3273–3297.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.
Tsai, H.-K., Lu, H.H.-S., & Li, W.-H. (2005). Statistical methods for identifying yeast cell cycle transcription factors. Proceedings of the National Academy of Sciences, 102(38), 13532–13537.
Uemura, H., & Fraenkel, D. G. (1990). Gcr2, a new mutation affecting glycolytic gene expression in saccharomyces cerevisiae. Molecular and Cellular Biology, 10(12), 6389–6396.
Uemura, H., & Jigami, Y. (1992). Role of GCR2 in transcriptional activation of yeast glycolytic genes. Molecular and Cellular Biology, 12(9), 3834–3842.
Wang, J.-L., Chiou, J.-M., & Müller, H.-G. (2016). Functional data analysis. Annual Review of Statistics and Its Application, 3(1), 257–295.
Wang, L., Kim, Y., & Li, R. (2013). Calibrating nonconvex penalized regression in ultra-high dimension. The Annals of Statistics, 41(5), 2505–2536.
Wang, L., Chen, G., & Li, H. (2007). Group SCAD regression analysis for microarray time course gene expression data. Bioinformatics, 23(12), 1486–1494.
Wang, L., Li, H., & Huang, J. Z. (2008). Variable selection in nonparametric varyingcoefficient models for analysis of repeated measurements. Journal of the American Statistical Association, 103(484), 1556–1569.
Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(1), 49–67.
Yuille, A. L., & Rangarajan, A. (2003). The concave-convex procedure. Neural Computation, 15(4), 915–936.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429.
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (NRF-2019R1A2C1005979, NRF-2022R1A4A1033384).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lee, Y.J., Jeon, Y. Sparse functional linear models via calibrated concave-convex procedure. J. Korean Stat. Soc. 53, 189–207 (2024). https://doi.org/10.1007/s42952-023-00242-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-023-00242-3