Log in

Sparse functional linear models via calibrated concave-convex procedure

  • Research Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

In this paper, we propose a calibrated ConCave-Convex Procedure (CCCP) for variable selection in high-dimensional functional linear models. The calibrated CCCP approach for the Smoothly Clipped Absolute Deviation (SCAD) penalty is known to produce a consistent solution path with probability converging to one in linear models. We incorporate the SCAD penalty into function-on-scalar regression models and phrase them as a type of group-penalized estimation using a basis expansion approach. We then implement the calibrated CCCP method to solve the nonconvex group-penalized problem. For the tuning procedure, we use the Extended Bayesian Information Criterion (EBIC) to ensure consistency in high-dimensional settings. In simulation studies, we compare the performance of the proposed method with two existing convex-penalized estimators in terms of variable selection consistency and prediction accuracy. Lastly, we apply the method to the gene expression dataset for sparsely estimating the time-varying effects of transcription factors on the regulation of yeast cell cycle genes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

The yeast cell cycle gene expression dataset is available in the spls package in R.

References

  • Abramowicz, K., Häger, C. K., Pini, A., Schelin, L., de Luna, S. S., & Vantini, S. (2018). Nonparametric inference for functional-on-scalar linear models applied to knee kinematic hop data after injury of the anterior cruciate ligament. Scandinavian Journal of Statistics, 45(4), 1036–1061.

    Article  MathSciNet  Google Scholar 

  • Banerjee, N. (2003). Identifying cooperativity among transcription factors controlling the cell cycle in yeast. Nucleic Acids Research, 31(23), 7024–7031.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Barber, R. F., Reimherr, M., & Schill, T. (2017). The function-on-scalar LASSO with applications to longitudinal GWAS. Electronic Journal of Statistics, 11(1), 1351–1389.

    Article  MathSciNet  Google Scholar 

  • Cardot, H., Mas, A., & Sarda, P. (2006). CLT in functional linear regression models. Probability Theory and Related Fields, 138(3–4), 325–361.

    MathSciNet  Google Scholar 

  • Chen, J., & Chen, Z. (2008). Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), 759–771.

    Article  MathSciNet  Google Scholar 

  • Chen, Y., Goldsmith, J., & Ogden, R. T. (2016). Variable selection in function-onscalar regression. Stat, 5(1), 88–101.

    Article  MathSciNet  PubMed  Google Scholar 

  • Cheng, C., & Li, L. M. (2008). Systematic identification of cell cycle regulated transcription factors from microarray time series data. BMC Genomics, 9(1), 116.

    Article  PubMed  PubMed Central  Google Scholar 

  • Chun, H., & Keleş, S. (2010). Sparse partial least squares regression for simultaneous dimension reduction and variable selection. Journal of the Royal Statistical Society Series B: Statistical Methodology, 72(1), 3–25.

    Article  MathSciNet  PubMed  Google Scholar 

  • Fan, J., Feng, Y., & Wu, Y. (2009). Network exploration via the adaptive LASSO and SCAD penalties. The Annals of Applied Statistics, 3(2), 521–541.

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.

    Article  MathSciNet  Google Scholar 

  • Fan, Z., & Reimherr, M. (2017). High-dimensional adaptive function-on-scalar regression. Econometrics and Statistics, 1, 167–183.

    Article  MathSciNet  Google Scholar 

  • Foygel, R., & Drton, M. (2010). Extended bayesian information criteria for gaussian graphical models. Advances in Neural Information Processing Systems, 23.

  • Guo, W. (2002). Functional mixed effects models. Biometrics, 58(1), 121–128.

    Article  MathSciNet  PubMed  Google Scholar 

  • Kim, Y., & Kwon, S. (2012). Global optimality of nonconvex penalized estimators. Biometrika, 99(2), 315–325.

    Article  MathSciNet  Google Scholar 

  • Kim, Y., Choi, H., & Oh, H.-S. (2008). Smoothly clipped absolute deviation on high dimensions. Journal of the American Statistical Association, 103(484), 1665–1673.

    Article  ADS  MathSciNet  CAS  Google Scholar 

  • Lee, S., Oh, M., & Kim, Y. (2015). Sparse optimization for nonconvex group penalized estimation. Journal of Statistical Computation and Simulation, 86(3), 597–610.

    Article  MathSciNet  Google Scholar 

  • Lee, T. I., Rinaldi, N. J., Robert, F., Odom, D. T., Bar-Joseph, Z., Gerber, G. K., Hannett, N. M., Harbison, C. T., Thompson, C. M., Simon, I., Zeitlinger, J., Jennings, E. G., Murray, H. L., Gordon, D. B., Ren, B., Wyrick, J. J., Tagne, J.-B., Volkert, T. L., Fraenkel, E., & Young, R. A. (2002). Transcriptional regulatory networks in Saccharomyces cerevisiae. Science, 298(5594), 799–804.

    Article  ADS  CAS  PubMed  Google Scholar 

  • Parodi, A., & Reimherr, M. (2018). Simultaneous variable selection and smoothing for high-dimensional function-on-scalar regression. Electronic Journal of Statistics, 12(2), 4602–4639.

    Article  MathSciNet  Google Scholar 

  • Ramsay, J. O., & Silverman, B. W. (2005). Functional data analysis. Springer, New York.

  • Reiss, P. T., Huang, L., & Mennes, M. (2010). Fast function-on-scalar regression with penalized basis expansions. The International Journal of Biostatistics, 6(1), 28.

    Article  MathSciNet  PubMed  Google Scholar 

  • Scheipl, F., & Greven, S. (2016). Identifiability in penalized function-on-function regression models. Electronic Journal of Statistics, 10(1), 495–526.

  • Shedden, K., & Cooper, S. (2002). Analysis of cell-cycle-specific gene expression in human cells as determined by microarrays and double-thymidine block synchronization. Proceedings of the National Academy of Sciences, 99(7), 4379–4384.

    Article  ADS  CAS  Google Scholar 

  • Son, S., Park, C., & Jeon, Y. (2019). Sparse graphical models via calibrated concave convex procedure with application to fMRI data. Journal of Applied Statistics, 47(6), 997–1016.

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

  • Song, R., Yi, F., & Zou, H. (2014). On varying-coefficient independence screening for high-dimensional varying-coefficient models. Statistica Sinica, 24(4), 1735–1752.

    MathSciNet  PubMed  PubMed Central  Google Scholar 

  • Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D., & Futcher, B. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9(12), 3273–3297.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267–288.

    MathSciNet  Google Scholar 

  • Tsai, H.-K., Lu, H.H.-S., & Li, W.-H. (2005). Statistical methods for identifying yeast cell cycle transcription factors. Proceedings of the National Academy of Sciences, 102(38), 13532–13537.

    Article  ADS  CAS  Google Scholar 

  • Uemura, H., & Fraenkel, D. G. (1990). Gcr2, a new mutation affecting glycolytic gene expression in saccharomyces cerevisiae. Molecular and Cellular Biology, 10(12), 6389–6396.

    CAS  PubMed  PubMed Central  Google Scholar 

  • Uemura, H., & Jigami, Y. (1992). Role of GCR2 in transcriptional activation of yeast glycolytic genes. Molecular and Cellular Biology, 12(9), 3834–3842.

    CAS  PubMed  PubMed Central  Google Scholar 

  • Wang, J.-L., Chiou, J.-M., & Müller, H.-G. (2016). Functional data analysis. Annual Review of Statistics and Its Application, 3(1), 257–295.

    Article  ADS  Google Scholar 

  • Wang, L., Kim, Y., & Li, R. (2013). Calibrating nonconvex penalized regression in ultra-high dimension. The Annals of Statistics, 41(5), 2505–2536.

    Article  MathSciNet  PubMed  Google Scholar 

  • Wang, L., Chen, G., & Li, H. (2007). Group SCAD regression analysis for microarray time course gene expression data. Bioinformatics, 23(12), 1486–1494.

    Article  CAS  PubMed  Google Scholar 

  • Wang, L., Li, H., & Huang, J. Z. (2008). Variable selection in nonparametric varyingcoefficient models for analysis of repeated measurements. Journal of the American Statistical Association, 103(484), 1556–1569.

    Article  MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  • Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(1), 49–67.

    Article  MathSciNet  Google Scholar 

  • Yuille, A. L., & Rangarajan, A. (2003). The concave-convex procedure. Neural Computation, 15(4), 915–936.

    Article  CAS  PubMed  Google Scholar 

  • Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101(476), 1418–1429.

    Article  MathSciNet  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (NRF-2019R1A2C1005979, NRF-2022R1A4A1033384).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongho Jeon.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, Y.J., Jeon, Y. Sparse functional linear models via calibrated concave-convex procedure. J. Korean Stat. Soc. 53, 189–207 (2024). https://doi.org/10.1007/s42952-023-00242-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42952-023-00242-3

Keywords

Navigation