Abstract
In high-dimensional data modeling, variable selection plays a crucial role in improving predictive accuracy and enhancing model interpretability through sparse representation. Unfortunately, certain variable selection methods encounter challenges such as insufficient model sparsity, high computational overhead, and difficulties in handling large-scale data. Recently, axis-aligned random projection techniques have been applied to address these issues by selecting variables. However, these techniques have seen limited application in handling complex data within the regression framework. In this study, we propose a novel method, sparse partial least squares via axis-aligned random projection, designed for the analysis of high-dimensional data. Initially, axis-aligned random projection is utilized to obtain a sparse loading vector, significantly reducing computational complexity. Subsequently, partial least squares regression is conducted within the subspace of the top-ranked significant variables. The submatrices are iteratively updated until an optimal sparse partial least squares model is achieved. Comparative analysis with some state-of-the-art high-dimensional regression methods demonstrates that the proposed method exhibits superior predictive performance. To illustrate its effectiveness, we apply the method to four cases, including one simulated dataset and three real-world datasets. The results show the proposed method’s ability to identify important variables in all four cases.
Similar content being viewed by others
References
Ahn, S.C., Bae, J.: Forecasting with Partial Least Squares When a Large Number of Predictors are Available. Available at SSRN 4248450 (2022)
Anderlucci, L., Fortunato, F., Montanari, A.: High-dimensional clustering via Random Projections. J. Classif. 39, 1–26 (2022)
Brown, P.J., Fearn, T., Vannucci, M.: Bayesian wavelet regression on curves with application to a spectroscopic calibration problem. J. Am. Stat. Assoc. 96(454), 398–408 (2001)
Cannings, T.I., Samworth, R.J.: Random-projection ensemble classification. J. R. Stat. Soc. Ser. B Stat. Methodol. 79(4), 959–1035 (2017)
Chun, H., Keleş, S.: Sparse partial least squares regression for simultaneous dimension reduction and variable selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 72(1), 3–25 (2010)
Cook, R.D., Forzani, L.: Partial least squares prediction in high-dimensional regression. Ann. Stat. 47(2), 884–908 (2019)
De Jong, S.: SIMPLS: an alternative approach to partial least squares regression. Chemom. Intell. Lab. Syst. 18(3), 251–263 (1993)
Ding, Y., et al.: Variable selection and regularization via arbitrary rectangle-range generalized elastic net. Stat. Comput. 33(3), 72 (2023)
Gataric, M., Wang, T., Samworth, R.J.: Sparse principal component analysis via axis-aligned random projections. J. R. Stat. Soc. Ser. B Stat. Methodol. 82(2), 329–359 (2020)
Heinze, C., McWilliams, B., Meinshausen, N.: Dual-loco: distributing statistical estimation using random projections. In: Artificial Intelligence and Statistics, PMLR (2016)
Huang, X., et al.: Modeling the relationship between LVAD support time and gene expression changes in the human heart by penalized partial least squares. Bioinformatics 20(6), 888–894 (2004)
Johnson, W.B., Lindenstrauss, J., Schechtman, G.: Extensions of Lipschitz maps into Banach spaces. Isr. J. Math. 54(2), 129–138 (1986)
Lê Cao, K.-A., et al.: A sparse PLS for variable selection when integrating omics data. Stat. Appl. Genet. Mol. Biol. 7(1) 35 (2008)
Lê Cao, K.-A., et al.: Sparse canonical methods for biological data integration: application to a cross-platform study. BMC Bioinform. 10, 1–17 (2009)
Lee, Y., et al.: Variable selection using conditional AIC for linear mixed models with data-driven transformations. Stat. Comput. 33(1), 27 (2023)
Li, W., et al.: A PLS-based pruning algorithm for simplified long-short term memory neural network in time series prediction. Knowl. Based Syst. 254, 109608 (2022)
Lin, Y.W., et al.: Fisher optimal subspace shrinkage for block variable selection with applications to NIR spectroscopic analysis. Chemom. Intell. Lab. Syst. 159, 196–204 (2016)
Lin, Y.W., et al.: Ordered homogeneity pursuit lasso for group variable selection with applications to spectroscopic data. Chemom. Intell. Lab. Syst. 168, 62–71 (2017)
Mahoney, M.W.: Randomized algorithms for matrices and data. Found. Trends® Mach. Learn. 3(2), 123–224 (2011)
Maillard, O., Munos, R.: Compressed least-squares regression. Adv. Neural Inf. Process. Syst. 22, 1213–1221 (2009)
McWilliams, B., et al.: LOCO: distributing ridge regression with random projections. Stat. 1050, 26–50 (2014)
Mukhopadhyay, M., Dunson, D.B.: Targeted random projection for prediction from high-dimensional features. J. Am. Stat. Assoc. 115(532), 1998–2010 (2020)
O’Neill, M., Burke, K.: Variable selection using a smooth information criterion for distributional regression models. Stat. Comput. 33(3), 71 (2023)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58(1), 267–288 (1996)
Vempala, S.S.: The random projection method, vol. 65. American Mathematical Society (2005)
Wang, T., et al.: Sharp-SSL: Selective High-Dimensional Axis-Aligned Random Projections for Semi-supervised Learning. ar**v preprint ar**v:2304.09154 (2023)
Wold, H.: Estimation of principal components and related models by iterative least squares. In P. R. Krishnajah (Ed.), Multivariate analysis, New York: Academic Press, pp. 391–420 (1966)
Woodruff, D.P.: Sketching as a tool for numerical linear algebra. Found. Trends® Theor. Comput. Sci. 10.1—-10.2, 1–157 (2014)
**e, Z., Chen, X.: Subsampling for partial least-squares regression via an influence function. Knowl. Based Syst. 245, 108661 (2022)
Yang, F., et al.: How to reduce dimension with PCA and random projections? IEEE Trans. Inf. Theory 67(12), 8154–8189 (2021)
Yun, Y.-H., et al.: An efficient method of wavelength interval selection based on random frog for multivariate spectral calibration. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 111, 31–36 (2013)
Zhang, J., Wu, R., Chen, X.: Sparse Sliced Inverse Regression via Random Projection. ar**v preprint ar**v:2305.05141 (2023)
Zhu, G., Zhihua, S.: Envelope-based sparse partial least squares. Ann. Stat. 48(1), 161–182 (2020)
Acknowledgements
This work is financially supported by the National Natural Science Foundation Committee of PR China (Grants Nos. 11801105, 1236010386, 71901074), the Guangxi Science and Technology Project (Grant No. Guike AD19245106), the Guangdong Basic and Applied Basic Research Foundation (Grant No. 2022A1515110315), and the Fundamental Research Grant Scheme of Malaysia (Grant No. FRGS/1/2021/STG06/SYUC/03/1).
Author information
Authors and Affiliations
Contributions
Youwu Lin: Conceptualization, methodology. **n Zeng: Software, writing—original draft. Pei Wang: Supervision, writing— reviewing and editing. Shuai Huang: Investigation, writing— original draft. Kok Lay Teo: Supervision, writing—reviewing and editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interests regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lin, Y., Zeng, X., Wang, P. et al. Variable selection using axis-aligned random projections for partial least-squares regression. Stat Comput 34, 105 (2024). https://doi.org/10.1007/s11222-024-10417-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-024-10417-5