Large-Scale Data Challenges: Instability in Statistical Learning

Chen, Bo-Yu; Zhang, Hao

doi:10.1007/978-981-97-0827-7_17

Bo-Yu Chen⁸ &
Hao Zhang⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2015))

Included in the following conference series:

International Conference on Applied Intelligence

169 Accesses

Abstract

Numerous approximation methods have been developed to approximate both the kernel matrix and its inverse. We investigate one such influential approximation that has recently gained popularity. However, our results indicate that this approximation fails to address the ill-conditioning of the kernel matrix, potentially leading to significantly large biases and highly unstable prediction results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 64.19; Price includes VAT (Germany)

Softcover Book: EUR 80.24; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Regularization: From Inverse Problems to Large-Scale Machine Learning

Optimal Learning Rates for Kernel Partial Least Squares

Article 07 April 2017

Semi-supervised Smoothing for Large Data Problems

References

Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge, MA (2002)
Google Scholar
Stein, M.L.: Interpolation of Spatial Data: Some Theory for Kriging. Springer (1999). https://doi.org/10.1007/978-1-4612-1494-6
Belkin, M.: Approximation beats concentration? An approximation view on inference with smooth radial kernels. Proc. Mach. Learn. Res. 75, 1–14 (2018)
Google Scholar
Williams, C., Seeger, M.: Using the Nyström method to speed up kernel machines. In: Leen, T., Dietterich, T., Tresp, V. (eds.) Advances in Neural Information Processing Systems, vol. 13. MIT Press, Cambridge (2000)
Google Scholar
Drineas, P., Mahoney, M.W.: On the Nystrom method for approximating a gram matrix for improved kernel-based learning. J. Mach. Learn. Res. 6, 2153–2175 (2005)
MathSciNet Google Scholar
Bach, F.: Sharp analysis of low-rank kernel matrix approximations. In: JMLR: Workshop and Conference Proceedings, vol. 30, pp. 1–25, May 2013
Google Scholar
Zhang, K., Tsang, I.W., Kwok, J.T.: Improved Nystrom low-rank approximation and error analysis. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1232–1239. ACM (2008)
Google Scholar
Gittens, A., Mahoney, M.W.: Revisiting the Nyström method for improved large-scale machine learning. J. Mach. Learn. Res. 17(1), 3977–4041 (2016)
Google Scholar
Furrer, R., Genton, M.G., Nychka, D.: Covariance tapering for interpolation of large spatial datasets. J. Comput. Graph. Stat. 15(3), 502–523 (2006)
Article MathSciNet Google Scholar
Kaufman, C.G., Schervish, M.J., Nychka, D.W.: Covariance tapering for likelihood-based estimation in large spatial data sets. J. Am. Stat. Assoc. 103(484), 1545–1555 (2008). Taylor & Francis
Google Scholar
Stein, M.L., Chen, J., Anitescu, M., et al.: Stochastic approximation of score functions for Gaussian processes. Ann. Appl. Stat. 7(2), 1162–1191 (2013). Institute of Mathematical Statistics
Google Scholar
Du, J., Zhang, H., Mandrekar, V.S.: Fixed-domain asymptotic properties of tapered maximum likelihood estimators. Ann. Stat. 37(6A), 3330–3361 (2009)
Article MathSciNet Google Scholar
Stein, M.L., Chi, Z., Welty, L.J.: Approximating likelihoods for large spatial data sets. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 66(2), 275–296 (2004). Wiley
Google Scholar
Eidsvik, J., Shaby, B.A., Reich, B.J., Wheeler, M., Niemi, J.: Estimation and prediction in spatial models with block composite likelihoods. J. Comput. Graph. Stat. 23(2), 295–315 (2014). Taylor & Francis
Google Scholar
Datta, A., Banerjee, S., Finley, A.O., Gelfand, A.E.: Hierarchical nearest-neighbor gaussian process models for large geostatistical datasets. J. Am. Stat. Assoc. 111(514), 800–812 (2016)
Article MathSciNet Google Scholar
Guinness, J.: Permutation and grou** methods for sharpening gaussian process approximations. Technometrics 60(4), 415–429 (2018)
Article MathSciNet Google Scholar
Datta, A.: Sparse nearest neighbor Cholesky matrices in spatial statistics (2021). ar**v:2102.13299 [stat]
Datta, A.: Nearest-neighbor sparse Cholesky matrices in spatial statistics. WIREs Comput. Stat. 14(5), e1574 (2022)
Article MathSciNet Google Scholar
Zhang, H.: Spatial process approximations: assessing their necessity (2023). ar**v:2311.03201 [stat.ML]
Braun, M.L.: Accurate error bounds for the eigenvalues of the kernel matrix. J. Mach. Learn. Res. 7(82), 2303–2328 (2006)
MathSciNet Google Scholar
Jia, L., Liao, S.: Accurate probabilistic error bound for eigenvalues of kernel matrix. In: Zhou, Z.-H., Washio, T. (eds.) ACML 2009. LNCS (LNAI), vol. 5828, pp. 162–175. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-05224-8_14
Chapter Google Scholar
Vecchia, A.V.: Estimation and model identification for continuous spatial processes. J. Roy. Stat. Soc. B 50, 297–312 (1988)
MathSciNet Google Scholar
Golub, G.H., Loan, C.F.V.: Matrix Computations, 4th edn. Johns Hopkins University Press, Baltimore (2012)
Google Scholar
Trefethen, L.N., Bau III, D.: Numerical Linear Algebra. Society for Industrial and Applied Mathematics (SIAM) (1997)
Google Scholar
O’Dowd, R.: Conditioning of coefficient matrices of ordinary kriging. Math. Geol. 23, 721–739 (1991)
Article MathSciNet Google Scholar
McCourt, M., Fasshauer, G.E.: Stable likelihood computation for gaussian random fields. In: Recent Applications of Harmonic Analysis to Function Spaces, Differential Equations, and Data Science: Novel Methods in Harmonic Analysis, vol. 2, pp. 917–943 (2017)
Google Scholar
Basak, S., Petit, S., Bect, J., Vazquez, E.: Numerical issues in maximum likelihood parameter estimation for Gaussian process interpolation. In: Nicosia, G., et al. (eds.) LOD 2021. LNCS, vol. 13164, pp. 116–131. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-95470-3_9
Zhang, H.: Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. J. Am. Stat. Assoc. 99(465), 250–261 (2004)
Article MathSciNet Google Scholar
Wang, D., Loh, W.L.: On fixed-domain asymptotics and covariance tapering in Gaussian random field models. Electron. J. Statist 5, 238–269 (2011)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Purdue University, West Lafayette, IN, 47906, USA
Bo-Yu Chen & Hao Zhang

Authors

Bo-Yu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao Zhang .

Editor information

Editors and Affiliations

Eastern Institute of Technology, Zhejiang, China
De-Shuang Huang
University of Wollongong, North Wollongong, NSW, Australia
Prashan Premaratne
Guangxi Academy of Sciences, Guangxi, China
Changan Yuan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, BY., Zhang, H. (2024). Large-Scale Data Challenges: Instability in Statistical Learning. In: Huang, DS., Premaratne, P., Yuan, C. (eds) Applied Intelligence. ICAI 2023. Communications in Computer and Information Science, vol 2015. Springer, Singapore. https://doi.org/10.1007/978-981-97-0827-7_17

Download citation

DOI: https://doi.org/10.1007/978-981-97-0827-7_17
Published: 01 March 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0826-0
Online ISBN: 978-981-97-0827-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Large-Scale Data Challenges: Instability in Statistical Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Regularization: From Inverse Problems to Large-Scale Machine Learning

Optimal Learning Rates for Kernel Partial Least Squares

Semi-supervised Smoothing for Large Data Problems

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Large-Scale Data Challenges: Instability in Statistical Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Regularization: From Inverse Problems to Large-Scale Machine Learning

Optimal Learning Rates for Kernel Partial Least Squares

Semi-supervised Smoothing for Large Data Problems

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation