Abstract
In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence structure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish. We analyze here the simple Kriging task, the flagship problem in Geostatistics, from a statistical learning perspective, i.e., by carrying out a nonparametric finite-sample predictive analysis. Given \(d\ge 1\) values taken by a realization of a square integrable random field \(X=\{X_s\}_{s\in S}\), \(S\subset {\mathbb {R}}^2\), with unknown covariance structure, at sites \(s_1,\; \ldots ,\; s_d\) in S, the goal is to predict the unknown values it takes at any other location \(s\in S\) with minimum quadratic risk. The prediction rule being derived from a training spatial dataset: a single realization \(X'\) of X, is independent from those to be predicted, observed at \(n\ge 1\) locations \(\sigma _1,\; \ldots ,\; \sigma _n\) in S. Despite the connection of this minimization problem with kernel ridge regression, establishing the generalization capacity of empirical risk minimizers is far from straightforward, due to the non-independent and identically distributed nature of the training data \(X'_{\sigma _1},\; \ldots ,\; X'_{\sigma _n}\) involved in the learning procedure. In this article, non-asymptotic bounds of order \(O_{{\mathbb {P}}}(1/\sqrt{n})\) are proved for the excess risk of a plug-in predictive rule mimicking the true minimizer in the case of isotropic stationary Gaussian processes, observed at locations forming a regular grid in the learning stage. These theoretical results, as well as the role played by the technical conditions required to establish them, are illustrated by various numerical experiments, on simulated data and on real-world datasets, and hopefully pave the way for further developments in statistical learning based on spatial data.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11749-023-00891-w/MediaObjects/11749_2023_891_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11749-023-00891-w/MediaObjects/11749_2023_891_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11749-023-00891-w/MediaObjects/11749_2023_891_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11749-023-00891-w/MediaObjects/11749_2023_891_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11749-023-00891-w/MediaObjects/11749_2023_891_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11749-023-00891-w/MediaObjects/11749_2023_891_Fig6_HTML.png)
Similar content being viewed by others
References
Bercu B, Delyon B, Rio E (2015) Concentration inequalities for sums and martingales. Springer, Cham. https://doi.org/10.1007/978-3-319-22099-4
Boucheron S, Lugosi G, Massart P (2013) Concentration inequalities: a nonasymptotic theory of independence. OUP Oxford, London
Brockwell P, Davis R (1987) Time Series: theory and methods. Springer, New York
Chiles JP, Delfiner P (1999) Geostatistics: modeling spatial uncertainty. Wiley, New York
Clémençon S, Ciolek G, Bertail P (2019) Statistical learning based on Markovian data: maximal deviation inequalities and learning rates. Ann Math Artif Intell 88:735–757
Cressie N (1993) Statistics for spatial data. Wiley, New York, pp 1–26. https://doi.org/10.1002/9781119115151.ch1
Devroye L, Györfi L, Lugosi G (1996) A probabilistic theory of pattern recognition, vol 31. Springer, New York
Gaetan C, Guyon X (2009) Spatial statistics and modeling springer series in statistics. Springer, New York
Golubov BI (1981) On Abel–Poisson type and Riesz means. Anal Math 7(3):161–184. https://doi.org/10.1007/BF01908520
Györfi L, Kohler M, Krzyzak A et al (2002) A distribution-free theory of nonparametric regression. Springer, New York
Hall P, Patil P (1994) Properties of nonparametric estimators of autocovariance for stationary random fields. Probab Theory Relat Fields 99(3):399–424. https://doi.org/10.1007/BF01199899
Hall P, Fisher NI, Hoffmann B et al (1994) On the nonparametric estimation of covariance functions. Ann Stat 22(4):2115–2134
Hanneke S (2017) Learning whenever learning is possible: universal learning under general stochastic processes. ar**v:1706.01418
Kanagawa M, Hennig P, Sejdinovic D et al (2018) Gaussian processes and kernel methods: a review on connections and equivalences. ar**v:1807.02582 [stat.ML]
Krige DG (1951) A statistical approach to some basic mine valuation problems on the Witwatersrand. J South Afr Inst Min Metall 52(6):119–139
Kuznetsov V, Mohri M (2014) Generalization bounds for time series prediction with non-stationary processes. In: Proceedings of ALT’14
Lecué G, Mendelson S (2016) Learning subgaussian classes: upper and minimax bounds
Lugosi G, Mendelson S (2016) Risk minimization by median-of-means tournaments. ar**v:1608.00757
Matheron G (1962) Traité de Géostatistique Appliquée. Tome 1. No. 14 in Mémoires du BRGM, Tecnip, Paris
Müller S, Schüler L (2020) Geostat-framework/gstools. zenodo. https://doi.org/10.5281/zenodo.1313628
Qiao H, Hucumenoglu MC, Pal P (2018) Compressive kriging using multi-dimensional generalized nested sampling. In: 2018 52nd asilomar conference on signals, systems, and computers, pp 84–88. https://doi.org/10.1109/ACSSC.2018.8645258
Stein ML (1999) Interpolation of spatial data: some theory for kriging. Springer series in statistics. Springer, New York. https://doi.org/10.1007/978-1-4612-1494-6
Steinwart I, Christmann A (2008) Support vector machines. Springer, New York
Steinwart I, Christmann A (2009) Fast learning from non-i.i.d. observations. NIPS, pp 1768–1776
Steinwart I, Hush D, Scovel C (2009) Learning from dependent observations. J Multivar Anal 100(1):175–194
Tikhonov A (1943) On the stability of inverse problems. Dokl Akad Nauk SSSR 39(5):195–198
Wang L, Ma T (2020) Tail bounds for sum of gamma variables and related inferences. Commun Stat Theory Methods 0(0):1–10. https://doi.org/10.1080/03610926.2020.1756329
Wedin PÅ (1973) Perturbation theory for pseudo-inverses. BIT Numer Math 13(2):217–232
Acknowledgements
This work was supported by the Télécom Paris research chair on Data Science and Artificial Intelligence for Digitalized Industry and Services (DSAIDIS). The authors would like to thank Jean-Rémy Conti (LTCI, Télécom Paris, Institut Polytechnique de Paris), who laid the foundations of this work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Code and data availability
The codes and datasets used for this study are publicly available at https://github.com/EmiliaSiv/Simple-Kriging-Code.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Siviero, E., Chautru, E. & Clémençon, S. A statistical learning view of simple Kriging. TEST 33, 271–296 (2024). https://doi.org/10.1007/s11749-023-00891-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-023-00891-w