Abstract
Geological modeling is essential for the characterization of natural phenomena and can be done in two steps: (1) clustering the data into consistent groups and (2) modeling the extent of these groups in space to define domains, honoring the labels defined in the previous step. The clustering step can be based on the information of continuous multivariate data in space instead of relying on the geological logging provided. However, extracting coherent spatial multivariate information is challenging when the variables show complex relationships, such as nonlinear correlation, heteroscedastic behavior, or spatial trends. In this work, we propose a method for clustering data, valid for domaining when multiple continuous variables are available and robust enough to deal with cases where complex relationships are found. The method looks at the local correlation matrix between variables at sample locations inferred in a local neighborhood. Changes in the local correlation between these attributes in space can be used to characterize the domains. By endowing the space of correlation matrices with a manifold structure, matrices are then clustered by adapting the K-means algorithm to this manifold context, using Riemannian geometry tools. A real case study illustrates the methodology. This example demonstrates how the clustering methodology proposed honors the spatial configuration of data delivering spatially connected clusters even when complex nonlinear relationships in the attribute space are shown.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11004-023-10085-7/MediaObjects/11004_2023_10085_Fig1_HTML.png)
Taken from Riquelme and Ortiz (2023)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11004-023-10085-7/MediaObjects/11004_2023_10085_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11004-023-10085-7/MediaObjects/11004_2023_10085_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11004-023-10085-7/MediaObjects/11004_2023_10085_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11004-023-10085-7/MediaObjects/11004_2023_10085_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11004-023-10085-7/MediaObjects/11004_2023_10085_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11004-023-10085-7/MediaObjects/11004_2023_10085_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11004-023-10085-7/MediaObjects/11004_2023_10085_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11004-023-10085-7/MediaObjects/11004_2023_10085_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11004-023-10085-7/MediaObjects/11004_2023_10085_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11004-023-10085-7/MediaObjects/11004_2023_10085_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11004-023-10085-7/MediaObjects/11004_2023_10085_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11004-023-10085-7/MediaObjects/11004_2023_10085_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11004-023-10085-7/MediaObjects/11004_2023_10085_Fig14_HTML.png)
Similar content being viewed by others
References
Ayadi MA, Ben-Ameur H, Channouf N, Tran QK (2019) Norta for portfolio credit risk. Ann Oper Res 281(1):99–119
Bourgault G (2014) Revisiting multi-Gaussian kriging with the Nataf transformation or the Bayes’ rule for the estimation of spatial distributions. Math Geosci 46(7):841–868
Bourgault G, Marcotte D (1991) Multivariable variogram and its application to the linear model of coregionalization. Math Geol 23(7):899–928
Bourgault G, Marcotte D, Legendre P (1992) The multivariate (co) variogram as a spatial weighting function in classification methods. Math Geol 24(5):463–478
Cario MC, Nelson BL (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical report, Citeseer
Charu CA, Chandan KR (2013) Data clustering: algorithms and applications
Chilès JP, Delfiner P (2012) Geostatistics: modeling spatial uncertainty. Wiley series in probability and statistics
Cowan EJ, Beatson RK, Ross HJ, Fright WR, McLennan TJ, Evans TR, Carr JC, Lane RG, Bright DV, Gillman AJ, Oshust PA, Titley M (2003) Practical implicit geological modelling. In: Fifth international mining geology conference, Australian Institute of Mining and Metallurgy Bendigo, Victoria, pp 17–19
David P (2019) A Riemannian quotient structure for correlation matrices with applications to data science. Ph.D. thesis, The Claremont Graduate University
David P, Gu W (2019) A Riemannian structure for correlation matrices. Oper Matrices 13:607–627
David P, Gu W (2022) Anomaly detection of time series correlations via a novel lie group structure. Stat 11:e494
Deutsch CV, Journel AG (1998) GSLIB: geostatistical software library and user’s guide. Oxford University Press, Oxford
Dryden IL, Koloydenko A, Zhou D (2009) Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging. Ann Appl Stat 3(3):1102–1123
Faraj F, Ortiz JM (2021) A simple unsupervised classification workflow for defining geological domains using multivariate data. Min Metall Explor 38(3):1609–1623
Fouedjio F (2016) A hierarchical clustering method for multivariate geostatistical data. Spat Stat 18:333–351
Fouedjio F (2018) A fully non-stationary linear coregionalization model for multivariate random fields. Stoch Env Res Risk Assess 32(6):1699–1721
Gelfand AE, Kim HJ, Sirmans C, Banerjee S (2003) Spatial modeling with spatially varying coefficient processes. J Am Stat Assoc 98(462):387–396
Goh A, Vidal R (2008) Clustering and dimensionality reduction on Riemannian manifolds. In: 2008 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–7
Grubišić I, Pietersz R (2007) Efficient rank reduction of correlation matrices. Linear Algebra Appl 422(2–3):629–653
Hiriart-Urruty JB, Malick J (2012) A fresh variational-analysis look at the positive semidefinite matrices world. J Optim Theory Appl 153(3):551–577
Janas M, Cuffaro ME, Janssen M (2022) Understanding quantum Raffles. Springer, Cham
Jayasumana S, Hartley R, Salzmann M, Li H, Harandi M (2015) Kernel methods on Riemannian manifolds with gaussian RBF kernels. IEEE Trans Pattern Anal Mach Intell 37(12):2464–2477
Lajaunie C, Courrioux G, Manuel L (1997) Foliation fields and 3D cartography in geology: principles of a method based on potential interpolation. Math Geol 29(4):571–584
Lee JM (2013) Smooth manifolds. In: Introduction to smooth manifolds. Springer, pp 1–31
Li ST, Hammond JL (1975) Generation of pseudorandom numbers with specified univariate distributions and correlation coefficients. IEEE Trans Syst Man Cybern SMC-5(5):557–561
Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistical and probability, Oakland, CA, USA, vol 1, pp 281–297
Matheron G (1971) Theory of regionalized variables and its applications. Ecole Natl Super des Mines 5:211
Moakher M (2005) A differential geometric approach to the geometric mean of symmetric positive-definite matrices. SIAM J Matrix Anal Appl 26(3):735–747
Moakher M (2006) On the averaging of symmetric positive-definite tensors. J Elast 82(3):273–296
Moakher M, Zéraï M (2011) The Riemannian geometry of the space of positive-definite matrices and its application to the regularization of positive-definite matrix-valued data. J Math Imaging Vis 40(2):171–187
Oliver M, Webster R (1989) A geostatistical basis for spatial weighting in multivariate classification. Math Geol 21(1):15–35
Pennec X, Fillard P, Ayache N (2006) A Riemannian framework for tensor computing. Int J Comput Vis 66(1):41–66
Pinto FC, Manchuk JG, Deutsch CV (2021) Decomposition of multivariate spatial data into latent factors. Comput Geosci 153:104773
Riquelme ÁI, Ortiz JM (2023) Multivariate simulation using a locally varying coregionalization model. Math Geosci (submitted)
Sepúlveda E, Dowd P, Xu C (2018) Fuzzy clustering with spatial correction and its application to geometallurgical domaining. Math Geosci 50(8):895–928
Solow AR (1986) Map** by simple indicator kriging. Math Geol 18(3):335–352
Thanwerdas Y, Pennec X (2021) Geodesics and curvature of the quotient-affine metrics on full-rank correlation matrices. In: International conference on geometric science of information. Springer, pp 93–102
Thanwerdas Y, Pennec X (2022) Theoretically and computationally convenient geometries on full-rank correlation matrices. ar**v preprint ar**v:2201.06282
Wackernagel H (2013) Multivariate geostatistics: an introduction with applications. Springer, New York
**ao Q (2014) Evaluating correlation coefficient for Nataf transformation. Probab Eng Mech 37:1–6
**e W, Sun H, Li C (2015) Quantifying statistical uncertainty for dependent input models with factor structure. In: 2015 winter simulation conference (WSC). IEEE, pp 667–678
You K, Park HJ (2021) Re-visiting Riemannian geometry of symmetric positive definite matrices for the analysis of functional connectivity. Neuroimage 225:117464
Acknowledgements
The authors acknowledge the funding provided by the Natural Sciences and Engineering Research Council of Canada (NSERC), funding reference numbers RGPIN-2017-04200 and RGPAS-2017-507956, and by the International Association for Mathematical Geosciences (IAMG) student grant, funding reference number MG-2020-14. The authors are grateful to two anonymous reviewers for their valuable comments on an earlier version of this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest that could influence the work reported in this paper.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Riquelme, Á.I., Ortiz, J.M. A Riemannian Tool for Clustering of Geo-Spatial Multivariate Data. Math Geosci 56, 121–141 (2024). https://doi.org/10.1007/s11004-023-10085-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11004-023-10085-7