Abstract
A challenge when working with multivariate data in a geostatistical context is that the data are rarely Gaussian. Multivariate distributions may include nonlinear features, clustering, long tails, functional boundaries, spikes, and heteroskedasticity. Multivariate transformations account for such features so that they are reproduced in geostatistical models. Projection pursuit as developed for high dimensional data exploration can also be used to transform a multivariate distribution into a multivariate Gaussian distribution with an identity covariance matrix. Its application within a geostatistical modeling context is called the projection pursuit multivariate transform (PPMT). An approach to incorporate exhaustive secondary variables in the PPMT is introduced. With this approach the PPMT can incorporate any number of secondary variables with any number of primary variables. A necessary alteration to the approach to make this numerically practical was the implementation of a continuous probability estimator that relies on Bernstein polynomials for the transformation that takes place in the projections. Stop** criteria were updated to incorporate a bootstrap t test that compares data sampled from a multivariate Gaussian distribution with the data undergoing transformation.
Similar content being viewed by others
References
Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc B 44(2):139–177
Aitchison J (1986) The statistical analysis of compositional data. Chapman & Hall, London
Asmussen S, Glynn PW (2007) Stochastic simulation algorithms and analysis. Springer Science+Business Media, New York
Babak O, Deutsch CV (2009) Collocated cokriging based on merged secondary attributes. Math Geosci 41:921–926
Babak O, Machuca-Mory DF, Deutsch CV (2010) An approximate method for joint sequential simulation of multiple spatial variables. Stoch Environ Res Risk Assess 24:327–336
Barnett RM, Manchuk JG, Deutsch CV (2014) Projection pursuit multivariate transform. Math Geosci 46(3):337–359
Barnett RM, Manchuk JG, Deutsch CV (2016) The projection-pursuit multivariate transform for improved continuous variable modeling. SPE J. doi:10.2118/184388-PA
Bernstein S (1912) Démonstration du théorème de Weierstrass fondée sur le calcul des probabilities. Commun Soc Math Kharkov 13:1–2
Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press, New York
Boardman RC, Vann, JE (2011). A review of the application of copulas to improve modelling of non-bigaussian bivariate relationships (with an example using geological data). In Chan F, Marinova D, Anderssen RS (eds.) 19th International Congress on Modeling and Simulation (MODSIM), Perth, Australia, December 12–16, 627–633
Burrough PA, McDonnell RA (1998) Principles of geographical information systems, 2nd edn. Oxford University Press, Oxford
Chiles JP, Delfiner P (2012) Geostatistics: modeling spatial uncertainty. Wiley, New Jersey
Desbarats AJ, Dimitrakopoulos R (2000) Geostatistical simulation of regionalized poresize distributions using min/max autocorrelations factors. Math Geol 32:919–942. doi:10.1023/A:1007570402430
Deutsch CV, Journel AG (1998) GSLIB: geostatistical software library and user’s guide. Oxford University Press, New York
Efron B (1982) The Jackknife, the Bootstrap, and other resampling plans. Soc Ind Appl Math 26:197–204
Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall, New York
Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barcelo-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3):279–300
Friedman JH (1987) Exploratory projection pursuit. J Am Stat Assoc 82(397):249–266
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24:417–441. doi:10.1037/h0071325
Hwang J-N, Lay S-R, Lippman A (1994) Nonparametric multivariate density estimation: a comparative study. IEEE Trans Signal Process 42(10):2795–2810
Johnson RJ, Wichern DW (1998) Applied multivariate statistical analysis, 4th edn. Prentice Hall, New Jersey
Jones MC (1989) Discretized and interpolated kernel density estimates. J Am Stat Assoc 84(407):733–741
Journel AG, Huijbregts CJ (1978) Mining geostatistics. Academic Press, London
Leuangthong O, Deutsch CV (2003) Stepwise conditional transform for simulation of multiple variables. Math Geol 35(2):155–173
Li G, Zhang J (1998) Sphering and its properties. Indian J Stat 60:119–133
Lorentz GG (1986) Bernstein polynomials. American Mathematical Society, New York
Manchuk JG, Deutsch CV (2008) Sequential simulation of geologic variables with non-Gaussian correlation. In Ortiz JM, Emery X (eds.) Geostats 2008, VIII International Geostatistics Congress, 1–5 Dec, Santiago, Chile
Oman SD, Vakulenko-Lagun B, Zilberbrand M (2015) Methods for descriptive factor analysis of multivariate geostatistical data: a case comparison. Stoch Environ Res Risk Assess 29:1103–1116
Pawlowsky-Glahn V, Egozcue JJ (2016) Spatial analysis of compositional data: a historical review. J Geochem Explor 164:28–32
Pawlowsky-Glawn V, Egozcue JJ (2006) Compositional data and their analysis: an introduction. In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: from theory to practice. Geological Society of London, Special Publications, London
Pyrcz MJ, Deutsch CV (2014) Geostatistical reservoir modeling. Oxford University Press, New York
Reddy MJ, Singh VP (2014) Multivariate modeling of droughts using copulas and meta-heuristic methods. Stoch Environ Res Risk Assess 28:475–489
Rosenblatt M (1952) Remarks on a multivariate transformation. Ann Math Stat 23(3):470–472
Sun W, Yaun Y-X (2006) Optimization theory and methods: nonlinear programming. Springer Science+Business Media, New York
Switzer P, Green AA (1984) Min/max autocorrelation factors for multivariate spatial imaging. Department of Statistics Technical Report 6. Stanford University, Stanford, USA
Tong Q, Karunamuni RJ (2016) Fast and accurate computation for kernel estimators. Comput Stat Data Anal 94:49–62
Vargas-Guzman JA (2004) Fast modeling of cross-covariance in the LMC: a tool for data integration. Stoch Environ Res 18:91–99
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Manchuk, J.G., Barnett, R.M. & Deutsch, C.V. Reproduction of secondary data in projection pursuit transformation. Stoch Environ Res Risk Assess 31, 2585–2605 (2017). https://doi.org/10.1007/s00477-016-1363-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-016-1363-y