Abstract
Daily weather generators are used in many applications and risk analyses. The present paper explores the potential of neural network architectures to design daily weather generator models. Focusing this first paper on precipitation, we design a collection of neural networks (multi-layer perceptrons in the present case), which are trained so as to approximate the empirical cumulative distribution (CDF) function for the occurrence of wet and dry spells and for the precipitation amounts. This approach contributes to correct some of the biases of the usual two-step weather generator models. As compared to a rainfall occurrence Markov model, NNGEN-P represents fairly well the mean and standard deviation of the number of wet days per month, and it significantly improves the simulation of the longest dry and wet periods. Then, we compared NNGEN-P to three parametric distribution functions usually applied to fit rainfall cumulative distribution functions (Gamma, Weibull and double-exponential). A data set of 19 Argentine stations was used. Also, data corresponding to stations in the United States, in Europe and in the Tropics were included to confirm the results. One of the advantages of NNGEN-P is that it is non-parametric. Unlike other parametric function, which adapt to certain types of climate regimes, NNGEN-P is fully adaptive to the observed cumulative distribution functions, which, on some occasions, may present complex shapes. On-going works will soon produce an extended version of NNGEN to temperature and radiation.
Similar content being viewed by others
References
Adamowsky K, Smith AF (1972) Stochastic generation of rainfall. J Hydraul Eng ASCE 98:1935–1945
Akaike A (1974) New look at the statistical model identification. IEEE Trans Automatic Control 19:716–723
Allen DM, Haan CT (1975) Stochastic simulation of daily rainfall. Res. Rep. No. 82, Water Resource Inst., University Kentucky, Lexington,Kentucky, USA
Buishand TA (1982) Some methods for testing the homogeneity of rainfall records. J Hydrol
Buishand TA (1978) Some remarks on the use of daily rainfall models. J Hydrol 36:295–308
Caballero R, Jewson S, Brix A (2001) Long memory in surface air temperature: detection, modeling and application to weather derivative calculation, Clim Res
Charles SP, Bates BC, Hughes JP (1999) A spatiotemporal model for downscaling precipitation occurrence and amounts. J Geophys Res 104:31657–31669
Foufoula-Georgiou E, Lettenmaier DP (1987) A Markov renewal model for rainfall occurrences. Water Resour Res 23:875–884
Frich P, Alexander LV, Della-Marta P, Gleason B, Haylock M, Klein Tank AMG, Peterson T (2002) Observed coherent changes in climatic extremes during the second half of the twentieth century. Clim Res 19:193–212
Geng S, Penning De Vries FWT, Supit I (1986) A simple method for generating daily rainfall data. Agric For Meteorol 36:363–376
Hansen JW, Ines AVM (2005) Stochastic disaggregation of monthly rainfall data for crop simulation studies. Agric For Meteorol
Hay LE, McCabe GJ Jr, Wolock DM, Ayres MA (1991) Simulation of precipitation by weather type analysis. Water Resour Res 27:493–501
Hutchinson MF (1986) Methods of generation of weather sequences. In: Bunting AH (ed) Agric Env CAB. International, Wallingford, pp 149–157
Hutchinson MF (1995) Stochastic space-time weather models from ground-based data. Agric For Meteorol 73:237–265
Jewson S (2004) Weather derivative pricing and the potential accuracy of daily temperature modeling. http://ssrn.com/abstract = 535122
Katz RW (1977) Precipitation as a chain-dependent process. J Appl Meteor 16:671–676
Lall U, Rajagopalan B, Traboton DG (1996) A non-parametric wet/dry spell model for resampling daily precipitation. J Clim 11:591–601
MacKay DJC (1992) Bayesian interpolation. Neural Comput 4:415–447
Nabney IT (2002) Netlab. Algorithms for pattern recognition, advances in pattern recogniction. Springer, Berlin Heidelberg New York, pp 420
Racsko P, Szeidl L, Semenov M (1991) A serial approach to local statistic weather models. Ecol Modeling 57:27–41
Rajagopalan B, Lall U (1999) A k-nearest-neighbor simulator for daily precipitation and other weather variables. Water Resour Res 35:3089–3101
Rajagopalan B, Lall U, Tarboton DG (1996) A nonhomogeneous Markov model for daily precipitation simulation. J Hydrol Eng-ASCE 1:33–40
Richardson CW (1981) Stochastic simulation of daily precipitation, temperature, and solar radiation. Water Resour Res 17:182–190
Roldan J, Woolhiser DA (1982) Stochatic daily precipitation models. 1. A comparison of occurrence processes. Water Resour Res 18:1451–1459
Schwartz G (1978) Estimating the dimension of a model. Ann Statistics 6:641–644
Selker JS, Haith DA (1990) Development and testing of single-parameter precipitation distributions. Water Resour Res
Semenov MA, Barrow EM (1997) Use of stochastic weather generator in the development of climate change scenarios. Clim Change 35:397–414
Sharpley AR, Willians JR (1990) EPIC an Erosion/Productivity Impact Calculator: 2. User Manual, U.S. Department of Agriculture, ARS Tech Bull No 1768
Siriwarden et al. 2002
Srikanthan R, McMahon TA (1983) Stochastic simulation of daily rainfall for Australian stations. Trans ASAE 26:754–759, 766
Srikanthan R, McMahon (1985) Stochastic generation of rainfall and evaporation data. Australian Water Resources Council, Tech. Paper No. 84, AGPS, Canberra, 301pp
Srikanthan R, McMahon TA (2001) Stochastic generation of annual, monthly and daily climate data: a review. Hydrol Earth Syst Sci 5:653–670
Stern RD, Coe R (1984) A model fitting analysis of daily rainfall data. J Roy Stat Soc A147:1–34
Wilks DS (1989) Conditioning stochastic daily precipitation models on total monthly precipitation. Water Resour Res 25:1429–1439
Wilks DS (1992) Adapting stochastic weather generation algorithms for climate change studies. Clim Change 22:67–84
Wilks DS (1998) Multisite generalization of a daily stochastic precipitation generation model. J Hydrol 210:178–191
Wilks DS (1999a) Simultaneous stochastic simulation of daily precipitation, temperature and solar radiation at multiple sites in complex terrain. Agric For Meteor 96:85–101
Wilks DS (1999b) Interannual variability and extreme-value characteristics of several stochastic daily precipitation models. Agric For Meteor 93:153–169
Wilson LL, Lettenmaier DP, Skyllinggstad E (1992) A hierarchic stochastic model of large-scale atmospheric circulation patterns and multiple station daily precipitation. J Geophys Res D3:2791–2809
Woolhiser DA, Roldan J (1982) Stochastic daily precipitation models. 2. A comparison of distribution amounts. Water Resour Res 18:1461–1468
Woolhiser DA, Pegram GS (1979) Maximum likelihood estimation of Fourier coefficients to describe seasonal variation of parameters in stochastics daily precipitation models. J Appl Meteor 18:34–42
Woolhiser DA, Roldan J (1986) Seasonal and regional variability of parameters for stochastic daily precipitation models. Water Resour Res 22:965–978
Woolhiser DA (1992) Modeling daily precipitation: B. Progress and problems. In: Walden AT, Guttorp P (eds) Statistics in the Env. And Earth Sciences, John Wiley, New York, pp 71–89
Acknowledgments
We wish to thank the Institut de Recherche pour le Développement (IRD), the Institut Pierre-Simon Laplace (IPSL), the Centre National de la Recherche Scientifique (CNRS; Programme ATIP-2002) for their financial support crucial in the development of the authors’ collaboration. We are also grateful to the European Commission for funding the CLARIS Project (Project 001454) in whose framework part of the present study was undertaken. We are wish to thank the University of Buenos Aires and the Department of Atmosphere and Ocean Sciences for welcoming Jean-Philippe Boulanger. Special thanks are addressed to the Ecole Normale Supérieure de Lyon and to Elie Desmond with whom this work was initiated. Fruitful discussions with Sylvie Thiria and Carlos Mejia were helpful during the training phase of Jean-Philippe Boulanger in the use of neural networks. Finally, we are thankful to Ian Nabney and Chris Bishop for sharing freely the Netlab Software source codes. Special thanks are addressed to Santiago Meira from INTA Pergamino for providing the Pergamino daily rainfall time series.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: the multi-layer perceptron
The multi-layer perceptron (MLP) is probably the most widely used architecture for practical applications of neural networks (Nabney 2002). From a computational point of view, the MLP can be described as a set of functions applied to different elements (neurons) using relatively simple arithmetic formulae, and a series of methods to optimize these functions based on a set of data. In the present study, we will only focus on a two-layer network architecture (Fig. 14). Its simplest element is called a neuron and is connected to all the neurons in the upper layer (either the hidden layer if the neuron belongs to the input layer or the output layer if the neuron belongs to the hidden layer). Each neuron has a value, and each connection is associated to a weight.
As shown in Fig. 14, in the MLP case we considered, the neurons are organized in layers: an input layer (the values of all the input neurons except the bias are specified by the user), a hidden layer and an output layer. Each neuron in one layer is connected to all the neurons in the next layer. More specifically, in the present case, the MLP architecture has one input neuron (I), H neurons in the hidden layer (value to be estimated by the method) and one output neuron (O). The first layer of the network forms H linear combinations of the input vector to give the following set of intermediate activation variables: h (1) j = w (1) j I + b (1) j j = 1,...,H where b (1) j corresponds to the bias of the input layer. Then, each activation variable is transformed by a non-linear activation function, which in most cases (including ours), is the hyperbolic tangent function (tanh):v j = tanh(h (1) j ) j = 1,...,H. Finally, the v j are transformed to give a second set of activation variables associated to the neurons in the output layer: \(O^{{(2)}} = {\sum\limits_{j = 1}^H {w^{{(2)}}_{j} v_{j} + b^{{(2)}} } } \) where b (2) corresponds to the bias of the hidden layer.
The weights and biases are initialized by random selection from a zero mean, unit variance isotropic Gaussian where the variance is scaled by the fan-in of the hidden or output units as appropriate. During the training phase, the neural network compares its outputs to the correct answers (a set of observations used as output vector), and it adjusts its weights in order to minimize an error function. In our case, the weights and biases are optimized by back-propagation using the scaled conjugate gradient method.
Such an architecture is capable of universal approximation and given a sufficiently large number of data, the MLP can model any smooth function. Finally, the interested reader can find an exhaustive description of the MLP network, its architecture, initialization and training methods in Nabney (2002). Our study made use of the Netlab software (Nabney 2002).
Appendix B: Bayesian approach to optimize the MLP architecture
When optimizing a model to the data, it is usual to consider the model as a function such as: y = f (x, w) + ε, where y are the observations, x the inputs, f the model, w the parameters to optimize (or the weights in our case) and ε the remaining error (model-data misfit). The more complex the model to fit (i.e. the number of parameters), the smaller the error, with the usual drawback of overfitting the data by fitting both the “true” data and its noise. Such an overfit is usually detected due to a very poor performance of the model on unseen data (data not included in the training phase). Therefore, optimizing the model parameters through minimizing the residual ε may actually lead to a poor model performance. One way to avoid this problem is to consider also the errors on the model parameters. The use of a Bayesian approach is very helpful to deal with this difficulty. Although two kinds of Bayesian approaches have been demonstrated to be effective (Laplace approximation and Monte Carlo techniques), in the following we will only consider the first one. Nabney (2002) offers an exhaustive discussion on this subject. And, for the reader to understand our approach, we believe important to present a summary.
First of all, following the same notations as in Nabney (2002), let’s consider two models M1 and M2 (in our case two MLPs which only differ by the number of neurons in the hidden layer and with M2 having more neurons than M1). Using Bayesian theorem, the posterior probability or likelihood for each model is: \(p(M_{i} |D) = \frac{{p(D|M_{i} )p(M_{i} )}} {{p(D)}}. \) Without any a priori reason to prefer any of the two models, the models should actually be compared considering probability p(D|M i ), which can be written (MacKay 1992) as \(p(D|M_{i} ) = {\int {p(D|w,M_{i} )p(w|} }M_{i} )){\text{d}}w. \) Considering that for either model, there exists a best choice of parameters \(\ifmmode\expandafter\hat\else\expandafter\^\fi{w}_{i} \) for which the probability is strongly peaked, then the previous equation can actually be simplified: \(p(D|M_{i} ) \approx p(D|\ifmmode\expandafter\hat\else\expandafter\^\fi{w}_{i} ,M_{i} )p(\ifmmode\expandafter\hat\else\expandafter\^\fi{w}_{i} |M_{i} )\Delta \ifmmode\expandafter\hat\else\expandafter\^\fi{w}^{{{\text{posterior}}}}_{i} \) where the last term represents the volume (in the space of the parameters) where the probability is uniform. Assuming that the prior probability \(p(\ifmmode\expandafter\hat\else\expandafter\^\fi{w}_{i} |M_{i} ) \) has been initialized so that it is uniform over a certain volume of the prior parameters, we can rewrite the previous equation as: \(p(D|M_{i} ) \approx p(D|\ifmmode\expandafter\hat\else\expandafter\^\fi{w}_{i} ,M_{i} )(\Delta \ifmmode\expandafter\hat\else\expandafter\^\fi{w}^{{{\text{posterior}}}}_{i} /\Delta \ifmmode\expandafter\hat\else\expandafter\^\fi{w}^{{{\text{prior}}}}_{i} ) \). The new equation is the product of two terms evolving in opposite directions as the complexity of the model increases. The first term on the right-hand side increases (i.e. the model-data misfit decreases) as the model complexity increases. The second term is always lower than 1 and is approximately exponential with parameters (Nabney 2002), which penalizes the most complex models. In conclusion, taking into account the weight uncertainty should reduce the overfitting problem. We will now explain how this can be done.
For a given number of units in the hidden layer, an optimum set of weights and biases can be calculated using the maximum likelihood to fit a model to data. In this case, the optimum set of parameters (weights and biases) is the one, which is most likely to have generated the observations. A Bayesian approach (or quasi-Bayesian approach due to difficulties in using Bayesian inference caused by the non-linear nature of the neural networks) may be valuable to infer these two classes of errors: model-data misfit and parameter uncertainty.
According to Bayesian theorem, for a given MLP architecture, the density of the parameters (noted w) for a given dataset (D) is given by:
In a first step, let’s only consider the terms depending on the weights. The negative log likelihood is given by E = -Log(p(w|D)) = -Log(p(D|w))-Log(p(w)).
The likelihood p(D|w) represents the model-data fit error, which can be modeled by a Gaussian function such as:
where β represents the inverse variance of the model-data fit error.
The requirement for small weights (i.e. avoiding the overfitting) suggests a Gaussian distribution for the weights of the form:
where α represents the inverse variance of the weight distribution. α and β are known as hyperparameters. Therefore, to compare different MLP architectures, we need first to optimize the MLP weights, biases and hyperparameters for any given architecture. Such an optimization can be reached using the evidence procedure, which is an iterative algorithm. Here again, we refer the reader to Nabney (2002). Briefly, if we consider a model to be determined (for any given architecture) by its two hyperparameters, we can write (as previously) that two models may be compared through their respectively maximized evidence p(D|α,β), log evidence of which can be written as:
where A is the Hessian matrix of the total error function (function of α and β).
Based on the previous equation, the evidence procedure is used to optimize the weights and hyperparameters for any given architecture, and the model optimized log evidence is calculated. We then compare the computed log evidence for different architectures, and we finally chose the smallest architecture giving the minimum negative log evidence.
Applying this Bayesian approach, different numbers of neurons in the hidden layer was found according to the complexity of the CDF shapes to fit. We found this number to range from 3 to 14 neurons, with larger numbers mainly found when fitting the dry spell CDF (the longest wet spells do not exceed 15 days and are often in the range of 5–6 days, while dry spells can be as long as a few months).
Rights and permissions
About this article
Cite this article
Boulanger, JP., Martinez, F., Penalba, O. et al. Neural network based daily precipitation generator (NNGEN-P). Clim Dyn 28, 307–324 (2007). https://doi.org/10.1007/s00382-006-0184-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00382-006-0184-y