Log in

Neural network based daily precipitation generator (NNGEN-P)

  • Published:
Climate Dynamics Aims and scope Submit manuscript

Abstract

Daily weather generators are used in many applications and risk analyses. The present paper explores the potential of neural network architectures to design daily weather generator models. Focusing this first paper on precipitation, we design a collection of neural networks (multi-layer perceptrons in the present case), which are trained so as to approximate the empirical cumulative distribution (CDF) function for the occurrence of wet and dry spells and for the precipitation amounts. This approach contributes to correct some of the biases of the usual two-step weather generator models. As compared to a rainfall occurrence Markov model, NNGEN-P represents fairly well the mean and standard deviation of the number of wet days per month, and it significantly improves the simulation of the longest dry and wet periods. Then, we compared NNGEN-P to three parametric distribution functions usually applied to fit rainfall cumulative distribution functions (Gamma, Weibull and double-exponential). A data set of 19 Argentine stations was used. Also, data corresponding to stations in the United States, in Europe and in the Tropics were included to confirm the results. One of the advantages of NNGEN-P is that it is non-parametric. Unlike other parametric function, which adapt to certain types of climate regimes, NNGEN-P is fully adaptive to the observed cumulative distribution functions, which, on some occasions, may present complex shapes. On-going works will soon produce an extended version of NNGEN to temperature and radiation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Canada)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  • Adamowsky K, Smith AF (1972) Stochastic generation of rainfall. J Hydraul Eng ASCE 98:1935–1945

    Google Scholar 

  • Akaike A (1974) New look at the statistical model identification. IEEE Trans Automatic Control 19:716–723

    Article  Google Scholar 

  • Allen DM, Haan CT (1975) Stochastic simulation of daily rainfall. Res. Rep. No. 82, Water Resource Inst., University Kentucky, Lexington,Kentucky, USA

  • Buishand TA (1982) Some methods for testing the homogeneity of rainfall records. J Hydrol

  • Buishand TA (1978) Some remarks on the use of daily rainfall models. J Hydrol 36:295–308

    Article  Google Scholar 

  • Caballero R, Jewson S, Brix A (2001) Long memory in surface air temperature: detection, modeling and application to weather derivative calculation, Clim Res

  • Charles SP, Bates BC, Hughes JP (1999) A spatiotemporal model for downscaling precipitation occurrence and amounts. J Geophys Res 104:31657–31669

    Article  Google Scholar 

  • Foufoula-Georgiou E, Lettenmaier DP (1987) A Markov renewal model for rainfall occurrences. Water Resour Res 23:875–884

    Google Scholar 

  • Frich P, Alexander LV, Della-Marta P, Gleason B, Haylock M, Klein Tank AMG, Peterson T (2002) Observed coherent changes in climatic extremes during the second half of the twentieth century. Clim Res 19:193–212

    Google Scholar 

  • Geng S, Penning De Vries FWT, Supit I (1986) A simple method for generating daily rainfall data. Agric For Meteorol 36:363–376

    Article  Google Scholar 

  • Hansen JW, Ines AVM (2005) Stochastic disaggregation of monthly rainfall data for crop simulation studies. Agric For Meteorol

  • Hay LE, McCabe GJ Jr, Wolock DM, Ayres MA (1991) Simulation of precipitation by weather type analysis. Water Resour Res 27:493–501

    Article  Google Scholar 

  • Hutchinson MF (1986) Methods of generation of weather sequences. In: Bunting AH (ed) Agric Env CAB. International, Wallingford, pp 149–157

    Google Scholar 

  • Hutchinson MF (1995) Stochastic space-time weather models from ground-based data. Agric For Meteorol 73:237–265

    Article  Google Scholar 

  • Jewson S (2004) Weather derivative pricing and the potential accuracy of daily temperature modeling. http://ssrn.com/abstract = 535122

  • Katz RW (1977) Precipitation as a chain-dependent process. J Appl Meteor 16:671–676

    Article  Google Scholar 

  • Lall U, Rajagopalan B, Traboton DG (1996) A non-parametric wet/dry spell model for resampling daily precipitation. J Clim 11:591–601

    Google Scholar 

  • MacKay DJC (1992) Bayesian interpolation. Neural Comput 4:415–447

    Google Scholar 

  • Nabney IT (2002) Netlab. Algorithms for pattern recognition, advances in pattern recogniction. Springer, Berlin Heidelberg New York, pp 420

  • Racsko P, Szeidl L, Semenov M (1991) A serial approach to local statistic weather models. Ecol Modeling 57:27–41

    Article  Google Scholar 

  • Rajagopalan B, Lall U (1999) A k-nearest-neighbor simulator for daily precipitation and other weather variables. Water Resour Res 35:3089–3101

    Article  Google Scholar 

  • Rajagopalan B, Lall U, Tarboton DG (1996) A nonhomogeneous Markov model for daily precipitation simulation. J Hydrol Eng-ASCE 1:33–40

    Article  Google Scholar 

  • Richardson CW (1981) Stochastic simulation of daily precipitation, temperature, and solar radiation. Water Resour Res 17:182–190

    Google Scholar 

  • Roldan J, Woolhiser DA (1982) Stochatic daily precipitation models. 1. A comparison of occurrence processes. Water Resour Res 18:1451–1459

    Google Scholar 

  • Schwartz G (1978) Estimating the dimension of a model. Ann Statistics 6:641–644

    Google Scholar 

  • Selker JS, Haith DA (1990) Development and testing of single-parameter precipitation distributions. Water Resour Res

  • Semenov MA, Barrow EM (1997) Use of stochastic weather generator in the development of climate change scenarios. Clim Change 35:397–414

    Article  Google Scholar 

  • Sharpley AR, Willians JR (1990) EPIC an Erosion/Productivity Impact Calculator: 2. User Manual, U.S. Department of Agriculture, ARS Tech Bull No 1768

  • Siriwarden et al. 2002

  • Srikanthan R, McMahon TA (1983) Stochastic simulation of daily rainfall for Australian stations. Trans ASAE 26:754–759, 766

    Google Scholar 

  • Srikanthan R, McMahon (1985) Stochastic generation of rainfall and evaporation data. Australian Water Resources Council, Tech. Paper No. 84, AGPS, Canberra, 301pp

  • Srikanthan R, McMahon TA (2001) Stochastic generation of annual, monthly and daily climate data: a review. Hydrol Earth Syst Sci 5:653–670

    Article  Google Scholar 

  • Stern RD, Coe R (1984) A model fitting analysis of daily rainfall data. J Roy Stat Soc A147:1–34

    Google Scholar 

  • Wilks DS (1989) Conditioning stochastic daily precipitation models on total monthly precipitation. Water Resour Res 25:1429–1439

    Google Scholar 

  • Wilks DS (1992) Adapting stochastic weather generation algorithms for climate change studies. Clim Change 22:67–84

    Article  Google Scholar 

  • Wilks DS (1998) Multisite generalization of a daily stochastic precipitation generation model. J Hydrol 210:178–191

    Article  Google Scholar 

  • Wilks DS (1999a) Simultaneous stochastic simulation of daily precipitation, temperature and solar radiation at multiple sites in complex terrain. Agric For Meteor 96:85–101

    Article  Google Scholar 

  • Wilks DS (1999b) Interannual variability and extreme-value characteristics of several stochastic daily precipitation models. Agric For Meteor 93:153–169

    Article  Google Scholar 

  • Wilson LL, Lettenmaier DP, Skyllinggstad E (1992) A hierarchic stochastic model of large-scale atmospheric circulation patterns and multiple station daily precipitation. J Geophys Res D3:2791–2809

    Google Scholar 

  • Woolhiser DA, Roldan J (1982) Stochastic daily precipitation models. 2. A comparison of distribution amounts. Water Resour Res 18:1461–1468

    Article  Google Scholar 

  • Woolhiser DA, Pegram GS (1979) Maximum likelihood estimation of Fourier coefficients to describe seasonal variation of parameters in stochastics daily precipitation models. J Appl Meteor 18:34–42

    Article  Google Scholar 

  • Woolhiser DA, Roldan J (1986) Seasonal and regional variability of parameters for stochastic daily precipitation models. Water Resour Res 22:965–978

    Google Scholar 

  • Woolhiser DA (1992) Modeling daily precipitation: B. Progress and problems. In: Walden AT, Guttorp P (eds) Statistics in the Env. And Earth Sciences, John Wiley, New York, pp 71–89

Download references

Acknowledgments

We wish to thank the Institut de Recherche pour le Développement (IRD), the Institut Pierre-Simon Laplace (IPSL), the Centre National de la Recherche Scientifique (CNRS; Programme ATIP-2002) for their financial support crucial in the development of the authors’ collaboration. We are also grateful to the European Commission for funding the CLARIS Project (Project 001454) in whose framework part of the present study was undertaken. We are wish to thank the University of Buenos Aires and the Department of Atmosphere and Ocean Sciences for welcoming Jean-Philippe Boulanger. Special thanks are addressed to the Ecole Normale Supérieure de Lyon and to Elie Desmond with whom this work was initiated. Fruitful discussions with Sylvie Thiria and Carlos Mejia were helpful during the training phase of Jean-Philippe Boulanger in the use of neural networks. Finally, we are thankful to Ian Nabney and Chris Bishop for sharing freely the Netlab Software source codes. Special thanks are addressed to Santiago Meira from INTA Pergamino for providing the Pergamino daily rainfall time series.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean-Philippe Boulanger.

Appendices

Appendix A: the multi-layer perceptron

The multi-layer perceptron (MLP) is probably the most widely used architecture for practical applications of neural networks (Nabney 2002). From a computational point of view, the MLP can be described as a set of functions applied to different elements (neurons) using relatively simple arithmetic formulae, and a series of methods to optimize these functions based on a set of data. In the present study, we will only focus on a two-layer network architecture (Fig. 14). Its simplest element is called a neuron and is connected to all the neurons in the upper layer (either the hidden layer if the neuron belongs to the input layer or the output layer if the neuron belongs to the hidden layer). Each neuron has a value, and each connection is associated to a weight.

Fig. 14
figure 14

Schematic representation a two-layer MLP as used in this study. In the input layer, one neuron represents the probability value of the CDF under study (wet spell, dry spell, rainfall amount). The number of neurons in the hidden layer is optimized by the method. In the output layer, one neuron represents the amplitude of the CDF associated to the input probability (length of the wet spell, length of the dry spell or amount of rainfall). The units or neurons called bias are units not connected to a lower layer and whose value is always equal to –1. They actually represent the threshold value of the next upper layer

As shown in Fig. 14, in the MLP case we considered, the neurons are organized in layers: an input layer (the values of all the input neurons except the bias are specified by the user), a hidden layer and an output layer. Each neuron in one layer is connected to all the neurons in the next layer. More specifically, in the present case, the MLP architecture has one input neuron (I), H neurons in the hidden layer (value to be estimated by the method) and one output neuron (O). The first layer of the network forms H linear combinations of the input vector to give the following set of intermediate activation variables: h (1) j  = w (1) j I + b (1) j = 1,...,H where b (1) j corresponds to the bias of the input layer. Then, each activation variable is transformed by a non-linear activation function, which in most cases (including ours), is the hyperbolic tangent function (tanh):v j  = tanh(h (1) j ) = 1,...,H. Finally, the v j are transformed to give a second set of activation variables associated to the neurons in the output layer: \(O^{{(2)}} = {\sum\limits_{j = 1}^H {w^{{(2)}}_{j} v_{j} + b^{{(2)}} } } \) where b (2) corresponds to the bias of the hidden layer.

The weights and biases are initialized by random selection from a zero mean, unit variance isotropic Gaussian where the variance is scaled by the fan-in of the hidden or output units as appropriate. During the training phase, the neural network compares its outputs to the correct answers (a set of observations used as output vector), and it adjusts its weights in order to minimize an error function. In our case, the weights and biases are optimized by back-propagation using the scaled conjugate gradient method.

Such an architecture is capable of universal approximation and given a sufficiently large number of data, the MLP can model any smooth function. Finally, the interested reader can find an exhaustive description of the MLP network, its architecture, initialization and training methods in Nabney (2002). Our study made use of the Netlab software (Nabney 2002).

Appendix B: Bayesian approach to optimize the MLP architecture

When optimizing a model to the data, it is usual to consider the model as a function such as: y = f (x, w) + ε, where y are the observations, x the inputs, f the model, w the parameters to optimize (or the weights in our case) and ε the remaining error (model-data misfit). The more complex the model to fit (i.e. the number of parameters), the smaller the error, with the usual drawback of overfitting the data by fitting both the “true” data and its noise. Such an overfit is usually detected due to a very poor performance of the model on unseen data (data not included in the training phase). Therefore, optimizing the model parameters through minimizing the residual ε may actually lead to a poor model performance. One way to avoid this problem is to consider also the errors on the model parameters. The use of a Bayesian approach is very helpful to deal with this difficulty. Although two kinds of Bayesian approaches have been demonstrated to be effective (Laplace approximation and Monte Carlo techniques), in the following we will only consider the first one. Nabney (2002) offers an exhaustive discussion on this subject. And, for the reader to understand our approach, we believe important to present a summary.

First of all, following the same notations as in Nabney (2002), let’s consider two models M1 and M2 (in our case two MLPs which only differ by the number of neurons in the hidden layer and with M2 having more neurons than M1). Using Bayesian theorem, the posterior probability or likelihood for each model is: \(p(M_{i} |D) = \frac{{p(D|M_{i} )p(M_{i} )}} {{p(D)}}. \) Without any a priori reason to prefer any of the two models, the models should actually be compared considering probability p(D|M i ), which can be written (MacKay 1992) as \(p(D|M_{i} ) = {\int {p(D|w,M_{i} )p(w|} }M_{i} )){\text{d}}w. \) Considering that for either model, there exists a best choice of parameters \(\ifmmode\expandafter\hat\else\expandafter\^\fi{w}_{i} \) for which the probability is strongly peaked, then the previous equation can actually be simplified: \(p(D|M_{i} ) \approx p(D|\ifmmode\expandafter\hat\else\expandafter\^\fi{w}_{i} ,M_{i} )p(\ifmmode\expandafter\hat\else\expandafter\^\fi{w}_{i} |M_{i} )\Delta \ifmmode\expandafter\hat\else\expandafter\^\fi{w}^{{{\text{posterior}}}}_{i} \) where the last term represents the volume (in the space of the parameters) where the probability is uniform. Assuming that the prior probability \(p(\ifmmode\expandafter\hat\else\expandafter\^\fi{w}_{i} |M_{i} ) \) has been initialized so that it is uniform over a certain volume of the prior parameters, we can rewrite the previous equation as: \(p(D|M_{i} ) \approx p(D|\ifmmode\expandafter\hat\else\expandafter\^\fi{w}_{i} ,M_{i} )(\Delta \ifmmode\expandafter\hat\else\expandafter\^\fi{w}^{{{\text{posterior}}}}_{i} /\Delta \ifmmode\expandafter\hat\else\expandafter\^\fi{w}^{{{\text{prior}}}}_{i} ) \). The new equation is the product of two terms evolving in opposite directions as the complexity of the model increases. The first term on the right-hand side increases (i.e. the model-data misfit decreases) as the model complexity increases. The second term is always lower than 1 and is approximately exponential with parameters (Nabney 2002), which penalizes the most complex models. In conclusion, taking into account the weight uncertainty should reduce the overfitting problem. We will now explain how this can be done.

For a given number of units in the hidden layer, an optimum set of weights and biases can be calculated using the maximum likelihood to fit a model to data. In this case, the optimum set of parameters (weights and biases) is the one, which is most likely to have generated the observations. A Bayesian approach (or quasi-Bayesian approach due to difficulties in using Bayesian inference caused by the non-linear nature of the neural networks) may be valuable to infer these two classes of errors: model-data misfit and parameter uncertainty.

According to Bayesian theorem, for a given MLP architecture, the density of the parameters (noted w) for a given dataset (D) is given by:

$$ p(w|D) = \frac{{p(D|w)p(w)}} {{p(D)}}. $$

In a first step, let’s only consider the terms depending on the weights. The negative log likelihood is given by = -Log(p(w|D)) = -Log(p(D|w))-Log(p(w)).

The likelihood p(D|w) represents the model-data fit error, which can be modeled by a Gaussian function such as:

$$ p(D|w) = {\left( {\frac{\beta } {{2\pi }}} \right)}^{{N/2}} \exp {\left( { - \frac{\beta } {2}{\sum\limits_{n = 1}^N {{\left\{ {f(x_{n} ,w) - y_{n} } \right\}}^{2} } }} \right)} = {\left( {\frac{\beta } {{2\pi }}} \right)}^{{N/2}} \exp {\left( { - \frac{\beta } {2}E_{D} } \right)} $$

where β represents the inverse variance of the model-data fit error.

The requirement for small weights (i.e. avoiding the overfitting) suggests a Gaussian distribution for the weights of the form:

$$ p(w) = {\left( {\frac{\alpha } {{2\pi }}} \right)}^{{W/2}} \exp {\left( { - \frac{\alpha } {2}{\sum\limits_{i = 1}^W {w^{2}_{i} } }} \right)} = {\left( {\frac{\alpha } {{2\pi }}} \right)}^{{W/2}} \exp {\left( { - \frac{\alpha } {2}E_{W} } \right)} $$

where α represents the inverse variance of the weight distribution. α and β are known as hyperparameters. Therefore, to compare different MLP architectures, we need first to optimize the MLP weights, biases and hyperparameters for any given architecture. Such an optimization can be reached using the evidence procedure, which is an iterative algorithm. Here again, we refer the reader to Nabney (2002). Briefly, if we consider a model to be determined (for any given architecture) by its two hyperparameters, we can write (as previously) that two models may be compared through their respectively maximized evidence p(D|α,β), log evidence of which can be written as:

$$ \ln p(D|\alpha ,\beta ) = - \alpha E_{w} - \beta E_{D} - \frac{1} {2}\ln {\left| A \right|} + \frac{W} {2}\ln \alpha + \frac{N} {2}\ln \beta - \frac{{W + N}} {2}\ln (2\pi ) $$

where A is the Hessian matrix of the total error function (function of α and β).

Based on the previous equation, the evidence procedure is used to optimize the weights and hyperparameters for any given architecture, and the model optimized log evidence is calculated. We then compare the computed log evidence for different architectures, and we finally chose the smallest architecture giving the minimum negative log evidence.

Applying this Bayesian approach, different numbers of neurons in the hidden layer was found according to the complexity of the CDF shapes to fit. We found this number to range from 3 to 14 neurons, with larger numbers mainly found when fitting the dry spell CDF (the longest wet spells do not exceed 15 days and are often in the range of 5–6 days, while dry spells can be as long as a few months).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boulanger, JP., Martinez, F., Penalba, O. et al. Neural network based daily precipitation generator (NNGEN-P). Clim Dyn 28, 307–324 (2007). https://doi.org/10.1007/s00382-006-0184-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00382-006-0184-y

Keywords

Navigation