Many artificial intelligence algorithms or models are ultimately designed for prediction. A prediction algorithm, wherever it may reside—in a computer, or in a forecaster's head—is subject to a set of tests aimed at assessing its goodness. The specific choice of the tests is contingent on many factors, including the nature of the problem, and the specific facet of goodness. This chapter will discuss some of these tests. For a more in-depth exposure, the reader is directed to the references, and two books: Wilks (1995) and Jolliffe and Stephenson (2003). The body of knowledge aimed at assessing the goodness of predictions is referred to as performance assessment in most fields; in atmospheric circles, though, it is generally called verification. In this chapter, I consider only a few of the numerous performance measures considered in the literature, but my emphasis is on ways of assessing their uncertainty (i.e., statistical significance).
Here, prediction (or forecast) does not necessarily refer to the prediction of the future state of some variable. It refers to the estimation of the state of some variable, from information on another variable. The two variables may be contemporaneous, or not. What is required, however, is that the data on which the performance of the algorithm is being assessed is as independent as possible from the data on which the algorithm is developed or fine-tuned; otherwise, the performance will be optimistically biased—and that is not a good thing; see Section 2.6 in Chapter 2.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baldwin, M. E., Lakshmivarahan, S., & Kain, J. S. (2002). Development of an “events-oriented” approach to forecast verification. 15th Conference, Numerical Weather Prediction, San Antonio, TC, August 12–16, 2002. Available at http://www.nssl.noaa.gov/mag/pubs/nwp15verf.pdf
Brown, B. G., Bullock, R., Davis, C. A., Gotway, J. H., Chapman, M., Takacs, A., Gilleland, E., Mahoney, J. L., & Manning, K. (2004). New verification approaches for convective weather forecasts. Preprints, 11th Conference on Aviation, Range, and Aerospace, Hyannis, MA, October 3–8
Casati, B., Ross, G., & Stephenson, D. B. (2004). A new intensity-scale approach for the verification of spatial precipitation forecasts. Meteorological Applications, 11, 141–154
Devore, J., & Farnum, N. (2005). Applied statistics for engineers and scientists. Belmont, CA: Thomson Learning
Doswell, C. A., III, Davies-Jones, R., & Keller, D. (1990). On summary measures of skill in rare event forecasting based on contingency tables. Weather and Forecasting, 5, 576–585
Du, J., & Mullen, S. L. (2000). Removal of distortion error from an ensemble forecast. Monthly Weather Review, 128, 3347– 3351
Ebert, E. E., & McBride, J. L. (2000). Verification of precipitation in weather systems: Determination of systematic errors. Journal of Hydrology, 239, 179–202
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. London: Chapman & Hall
Fawcett, T. (2006). An introduction to ROC analysis.Pattern Recognition Letters, 27, 861–874
Ferro, C. (2007). Comparing probabilistic forecasting systems with the Brier score. Weather and Forecasting, 22, 1076– 1088
Gandin, L. S., & Murphy, A. (1992). Equitable skill scores for categorical forecasts. Monthly Weather Review, 120, 361– 370
Gerrity, J. P. Jr. (1992). A note on Gandin and Murphy's equitable skill score. Monthly Weather Review, 120, 2707–2712
Glahn, H. R., Lowry, D. A. (1972). The use of Model Output Statistics (MOS) in objective weather forecasting. Journal of Applied Meteorology, 11, 1203–1211
Gneiting, T., & Raftery, A. E. (2005). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102, (477), 359–378
Gneiting, T., Raftery, A. E., Westveld, A. H., & Goldman, T. (2005). Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Monthly Weather Review, 133, 1098–1118
Gneiting, T., Balabdaoui, F., Raftery, A. E. (2007). Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(2), 243–268
Good, P. I. (2005a). Permutation, parametric and bootstrap tests of hypotheses (3rd ed.). New York: Springer. ISBN 0-387-98898-X
Good, P. I. (2005b). Introduction to statistics through resampling methods and R/S-PLUS. New Jersey, Canada: Wiley. ISBN 0-471-71575-1
Hamill, T. M. (1997). Reliability diagrams for multicategory probabilistic forecasts. Weather and Forecasting, 12, 736– 741
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. Springer Series in Statistics. New York: Springer
Heidke, P. (1926). Berechnung des Erfolges und der Gute der Windstarkevorhersagen im Sturmwarnungsdienst. Geogra Ann. 8, 301–349
Jolliffe, I. T. (2007). Uncertainty and inference for verification measures. Weather and Forecasting, 22, 633–646
Jolliffe, I. T., & Stephenson, D. B. (2003). Forecast verification: A practitioner's guide in atmospheric science. Chichester Wiley
Livezey in Jolliffe, I. T., & Stephenson, D. B. (2003). Forecast verification: A practitioner's guide in atmospheric science. West Sussex, England: Wiley. See chapter 4 concerning categorical events, written by R. E. Livezey
Macskassy, S. A., Provost, F. (2004). Confidence bands for ROC curves: Methods and an empirical study. First workshop on ROC analysis in AI, ECAI-2004, Spain
Marzban, C. (1998). Scalar measures of performance in rare-event situations. Weather and Forecasting, 13, 753–763
Marzban, C. (2004). The ROC curve and the area under it as a performance measure. Weather and Forecasting, 19(6), 1106–1114
Marzban, C., & Lakshmanan, V. (1999). On the uniqueness of Gandin and Murphy's equitable performance measures. Monthly Weather Review, 127(6), 1134–1136
Marzban, C., Sandgathe, S. (2006). Cluster analysis for verification of precipitation fields. Weather and Forecasting, 21(5), 824–838
Marzban, C., & Sandgathe, S. (2008). Cluster analysis for object-oriented verification of fields: A variation. Monthly Weather Review, 136, 1013–1025
Marzban, C., & Stumpf, G. J. (1998). A neural network for damaging wind prediction. Weather and Forecasting, 13, 151– 163
Marzban, C., & Witt, A. (2001). A Bayesian neural network for hail size prediction. Weather and Forecasting, 16(5), 600– 610
Marzban, C., Sandgathe, S., & Lyons, H. (2008). An object-oriented verification of three NWP model formulations via cluster analysis: An objective and a subjective analysis. Monthly Weather Review, 136, 3392–3407
Murphy, A. H. (1991). Forecast verification: Its complexity and dimensionality. Monthly Weather Review, 119, 1590–1601
Murphy, A. H. (1993). What is a good forecast? An essay on the nature of goodness in weather forecasting. Weather and Forecasting, 8, 281–293
Murphy, A. H., & Epstein, E. S. (1967). A note on probabilistic forecasts and “hedging”. Journal of Applied Meteorology, 6, 1002–1004
Murphy, A. H., & Winkler, R. L. (1987). A general framework for forecast verification. Monthly Weather Review, 115, 1330–1338
Murphy, A. H., & Winkler, R. L. (1992). Diagnostic verification of probability forecasts. International Journal of Forecasting, 7, 435–455
Nachamkin, J. E. (2004). Mesoscale verification using meteorological composites. Monthly Weather Review, 132, 941–955
Raftery, A. E., Gneiting, T., Balabdaoui, F., & Polakowski, M. (2005). Using Bayesian model averaging to calibrate forecast ensembles. Monthly Weather Review, 133, 1155–1174
Richardson, D. S. (2000). Skill and relative economic value of the ECMWF ensemble prediction system. Quarterly Journal of the Royal Meteorological Society, 126, 649–667
Roebber, P. J., & Bosart, L. F. (1996). The complex relationship between forecast skill and forecast value: A real-world analysis. Weather and Forecasting, 11, 544–559
Roulston, M. S., & Smith, L. A. (2002). Evaluating probabilistic forecasts using information theory. Monthly Weather Review, 130, 1653–1660
Seaman, R., Mason, I., & Woodcock, F. (1996). Confidence intervals for some performance measures of Yes-No forecasts. Australian Meteorological Magazine, 45, 49–53
Stephenson, D. B., Casati, B., & Wilson, C. (2004). Verification of rare extreme events. WMO verification workshop, Montreal, September 13–17
Venugopal, V., Basu, S., & Foufoula-Georgiou, E. (2005). A new metric for comparing precipitation patterns with an application to ensemble forecasts. Journal of Geophysical Research, 110, D8, D08111 DOI: 10.1029/2004JD005395
Wilks, D. S. (1995). Statistical methods in the atmospheric sciences (467 pp.). San Diego, CA: Academic Press
Wilks, D. S. (2001). A skill score based on economic value for probability forecasts. Meteorological Applications, 8, 209– 219
Wilson, L. J., Burrows, W. R., & Lanzinger, A. (1999). A strategy for verification of weather element forecasts from an ensemble prediction system. Monthly Weather Review, 127, 956–970
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media B.V
About this chapter
Cite this chapter
Marzban, C. (2009). Performance Measures and Uncertainty. In: Haupt, S.E., Pasini, A., Marzban, C. (eds) Artificial Intelligence Methods in the Environmental Sciences. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-9119-3_3
Download citation
DOI: https://doi.org/10.1007/978-1-4020-9119-3_3
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-9117-9
Online ISBN: 978-1-4020-9119-3
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)