Performance Measures and Uncertainty

Marzban, Caren

doi:10.1007/978-1-4020-9119-3_3

Caren Marzban⁴

3043 Accesses
3 Citations

Many artificial intelligence algorithms or models are ultimately designed for prediction. A prediction algorithm, wherever it may reside—in a computer, or in a forecaster's head—is subject to a set of tests aimed at assessing its goodness. The specific choice of the tests is contingent on many factors, including the nature of the problem, and the specific facet of goodness. This chapter will discuss some of these tests. For a more in-depth exposure, the reader is directed to the references, and two books: Wilks (1995) and Jolliffe and Stephenson (2003). The body of knowledge aimed at assessing the goodness of predictions is referred to as performance assessment in most fields; in atmospheric circles, though, it is generally called verification. In this chapter, I consider only a few of the numerous performance measures considered in the literature, but my emphasis is on ways of assessing their uncertainty (i.e., statistical significance).

Here, prediction (or forecast) does not necessarily refer to the prediction of the future state of some variable. It refers to the estimation of the state of some variable, from information on another variable. The two variables may be contemporaneous, or not. What is required, however, is that the data on which the performance of the algorithm is being assessed is as independent as possible from the data on which the algorithm is developed or fine-tuned; otherwise, the performance will be optimistically biased—and that is not a good thing; see Section 2.6 in Chapter 2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 103.50; Price includes VAT (United Kingdom)

Hardcover Book: GBP 129.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Don’t Test, Decide

Model Evaluation

RGA: a unified measure of predictive accuracy

Article Open access 17 January 2024

References

Baldwin, M. E., Lakshmivarahan, S., & Kain, J. S. (2002). Development of an “events-oriented” approach to forecast verification. 15th Conference, Numerical Weather Prediction, San Antonio, TC, August 12–16, 2002. Available at http://www.nssl.noaa.gov/mag/pubs/nwp15verf.pdf
Brown, B. G., Bullock, R., Davis, C. A., Gotway, J. H., Chapman, M., Takacs, A., Gilleland, E., Mahoney, J. L., & Manning, K. (2004). New verification approaches for convective weather forecasts. Preprints, 11th Conference on Aviation, Range, and Aerospace, Hyannis, MA, October 3–8
Google Scholar
Casati, B., Ross, G., & Stephenson, D. B. (2004). A new intensity-scale approach for the verification of spatial precipitation forecasts. Meteorological Applications, 11, 141–154
Article Google Scholar
Devore, J., & Farnum, N. (2005). Applied statistics for engineers and scientists. Belmont, CA: Thomson Learning
Google Scholar
Doswell, C. A., III, Davies-Jones, R., & Keller, D. (1990). On summary measures of skill in rare event forecasting based on contingency tables. Weather and Forecasting, 5, 576–585
Article Google Scholar
Du, J., & Mullen, S. L. (2000). Removal of distortion error from an ensemble forecast. Monthly Weather Review, 128, 3347– 3351
Article Google Scholar
Ebert, E. E., & McBride, J. L. (2000). Verification of precipitation in weather systems: Determination of systematic errors. Journal of Hydrology, 239, 179–202
Article Google Scholar
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. London: Chapman & Hall
Google Scholar
Fawcett, T. (2006). An introduction to ROC analysis.Pattern Recognition Letters, 27, 861–874
Article Google Scholar
Ferro, C. (2007). Comparing probabilistic forecasting systems with the Brier score. Weather and Forecasting, 22, 1076– 1088
Article Google Scholar
Gandin, L. S., & Murphy, A. (1992). Equitable skill scores for categorical forecasts. Monthly Weather Review, 120, 361– 370
Article Google Scholar
Gerrity, J. P. Jr. (1992). A note on Gandin and Murphy's equitable skill score. Monthly Weather Review, 120, 2707–2712
Article Google Scholar
Glahn, H. R., Lowry, D. A. (1972). The use of Model Output Statistics (MOS) in objective weather forecasting. Journal of Applied Meteorology, 11, 1203–1211
Article Google Scholar
Gneiting, T., & Raftery, A. E. (2005). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102, (477), 359–378
Article Google Scholar
Gneiting, T., Raftery, A. E., Westveld, A. H., & Goldman, T. (2005). Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Monthly Weather Review, 133, 1098–1118
Article Google Scholar
Gneiting, T., Balabdaoui, F., Raftery, A. E. (2007). Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(2), 243–268
Article Google Scholar
Good, P. I. (2005a). Permutation, parametric and bootstrap tests of hypotheses (3rd ed.). New York: Springer. ISBN 0-387-98898-X
Google Scholar
Good, P. I. (2005b). Introduction to statistics through resampling methods and R/S-PLUS. New Jersey, Canada: Wiley. ISBN 0-471-71575-1
Google Scholar
Hamill, T. M. (1997). Reliability diagrams for multicategory probabilistic forecasts. Weather and Forecasting, 12, 736– 741
Article Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. Springer Series in Statistics. New York: Springer
Google Scholar
Heidke, P. (1926). Berechnung des Erfolges und der Gute der Windstarkevorhersagen im Sturmwarnungsdienst. Geogra Ann. 8, 301–349
Article Google Scholar
Jolliffe, I. T. (2007). Uncertainty and inference for verification measures. Weather and Forecasting, 22, 633–646
Article Google Scholar
Jolliffe, I. T., & Stephenson, D. B. (2003). Forecast verification: A practitioner's guide in atmospheric science. Chichester Wiley
Google Scholar
Livezey in Jolliffe, I. T., & Stephenson, D. B. (2003). Forecast verification: A practitioner's guide in atmospheric science. West Sussex, England: Wiley. See chapter 4 concerning categorical events, written by R. E. Livezey
Google Scholar
Macskassy, S. A., Provost, F. (2004). Confidence bands for ROC curves: Methods and an empirical study. First workshop on ROC analysis in AI, ECAI-2004, Spain
Google Scholar
Marzban, C. (1998). Scalar measures of performance in rare-event situations. Weather and Forecasting, 13, 753–763
Article Google Scholar
Marzban, C. (2004). The ROC curve and the area under it as a performance measure. Weather and Forecasting, 19(6), 1106–1114
Article Google Scholar
Marzban, C., & Lakshmanan, V. (1999). On the uniqueness of Gandin and Murphy's equitable performance measures. Monthly Weather Review, 127(6), 1134–1136
Article Google Scholar
Marzban, C., Sandgathe, S. (2006). Cluster analysis for verification of precipitation fields. Weather and Forecasting, 21(5), 824–838
Article Google Scholar
Marzban, C., & Sandgathe, S. (2008). Cluster analysis for object-oriented verification of fields: A variation. Monthly Weather Review, 136, 1013–1025
Article Google Scholar
Marzban, C., & Stumpf, G. J. (1998). A neural network for damaging wind prediction. Weather and Forecasting, 13, 151– 163
Article Google Scholar
Marzban, C., & Witt, A. (2001). A Bayesian neural network for hail size prediction. Weather and Forecasting, 16(5), 600– 610
Article Google Scholar
Marzban, C., Sandgathe, S., & Lyons, H. (2008). An object-oriented verification of three NWP model formulations via cluster analysis: An objective and a subjective analysis. Monthly Weather Review, 136, 3392–3407
Article Google Scholar
Murphy, A. H. (1991). Forecast verification: Its complexity and dimensionality. Monthly Weather Review, 119, 1590–1601
Article Google Scholar
Murphy, A. H. (1993). What is a good forecast? An essay on the nature of goodness in weather forecasting. Weather and Forecasting, 8, 281–293
Article Google Scholar
Murphy, A. H., & Epstein, E. S. (1967). A note on probabilistic forecasts and “hedging”. Journal of Applied Meteorology, 6, 1002–1004
Article Google Scholar
Murphy, A. H., & Winkler, R. L. (1987). A general framework for forecast verification. Monthly Weather Review, 115, 1330–1338
Article Google Scholar
Murphy, A. H., & Winkler, R. L. (1992). Diagnostic verification of probability forecasts. International Journal of Forecasting, 7, 435–455
Article Google Scholar
Nachamkin, J. E. (2004). Mesoscale verification using meteorological composites. Monthly Weather Review, 132, 941–955
Article Google Scholar
Raftery, A. E., Gneiting, T., Balabdaoui, F., & Polakowski, M. (2005). Using Bayesian model averaging to calibrate forecast ensembles. Monthly Weather Review, 133, 1155–1174
Article Google Scholar
Richardson, D. S. (2000). Skill and relative economic value of the ECMWF ensemble prediction system. Quarterly Journal of the Royal Meteorological Society, 126, 649–667
Article Google Scholar
Roebber, P. J., & Bosart, L. F. (1996). The complex relationship between forecast skill and forecast value: A real-world analysis. Weather and Forecasting, 11, 544–559
Article Google Scholar
Roulston, M. S., & Smith, L. A. (2002). Evaluating probabilistic forecasts using information theory. Monthly Weather Review, 130, 1653–1660
Article Google Scholar
Seaman, R., Mason, I., & Woodcock, F. (1996). Confidence intervals for some performance measures of Yes-No forecasts. Australian Meteorological Magazine, 45, 49–53
Google Scholar
Stephenson, D. B., Casati, B., & Wilson, C. (2004). Verification of rare extreme events. WMO verification workshop, Montreal, September 13–17
Google Scholar
Venugopal, V., Basu, S., & Foufoula-Georgiou, E. (2005). A new metric for comparing precipitation patterns with an application to ensemble forecasts. Journal of Geophysical Research, 110, D8, D08111 DOI: 10.1029/2004JD005395
Google Scholar
Wilks, D. S. (1995). Statistical methods in the atmospheric sciences (467 pp.). San Diego, CA: Academic Press
Google Scholar
Wilks, D. S. (2001). A skill score based on economic value for probability forecasts. Meteorological Applications, 8, 209– 219
Article Google Scholar
Wilson, L. J., Burrows, W. R., & Lanzinger, A. (1999). A strategy for verification of weather element forecasts from an ensemble prediction system. Monthly Weather Review, 127, 956–970
Article Google Scholar

Download references

Author information

Authors and Affiliations

Applied Physics Laboratory and Department of Statistics, University of Washington, Seattle, WA, 98195-4323, USA
Caren Marzban

Authors

Caren Marzban
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Caren Marzban .

Editor information

Editors and Affiliations

Applied Research Laboratory, Pennsylvania State University, Box 30, State College, PA, 16804-0030, USA
Sue Ellen Haupt
Institute of Atmospheric Pollution, National Research Council, Via Salaria Km. 29.300, Monterotondo Stazione, Rome, 00016, Italy
Antonello Pasini
Dept. of Statistics, University of Washington and the Applied Physics Laboratory, Box 354322, Seattle, WA, 98195-4322, USA
Caren Marzban

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Marzban, C. (2009). Performance Measures and Uncertainty. In: Haupt, S.E., Pasini, A., Marzban, C. (eds) Artificial Intelligence Methods in the Environmental Sciences. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-9119-3_3

Download citation

DOI: https://doi.org/10.1007/978-1-4020-9119-3_3
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-9117-9
Online ISBN: 978-1-4020-9119-3
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)

Publish with us

Policies and ethics

Performance Measures and Uncertainty

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Don’t Test, Decide

Model Evaluation

RGA: a unified measure of predictive accuracy

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Performance Measures and Uncertainty

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Don’t Test, Decide

Model Evaluation

RGA: a unified measure of predictive accuracy

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation