Log in

Six textbook mistakes in data analysis

  • Regular Article
  • Published:
The European Physical Journal Plus Aims and scope Submit manuscript

Abstract

This article discusses a number of incorrect statements appearing in textbooks on data analysis, machine learning or computational methods; the common theme in all these cases is the relevance and application of statistics to the study of scientific or engineering data; these mistakes are also quite prevalent in the research literature. Crucially, we do not address errors made by an individual author, focusing instead on mistakes that are widespread in the introductory literature. After some background on frequentist and Bayesian linear regression, we turn to our six paradigmatic cases, providing in each instance a specific example of the textbook mistake, pointers to the specialist literature where the topic is handled properly, along with a correction that summarizes the salient points. The mistakes (and corrections) are broadly relevant to any technical setting where statistical techniques are used to draw practical conclusions, ranging from topics introduced in an elementary course on experimental measurements all the way to more involved approaches to regression.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability statement

Not applicable.

References

  1. A. Gezerlis, M. Williams, Six textbook mistakes in computational physics. Am. J. Phys. 89, 51–60 (2021)

    Article  ADS  Google Scholar 

  2. A. Gezerlis, Numerical Methods in Physics with Python, 2nd edn. (Cambridge University Press, Cambridge, 2023)

    Google Scholar 

  3. A. Gelman, J. Hill, A. Vehtari, Regression and Other Stories (Cambridge University Press, Cambridge, 2020), p.154

    Book  MATH  Google Scholar 

  4. A. Gelman, Going beyond the book: towards critical reading in statistics teaching. Teach. Stat. 34(3), 82–86 (2011)

    Article  Google Scholar 

  5. R.J. Barlow, Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences (Wiley, New Jersey, 1989)

    MATH  Google Scholar 

  6. K.J. Beers, Numerical Methods for Chemical Engineering (Cambridge University Press, Cambridge, 2007)

    Google Scholar 

  7. D.P. Bertsekas, J.N. Tsitsiklis, Introduction to Probability, 2nd edn. (Athena Scientific, Massachusetts, 2008)

    Google Scholar 

  8. P.R. Bevington, D.K. Robinson, Data Reduction and Error Analysis in the Physical Sciences, 3rd edn. (McGraw-Hill, New York, 2003)

    Google Scholar 

  9. C.M. Bishop, Pattern Recognition and Machine Learning (Springer, Berlin, 2006)

    MATH  Google Scholar 

  10. G. Bohm, G. Zech, Introduction to Statistics and Data Analysis for Physicists, 3rd edn. (Verlag Deutsches Elektronen-Synchrotron, Hamburg, 2017)

    Google Scholar 

  11. J.F. Boudreau, E.S. Swanson, Applied Computational Physics (Oxford University Press, Oxford, 2018)

    Book  Google Scholar 

  12. R.L. Burden, D.J. Faires, A.M. Burden, Numerical Analysis, 10th edn. (Cengage Learning, Massachusetts, 2015)

    MATH  Google Scholar 

  13. S.C. Chapra, R.P. Canale, Numerical Methods for Engineers, 7th edn. (McGraw-Hill, New York, 2014)

    Google Scholar 

  14. M.H. DeGroot, M.J. Schervish, Probability and Statistics, 4th edn. (Addison-Wesley, Massachusetts, 2012)

    Google Scholar 

  15. M.P. Deisenroth, A.A. Faisal, C.S. Ong, Mathematics for Machine Learning (Cambridge University Press, Cambridge, 2020)

    Book  MATH  Google Scholar 

  16. P.L. DeVries, A First Course in Computational Physics (Wiley, New Jersey, 1994)

    Google Scholar 

  17. A. Gilat, V. Subramaniam, Numerical Methods for Engineers and Scientists, 3rd edn. (Wiley, New Jersey, 2013)

    Google Scholar 

  18. H. Gould, J. Tobochnik, W. Christian, An Introduction to Computer Simulation Methods, Rev, 3rd edn. (CreateSpace, California, 2017)

    Google Scholar 

  19. R.W. Hamming, Numerical Methods for Scientists and Engineers, 2nd edn. (McGraw-Hill, New York, 1973)

    MATH  Google Scholar 

  20. H. Jiang, Machine Learning Fundamentals (Cambridge University Press, Cambridge, 2021)

    Book  MATH  Google Scholar 

  21. D. Kahaner, C. Moler, S. Nash, Numerical Methods and Software (Prentice Hall, New Jersey, 1989)

    MATH  Google Scholar 

  22. J. Kiusalaas, Numerical Methods for Engineers with Python 3 (Cambridge University Press, Cambridge, 2013)

    Book  MATH  Google Scholar 

  23. S. Koonin, D.C. Meredith, Computational Physics (Addison-Wesley, Massachusetts, 1990)

    MATH  Google Scholar 

  24. R.H. Landau, M.J. Páez, C.C. Bordeianu, Computational Physics, 3rd edn. (Wiley-VCH, New Jersey, 2015)

    MATH  Google Scholar 

  25. L. Lyons, Statistics for Nuclear and Particle Physicists (Cambridge University Press, Cambridge, 1986)

    Book  Google Scholar 

  26. J. Mandel, The Statistical Analysis of Experimental Data (Wiley, New Jersey, 1964)

    Google Scholar 

  27. J. Mathews, R.L. Walker, Mathematical Methods of Physics, 2nd edn. (Pearson, London, 1971)

    MATH  Google Scholar 

  28. K.P. Murphy, Probabilistic Machine Learning (The MIT Press, Massachusetts, 2022)

    MATH  Google Scholar 

  29. W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical Recipes in Fortran, 2nd edn. (Cambridge University Press, Cambridge, 1992)

    MATH  Google Scholar 

  30. C.A. Pruneau, Data Analysis Techniques for Physical Scientists (Cambridge University Press, Cambridge, 2017)

    Book  Google Scholar 

  31. J.A. Rice, Mathematical Statistics and Data Analysis (Duxbury, California, 2007)

    Google Scholar 

  32. B.P. Roe, Probability and Statistics in the Physical Sciences, 3rd edn. (Springer, Berlin, 2020)

    Book  Google Scholar 

  33. S. Rogers, M. Girolami, A First Course in Machine Learning, 2nd edn. (CRC Press, Florida, 2017)

    MATH  Google Scholar 

  34. S. Širca, M. Horvat, Computational Methods in Physics, 2nd edn. (Springer, Berlin, 2018)

    Book  MATH  Google Scholar 

  35. D.S. Sivia, J. Skilling, Data Analysis: A Bayesian Tutorial, 2nd edn. (Oxford University Press, Oxford, 2006)

    MATH  Google Scholar 

  36. S. Theodoridis, Machine Learning: A Bayesian and Optimization Perspective, 2nd edn. (Academic Press, London, 2020)

    Google Scholar 

  37. W.J. Thompson, Computing for Scientists and Engineers (Wiley, New Jersey, 1992)

    MATH  Google Scholar 

  38. S.S.M. Wong, Computational Methods in Physics and Engineering, 2nd edn. (World Scientific, Singapore, 1997)

    Book  MATH  Google Scholar 

  39. A. Zielesny, From Curve Fitting to Machine Learning, 2nd edn. (Springer, Berlin, 2016)

    Book  MATH  Google Scholar 

  40. G. Casella, R.L. Berger, Statistical Inference, 2nd edn. (Duxbury, California, 2002)

    MATH  Google Scholar 

  41. L. Wasserman, All of Statistics: A Concise Course in Statistical Inference (Springer, Berlin, 2004). (chapter 9)

    Book  MATH  Google Scholar 

  42. M.G. Kendall, A. Stuart, The Advanced Theory of Statistics, vol. 2 (Hafner Publishing Company, New York, 1961), p.40

    Google Scholar 

  43. A. Vehtari, D.P. Simpson, Y. Yao, A. Gelman, Limitations of ‘Limitations of Bayesian leave-one-out cross-validation for model selection’. Comput. Brain Behav. 2, 22–27 (2019)

    Article  Google Scholar 

  44. P.C. Gregory, Bayesian Logical Data Analysis for the Physical Sciences (Cambridge University Press, Cambridge, 2005), p.280

    Book  Google Scholar 

  45. D.L. Goodstein, Richard P. Feynman, Teacher. Phys. Today 42, 70–75 (1989)

    Article  Google Scholar 

  46. R. Andrae, T. Schulze-Hartung, P. Melchior, “Dos and don’ts of reduced chi-squared”. ar**v:1012.3754

  47. C.P. Robert, The Bayesian Choice, 2nd edn. (Springer, Berlin, 2007), p.350

    Google Scholar 

  48. R.D. Morey, R. Hoekstra, J.N. Rouder, M.D. Lee, E.-J. Wagenmakers, The fallacy of placing confidence in confidence intervals. Psychon. Bull. Rev. 23, 103–123 (2016)

    Article  Google Scholar 

  49. P. Bajorski, Statistics for Imaging, Optics, and Photonics (Wiley, New Jersey, 2012), p.166

    MATH  Google Scholar 

  50. P. Diaconis, B. Skyrms, Ten Great Ideas About Chance (Princeton University Press, New Jersey, 2018)

    Book  MATH  Google Scholar 

  51. J. Heinrich, L. Lyons, Systematic errors. Ann. Rev. Nucl. Part. Sci. 57, 145–169 (2007)

    Article  ADS  Google Scholar 

  52. A. Gelman, J.B. Carlin, H.S. Stern, D.B. Dunson, A. Vehtari, D.B. Rubin, Bayesian Data Analysis, 3rd edn. (CRC Press, Florida, 2013)

    Book  MATH  Google Scholar 

  53. C.A. Fuchs, R. Schack, QBism and the Greeks: Why a quantum state does not represent an element of physical reality. Phys. Scr. 90, 015104 (2015)

    Article  ADS  Google Scholar 

  54. J.M. Bernardo, A.F.M. Smith, Bayesian Theory (Wiley, New Jersey, 2000), p.236

    Google Scholar 

  55. J. Sprenger, S. Hartmann, Bayesian Philosophy of Science (Oxford University Press, Oxford, 2019), p.28

    Book  Google Scholar 

  56. J. M. Bernardo, Interpretation of Electoral Results: A Bayesian Analysis, in Proceedings of Teias Matemáticas, pp. 63–75 (2004)

  57. E.T. Jaynes, Probability Theory: The Logic of Science (Cambridge University Press, Cambridge, 2003), p.108

    Book  Google Scholar 

  58. B. Williams, Moral Luck: Philosophical Papers 1973–1980 (Cambridge University Press, Cambridge, 1982), p.18

    Google Scholar 

Download references

Acknowledgements

A.G. would like to thank Andrew Gelman for help navigating the relevant literature. This work was supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada and the Canada Foundation for Innovation (CFI). Computational resources were provided by SHARCNET and NERSC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandros Gezerlis.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gezerlis, A., Williams, M. Six textbook mistakes in data analysis. Eur. Phys. J. Plus 138, 19 (2023). https://doi.org/10.1140/epjp/s13360-022-03629-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1140/epjp/s13360-022-03629-z

Navigation