Abstract
This article discusses a number of incorrect statements appearing in textbooks on data analysis, machine learning or computational methods; the common theme in all these cases is the relevance and application of statistics to the study of scientific or engineering data; these mistakes are also quite prevalent in the research literature. Crucially, we do not address errors made by an individual author, focusing instead on mistakes that are widespread in the introductory literature. After some background on frequentist and Bayesian linear regression, we turn to our six paradigmatic cases, providing in each instance a specific example of the textbook mistake, pointers to the specialist literature where the topic is handled properly, along with a correction that summarizes the salient points. The mistakes (and corrections) are broadly relevant to any technical setting where statistical techniques are used to draw practical conclusions, ranging from topics introduced in an elementary course on experimental measurements all the way to more involved approaches to regression.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1140%2Fepjp%2Fs13360-022-03629-z/MediaObjects/13360_2022_3629_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1140%2Fepjp%2Fs13360-022-03629-z/MediaObjects/13360_2022_3629_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1140%2Fepjp%2Fs13360-022-03629-z/MediaObjects/13360_2022_3629_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1140%2Fepjp%2Fs13360-022-03629-z/MediaObjects/13360_2022_3629_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1140%2Fepjp%2Fs13360-022-03629-z/MediaObjects/13360_2022_3629_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1140%2Fepjp%2Fs13360-022-03629-z/MediaObjects/13360_2022_3629_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1140%2Fepjp%2Fs13360-022-03629-z/MediaObjects/13360_2022_3629_Fig7_HTML.png)
Similar content being viewed by others
Data availability statement
Not applicable.
References
A. Gezerlis, M. Williams, Six textbook mistakes in computational physics. Am. J. Phys. 89, 51–60 (2021)
A. Gezerlis, Numerical Methods in Physics with Python, 2nd edn. (Cambridge University Press, Cambridge, 2023)
A. Gelman, J. Hill, A. Vehtari, Regression and Other Stories (Cambridge University Press, Cambridge, 2020), p.154
A. Gelman, Going beyond the book: towards critical reading in statistics teaching. Teach. Stat. 34(3), 82–86 (2011)
R.J. Barlow, Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences (Wiley, New Jersey, 1989)
K.J. Beers, Numerical Methods for Chemical Engineering (Cambridge University Press, Cambridge, 2007)
D.P. Bertsekas, J.N. Tsitsiklis, Introduction to Probability, 2nd edn. (Athena Scientific, Massachusetts, 2008)
P.R. Bevington, D.K. Robinson, Data Reduction and Error Analysis in the Physical Sciences, 3rd edn. (McGraw-Hill, New York, 2003)
C.M. Bishop, Pattern Recognition and Machine Learning (Springer, Berlin, 2006)
G. Bohm, G. Zech, Introduction to Statistics and Data Analysis for Physicists, 3rd edn. (Verlag Deutsches Elektronen-Synchrotron, Hamburg, 2017)
J.F. Boudreau, E.S. Swanson, Applied Computational Physics (Oxford University Press, Oxford, 2018)
R.L. Burden, D.J. Faires, A.M. Burden, Numerical Analysis, 10th edn. (Cengage Learning, Massachusetts, 2015)
S.C. Chapra, R.P. Canale, Numerical Methods for Engineers, 7th edn. (McGraw-Hill, New York, 2014)
M.H. DeGroot, M.J. Schervish, Probability and Statistics, 4th edn. (Addison-Wesley, Massachusetts, 2012)
M.P. Deisenroth, A.A. Faisal, C.S. Ong, Mathematics for Machine Learning (Cambridge University Press, Cambridge, 2020)
P.L. DeVries, A First Course in Computational Physics (Wiley, New Jersey, 1994)
A. Gilat, V. Subramaniam, Numerical Methods for Engineers and Scientists, 3rd edn. (Wiley, New Jersey, 2013)
H. Gould, J. Tobochnik, W. Christian, An Introduction to Computer Simulation Methods, Rev, 3rd edn. (CreateSpace, California, 2017)
R.W. Hamming, Numerical Methods for Scientists and Engineers, 2nd edn. (McGraw-Hill, New York, 1973)
H. Jiang, Machine Learning Fundamentals (Cambridge University Press, Cambridge, 2021)
D. Kahaner, C. Moler, S. Nash, Numerical Methods and Software (Prentice Hall, New Jersey, 1989)
J. Kiusalaas, Numerical Methods for Engineers with Python 3 (Cambridge University Press, Cambridge, 2013)
S. Koonin, D.C. Meredith, Computational Physics (Addison-Wesley, Massachusetts, 1990)
R.H. Landau, M.J. Páez, C.C. Bordeianu, Computational Physics, 3rd edn. (Wiley-VCH, New Jersey, 2015)
L. Lyons, Statistics for Nuclear and Particle Physicists (Cambridge University Press, Cambridge, 1986)
J. Mandel, The Statistical Analysis of Experimental Data (Wiley, New Jersey, 1964)
J. Mathews, R.L. Walker, Mathematical Methods of Physics, 2nd edn. (Pearson, London, 1971)
K.P. Murphy, Probabilistic Machine Learning (The MIT Press, Massachusetts, 2022)
W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical Recipes in Fortran, 2nd edn. (Cambridge University Press, Cambridge, 1992)
C.A. Pruneau, Data Analysis Techniques for Physical Scientists (Cambridge University Press, Cambridge, 2017)
J.A. Rice, Mathematical Statistics and Data Analysis (Duxbury, California, 2007)
B.P. Roe, Probability and Statistics in the Physical Sciences, 3rd edn. (Springer, Berlin, 2020)
S. Rogers, M. Girolami, A First Course in Machine Learning, 2nd edn. (CRC Press, Florida, 2017)
S. Širca, M. Horvat, Computational Methods in Physics, 2nd edn. (Springer, Berlin, 2018)
D.S. Sivia, J. Skilling, Data Analysis: A Bayesian Tutorial, 2nd edn. (Oxford University Press, Oxford, 2006)
S. Theodoridis, Machine Learning: A Bayesian and Optimization Perspective, 2nd edn. (Academic Press, London, 2020)
W.J. Thompson, Computing for Scientists and Engineers (Wiley, New Jersey, 1992)
S.S.M. Wong, Computational Methods in Physics and Engineering, 2nd edn. (World Scientific, Singapore, 1997)
A. Zielesny, From Curve Fitting to Machine Learning, 2nd edn. (Springer, Berlin, 2016)
G. Casella, R.L. Berger, Statistical Inference, 2nd edn. (Duxbury, California, 2002)
L. Wasserman, All of Statistics: A Concise Course in Statistical Inference (Springer, Berlin, 2004). (chapter 9)
M.G. Kendall, A. Stuart, The Advanced Theory of Statistics, vol. 2 (Hafner Publishing Company, New York, 1961), p.40
A. Vehtari, D.P. Simpson, Y. Yao, A. Gelman, Limitations of ‘Limitations of Bayesian leave-one-out cross-validation for model selection’. Comput. Brain Behav. 2, 22–27 (2019)
P.C. Gregory, Bayesian Logical Data Analysis for the Physical Sciences (Cambridge University Press, Cambridge, 2005), p.280
D.L. Goodstein, Richard P. Feynman, Teacher. Phys. Today 42, 70–75 (1989)
R. Andrae, T. Schulze-Hartung, P. Melchior, “Dos and don’ts of reduced chi-squared”. ar**v:1012.3754
C.P. Robert, The Bayesian Choice, 2nd edn. (Springer, Berlin, 2007), p.350
R.D. Morey, R. Hoekstra, J.N. Rouder, M.D. Lee, E.-J. Wagenmakers, The fallacy of placing confidence in confidence intervals. Psychon. Bull. Rev. 23, 103–123 (2016)
P. Bajorski, Statistics for Imaging, Optics, and Photonics (Wiley, New Jersey, 2012), p.166
P. Diaconis, B. Skyrms, Ten Great Ideas About Chance (Princeton University Press, New Jersey, 2018)
J. Heinrich, L. Lyons, Systematic errors. Ann. Rev. Nucl. Part. Sci. 57, 145–169 (2007)
A. Gelman, J.B. Carlin, H.S. Stern, D.B. Dunson, A. Vehtari, D.B. Rubin, Bayesian Data Analysis, 3rd edn. (CRC Press, Florida, 2013)
C.A. Fuchs, R. Schack, QBism and the Greeks: Why a quantum state does not represent an element of physical reality. Phys. Scr. 90, 015104 (2015)
J.M. Bernardo, A.F.M. Smith, Bayesian Theory (Wiley, New Jersey, 2000), p.236
J. Sprenger, S. Hartmann, Bayesian Philosophy of Science (Oxford University Press, Oxford, 2019), p.28
J. M. Bernardo, Interpretation of Electoral Results: A Bayesian Analysis, in Proceedings of Teias Matemáticas, pp. 63–75 (2004)
E.T. Jaynes, Probability Theory: The Logic of Science (Cambridge University Press, Cambridge, 2003), p.108
B. Williams, Moral Luck: Philosophical Papers 1973–1980 (Cambridge University Press, Cambridge, 1982), p.18
Acknowledgements
A.G. would like to thank Andrew Gelman for help navigating the relevant literature. This work was supported in part by the Natural Sciences and Engineering Research Council (NSERC) of Canada and the Canada Foundation for Innovation (CFI). Computational resources were provided by SHARCNET and NERSC.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gezerlis, A., Williams, M. Six textbook mistakes in data analysis. Eur. Phys. J. Plus 138, 19 (2023). https://doi.org/10.1140/epjp/s13360-022-03629-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1140/epjp/s13360-022-03629-z