Statistical and, subsequently, econometric inferences have not undergone a cumulative, progressive process. We have seen instead the emergence of a number of different views, which have often been confused with each other in textbook literature on the subject. It therefore makes sense to approach the issue from a historical-scientific angle rather than a systematic one. We intend, using the extraordinarily complex development as a basis, to give a historical overview of the emergence of concepts that are of particular importance from the point of view of cliometrics. We shall start by describing the beginnings of modern probability theory, along with its connection with other statistical approaches. The following overview covers the basic principles of the current concepts of inference developed by R. A. Fisher on one hand and by J. Neyman and E. S. Pearson on the other. Neo-Bayesian approaches have meanwhile been developed in parallel, although they were not taken into account during the initial founding phase of econometrics. A “classic” approach was instead adopted in this respect, albeit with an additional difficulty: the taking into account of time. Cliometrics initially followed a Bayesian approach, but this did not finally prevail. Following on from econometrics, a correspondingly classic, inference-based position was adopted. This chapter concludes with a reference to a fundamental critique of the classic position by Rudolf Kalman, which we also find very promising as an inference-related concept for cliometrics. We often quote authors directly, in an effort to portray developments more vividly.

  1. 1.

    Cf. Birnbaum (1962, 1968, 1977).

  2. 2.

    Original author’s italics.

  3. 3.

    Quoted from DuMouchel (1992, S. 527). The first edition was published in 1954. Cf. Savage (1954).

  4. 4.

    A detailed treatment of the topic of this chapter can be found at Rahlf (1998) and Gigerenzer/Swijtink/Porter/Daston/Beatty/Krüger (1989).

  5. 5.

    Stigler (1986, p. 122).

  6. 6.

    See, for example, Yule (1895, 1896a, b) and Pearson (1898).

  7. 7.

    F. Galton in a letter to K. Pearson of 18 Nov 1893, quoted by Stigler (1986, p. 336). Original author’s italics.

  8. 8.

    Despite criticism, Pearson’s frequency curves soon became part of the standard repertoire of statistics.

  9. 9.

    W. F. R. Weldon in a letter to F. Galton of 27 Jan 1895, quoted by Stigler (1986, p. 337).

  10. 10.

    K. Pearson explicitly rejected the concept of inverse probability, although E. S. Pearson was of the view that he implicitly followed this approach on at least one occasion. Cf. Pearson (1898). “The basic of the approach used here is a little obscure and there seems to be implicit in it the classical concept of inverse probability” (Pearson 1967, p. 347), quoted by Dale (1991, p. 379). Pearson expressed himself most extensively on this issue in his paper The fundamental problem of practical statistics (1920), which has provoked different interpretations up to the present day. While Fisher (1922, p. 311), for example, believed he recognized a proof of Bayes’ theorem in it, Dale (1991, p. 388) considered this as a “totally inaccurate observation.” For further interpretations, cf. ibid., pp. 377-391. According to Stigler (1986, p. 345), Pearson worked on multiple occasions “[…] (implicitly) in a Bayesian framework.”

  11. 11.

    Pearson (1898, p. 1f), quoted by Stigler (1986, p. 304).

  12. 12.

    Cf. ibid., p. 373.

  13. 13.

    Fisher (1922 [1992], p. 13), similar also to Fisher (1959, p. 34). There is in the case of Fisher (1956, p. 9) a (more or less) clear rejection of the Bayesian approach. He emphasized that he was “personally convinced” that “the theory of inverse probability is founded upon an error, and must be wholly rejected.”

  14. 14.

    Fisher (1922 [1992], p. 13). Ambiguities such as these are characteristic of Fisher’s work. According to Geisser (1992, p. 4), Fisher subscribed – until at least 1912 – to approaches based on Bayesian logic. He then (p. 26f) explicitly rejected the validity of Bayes’ theorem. Cf. Barnard (1988) regarding this question.

  15. 15.

    See supporting evidence in Savage (1976, p. 461). In Fisher (1959, p. 32), he emphasized, for example, that no probability of individual events could be established with such a definition.

  16. 16.

    Savage (1976, p. 461) with corresponding supporting evidence. Savage observes in this respect: “Such a notion is hard to formulate mathematically, and indeed Fisher’s concept of probability remained very unclear, which must have contributed to his isolation from many other statistical theorists” (p. 462).

  17. 17.

    Cf. Geisser (1992). Partly ambiguous terms such as “mean,” “standard deviation,” or “correlation coefficient” have remained in use to this day to indicate, in various contexts, either theoretical variables or estimators for these theoretical variables.

  18. 18.

    Cf. Savage (1976, S. 462) with supporting evidence.

  19. 19.

    Ibid., p. 466: “Nobody knows just what they mean […]. In a word, Fisher hopes by means of some process – the fiducial argument – to arrive at the equivalent of posterior distributions in a Bayesian argument without the introduction of prior distributions […].” We would like to join in with this criticism. As observed by Menges (1972, p. 275): “The fiducial concept considers the results of an observation as indisputable fact in this respect, and as the basis on which to build inference. It can thus do justice, in principle, to the historical character of social phenomena” (original author’s italics), although this also applies to Bayesian logic in our opinion.

  20. 20.

    Such as Pearson’s chi-squared goodness-of-fit test of 1900, Student’s t-test, developed in 1908 and formalized by Fisher, or the F-test applied to the analysis of variance by Fisher.

  21. 21.

    Fisher (1959, p. 41f). Fisher’s failure to include tables of p-values in his famous textbook Statistical Methods for Research Workers (rather than the tables of significance values that he did include) arose from the fact that K. Pearson held the copyright to the former. Cf. Watson (1983, p. 714).

  22. 22.

    Pearson in a paper from 1939 quoted from Lehmann’s comments (1992, p. 68) on Neyman/Pearson (1933) (our italics).

  23. 23.


  24. 24.

    Lehmann (1992, p. 68). This highly important aspect of the Neyman-Pearson theory is often not taken into account. As Borovcnik (1992, p. 92) rightly points out, “[…] a frequency interpretation places too much emphasis on the α-error during testing, while the real trick with this method is to minimise the ß-error.”

  25. 25.

    According to Lehmann (1992, p. 69f).

  26. 26.

    We do not intend to go into the corresponding techniques here but refer instead to textbook literature on the subject.

  27. 27.

    Neyman/Pearson (1933 [1992], p. 74). Kyburg (1985, p. 119) sums up their intention in the observation: “That says nothing about the case before us, but it may make us feel better.”

  28. 28.

    Fisher (1955, S. 71).

  29. 29.

    Cf. Lehmann (1993), for example.

  30. 30.

    Johnstone (1986, p. 6) aptly describes the prevailing approach: “In general, tests of significance in practice follow Neyman formally, but Fisher philosophically. Formally, there is mention of ‘alternative’ hypotheses, errors ‘of the second kind’, and the ‘power’ of the test, which are terms due to Neyman (and his colleague Pearson). But philosophically, the result in a test, e.g. the result that the level of significance P equals 0.049, or that P is less than or equal to 5%, is interpreted as a measure of evidence, which is the interpretation following Fisher, and denied repeatedly by Neyman.”

  31. 31.

    Ramsey (1931a, b).

  32. 32.

    Keynes (1921, S. 4), quoted by Kyburg/Smokler (1964, p. 9).

  33. 33.

    Cf. de Finetti (1937).

  34. 34.

    De Finetti (1981, p. 657).

  35. 35.

    Jeffreys (1939, p. 8).

  36. 36.

    Ibid., p. 401.

  37. 37.

    Although the hypothesis can still be false in terms of rule 4.

  38. 38.

    Barnard (1947, 1949). For historical development, see Berger/Wolpert (1988, p. 22ff).

  39. 39.

    Birnbaum (1962). Cf. also Bjornstad (1992) on the following. A “standard” work on the subject is that of Berger/Wolpert (1988).

  40. 40.

    Edwards/Lindman/Savage (1963 [1992]). Our intention from here on is to deal only with certain ideas without going into technical detail.

  41. 41.

    Ibid., pp. 534–540.

  42. 42.

    For example, Laplace (1812) and Edgeworth (1884).

  43. 43.

    Ibid., p. 541. This is referred to as “stable estimation.”

  44. 44.

    DuMouchel (1992, p. 521) points out that this approach is closely related to the “reference priors” subsequently proposed by other Bayesians for use in situations where little a priori information is available, which are also acceptable to classical statisticians.

  45. 45.

    Edwards/Lindman/Savage (1963 [1992], p. 546).

  46. 46.

    Bayesian literature does not adopt a uniform position regarding the need for a test theory.

  47. 47.

    DuMouchel (1992, p. 523). Cf. example no. 3 in appendix A3 and also example no. 2 in appendix A4.

  48. 48.

    General reference is made to Hodges (1990) in this respect.

  49. 49.

    The following according to Iversen (1984, p. 31).

  50. 50.

    Ibid: “This is the way many users of confidence intervals want to interpret a confidence interval, but in classical statistical inference such an interpretation is not possible.”

  51. 51.

    See above, p. 86f.

  52. 52.

    Cf. Stegmüller (1973, p. 32ff, particularly p. 37).

  53. 53.

    Howson (1995, p. 27).

  54. 54.

    See above, p. 99.

  55. 55.

    Lindley (1991, p. 493).

  56. 56.

    Economic theories, from L. Walras to A. Marshall, started out from states of equilibrium, which were adapted, independently of historical context, by the same perpetual motives of human action. The economic laws contained in these theories were timeless.

  57. 57.

    See above, p. 76 f.

  58. 58.

    One of the few exceptions, who assigned independent significance to the trend, was S. Kuznets. See Kuznets (1930a, b) in particular.

  59. 59.

    Even Tinbergen came to recognize that he “did not understand the role of the shocks as well as Frisch did” (Tinbergen in Magnus/Morgan (1987, p. 125)).

  60. 60.

    The separation between the role of the mechanism and that of the shock was of great importance for the development of econometrics, even though Tinbergen regarded it critically in retrospect: “[…] I think that what interested economics most was not the shocks but the mechanism generating endogenous cycles, and it might very well be that we have overestimated the role of the mechanism. Maybe the shocks were really much more important. This problem was never solved, because the War came along and after the War we were not interested in business cycles anymore” (Tinbergen in Magnus/Morgan (1987, p. 125)).

  61. 61.

    Cf. Kuznets (1934).

  62. 62.

    See Epstein (1987, p. 75 note 39), Mirowski (1989, p. 234), and above all Boumans (1993). Even the statistician G. U. Yule, who was particularly involved in research in the field of time series analysis and its potential applications in economics, began his academic career in the study of electrical waves.

  63. 63.

    Quoted from Mirowski (1991, p. 152). Frisch and Koopmans applied matrix calculus, which was being widely disseminated in physics in the mid-1920s, in the context of multiple regression analysis, to the field of econometrics, thereby making it more difficult for economists to comprehend the texts concerned. Cf. Mirowski (1989, p. 231).

  64. 64.

    Research nevertheless still continued to take place in the “old” tradition, as econometrics began to develop. See, for example, Hotelling (1934), Schultz (1934), Greenstein (1935), and Regan (1936). Even the method of moving averages was still being recommended by Sasuly (1936) in this context.

  65. 65.

    Keuzenkamp/Magnus (1995, p. 18).

  66. 66.

    Heckman (1992, p. 881) also poses the question in this context, in criticism addressed to Morgan (1990): “Why was the Neyman-Pearson theory adopted as the paradigm of statistical inference in econometrics, and why were rival theories by Ronald Fisher and Harold Jeffreys less successful?”.

  67. 67.

    Haavelmo (1994, p. 75).

  68. 68.

    Haavelmo (1944, pp. 13, 22f, 24).

  69. 69.

    Heckman (1992, p. 882). He gives reasons for Morgan’s overestimation of Haavelmo’s approach – rightly in our opinion – with the view, which can be traced back to the influence of Hendry, that these problems are generally solvable in the context of the Neyman-Pearson approach. This overestimation is also picked up by Malinvaud (1991, p. 635) and Zellner (1992, p. 220).

  70. 70.

    See, for example, Sims (1980).

  71. 71.

    They were also subject to the same statistical limitations, such as stationarity and linearity.

  72. 72.

    Zellner (1971, p. 11).

  73. 73.

    See references in Rahlf (1998).

  74. 74.

    Leamer (1994, p. ix).

  75. 75.

    Keuzenkamp (1995, p. 243) therefore uses, for Hendry’s approach, the more apposite term “diagnostic checks” rather than “diagnostic tests.”

  76. 76.

    Cf. Usher (1949, p. 148 and p. 155, note 29).

  77. 77.

    Fogel (1995, S. 49): “The leading history journals, even in economic history, initially refused to accept articles with complex tables and even after such articles began to be accepted, equations were absolutely forbidden.”

  78. 78.

    Floud (1991, p. 452.

  79. 79.

    Fogel/Elton (1983, S. 2), quoted by Floud (1991, p. 452.

  80. 80.

    See above, p. 5.

  81. 81.

    See also Fogel (1995, p. 52) on this subject: “By the early 1980s cliometric methods were so firmly established in certain fields of history that no scholar in these fields could afford to neglect them” (our italics).

  82. 82.

    This is supported not least by the fact that cliometrics did emerge as an independent school of thought because an application for admission by a group of the founding fathers of cliometrics had been rejected by the Econometric Society. Cf. Hughes (1965).

  83. 83.

    Conrad/Meyer (1958). The authors were at the time assistant professors of economics at Harvard. The expressions “starting gun” and “watershed” are therefore justified, since econometric methods were for the first time being applied to historical phenomena without any reference to the present.

  84. 84.

    Cf. Conrad/Meyer (1964).

  85. 85.

    Conrad/Meyer (1957).

  86. 86.

    Cf. Conrad/Meyer (1957, p. 527).

  87. 87.

    They refer here to an example given by Simon (1957) regarding the differing possible influences of the variables of weather, wheat harvest yield, and wheat price.

  88. 88.

    Conrad/Meyer (1957, p. 147.

  89. 89.

    They seek support, in this context, in the line of argument of H. Jeffreys.

  90. 90.

    Conrad/Meyer (1957, p. 544). Specific examples can be found in Conrad/Meyer (1964).

  91. 91.

    Bayesian approaches were not to find fertile ground in the field of econometrics until several years later. It must however be emphasized that the line of argument maintained by Conrad and Meyer contained various terms and concepts (they speak of objective tests and significant differences, before returning to probabilities of hypotheses and a “morass of subjectivism”) that cannot always be clearly differentiated from each other.

  92. 92.

    We rely mainly on Kalman (1982a, b) in this respect. We are therefore not concerned with the application of the so-called Kalman filter to econometrics.

  93. 93.

    Kalman (1982a, p. 19f). Original author’s italics.

  94. 94.

    Ibid., p. 20.

  95. 95.

    This sentence, which was supposed to appear in Kalman (1982c), was deleted at editorial request and included instead in Kalman (1982b, p. 194).

  96. 96.

    Kalman (1982a, p. 23). Linearity and finiteness might be reasonable assumptions for such a system.

  97. 97.

    Kalman (1982b, p. 162). Original author’s italics.

  98. 98.

    Ibid., p. 171.

  99. 99.

    Cf. ibid., p. 172.

  100. 100.

    Kalman (1982a, pp. 26, 27). He describes the calculation of a constant parameter (e.g., in the context of the Phillips curve) as a “conceptual absurdity” (ibid.). Kalman consequently also rejects any causal interpretation. Cf. Kalman (1982b, p. 177), for example.


Recommended Reading

  • The best starting point is still Gigerenzer et al. (1989). See Cited Literature. Other helpful overviews are:

    Google Scholar 

  • Cohen IB (2005) The triumph of numbers: how counting shaped modern life. W. W. Norton, New York

    Google Scholar 

  • Kotz S, Johnson NL (eds) (1992) Breakthroughs in statistics, 1. Foundations and basic theory. 2. Methodology and distribution, Springer series in statistics. Springer, New York

    Google Scholar 

  • Lenhard J (2006) Models and statistical inference: the controversy between Fisher and Neyman-Pearson. Br J Philos Sci 57:69–91

    Article  Google Scholar 

  • Salsburg D (2001) The lady tasting tea: how statistics revolutionized science in the twentieth century. Freeman, New York

    Google Scholar 

  • Sprenger J (2014) Bayesianism vs frequentism in statistical inference. In: Hájek A, Hitchcock C (eds) Handbook of the philosophy of probability. Oxford University Press, Oxford

    Google Scholar 

  • Sprenger J, Hartmann S (2001) Mathematics and statistics in the social sciences. In: Jarvie IC, Bonilla JZ (eds) The SAGE handbook of the philosophy of social sciences. Sage, London, pp 594–612

    Google Scholar 

  • Stigler SM (1999) Statistics on the table: the history of statistical concepts and methods. Harvard University Press, Cambridge, MA

    Google Scholar 

