Abstract
In this Chapter, data science is characterized as an inductivist approach, i.e. an approach which aims to start from the facts to infer increasingly general laws and theories. This perspective is corroborated first by a case study of successful scientific practice from the field of machine translation and second by an analysis of recent developments in statistics, in particular the shift from so-called data modeling to algorithmic modeling. Over the past century, inductivism has not been well regarded by many scientists and philosophers of science. Given that inductivism is generally considered to be a failed methodology, the fundamental epistemological problem of data science turns out to be the justification of inductivism. Some classic objections against inductivism are revisited, the most pertinent of which is the so-called problem of induction. Without a satisfying solution to the problem of induction, data science seems doomed to failure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Compare Pietsch (2017).
- 2.
- 3.
Another version of the hypothetico-deductive doctrine is given by Richard Feynman, arguably the most influential physicist in the second half of the 20th century, in his Character of Physical Law: “In general we look for a new law by the following process. First we guess it. Then we compute the consequences of the guess to see what would be implied if this law that we guessed is right. Then we compare the result of the computation to nature, with experiment or experience, compare it directly with observation, to see if it works. If it disagrees with experiment it is wrong. In that simple statement is the key to science. It does not make any difference how beautiful your guess is. It does not make any difference how smart you are, who made the guess, or what his name is – if it disagrees with experiment it is wrong. That is all there is to it.” (Feynman 1967, 156)
- 4.
- 5.
‘The Unreasonable Effectiveness of Data’, talk given by Peter Norvig at UBC, 23.9.2010 http://www.youtube.com/watch?v=yvDCzhbjYWs at 43:45.
- 6.
http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/speechreco/team/, accessed 1.8.2013.
- 7.
For an interesting exchange between Noam Chomsky and Peter Norvig, representing a model-driven approach and a statistical approach to linguistics, respectively, compare Norvig (2017).
- 8.
For example, Google Translate switched in November 2016.
- 9.
- 10.
http://magazine.amstat.org/blog/2010/09/01/statrevolution/ (accessed 31.1.2015).
- 11.
For a graphic illustration of this claim compare the terms ‘computer’ and ‘non-parametric’ on Google’s Ngram Viewer https://books.google.com/ngrams.
- 12.
- 13.
Here, parameters are not to be understood in terms of variables but of constants that determine the properties of a specific model: e.g. in the linear model y = ax + b, a and b are the model parameters.
- 14.
This curse of dimensionality does not automatically apply to all algorithms in data science and machine learning. To the contrary, it occasionally turns out helpful to artificially increase the dimensionality of the variable space in methods like decision trees or support vector machines (Breiman 2001, 208-209).
- 15.
Some authors deny that Whewell should be considered a deductivist (e.g. Snyder 2017, Sec. 2). But while his epistemological stance does not fulfill all the criteria laid out in Section 2.1, he can certainly be seen as a precursor to hypothetico-deductivism. In particular, his methodological approach has considerable rationalistic elements, stressing the importance of ideas, which are “not a consequence of experience, but a result of the particular constitution and activity of the mind, which is independent of all experience in its origin, though constantly combined with experience in its exercise” (Whewell 1858b, 91; cited in Snyder 2017, Sec. 2). This was a major point of contention in the debate with Mill. In a somewhat Kantian perspective, Whewell introduced the notion of ‘colligation of facts’ referring to a process which subsumes certain phenomena under a general idea, for example geometric phenomena under the concept of space (Whewell 1858a, Ch. II.IV).
References
Ampère, Jean-Marie. 1826/2012. Mathematical theory of electro-dynamic phenomena uniquely derived from experiments. Transl. Michael D. Godfrey. Paris: A. Hermann. https://archive.org/details/AmpereTheorieEn
Bacon, Francis. 1620/1994. Novum Organum. Chicago, Il: Open Court.
Bellman, Richard E. 1961. Adaptive Control Processes: A Guided Tour. Princeton: Princeton University Press.
Breiman, Leo. 2001. Statistical Modeling: The Two Cultures. Statistical Science 16 (3): 199–231.
Callebaut, Werner. 2012. Scientific perspectivism: A philosopher of science’s response to the challenge of big data biology. Studies in History and Philosophy of Biological and Biomedical Science 43 (1): 69–80.
Chomsky, Noam. 1965. Aspects of the Theory of Syntax. MIT Press.
Ducheyne, Steffen. 2005. Bacon’s Idea and Newton’s Practice of Induction. Philosophica 76: 115–128.
Duhem, Pierre. 1906/1962. The Aim and Structure of Physical Theory. New York: Atheneum.
Einstein. 1934. On the Method of Theoretical Physics. Philosophy of Science 1 (2): 163–169.
Feynman, Richard. 1967. The Character of Physical Law
Frické, Martin. 2014. Big Data and Its Epistemology. Journal of the Association for Information Science and Technology 66 (4): 651–661.
Gillies, Donald. 1996. Artificial Intelligence and Scientific Method. Oxford: Oxford University Press.
Goodman, Nelson. 1954. Fact, Fiction, and Forecast. London: Athlone Press.
Halevy, Alon, Peter Norvig, and Fernando Pereira. 2009. The Unreasonable Effectiveness of Data. IEEE Intelligent Systems 24 (2): 8–12.
Hanson, Norwood Russell. 1958. Patterns of Discovery: An Inquiry into the Conceptual Foundations of Science. Cambridge: Cambridge University Press.
Harman, Gilbert, and Sanjeev Kulkarni. 2007. Reliable Reasoning. Induction and Statistical Learning Theory. Boston: MIT Press.
Hastie, T., and R. Tibshirani. 1990. Generalized Additive Models. London: Chapman and Hall.
Hume, David. 1748. An Enquiry concerning Human Understanding. London: A. Millar.
Jelinek, Frederick. 2009. The Dawn of Statistical ASR and MT. Computational Linguistics 35 (4): 483–494.
Kauermann, Goeran. 2006. Nonparametric Models and their Estimation. In Modern Econometric Analysis, ed. Olaf Hübler and Joachim Frohn, 137–152. Springer: Berlin.
Kitchin, Rob. 2014. The Data Revolution. Los Angeles: Sage.
Lavoisier, Antoine. 1789/1890. Elements of Chemistry. Edinburgh: William Creech. http://www.gutenberg.org/files/30775/30775-h/30775-h.htm
Leonelli, Sabina. 2012. Introduction: Making sense of data-driven research in the biological and biomedical sciences. Studies in History and Philosophy of Biological and Biomedical Sciences 43 (1): 1–3.
Newton, Isaac. 1726/1999. Mathematical Principles of Natural Philosophy. Berkeley: University of California Press.
Norvig, Peter. 2009. Natural Language Corpus Data. In Beautiful Data, ed. T. Segaran and J. Hammerbacher, 219–242. Sebastopol: O’Reilly.
———. 2017. On Chomsky and the Two Cultures of Statistical Learning. In Berechenbarkeit der Welt? Philosophie und Wissenschaft im Zeitalter von Big Data, ed. W. Pietsch, J. Wernecke, and M. Ott, 61–83. Wiesbaden: Springer.
Pérez-Ramos, Antonio. 1996. Bacon’s Forms and the Maker’s Knowledge. In The Cambridge Companion to Bacon, ed. Markuu Peltonen, 99–120. Cambridge: Cambridge University Press.
Pietsch, Wolfgang. 2015. Aspects of Theory-Ladenness in Data-Intensive Science. Philosophy of Science 82 (5): 905–916.
———. 2016. The Causal Nature of Modeling with Big Data. Philosophy & Technology 29 (2): 137–171.
———. 2017. Causation, Probability, and all that: Data Science as a Novel Inductive Paradigm. In Frontiers in Data Science, ed. Matthias Dehmer and Frank Emmert-Streib, 329–353. Boca Raton: CRC Press.
———. 2021. Big Data. Cambridge: Cambridge University Press.
Popper, Karl. 1935/2002. The Logic of Scientific Discovery. London: Routledge Classics.
———. 1963. Conjectures and Refutations. Abingdon: Routledge.
Quine, Willard Van Orman. 1951. Two Dogmas of Empiricism. The Philosophical Review 60 (1): 20–43.
Russell, Stuart, and Peter Norvig. 2009. Artificial Intelligence. Upper Saddle River, NJ: Pearson.
Snyder, Laura J. 2017. William Whewell. Stanford Encyclopedia of Philosophy (Winter 2017 Edition). https://plato.stanford.edu/archives/win2017/entries/whewell/
Sprenger. 2011. Science without (parametric) models: the case of bootstrap resampling. Synthese 180 (1): 65–76.
Whewell, William. 1858a. Novum Organon Renovatum. 3rd ed. London: John W. Parker.
———. 1858b. History of Scientific Ideas. Vol. I. London: John W. Parker.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Pietsch, W. (2022). Inductivism. In: On the Epistemology of Data Science. Philosophical Studies Series, vol 148. Springer, Cham. https://doi.org/10.1007/978-3-030-86442-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-86442-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86441-5
Online ISBN: 978-3-030-86442-2
eBook Packages: Religion and PhilosophyPhilosophy and Religion (R0)