Log in

From Coding To Curing. Functions, Implementations, and Correctness in Deep Learning

  • Research Article
  • Published:
Philosophy & Technology Aims and scope Submit manuscript

Abstract

This paper sheds light on the shift that is taking place from the practice of ‘coding’, namely develo** programs as conventional in the software community, to the practice of ‘curing’, an activity that has emerged in the last few years in Deep Learning (DL) and that amounts to curing the data regime to which a DL model is exposed during training. Initially, the curing paradigm is illustrated by means of a study-case on autonomous vehicles. Subsequently, the shift from coding to curing is analysed taking into consideration the epistemological notions, central in the philosophy of computer science, of function, implementation, and correctness. First, it is illustrated how, in the curing paradigm, the functions performed by the trained model depend much more on dataset curation rather than on the model algorithms which, in contrast with the coding paradigm, do not comply with requested specifications. Second, it is highlighted how DL models cannot be considered implementations according to any of the available definitions of implementation that follow an intentional theory of functions. Finally, it is argued that DL models cannot be evaluated in terms of their correctness but rather in their experimental computational validity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. The LoAs here listed should not be taken to suggest a cascade software development model (Sommerville, 2021): from intention to execution, but rather to define a stratified ontology for computational systems. The different development methods, such as cascade, spiral, or agile, may go through some or all those levels. See on this Angius et al. (2021).

  2. Andrej Karpathy, former director of AI at Tesla, recently revealed that much of the self-driving dataset for Tesla cars is being collected using cameras only (https://www.youtube.com/watch?v=g6bOwQdCJrc).

  3. The Cityscape dataset contains video recordings from 50 cities with 25.000 labelled frames, concerning street scenes in distinct seasons. See https://www.cityscapes-dataset.com/. The graph in Fig. 2 is taken from https://paperswithcode.com/sota/semantic-segmentation-on-cityscapes.

  4. Karpathy’s keynote talk at the 2021 IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) is available at https://karpathy.ai/; unfortunately, no published paper or proceedings is associated with the talk.

  5. The algorithm proceeds by first ranking snippets given a set of complexity measures and a task, then by selecting the most interesting snippets (with high ranking scores), and finally by ensuring diversity of the snippet set, quantifying diversity as a difference in their complexity measures.

  6. Recall that a finite transition system \(TS=(S, A, T, I, F, AP, L)\) is a set-theoretic structure defined by a finite set of states \(S=\{s_0, \ldots , s_n\}\), a finite set of labels for transitions A, a transition relation \(T \subseteq S \times A \times S\), each transition having form \(s_i \xrightarrow {\alpha } s_j\), a set of initial states \(I \subseteq S\), a set of final states \(F \subseteq S\), a finite set of labels for states AP, and a state labelling function \(L: S \rightarrow 2^{AP}\). The example in Fig. 5 is inspired by a similar one in Baier and Katoen (2008, 100-102).

  7. Clearly, that both traffic lights enter the green state infinitely often together is avoided by requiring safety property S.

  8. Distinct cases should be made for the two different forms of malfunction: dysfunction when the system does not display required behaviours, and mysfunction, when the system displays behaviours not requested by its specifications (Floridi et al., 2015).

  9. It follows from this analysis that any LoA can also be considered a specification level for its lower LoAs, in that it corresponds to the functional level of an artefact prescribing functions for its implementations.

  10. That is, if the focus is on a high-level language program as an implementation of an algorithm, the program is understood as a structural level; supposing then focus is made on the computational machine as an implementation of the high-level language program, the latter has to be reinterpreted as a functional level.

  11. Note that this is not a problem for biological systems in evolution theory, new functions caused by the same structure being known as exaptations (Gould & Vrba, 1982).

  12. It would be interesting at this point analysing such a taxonomy in the light of the conceptual distinctions made, for instance, by Fresco & Primiero (2013); Primiero (2014); Floridi et al. (2015) and comparing it to the empirical classification of traditional software errors carried out by Horner & Symons (2019). This, however, would go far beyond the scope of this paper.

  13. Notice that in case the validation process requires to fix the model may times, modifying the so called hyper-parameters (such as the number of layers or the size of the layers), the model may also overfit to the validation data set. In this case it said that an information leaks from the validation set to the trained model takes place. For more details see Chollet (2021, pp. 97-100).

  14. It should be noted that, as it is for common formal verification, applying those methods to real cases often requires using abstractions and approximations to allow computational tractability. However, abstractions and approximations lead to the incompleteness of the involved verification method: the algorithm may either terminate with an ‘unknown’ answer, or terminate with a false negative, that is, a counterexample showing a violation of the verified functional property to which, though, does not correspond any actual model computation.

  15. Another restraint is given by the difficulty of controlling every parameter involved in a specific function, given the complex layered structure of contemporary networks (Gerasimou et al., 2020).

  16. A mathematical model is called by Primiero (2020) robust; in case it is able to include all variables of interest and all the inferential properties of the simulated empirical system.

  17. There are indeed scientific contexts in which DL models used to predict the evolution of some empirical system also bear structural similarities with the target system. The convolutional DL model used by Monk (2018) to simulate parton shower, the neural model by Choudhary et al. (2020) for simulating the Henon-Heiles potential, and the MetNet-2 (Meteorological Neural Network 2) (Espeholt et al., 2021) DL model are some examples.

Abbreviations

DL:

Deep Learning

LoA:

Level of abstraction

AV:

Autonomous vehicle

XAI:

Explainable artificial intelligence

References

  • Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., et al. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Technical report, Google Brain Team.

  • Ackerman, E. (2016). Self-driving cars were just around the corner — in 1960. IEEE Spectrum, 31 Aug.

  • Alpern, B., & Schneider, F. B. (1985). Defining liveness. Information processing letters, 21(4), 181–185.

    Article  Google Scholar 

  • Alpern, B., & Schneider, F. B. (1987). Recognizing safety and liveness. Distributed Computing, 2(3), 117–126.

    Article  Google Scholar 

  • Alvarado, R. (2022a). Ai as an epistemic technology. http://philsci-archive.pitt.edu/21243/

  • Alvarado, R. (2022). Computer simulations as scientific instruments. Foundations of Science, 27, 1–23.

    Article  Google Scholar 

  • Ammann, P., & Offutt, J. (2016). Introduction to software testing. Cambridge University Press.

  • Angius, N. (2013). Model-based abductive reasoning in automated software testing. Logic Journal of IGPL, 21(6), 931–942.

    Article  Google Scholar 

  • Angius, N., Primiero, G., and Turner, R. (2021). The Philosophy of Computer Science. In E. N. Zalta (Eds.), The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Spring 2021 edition.

  • Angius, N., & Tamburrini, G. (2011). Scientific theories of computational systems in model checking. Minds and Machines, 21(2), 323–336.

    Article  Google Scholar 

  • Angius, N., & Tamburrini, G. (2017). Explaining engineered computing systems’ behaviour: the role of abstraction and idealization. Philosophy & Technology, 30(2), 239–258.

    Article  Google Scholar 

  • Baier, C. & Katoen, J.-P. (2008). Principles of model checking. MIT press.

  • Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of deep networks. In Advances in Neural Information Processing Systems (pp. 153–160).

  • Chalmers, D. J. (1996). Does a rock implement every finite-state automaton? Synthese, 108, 309–333.

    Article  Google Scholar 

  • Chollet, F. (2021). Deep learning with Python. Simon and Schuster.

  • Choudhary, A., Lindner, J. F., Holliday, E. G., Miller, S. T., Sinha, S., & Ditto, W. L. (2020). Physics-enhanced neural networks learn order and chaos. Physical Review E, 27, 217–236.

    Google Scholar 

  • Clarke, E. M., Grumberg, O., & Peled, D. A. (1999). Model checking. MIT Press.

  • Colburn, T., & Shute, G. (2007). Abstraction in computer science. Minds and Machines, 17(2), 169–184.

    Article  Google Scholar 

  • Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., & Schiele, B. (2016). The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3213–3223).

  • Cummins, R. (1975). Functional analysis. Journal of Philosophy, 72, 741–764.

    Article  Google Scholar 

  • Curtis-Trudel, A. (2021). Implementation as resemblance. Philosophy of Science, 88(5), 1021–1032.

    Article  Google Scholar 

  • de Villers, J., & Barnard, E. (1992). Backpropagation neural nets with one and two hidden layers. IEEE Transactions on Neural Networks, 4, 136–141.

    Article  Google Scholar 

  • Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12, 2121–2159.

    Google Scholar 

  • Durán, J. M. (2018). Computer simulations in science and engineering: Concepts-Practices-Perspectives. Springer.

  • Durán, J. M., & Formanek, N. (2018). Grounds for trust: Essential epistemic opacity and computational reliabilism. Minds and Machines, 28, 645–666.

    Article  Google Scholar 

  • Espeholt, L., Agrawal, S., Słonderby, C., Kumar, M., Heek, J., Bromberg, C., Gazen, C., Hickey, J., Bell, A., & Kalchbrenner, N. (2021). Skillful twelve hour precipitation forecasts using large context neural networks. abs/2111.07472.

  • Fetzer, J. H. (1988). Program verification: The very idea. Communications of the ACM, 31(9), 1048–1063.

    Article  Google Scholar 

  • Floridi, L. (2008). The method of levels of abstraction. Minds and machines, 18(3), 303–329.

    Article  Google Scholar 

  • Floridi, L., Fresco, N., & Primiero, G. (2015). On malfunctioning software. Synthese, 192(4), 1199–1220.

    Article  Google Scholar 

  • Fresco, N., & Primiero, G. (2013). Miscomputation. Philosophy & Technology, 26, 253–272.

    Article  Google Scholar 

  • Gerasimou, S., Eniser, H. F., Sen, A., & Cakan, A. (2020). Importance-driven deep learning system testing. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE) (pp. 702–713). IEEE.

  • Gould, S. J., & Vrba, E. S. (1982). Exaptation–a missing term in the science of form. Paleobiology, 8(1), 4–15.

    Article  Google Scholar 

  • Hilpinen, R. (1992). On artifacts and works of art 1. Theoria, 58(1), 58–82.

    Article  Google Scholar 

  • Hinton, G. E., Krizhevsky, A., & Wang, S. D. (2011). Transforming auto-encoders. In International Conference on Artificial Neural Networks (pp. 44–51). Springer-Verlag.

  • Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 28, 504–507.

    Article  Google Scholar 

  • Hoare, C. A. R. (1969). An axiomatic basis for computer programming. Communications of the ACM, 12(10), 576–580.

    Article  Google Scholar 

  • Horner, J. K., & Symons, J. (2019). Understanding error rates in software engineering: Conceptual, empirical, and experimental approaches. Philosophy & Technology, 32, 363–378.

    Article  Google Scholar 

  • Humbatova, N., Jahangirova, G., Bavota, G., Riccio, V., Stocco, A., & Tonella, P. (2020). Taxonomy of real faults in deep learning systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (pp. 1110–1121).

  • Humphreys, P. (2004). Extending ourselves: Computational science, empiricism, and scientific method. Oxford University Press.

  • Itkonen, J. & Rautiainen, K. (2005). Exploratory testing: a multiple case study. In 2005 International Symposium on Empirical Software Engineering, 2005. (pp. 10–pp). IEEE.

  • Janai, J., Güney, F., Behl, A., & Geiger, A. (2020). Computer vision for autonomous vehicles, problems, datasets and state of the art. Foundations and Trends in Computer Graphics and Vision, 12, 1–308.

    Article  Google Scholar 

  • Johnson, D. G. (2009). Computer ethics. London: Pearson.

    Google Scholar 

  • Jones, C. B. (1990). Systematic software development using vdm. In Prentice Hall International Series in Computer Science.

  • Joshi, A. J., Porikli, F., & Papanikolopoulos, N. (2009). Multi-class active learning for image classification. In 2009 IEEE conference on computer vision and pattern recognition (pp. 2372–2379). IEEE.

  • Ketkar, N. (2017). Introduction to PyTorch (pp. 195–208). Berkeley (CA): Apress.

    Google Scholar 

  • Kim, T. K., Yi, P. H., Hager, G. D., & Lin, C. T. (2020). Refining dataset curation methods for deep learning-based automated tuberculosis screening. Journal of Thoracic Disease, 12, 5078–5085.

    Article  Google Scholar 

  • Kingma, D. P. & Ba, J. (2014). Adam: A method for stochastic optimization. In Proceedings of International Conference on Learning Representations.

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1090–1098.

  • Kroes, P. (2012). Technical artefacts: Creations of mind and matter: A philosophy of engineering design (vol. 6). Springer Science & Business Media.

  • Kroes, P., & Meijers, A. (2006). The dual nature of technical artefacts. Studies in History and Philosophy of Science, 37(1), 1–4.

    Article  Google Scholar 

  • Kröger, F., & Merz, S. (2008). Temporal Logic and State Systems. Springer.

  • Kuutti, S., Fallah, S., Bowden, R., & Barber, P. (2019). Deep learning for autonomous vehicle control - algorithms, state-of-the-art, and future prospects. Synthesis Lectures on Advances in Automotive Technology, 3, 1–80.

    Article  Google Scholar 

  • Leeuwen, J. V. (1990). Handbook of Theoretical Computer Science, Volume B: Formal models and semantics. MIT Press.

  • Li, X. & Guo, Y. (2013). Adaptive active learning for image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 859–866).

  • Lipton, Z. C. (2018). The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue, 16(3), 31–57.

    Article  Google Scholar 

  • Liu, C., Arnon, T., Lazarus, C., Strong, C., Barrett, C., Kochenderfer, M. J., et al. (2021). Algorithms for verifying deep neural networks. Foundations and Trends® in Optimization, 4(3-4), 244–404.

  • Ma, L., Juefei-Xu, F., Zhang, F., Sun, J., Xue, M., Li, B., Chen, C., Su, T., Li, L., Liu, Y., et al. (2018). Deepgauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (pp. 120–131).

  • McLaughlin, P. (2001). What functions explain: Functional explanation and self-reproducing systems. Cambridge University Press.

  • Meng, J., Chen, P., Wahib, M., Yang, M., Zheng, L., Wei, Y., Feng, S., & Liu, W. (2022). Boosting the predictive performance with aqueous solubility dataset curation. Scientific Data, 9, 71.

    Article  Google Scholar 

  • Monk, J. W. (2018). Deep learning as a parton shower. Journal of High Energy Physics, 2018, 21.

    Article  Google Scholar 

  • Odena, A., Olsson, C., Andersen, D., & Goodfellow, I. (2019). Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. In International Conference on Machine Learning (pp. 4901–4911). PMLR.

  • Pei, K., Cao, Y., Yang, J., & Jana, S. (2017). Deepxplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles (pp. 1–18).

  • Piccinini, G. (2007). Computing mechanisms. Philosophy of Science, 74(4), 501–526.

    Article  Google Scholar 

  • Plebe, A., & Grasso, G. (2019). The unbearable shallow understanding of deep learning. Minds and Machines, 29(4), 515–553.

    Article  Google Scholar 

  • Plebe, A., & Perconti, P. (2022). The Future of the Artificial Mind. Boca Raton: CRC Press.

    Book  Google Scholar 

  • Primiero, G. (2014). A taxonomy of errors for information systems. Minds and Machines, 24, 249–273.

    Article  Google Scholar 

  • Primiero, G. (2016). Information in the philosophy of computer science. In The Routledge handbook of philosophy of information (pp. 106–122). Routledge.

  • Primiero, G. (2020). On the foundations of computing. Oxford University Press.

  • Rapaport, W. J. (1999). Implementation is semantic interpretation. The Monist, 82(1), 109–130.

    Article  Google Scholar 

  • Rapaport, W. J. (2005). Implementation is semantic interpretation: further thoughts. Journal of Experimental & Theoretical Artificial Intelligence, 17(4), 385–417.

    Article  Google Scholar 

  • Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536.

    Article  Google Scholar 

  • Rumelhart, D. E. & McClelland, J. L. (Eds.) (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition.

  • Sadat, A., Segal, S., Casas, S., Tu, J., Yang, B., Urtasun, R., & Yumer, E. (2021). Diverse complexity measures for dataset curation in self-driving. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 8609–8616). IEEE.

  • Salay, R., Queiroz, R., & Czarnecki, K. (2017). An analysis of iso 26262: Using machine learning safely in automotive software. ar**v preprint ar**v:1709.02435.

  • Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K., & Müller, K.-R. (2019). Explainable AI: interpreting, explaining and visualizing deep learning (vol. 11700). Springer Nature.

  • Searle, J. R. (1995). The construction of social reality. Free Press.

    Google Scholar 

  • Settles, B. (2012). Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 6(1), 1–114.

    Article  Google Scholar 

  • Shapiro, S. (1997). Splitting the difference: the historical necessity of synthesis in software engineering. IEEE Annals of the History of Computing, 19(1), 20–54.

    Article  Google Scholar 

  • Sommerville, I. (2021). Software Engineering. London: Pearson.

    Google Scholar 

  • Spivey, J. M. (1988). Understanding Z: a specification language and its formal semantics (vol. 3). Cambridge University Press.

  • Symons, J., & Horner, J. K. (2020). Why there is no general solution to the problem of software verification. Foundations of Science, 25, 541–557.

    Article  Google Scholar 

  • Thung, F., Wang, S., Lo, D., & Jiang, L. (2012). An empirical study of bugs in machine learning systems. In 2012 IEEE 23rd International Symposium on Software Reliability Engineering (pp. 271–280). IEEE.

  • Turing, A. (1936). On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, 42, 230–265.

    Google Scholar 

  • Turing, A. (1948). Intelligent machinery. Technical report, National Physical Laboratory, London. Reprinted in Ince, D. C. (ed.) Collected Works of A. M. Turing: Mechanical Intelligence, Elsevier Science Publishers, 1992.

  • Turner, R. (2009). Computable models (vol. 1193). Springer.

  • Turner, R. (2011). Specification. Minds & Machines, 21(2), 135–152.

    Article  Google Scholar 

  • Turner, R. (2018). Computational artifacts\(:\ :\)towards a philosophy of computer science. Springer.

  • Turner, R. (2020). Computational intention. Studies in Logic, Grammar and Rhetoric, 63(1), 19–30.

    Article  Google Scholar 

  • Winsberg, E. (1999). Sanctioning models: The epistemology of simulation. Science in Context, 12(2), 275–292.

    Article  Google Scholar 

  • Winsberg, E. (2010). Science in the age of computer simulation. University of Chicago Press.

  • Winsberg, E. (2022). Computer Simulations in Science. In E. N. Zalta, & U, Nodelman (Eds.), The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Winter 2022 edition.

Download references

Funding

Nicola Angius was partially funded by the Research Project ANR-17-CE38-0003-01 (ANR - Agence Nationale de la Recherche) titled ‘What is a (computer) program: Historical and Philosophical Perspectives’.

Author information

Authors and Affiliations

Authors

Contributions

All authors equally contributed to the study conception and design of this paper. Material preparation and analysis were performed by both NA e AP. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Nicola Angius.

Ethics declarations

Conflict of interest

Nicola Angius and Alessio Plebe declare they have no financial interests.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Angius, N., Plebe, A. From Coding To Curing. Functions, Implementations, and Correctness in Deep Learning. Philos. Technol. 36, 47 (2023). https://doi.org/10.1007/s13347-023-00642-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13347-023-00642-7

Keywords

Navigation