1 Introduction

It has been established that proponents of the PP framework seek mechanistic explanations and that the various models of cognitive functions developed via PP are aimed at this kind of account (Friston et al., 2018; Gładziejewski, 2019). In line with this view, it has been argued that PP provides a sketch of a mechanism (Gładziejewski, 2019; Gordon et al., 2019; Harkness, 2015; Harkness & Keshava, 2017; Hohwy, 2015), i.e., an incomplete representation of a target mechanism in which some structural aspects of a mechanistic explanation are omitted (cf. Piccinini & Craver, 2011). Understood in this way, the sketch is defined in terms of functional roles played by the respective components, disregarding to some extent their biological or physical implementation. This raises the important question of how to understand the causal structure responsible for predictive mechanisms. It can be a simple multi-level hierarchy from simple neural levels of, e.g., pattern recognition, edge detection, color perception etc. (implemented in the early sensory system), to high-level neural representations (implemented deep in the cortical hierarchy [Sprevak, 2021]), to increasingly abstract and general levels related to Bayesian beliefs and concerning the general properties of the world; or it can be a subtler structure implemented by several different, partially independent mechanisms responsible for various phenomena.Footnote 1

The key to this type of practice is the recognition of cognition in the categories of mechanistic causal relations (cf. Gładziejewski, 2019, p. 665). Gładziejewski suggests that sketches of mechanisms provided by PP should be understood in the sense that these models “share common core assumptions about relevant mechanisms” but do not describe a single cognitive structure (mechanism). This means that “there are a couple of ways in which a collection of mechanisms that fall under a common predictive template could provide a schema-centered explanatory unification” (Gładziejewski, 2019, p. 666). This author points to four possible research heuristics which, by providing sketches, may allow the identification of actual mechanisms:

  1. 1.

    There are separate neural mechanisms that follow the same predictive scheme;

  2. 2.

    Different levels within one hierarchy can explain different cognitive phenomena;

  3. 3.

    Various aspects of PP mechanisms are explanatory, which means that for a given mechanism, certain aspects of its functioning may explain specific phenomena;

  4. 4.

    The ways in which distinct PP mechanisms become integrated may play explanatory roles (Gładziejewski, 2019, pp. 666–667).

Regardless of which of the indicated heuristics is actually employed by PP researchers (whether it be one or a combination of several), there is no doubt that many supporters of PP seek mechanistic explanations.

As can be seen, the thesis about the mechanistic nature of PP is already reasonably well-founded, but it seems that in light of the view advocated by some mechanists (cf. Bechtel, 2019; Winning & Bechtel, 2018), (at least) some mechanistic explanations should include constraints and flows of free energy as their constitutive component. This view, which I will refer to in this paper as the constraint-based mechanisms approach,Footnote 2 could be of great importance to many debates about PP and FEP theories as it allows for a rethink of the relationship between PP, FEP, and FEP-based Active Inference.

The possibility of a mechanistic integration of PP and the FEP has already been raised by researchers. Some responses have also been offered. There are authors who share the viewpoint that the FEP carries mechanistic implications for PP, asserting that the FEP can be treated as a heuristic guide or regarded as a regulatory principle. Supporters of the first position include Paweł Gładziejewski who, in his paper Mechanistic unity of the predictive mind, states that the FEP is “a powerful heuristic guide for the development of PP” but “only puts extremely general constraint on the causal organization of organisms, perhaps to the point of lacking any non-trivial commitments about it” (Gładziejewski, 2019, p. 664). Another supporter, Dominic Harkness, claims that “the upshot of this criticism lies within the free energy principle’s potential to act as a heuristic guide for finding multilevel mechanistic explanations” (Harkness, 2015, p. 2). Jakob Hohwy supports the second position, claiming that the “FEP can be considered a regulatory principle, ‘guiding’ or ‘informing’ the construction of process theories” (Hohwy, 2020, p. 39), meaning that the FEP provides “distinct process theories explaining perception, action, attention, and other mental phenomena” (Hohwy, 2020, p. 47).

However, some researchers are not convinced by the FEP or its explanatory relationship with PP. For example, Daniel Williams in his recent paper Is the brain an organ for free energy minimization? argues that “the claim that the FEP implies a substantive constraint on process theories in cognitive science—namely, that they must describe how the brain’s mechanisms implement free energy minimization—rests on a fallacy of equivocation” (Williams, 2021, p. 8). Similarly, Mateo Colombo and Patricia Palacios in their paper Non-equilibrium thermodynamics and the free energy principle in biology note that “because of a fundamental mismatch between its physics assumptions and properties of its biological targets, model-building grounded in the free energy principle exacerbates a trade-off between generality and biological plausibility” (Colombo & Palacios, 2021, p. 2). Colombo defends a slightly different position in a paper co-written with Cory Wright, where they take into account that the analysis carried out by the FEP’s supporters can be treated as sketches of mechanisms in the sense of Piccinini and Craver (2011). They do, however, only treat them as weak explanatory idealizations: “Some of the confusions in recent debates surrounding the FEP, organicism, and mechanism depend on indulging this sort of metaphysics without carefully considering the epistemic and pragmatic roles that ‘rampant and unchecked’ idealizations, like those involved in FEP, play in science” (Colombo & Wright, 2021, p. 3486).

In this paper I will take a different starting point. I want to demonstrate by reference to the constraint-based mechanisms approach, that the FEP offers an explanatory relevant (variational) constraint for the causal organization of any and all systems equipped with generative models, explained mechanistically by PP. In other words, I will claim that the FEP provides a constraint which determines PP’s scheme of mechanism.

This paper has the following structure: in Sect. 2, I present an overview of the PP and FEP frameworks and explain why, when analyzing predictive mechanisms, one should take into account the quantity described in the literature on the FEP as variational free energy (VFE). In Sect. 3, I sketch the new mechanical philosophy and its characteristic systems tradition, describing explanations in terms of the identification and decomposition of mechanisms. I also present the recent position based on mechanism, which I refer to as the constraint-based mechanisms approach and—characteristic for this approach—the so-called heuristics of constraint-based mechanisms. In the following part, I formulate a mechanistic interpretation of PP and wonder if it can meet the norm defined by the heuristics of constraint-based mechanisms. The context of the question is set by the discussion on the FEP and its explanatory relationship with PP. In Sect. 4, I discuss two main possible interpretations (realistic and instrumental) of the statement that self-organizing systems minimize VFE. Discussing them is important because it provides an initial answer to whether the FEP determines the energetic (in information-theoretic sense) constraint for mechanistic PP. In Sect. 5, I articulate the position of mechanistic realism, which asserts the feasibility of employing heuristics based on the constraint-based mechanisms approach. I argue that the interpretation of the FEP, which I called moderate realistic, is compatible with mechanistic realism. In Sect. 6, I discuss Karl Friston’s argument from Bayesian mechanics that VFE coincides with thermodynamic free energy (TFE). If Friston’s perspective is accurate, the FEP serves a similar explanatory role in elucidating living organisms as thermodynamics does in explaining physical systems. However, in this section, I reject Friston’s argument because of its instrumental character, which precludes mechanistic realism and the application of the heuristics of constraint-based mechanisms. As a result, in Sect. 7, I present an argument in favor of moderate realism regarding the FEP and FEP-based PP. This argument is supported by empirical evidence from investigations into neural computations and the thermodynamics of information. Next I discuss the ontological commitments of this position, and I also formulate a provisional response to the objections of those authors who deny explanatory value to the FEP. In the Conclusion, I summarize the analyses carried out.

2 Predictive processing and variational free energy

PP is a process theory of the brain that provides a computational model of cognitive mechanisms and core processes that underwrite perception and cognition. Some advocates of PP believe that it can be used to unify the models of perception, cognition, and action theoretically (Clark, 2013; Hohwy, 2015; Seth, 2015). Specific versions of PP are grounded in the same process of precision-weighted, hierarchical, and bidirectional message passing and error minimization (Clark, 2013; Hohwy, 2020). In this framework, perceptual and cognitive processes are conceived as being the result of a computational trade-off between (hierarchical) top-down processing (predictions based on the model of the world) and bottom-up processing (prediction errors tracking the difference between predicted and actually sensed data). A characteristic feature of this view is the assumption that, in order to perceive the world, the cognitive system must resolve its uncertainty about the ‘hidden’ causes of its sense states. This is because the causes of the sensory signals are not directly recognized or detected, but instead must be inferred by a hierarchical, multi-level probabilistic (generative) model. In PP, the activity of the brain (or cognitive system) is understood as instantiating or leveraging a generative model (cf. Clark, 2016), which is, generally speaking, a model of the process that generated the sensory data of interest. In short, PP purports to explain the dynamics of the brain by appealing to hierarchically organized bidirectional brain activity, cast as instantiating a generative model.

The generative model is defined as the joint probability of the “observable” data e—sensory state, and h—a hypothesis about these data (trees, birds, glasses etc.). In other words, a generative model is the product of p(h) (priors over states) and p(e|h) (likelihood of evidence probability if the hypothesis is true). This means that the generative model is a statistical model of how observations are generated (strictly speaking, a description of causal dependencies in the environment and their relation to sensory signal). It uses prior distributions p(h) (which determine the probability of hypothesis before evidence) that the system applies to the environment about which it makes inferences.

The model minimizes the so-called prediction errors, i.e., the differences between the expectations of the organism—its “best guess” about what would be the case (what caused its sensory states) and what the organism factually observes. To minimize prediction errors, the generative model continuously creates statistical predictions about what is happening or can happen in the world. This means that updating the likelihoods and priors based on prediction errors is a mechanism that can be described in terms of Bayesian inference, i.e., a statistical inference in which a Bayesian rule is used to update the probability for a hypothesis as more evidence or data becomes available.

Technically speaking, according to the Bayesian rule


the generative model p(h|e) calculates the posterior probability p(e|h), which in practice allows the system to assume the most probable hypothesis explaining the nature and causes of the sensory signal, taking into account the available sensory data.Footnote 3 This hypothesis enables the minimization of the long-term average prediction error (Hohwy, 2020). Moving from p(h|e) to p(e|h), i.e., inverting the likelihood map**, allows one to update beliefs from prior to posterior beliefs (Smith et al., 2022, p. 3). Proponents of the PP framework argue that the model approximates Bayesian inference rather than computing it exactly (cf. Clark, 2013). In PP, the model implements an algorithm that computes Bayesian inferences so that the prediction error is gradually minimized, which maximizes the posterior probabilities of the hypotheses.

This way, when the model minimizes the prediction error, it also minimizes a certain quantity that is always greater than or equal to the surprisal—negative log probability of an observation/outcome—the surprisal model itself cannot be minimized directly due to ignorance of the underlying causes of the sensory signals (Friston, 2009, p. 294). This quantity refers to the objective function that is known as VFE or an evidence lower bound (cf. Winn & Bishop, 2005). The introduction of VFE helps to convert exact Bayesian inference into approximate Bayesian inference.Footnote 4

Why is this important? Approximate Bayesian inference uses VFE minimization, which can be described as the difference between the approximate posterior distribution of the model and the target distribution. The introduction of an approximate posterior distribution over states, denoted q(e) (such that each q(e) ∈ Q is a possible approximation to the exact posterior distribution), makes simplifying assumptions about the nature of the true posterior distribution. By iteratively updating the approximate posterior (initially arbitrary), one can find a distribution that approximates the exact posterior. The next step is to measure the similarity between approximated p(h|e) and the true posterior p(e|h). Formally, this means minimizing the so-called Kullback–Leibler divergence (KL-divergence). It is important that KL-divergence cannot be directly estimated, and therefore the model must optimize a different function (i.e., VFE) which bounds the model evidence. The smaller the VFE, the smaller the KL-divergence. When KL-divergence is zero, then the distributions match. It gets larger the more dissimilar the distributions become. In variational inference, the model iteratively updates approximate posterior q(e) until it finds the value that minimizes VFE at which q(e) will approximate the true posterior p(e|h) (Smith et al., 2022; cf. Buckley et al., 2017).

The association of PP with VFE helps explain how the generative model minimizes prediction errors by Bayesian inference approximation, which can be interpreted as the way in which neural information processing mechanisms perform variational inference. This remark is crucial for further analyses.

To sum up: predictive mechanisms can be described in terms of the realization of variational principles (cf. Friston et al., 2017). In research practice, this means that in order to be able to concretize any variational inference algorithm, we must define the forms of the variational posterior and the generative model, which in the case of PP means (relying on the Laplace assumption) that posterior probability densities are normal (Gaussian). With this assumption in place, free energy can be viewed as the sum of the long-term average prediction error, which is supposed to be linked to the FEP (cf. Friston, 2010). It means that in the context of PP, the process involves the minimization of long-term average prediction error through the model’s optimization of the statistics of an approximate posterior distribution. Modelers postulate and refine this distribution to align with the desired target distribution (Millidge et al., 2021, p. 7). This is an important observation for the very understanding of PP because it allows us to think about the normative function of the predictive mechanisms, which is the long-term average precision-weighted error in terms of free energy minimization.

At this point, however, difficulties arise regarding the linking of the PP framework with the research framework motivated by the FEP. Before discussing them (cf. §4), it is necessary to at least briefly explain what the FEP is.

The FEP was introduced by Karl Friston and colleagues as a mathematical framework that specifies the objective function that self-organizing systems need to minimize in order to change their relationship with the environment and maintain thermodynamic homeostasis (Friston, 2009, 2010, 2012; Friston & Stephan, 2007; Friston et al., 2006; cf. Andrews, 2021). Originally, the FEP was a principle explaining how the sensory cortex infers the causes of its inputs and learns causal regularities. What distinguished the FEP from other theories of inference (cf. Gregory, 1966; Rock, 1983) is the fact that all cognitive processes and functions, not only perceptual, can be explained in terms of one unifying principle, which is the minimization of free energy (Bruineberg et al., 2021, p. 3; cf. Friston, 2010). Later, the validity of the FEP was extended from perception and action to organization of all self-organizing systems: from unicellular cells to social networks (cf. Friston, 2009, p. 293; 2013; Wiese & Friston, 2021).Footnote 5

According to the current formulation of this principleFootnote 6 any self-organizing system that is at a nonequilibrium steady-state (NESS) with its environment must minimize its free energy.Footnote 7 In other words, any “thing” that achieves NESS can be construed as performing a Bayesian inference with posterior beliefs that are parameterized by the thing’s (model’s) internal states. In other words, the FEP offers an interpretation of mechanical theories of systems as if they possess (Bayesian) beliefs (Ramstead et al., 2023, p. 2). This is related to the fact that the state flow of a given self-organizing system can be described as a function of their NESS density. The system, if it exists, can be described in terms of a random dynamic system (in terms of Dynamic System Theory—DST) that evolves, which means that it can be said to change over time, subject to random fluctuations. It must be added that any self-organizing system that is at NESS, i.e., one that has an attracting set, can be described in terms of Markov blankets (Friston, 2013; Friston et al., 2020; Wiese & Friston, 2021).Footnote 8

NESS density means a certain probability of finding it in a particular state when the system is observed at random (Friston et al., 2020, p. 4). In this sense, everything that exists is characterized by properties that remain unchanged or stable enough to be measured over time. In other words, this means that the states of a given system behave as if they are trying to minimize exactly the same quantity: the surprisal of states that constitute the thing, system, and so on. That is, everything that exists will act as if to minimize the entropy of its particular states over time. Thus, open systems that are far away from equilibrium resist the second law of thermodynamics (Friston & Stephan, 2007; cf. Davies, 2019; Ueltzhöffer, 2019). What exists must be in a sense self-evidencing, meaning that it must maximize a particular model evidence or equivalently minimize surprisal (cf. Hohwy, 2016). This way, according to Friston and colleagues, it is possible to interpret the flow of (expected) autonomous states of the model as a gradient flow on something what we know as VFE,Footnote 9 and at the same time allows us to think of systems that have Markov blankets as “agents” that optimize the evidence for their own existence. In this sense, their internal states with the blanket surrounding them are (in some sense) autonomous (Kirchhoff et al., 2018, p. 2; cf. Friston et al., 2020). Autonomy understood in this way allows us to think of “agents” as adaptive systems, where adaptivity refers to an ability to operate differentially in certain circumstances. This means that a system that is not adaptive, suggesting that it does not have a Markov blanket and cannot exist.Footnote 10

On the basis of the conducted analyses, it can be concluded that the FEP, as a formal statement—the existential imperatives for any system that manages to survive in a changing environment—can be treated as a generalization of the second law of thermodynamics to NESS (Parr et al., 2020). In that sense, the FEP is true for any bounded stationary system that is far from equilibrium, because the FEP applies to all self-organizing systems at NESS (meaning that the FEP applies to all systems equipped with the generative model because NESS density can be described in the terms of generative model [Friston, 2019, p. 89; cf. Sakthivadivel, 2022]).Footnote 11

3 Systems tradition of mechanistic explanation and the constraint-based mechanisms approach

In §1, I drew attention to the fact that many researchers either have doubts about the actual integration of PP with the FEP—where the FEP would offer an explanatory significant contribution to the mechanistic PP (cf. Gładziejewski, 2019; Harkness, 2015; Hohwy, 2020), or even negate such a possibility (cf. Colombo & Palacios, 2021; Colombo & Wright, 2021; Williams, 2021). In this paper, I propose a different research perspective, according to which the FEP imposes an explanatory relevant informational constraint (i.e., VFE) on the mechanistic architecture postulated by PP. In order to justify this view, I will refer to the position I call the constraint-based mechanisms approach. Before I develop my argument, however, it is necessary to explain, albeit briefly, what this approach is.

Scientific research can be described in terms of discovering and describing mechanisms. In many fields of science, it is assumed that in order to formulate a satisfactory explanation of the phenomenon under study, one needs to provide a decomposition of its mechanism. Mechanistic explanations are used with great success in neuroscience as well as in biological, physical, and social sciences (cf. Glennan & Illari, 2018). This new mechanistic explanatory program became the dominant view across many debates in the philosophy of science (Bechtel, 2008; Bechtel & Richardson, 1993/2010; Craver, 2007; Craver & Darden, 2013; Machamer et al., 2000).

The introduction of a new mechanism comes with the assumption that a distinction should be made between explanations which are componential or constitutive and etiological explanations, which explain a phenomenon by describing its antecedent causes. Constitutive explanations detail a phenomenon by describing its underlying mechanism, i.e., the relation between the behavior of a mechanism as a whole and the organized activities of its individual components is constitutive (cf. Salmon, 1984).Footnote 12 The latter’s explanations assume a strategy of decomposing high-level cognitive capacities into components that are responsible for various information processing operations, and then using various computational models, showing how these operations together explain a given phenomenon. Decomposition is a characteristic determinant of the ‘systems tradition’ (Craver, 2007; cf. Bechtel & Richardson, 1993/2010; Cummins, 1975; Fodor, 1968; Simon, 1969). In this tradition, explanation is understood as a matter of decomposing systems into their parts to show how those parts are organized in such a way to emphasize the explanandum phenomenon.

Systems tradition is currently the dominant approach to explanations formulated in biology, system research, and cognitive neuroscience, while decomposition is the central heuristic strategy in mechanistic explanations besides the identification of mechanisms (Bechtel & Richardson, 1993/2010; cf. Bechtel, 2008; Craver, 2007; Illari & Williamson, 2013). However, the mechanistic view of explanation has met with controversy (cf. Koutroufinis, 2017; Silberstein & Chemero, 2013). Moreover, some authors defend dynamical explanation as an alternative to mechanistic explanation (cf. Stepp et al., 2011).Footnote 13

3.1 What about constraints?

Some researchers (cf. Bechtel, 2018, 2019, 2021; Bechtel & Bollhagen, 2021; Winning, 2020; Winning & Bechtel, 2018) point out that the decomposition strategy, as understood by mechanism, assumes that there is a composition or causation relationship (i.e., causal production) between processes present in mechanisms (where one process, an organized set of causal processes is “responsible for” the implementation of another). Such a view, however, ignores two important features of cognitive mechanisms:

  1. 1.

    Mechanisms of this kind primarily act to control production mechanisms, i.e., mechanisms which are responsible for bodily movement and physiological processes. This type of relationship can be called control, and it is as important for the understanding of the nature of mechanisms and their explanations as the relationships of causation and composition (Winning & Bechtel, 2018, p. 2). These are, therefore, mechanisms that help to maintain the internal environment of the given organisms. The analysis of control mechanisms is important because they allow organisms to quickly adapt to their environment. Therefore, they perform an important adaptive function and are responsible for the autonomy of the individual, as they contribute to the maintenance of the existence of a given organism. In this sense, they are normative because they contribute to the self-maintenance that is the norm of autonomous living systems (cf. Bickhard, 2003). Self-maintenance is the norm (what is good or bad for the system) in the sense that it “is not externally interpreted or derived from an adaptive history but defined intrinsically by the very organization of the system” (Barandiaran & Moreno, 2006, p. 174);

  2. 2.

    High-level cognitive mechanisms are components of a highly developed and complex network of heterarchically organized control systems whose aim is to perform a given cognitive task (Bechtel, 2019, p. 621, cf. Pattee, 1991). By heterarchical organization, I mean a such distributed causal network in which a given (production) mechanism is regulated by multiple (control) mechanisms without these control mechanisms being themselves subsumed under a higher-level controller. This means that their organization is horizontal and not vertical, as is the case with hierarchical organization (cf. Bechtel & Bich, 2021).Footnote 14

These features (1) and (2) are extremely important and their omission in explaining cognitive mechanisms makes these explanations incomplete, violating the standard of mechanistic explanations (Craver & Kaplan, 2018). This may result in “incorrect accounts of cognition” (Bechtel, 2019, p. 621).Footnote 15 Taking account of these two aspects of cognitive processes, i.e., their function in the production of control mechanisms and their non-autonomous character, leads to the conclusion that their explanation should also cover other components (some of which are flexible and able to be operated on and altered by other mechanisms) than those previously considered.Footnote 16 This means that the mechanisms are organized not only in terms of production and composition, but also in terms of control. Such a view thus presupposes a revision of the systems tradition in which “processes are controlled by other processes, and mechanisms are controlled by other mechanisms, often hierarchically” (Winning & Bechtel, 2018, p. 3).

A drift from the classical understanding of systems tradition does not mean a departure from the norms of mechanistic explanations, but rather their extension and the recognition that the concept of constraint is also important from the explanatory perspective. The concept of constraint comes from classical mechanics. It was used to describe the reduction of the degree of freedom available to components organized into macroscale objects. Constraints define some limits on independent behavior but also create possibilities (Hooker, 2013). For example, in contexts where there is a source of (thermodynamic) free energy, constraints can be used to direct the flow of this energy. This means that elements of biological mechanisms can be used to limit the flow of available free energy so that work is done (which can be used to generate particular phenomena). Some (control) mechanisms are therefore systems of constraints that restrict the flow of free energy to perform work. Therefore, the operation of control mechanisms leads to such behaviors or physiological processes that would not be possible if not for the changes that constraints make in the mechanisms of production. Controlling the production mechanisms is essential because they are constrained to do work as long as free energy is available. The same is true for artifacts. For example: turning on the on/off switch enables the user of a given machine to control it so that it can use energy and carry out its design activities (Bechtel, 2019, p. 623).Footnote 17

Constraints understood in this way do not only (or at all) function as the context or background conditions in which a given mechanism is implemented, but most of all they are its constitutive (in the sense of being responsible for producing a given phenomenon, resp. mechanism) component because “mechanical systems inherently contain a ‘thicket’ of constraints” (Winning, 2020, p. 20).Footnote 18

Bechtel (2018, 2019, 2021), Bechtel and Bollhagen (2021), Winning and Bechtel (2018), and Winning (2020) emphasize the need to refer to constraints, linking them with the necessity to include both constraints and energy flows as those elements which, apart from entities and activities, are relevant for the explanation of mechanisms at higher levels of organization.Footnote 19 It is the constraints and the flows of free energy that make living organisms “dissipative structures”,Footnote 20 which means “that they actually use the second law of thermodynamics to their advantage to maintain their organization” (Winning & Bechtel, 2018, p. 3; cf. Moreno & Mossio, 2014). This way, living organisms—unlike most “things”—develop while maintaining their autonomy, rather than being degraded by the flow of energy and interaction with the environment.Footnote 21

Biological mechanisms derive their causal efficacy from being constrained systems: “An active causal power exists when a system within a larger system is internally constrained in such a way as to externally constrain under certain conditions” (Winning, 2020, p. 28). In other words, constraints determine the causal powers of mechanisms in such a way that they direct the flows of free energy so that biological systems may remain in a state of energy non-equilibrium with the environment. Such mechanisms are part of a heterarchical network of controllers that guarantees the biological autonomy of a given system. Based on this, mechanisms are systems of constraints that restrict the flow of free energy to perform work (Bich & Bechtel, 2021, p. 2).

Mechanisms are active and serve to maintain the autonomy of biological systems as a result of the constrained flows of free energy. Including these kinds of constraints in the explanation of activities means breaking with the standard account of mechanistic explanation (systems tradition).Footnote 22 If the energetic dimension is ignored, “at some point, such research typically bottoms out” and “this process leaves the active nature of activities unexplained” (Bechtel & Bollhagen, 2021, p. 17) because “a completely unconstrained system will have no behaviors; it would simply be a disorganized motion of particles” (Winning & Bechtel, 2018, p. 7). The approach that takes into account the need to refer to constraints and flows of free energy will be referred to as the ‘constraint-based mechanisms approach’ and its postulate as heuristics of constraint-based mechanisms. It is important to emphasize that this approach is not so much a break with the systems tradition, but its significant modification.Footnote 23

3.2 What about predictive processing?

In §1, I have already discussed the mechanistic commitments of PP. We can now take the next step. From the point of the view of the constraint-based mechanisms approach we should note that, if PP explains its phenomena mechanistically, then it is legitimate to ask whether the mechanistic explanations based on the PP framework include constraints and the energy dimension as their constitutive component. This is not a trivial or secondary question, because, according to the heuristics of constraint-based mechanisms, mechanistic PP should also include energy processes. This case is not obvious. Let us note, however, that there are indications that the above heuristic is used by researchers working in the PP framework.

On the one hand, many of PP’s supporters use the term “constraint” in their considerations to refer to perceptual inference in the brain. For example, “the only constraint on the brain’s causal inference is the immediate sensory input” (Hohwy, 2013, p. 14), but “immediate sensory input is not the only constraint; there are, in addition, general beliefs about the world, specific hypotheses about the current state of the world, and ongoing sensory input” (Anderson, 2017, p. 3) and “perceptual experience is determined by the mutual constraint between the incoming sensory signal and ongoing neural and bodily processes, and no aspect of that content can be definitively attributed to either influence” (Anderson, 2017, p. 17). It is also worth adding that the levels of bidirectional hierarchical structure are constraints for each other (Clark, 2013, p. 183; cf. Gordon et al., 2019). Conversely, some have suggested that “without independent constraints on their content, there is a significant risk of post hoc model-fitting” (Williams, 2020, p. 1753). However, it is not clear in what sense these authors use this term and whether they use it in the same way.Footnote 24

These various uses of the concept of constraint are difficult to relate directly to the understanding of constraints as control mechanisms, which I defend in this paper. The constraints discussed by these authors, however, reveal the non-trivial commitment of PP. Namely: the functioning of predictive mechanisms depends on the existence of various types of constraints, which on the one hand limit the content of the generative model, and on the other hand, enable its adaptation to the environment, making it an effective adaptive tool to maintain the autonomy of the organism. The perspective I defend allows us to specify the functions of constraints in PP and to study them in a more systematic way. What is important is the question of how certain constraints are constitutive of predictive mechanisms. In other words, the point is to demonstrate how such and such organization of predictive mechanisms constrains free energy so that it is possible to perform the work required to generate particular phenomena, resp. predictions.

On the other hand, broadly speaking, we have to note that the findings within the FEP and NESS mathematics (expressed in the language of DST)—according to which, if something exists then it must exhibit properties as if it is optimizing a VFE—look like they coincide with the heuristics of constraint-based mechanisms whereby mechanisms are active and serve to maintain the autonomy of biological systems as a result of the constrained release of free energy. It seems that mechanistic PP should take into account the energetic dimension of predictive mechanisms. Is it really so? The full answer to this question depends on further empirical solutions, and it is certainly not only an a priori answer. Nevertheless, I argue that if the arguments presented above are correct, then it should be asked if FEP-based PP meets the requirements of the constraint-based mechanisms approach and allows one to think of predictive mechanisms as constitutive control mechanisms for autonomous systems armed with a generative model. I will devote my further analysis to answering this question.

4 What does it mean for the system that it minimizes variational free energy?

The connection between PP and the FEP raises a number of doubts, which can be reduced to two main issues: (1) the very interpretation of the FEP as a principle of modeling self-organizing systems armed with generative models; (2) the question of how the FEP determines the energetic (in the information-theoretical sense) constraint for the mechanistic PP. Let me start by outlining the first difficulty. I will devote another section to the second.

I stated earlier that under the mathematical framework of the FEP, PP looks like it coincides with the heuristics of constraint-based mechanisms. But why do I use the terms “looks like” and “as if”?Footnote 25 I do it because this is how some proponents of the FEP define its application to autonomous systems: “physical systems that look as if they encode probabilistic beliefs about the environment”; “self-organising system that looks as if it is modelling its embedding environment” or “all systems that look as if they engage in inference” (Ramstead et al., 2023, pp. 1, 2, 18) and so on. What does the phrase “as if” mean? Simon McGregor defines its use as follows: “To say that something behaves ‘as if’ it has property X usually implies that it does not, in fact, have property X. However, there is clearly a sense in which a system possessing property X must also behave as if it had property X; it is in this, less restrictive, sense that we intend the phrase ‘as if’. In other words, we classify both the regulation of temperature by a thermostat, and also the pursuit of prey by an eagle, as ‘as if’ agency” (McGregor, 2017, p. 72). McGregor distinguishes between two senses of “as if”. In the first one (“instrumental”), the system can be described as if it had a given property, even though it does not actually have it, and in the second (“realistic”), it can be described as if it had a given property precisely because it has it.Footnote 26

This duality allows us to see that the use of the phrase “as if” in relation to systems that are supposed to minimize VFE can be interpreted in at least several ways: from the realistic interpretation, where VFE is a quantity (or means a quantity) that is minimized by biological systems that maintain their organization – in this approach, VFE cannot be reduced to researches’ construction or explained only in terms of the practice of modelingFootnote 27; to various anti-realistic or instrumental interpretations in which the FEP is a construction devised by scientists to describe the dynamics of any self-organizing system that is at NESS with its environment without any implications for their actual causal structure. In this approach, VFE looks like a quantity that relates to the models made by scientists, while the FEP serves to designate a model structure on the basis of which specific models are constructed (cf. Andrews, 2021).Footnote 28

The discussion so far concerning the ontological and epistemological commitments of the FEP is rich. It is worth mentioning the papers of Andrews (2021, 2022), Bruineberg et al. (2021), Kirchhoff et al. (2022), Ramstead et al. (2001).Footnote 37

I argue that both empirical and formal findings will most probably determine that there are such phenomena (e.g., the neural computations performed by brains), the explanation of which, according to the constraint-based mechanisms approach, should take into account the energetic constraint of VFE. Otherwise, such an explanation fails to capture the characteristic properties that distinguish the biotic systems that are at NESS from those that can be thermodynamically described as a heat bath.

7.2 Ontological commitments of the moderate realism

If it is true that the free energy flows constitutive of the active mechanisms can be described in terms of minimization of VFE, then it seems that there are no formal obstacles to acknowledging that the mechanistic decomposition of generative models minimizing the average prediction error should refer to the minimization of VFE as a constitutive constraint for these mechanisms. For this reason, I argue that one should adopt moderate realism about the FEP and PP. Its legitimacy is supported by explanatory considerations, integration possibilities regarding PP and perhaps other research frameworks, as well as relatively weak ontological commitments regarding the architecture of target phenomena. Moderate realism allows one to maintain the quantity of VFE without incurring the debts of adopting instrumentalism.

Let’s take a closer look at these ontological commitments that result from adopting moderate realism about PP and FEP, resp. VFE. Firstly, this position assumes that formal structures such as generative models, VFE or TFE, are interpreted as part of explanations in the ontic sense, i.e., the exhibitions “of the ways in which what is to be explained fits into natural patterns or regularities … [and] usually takes the patterns and regularities to be causal” (Salmon, 1984, p. 293, cf. Craver, 2013). In this sense, moderate realism corresponds to mechanistic realism and the constraint-based mechanisms approach. In practice, this means that moderate realism does not map literally the formal structure (generative model or Bayesian network) onto the target phenomena, which would involve committing the literalist fallacy, but assumes that there are structures that cannot be reduced solely to the aggregation of causes and which implement some causal mechanism that can be described (approximately) in terms of generative models minimizing VFE, resp. long-term average prediction error. Therefore, it is important to assert that the formal structures (Bayesian modeling in our case) are such and such, because the world has genuinely causal structures, at least some of which are entities and activities organized to form mechanisms responsible for the phenomena that are described in terms of Bayesian optimization.

This view can be further elucidated through the findings of Kirchhoff, Kiverstein, and Robertson. These authors state that realism in science does not mean that all entities postulated by a given theory or model are literally true (Kirchhoff et al., 2022, p. 12). A theory may incorporate both “OK-entities” (such as electrons and similar entities) and “supposedly non-OK-entities” (such as numbers or theoretical ideals) (Psillos, 2011, p. 6). Consequently, it is important to acknowledge that each model includes parts that are fictional entities, which bear resemblance to target systems in various ways. These fictional entities facilitate the understanding of real system dynamics within the model (Kirchhoff et al., 2022, p. 13), but they do not themselves represent specific causal structures in a literal sense. The expectation of a literal interpretation of fictional entities gives rise to the literalist fallacy, as mentioned earlier. One such fictional entity is VFE. Therefore, process theories like PP should be viewed as approximations of the actual causal structures or patterns in the world. They are approximations due to the inherent complexity of target systems. Hence, I argue that moderate realism posits that a given model fits the data without a literal map**. Instead, it is approximately true in relation to the data (cf. Kirchhoff et al., 2022, p. 16; Stanford, 2003).

Let us now delve into the relationship between the FEP and PP. Friston argues that Bayesian mechanics provides a “formal description of lifelike particles” (Friston, 2019, p. 1). This means that the Bayesian mechanics, by establishing a relationship between TFE and VFE, tells researchers something about mathematical models, i.e., formal structures, and only about them. Consequently, process theories such as PP are indispensable for addressing target phenomena. In line with the stance I advocate, the existence of control mechanisms that constrain the flow of free energy (both in terms of TFE and VFE) enables the formulation of theorems regarding the interplay between state theory (the FEP) and process theory (PP). Therefore, it is crucial to distinguish between three distinct elements: the FEP as a formal principle, PP as a computational modeling framework grounded in this formal principle, and the biological systems that PP is employed to model, which are independent of the FEP.

How, then, is the transition from the FEP to target phenomena possible? On one hand, if the view presented in this paper is correct, mechanistic PP, employing the heuristics of constraint-based mechanisms, is utilized to model control mechanisms and systems. One such control system is the brain, modeled by predictive coders as a hierarchical generative model that approximates Bayesian inference. On the other hand, the relationship between VFE and TFE established by Bayesian mechanics informs us about target phenomena because computational models of these systems in PP are constructed using the mathematics of the FEP. Ultimately, this implies that the position of moderate realism concerns not only the FEP and Bayesian mechanics themselves, but rather the application of the FEP in a specific process theory, such as PP, which is a concrete FEP-based model. It is important to note that the FEP, as a formal principle, does not imply any ontological commitments or resolutions (cf. Andrews, 2021).Footnote 38 These commitments and resolutions arise at the level of applying the FEP through a particular process theory. The use of the constraint-based mechanisms approach justifies why such an understanding of PP should be interpreted in terms of moderate realism.

There are also further benefits of the FEP and PP interpretation presented here. According to the position defended by Friston, FEP is a (normative) state theory that things may or may not conform to it, and PP is a process theory—a hypothesis on how that principle is realized (Friston et al., 2018, p. 21). It means that PP as the process theory provides “a possible (mechanistic) story about how the FEP is implemented in real-world, target systems” (Kirchoff et al., 2022, p. 6).Footnote 39 The proposed mechanistic integration of PP with FEP reveals that the FEP serves as a normative theory for PP, setting a norm that mechanistically non-trivial PP models should strive to meet, assuming the utilization of the constraint-based mechanisms approach and its heuristics. According to this norm, PP models should have an energetic component if they are to be mechanistic.Footnote 40

The view I defend can be treated as a voice in the discussion on the status of PP and its relation to the FEP, because FEP not only constrains the space of possible algorithms for PP (cf. Spratling, 2017), but also indicates energetic constraint for the causal organization of all autonomous systems, including those that are armed with generative models and are or should be the subject of (mechanistic) explanations formulated on the basis of PP. In practice, this means that all autonomous systems that can be described in terms of (Bayesian) generative models realizing updating priors and likelihood based on (average) prediction error should be treated as if they approximate Bayesian inference constrained by VFE. In other words: FEP offers a normative framework for the PP process theory, and that the PP explains the (biologically reliable) implementation of the FEP in terms of hierarchical and heterarchical active mechanisms that implement the generative model.

7.3 Why the free energy principle is not a heuristic or a regulatory principle or an idealization

The analyses carried out in this paper allow to refer to various positions concerning the explanatory status of FEP and its relation to PP. If the approach proposed here is valid, it has certain consequences for a number of discussions among PP and FEP researchers (see §1). Due to the limited space, I can only give provisional answers to the questions raised.

Foremost, I think that the presented approach allows for a new way of describing the PP-FEP relationship. If the FEP refers to self-organizing adaptive systems, as described in DST and that are at NESS with their environment, then with the appropriate interpretation of the notion of mechanism, dynamical FEP models may in fact turn out to be descriptions of mechanisms: “dynamical models and dynamical analyses may be involved in both covering law and mechanistic explanations—what matters is not that dynamical models are used, but how they are used” (Zednik, 2008, p. 1459).Footnote 41 In this view, the FEP provides specific constraint for a PP’s scheme of mechanism.

Therefore, it is a stronger commitment than that suggested by Gładziejewski (2019) and Harkness (2015), stating that the FEP offers (only) heuristics. The approach I propose suggests that the FEP is not so much a heuristic that can aid the process of designing experiments or constructing a space of possible mechanisms, but above all points to a constitutive constraint—VFE, which is needed “not just for mechanisms to perform work, but also to maintain the mechanisms themselves” (Winning & Bechtel, 2018, p. 11). VFE as a constraint determines the causal powers of mechanisms in such a way that the flows of (variational) free energy guarantee that biological systems may remain in a state of energy non-equilibrium with the environment. Such mechanisms are part of a heterarchical network of controllers that guarantees the biological autonomy of a given system. From this point of view, biotic mechanisms are systems of constraints that restrict the flow of free energy to perform work.Footnote 42

For the above reasons, it is also difficult to agree with Hohwy’s thesis that the FEP is a regulatory principle. Surely Hohwy is right when he states that the “FEP itself (does not) implies cognitive architecture” and adds that “notions of architecture will need to build on assumptions about the particular system in question, which will constrain processes for message passing structure” (Hohwy, 2021, p. 47). However, the constraint relationship is reciprocal: on one hand, a particular system constrains flow of VFE, and on the other hand, those flows constrain the system to perform given work. Therefore, the FEP, as an explication of the dynamics of flows of VFE, possesses a specific explanatory power in the explanation of cognitive phenomena, distinct from its regulatory function. Therefore, it is agreeable to conclude, following Tomasz Korbak, that the FEP can be regarded as a functional principle that offers a general framework for understanding the mechanisms involved in free energy minimization, which can then be further specified through concrete models applied to specific phenomena (Korbak, 2021, p. 2754).

It seems that these considerations may also shed some light on a number of critical works concerning either the FEP itself or its relationship with the PP. In Introduction, I referred to the papers of Williams, Colombo, Palacios and Wright. Let us recall: Colombo and Palacios (2021) emphasize that there is an inalienable tension between the “physics assumptions and properties of its biological targets”, which in practice makes it impossible to use the FEP to explain living organisms or, in other words, to integrate it with models developed by mechanists and/or organicists (cf. Colombo & Wright, 2021). This objection seems to be thwarted by emphasizing, as I do in my paper, the mechanistic status of explanations of biological phenomena offered in terms of constraints and free energy flows. If, for living organisms, autonomy is a constitutive property (cf. Moreno & Mossio, 2014; Ruiz-Mirazo & Moreno, 2004; Varela, 1979), then the FEP—contrary to what Colombo and Palacios claim—offers specific constraints to mechanistic explanations formulated on the basis of biology and neuroscience, in the sense that it allows one to treat descriptions, using the language of DST, as sketches of mechanisms.

From this perspective, it is also difficult to agree with the belief of Colombo and Wright that the FEP offers a weak explanatory idealization. Even if, as these authors claim, the analyses carried out by FEP supporters can be treated as (weak explanatory) sketches of mechanisms, then in the light of the constraint-based mechanisms approach and arguments presented here, sketches of free energy flow mechanisms can be used in the formulation of schemes of mechanisms with specific explanatory powers.

Finally, let’s note that conducting a detailed discussion that addresses all the aforementioned positions and responds to every objection exceeds the scope of the intended framework for this analysis. Nevertheless, I believe that the general direction of the response has been set.

8 Conclusions

In this paper, I defended the view that the FEP indicates an explanatory relevant constraint (i.e., VFE) for cognitive mechanisms that can be mechanistically explained by PP. The arguments made here were based on the postulate of some mechanists about the need to include in the explanations such constitutive components as constraints for mechanisms and free energy flows. I found that the position defined by me as the constraint-based mechanisms approach has important implications for PP, because the actual research practice in this framework corresponds to the heuristics of constraint-based mechanisms and is related to those approaches that assume the FEP to be a normative framework for the process theory realized by PP. According to the presented approach, non-trivial PP models should include an energetic component, if they are to be mechanistic. The discussion presented here has great importance for considering the relationship between PP, the FEP, and Active Inference.

The advantage of the position I defend—moderate realism about the FEP and PP —is, firstly, that it implies only minimal commitments regarding the architecture of target phenomena; and secondly, it does not reduce the constructions used by scientists to their purely instrumental functions, recognizing them, for example, as useful fictions. I argue that the approach presented here may also contribute to the formulation of a mechanism scheme, which would be defined by a common predictive template combining various mechanisms under one PP flag. Last but not least, this approach (I believe) also enables fruitful discussions with those researchers who regard the FEP as an explanatory weak heuristic, idealization or regulatory idea, as well as with those who deny any explanatory power to the FEP.