Introduction

Animals adjust to changes in their environment using a variety of mechanisms during their lives. Behavioural plasticity allows animals to adjust their behaviour in response to environmental change, and the diversity of mechanisms that explain behavioural plasticity are the focus of much research in many fields (Lee & Thornton, 2021). In neurosciences and comparative psychology, reversal-learning tasks are widely used to assay the executive cognitive function ‘cognitive flexibility’, defined as the ability to update behaviour in accordance with changes in stimulus-reward contingencies (Izquierdo et al., 2017). In these tasks, individuals first need to learn that one stimulus is associated with a reward while another (or several others) is not. The contingencies are then reversed, so that the previously rewarded option becomes non-rewarding, and a previously non-rewarding stimulus becomes associated with a positive outcome.

Increasingly reversal-learning tasks are being applied to evolutionary ecological studies of individual cognitive performance in wild populations (reviewed in: Boogert et al., 2018; Morand-Ferron et al., 2016). Cognitive flexibility could be important for tracking changing environments (Shettleworth, 2010; Sol, 2009). For instance, lizards from suburban areas, where ecological conditions vary spatially and temporally, were quicker to learn and reversal learn an association between refuges and predatory cues than lizards from rural areas (Batabyal & Thaker, 2019). Performance in serial reversal learning, in which reward contingencies were repeatedly switched, was predicted by an index of sociality among three species of corvids, suggesting that dynamic social interactions might select for greater cognitive flexibility (Bond et al., 2007). Reversal learning performance did not significantly predict overwinter survival in wild mountain chickadees Poecile gambeli (Sonnenberg et al., 2019), but negatively impacted survival after release in captive-raised pheasant Phasianus colchicus (Madden et al., 2018). Whether and how natural selection acts on the observed individual variation depends on the mechanisms underlying reversal-learning performance, which in reality is a composite trait (Nilsson et al., 2015). In neuroscience, identifying the underlying mechanisms of reversal-learning performance helps to identify pathologies associated with cognitive inflexibility, and to examine the efficacy of drugs administered to alleviate specific issues (Nilsson et al., 2015). In evolutionary ecological studies it is also potentially important to do so because the individual mechanisms could vary in their additive genetic variation and in their links with fitness through independent associations with other behaviours. However, few ecological studies have examined what drives variation in reversal-learning. A series of studies on mountain chickadees examined spatial reversal-learning performance at an array of eight feeders in the field, and specifically examined proactive interference errors, i.e., visits to the previously rewarded option (Croston et al., 2017; Tello-Ramos et al., 2019). While these studies indicated that proactive interference can vary among populations (but see Hermer et al., 2021), we still have little information on individual variation, and no repeatability estimates, for proactive interference in natural populations.

Studies on both humans and non-human animals suggest that reversal-learning tasks involve at least four cognitive phenomena: (1) the formation of an excitatory association between the new rewarded stimulus and the reward; (2) the formation of an independent, inhibitory association, between the newly non-rewarded stimulus and lack of reinforcement; (3) attention allocation to the relevant stimulus dimension (e.g. colour, space); and (4) proactive interference, in which the similarity of the previous and new association to be learned results in perseverative choices towards the previously rewarded stimulus, which may prevent or slow down new learning (Crossley et al., 2019; Lewis & Kamil, 2006), although, counter-intuitively, over the course of several reversals proactive interference may in fact reduce perseveration and speed up reversal learning (Mackintosh et al., 1968; Strang & Sherry, 2014). While the first three processes are common to discrimination and reversal learning, proactive interference is specific to reversal learning, and plays an important role in cognitive flexibility (Nilsson et al., 2015). Proactive interference is itself likely an outcome of multiple cognitive mechanisms (Anderson & Neely, 1996); for simplicity, here we refer to the outcome of choosing a previously rewarded stimulus as proactive interference, and acknowledge that the specific mechanism responsible for this phenomenon is unknown in our study system.

When more than two options are available in a reversal-learning experiment, an additional process that may affect reversal-learning performance can be identified: exploration behaviour or information sampling, hereafter, ‘sampling’, by examining choices made for unrewarded stimuli (e.g., Jentsch et al., 2002). Optimal performance in reversal tests is attained with a win-stay/lose-shift rule: stay with the rewarded option until it stops paying, then shift to another alternative until you find a reward (Shettleworth, 2010). Sampling a non-rewarded option is typically viewed as a ‘learning error’, but this overlooks the possibility that this behaviour may allow the animal to obtain information on the characteristics of the alternatives, the value of which may change – and potentially increase – over time. Individuals who make many visits to non-rewarded options may tend to favour sampling over exploitation, a strategy that could be adaptive in natural conditions (Reader, 2015). For instance, bees that made more errors during learning were faster to discover new food sources (Evans et al., 2017). Sampling allows for tracking changes in the environment (Shettleworth et al., 1988), but repeatable individual differences in such behaviour have been reported in only a few studies (Morand-Ferron et al., 2011; Smit & van Oers, 2019), and have not been investigated in the context of reversal-learning tasks in wild populations. We thus know very little about the impact of sampling tendencies on reversal performance in non-human animals. Information reduces uncertainty and is thus expected to increase fitness (Dall et al., 2005). Here we predict that individuals who sample at a low rate (which we term ‘sampling quantity’), and those that sample a narrower range of options (which we term ‘sampling bias’) are less able to reduce uncertainty about the available options, resulting in reduced reversal-learning performance.

We used the data generated by a published reversal learning experiment on free-ranging great and blue tits foraging in mixed-species flocks (Reichert et al., 2020). Both species readily visit feeders and interact with cognitive tasks. Although Reichert et al. (2020) showed substantial individual variation in learning speed, and a bias towards previously rewarded feeders indicative of proactive interference, the role of sampling and proactive interference on learning performance were unexplored. The experiment used a linear array of five automated feeders, thus allowing an opportunity to examine sampling separately from proactive interference (multi-option test; Izquierdo et al., 2017). Trials were run on the same individuals over four different phases – habituation, initial learning, first reversal learning, and second reversal learning – which allowed us to examine the repeatability of individual behaviour expressed over time. The effect of sampling in preceding stages could be examined for the last three phases, while a test for an independent effect of proactive interference on the second reversal learning was possible using proactive interference measured during the first reversal learning. Thus, we explored the following questions: (1) Are learning speed and reversal learning speed associated with sampling quantity and sampling bias? If greater sampling provides individuals with more information about the environment, then we would expect that individuals that sample more would be faster to respond and learn more quickly when reward contingencies change in the reversal learning experiments. (2) Does reversal learning speed correlate negatively with proactive interference? If individuals tend to visit previously rewarded feeders more extensively, then we would expect them to take longer to learn an association with a new rewarding feeder. (3) Are individual differences in proactive interference, sampling quantity, and sampling bias consistent over time during the learning trials, that is, are they repeatable? (4) And in the case of sampling quantity and bias, are they also repeatable across the two different contexts of learning and the prior habituation phase? (5) Is the repeatability of reversal-learning performance explained by proactive interference, sampling quantity or sampling bias? If these effects explain among individual differences in performance during the reversal learning trials, then their addition to a model should lead to a lower ‘adjusted repeatability’ estimate (Nakagawa & Schielzeth, 2010). If they are entirely unrelated to among-individual differences in reversal-learning performance, there should be no change. However, it is possible that proactive interference, sampling quantity and sampling bias could mask individual variation in performance, in which case accounting for these variables would lead to a higher adjusted repeatability estimate.

Methods

Study site and species

The data for this experiment came from a study of associative learning in a wild population of PIT-tagged (passive integrated transponder) great tits, Parus major, and blue tits, Cyanistes caeruleus, in Wytham Woods, Oxfordshire, UK. For full details on the study design, see Reichert et al. (2020). Briefly, we set out arrays of five programmable sunflower-seed feeders in each of eight locations within the wood (four locations in November-December 2017, and four different locations in January-February 2018). At each site, feeders were arranged linearly and spaced at 1-m intervals. Feeders were equipped with a radio-frequency identification (RFID) antenna placed on a perch in front of the only feeder opening. Access to the feeders was controlled by a solenoid connected to a printed circuit board (‘Darwin Board’, Stickman Technologies Inc., UK). If a PIT-tagged bird was detected by the antenna at the perch, the solenoid blocking the door was released, and a transparent door could be pecked open to gain access to the sunflower seeds. The board recorded the RFID code and time of day of each individual visit.

The project received ethical approval from the Animal Welfare Body at University College Cork (HPRA license number AE19130-P017), and was carried out in accordance with the ASAB Guidelines for the Treatment of Animals in Behavioural Research and Teaching. All research was conducted under BTO licenses as part of ongoing research in this population.

Learning experiment and criterion

Prior to the associative learning experiments, any PIT-tagged individual could obtain food from any of the five feeders at a site for a period of 4 days (habituation phase). The habituation phase allowed the birds an opportunity to become accustomed to gaining access to food at the feeders, and provided data on the relatively uninhibited sampling tendencies of individuals prior to the learning experiments.

The initial learning phase took place immediately after the habituation phase. Each individual was randomly assigned to one of the five feeders at the site, and the feeders were programmed such that individuals could only gain access to food at their single assigned feeder. Individuals were considered to have learned the association with their assigned feeder when they made 16 ‘correct’ visits to the assigned feeder out of 20 consecutive visits, with the requirement that the first visit within that window was a correct visit (Reichert et al., 2020). The individual’s learning speed was then defined as the number of visits until the first visit at which this criterion was met (i.e., trials to criterion).

The initial learning phase took place for 8 days. Immediately afterwards we began a reversal learning phase, in which we randomly assigned individuals to a different rewarded feeder in the array. Although laboratory tests of reversal learning typically involve only two choices, reversal learning paradigms can involve more choices, which allows examining proactive interference and sampling as distinct phenomena (Izquierdo et al., 2017; Tello-Ramos et al., 2019). Furthermore, we use the term ‘reversal learning’ here to maintain consistency with previous work (Reichert et al., 2020). The assignment to a new feeder was completely random by individuals for four of the eight sites, and for the other four sites, the entire set of birds assigned to one feeder was reassigned as a group to a randomly selected new feeder. The purpose of this treatment was to test that our design prevented social learning (which seemed to be the case; Reichert et al., 2020), and to examine possible effects on the social network, which will be reported elsewhere and are therefore not explored further here. We measured learning speed during the first reversal phase as in the initial learning phase (trials to criterion). The first reversal learning phase lasted for 8 days in 2017 and 10 days in 2018, when extra time was needed to repair backup feeders. We then repeated the randomization procedure a second time, assigning birds to another new feeder (no bird was assigned to a feeder that it had already been assigned to in a previous phase). This second reversal learning phase also lasted for 8 days in 2017 and 10 days in 2018, and we again measured trials to criterion. All of our analyses are for a dataset including the 183 individuals that met the learning criterion for all three of these phases (n = 115 blue tits and n = 68 great tits; Reichert et al., 2020).

Sampling and proactive interference

As a first step in examining sampling behaviour expressed by wild birds in this experiment, we examined visits occurring during the habituation phase (when birds could feed from any feeder) in three ways. First, we calculated the total number of visits to all of the feeders during this phase, the sampling quantity. Second, we calculated the variance in visits to the different feeders in the array, which we used to define sampling bias. To do this, we determined the proportion of the total visits that were made to each feeder and then calculated the variance of these proportions across the five feeders. This variance ranged from a minimum of zero, indicating that the individual visited each feeder an equal number of times (i.e., sampling broadly), to a maximum of 0.2, indicating that the individual exclusively visited just one feeder in the array and therefore was likely to have acquired less information about the array of feeders as a whole. Third, we estimated the strength of preference for the most preferred feeder, by calculating the proportion of visits directed towards the most preferred feeder during the habituation phase. Six individuals that met the learning criterion in all three learning phases did not visit the feeders during the habituation phase.

We defined sampling during the three learning phases as visits to non-rewarding feeders that took place after the learning criterion was met. Visits prior to the learning criterion being met were not included because these were more likely to have resulted from errors made while searching for the rewarding feeder rather than sampling. For the reversal phases, we also excluded visits to the feeder assigned in the previous phase, because these were considered to have resulted from proactive interference (i.e., three feeders could be sampled in each of the reversals, whereas four feeders could be sampled in the initial phase). Visits that took place to non-rewarded feeders when the individual’s assigned feeder was malfunctioning were not included. Sampling quantity and sampling bias were otherwise defined the same as for the habituation phase. Individuals that did not make any sampling visits were excluded from analyses of sampling bias (n = 2, 6, and 10 for initial learning, first reversal, and second reversal, respectively). Unlike the analyses of the habituation phase, for the learning phases we did not calculate the third metric of sampling, the strength of preference for the most preferred feeder, because birds directed most of their visits in the learning phases to the rewarded feeder, which is not considered in analyses of sampling. Note that we use an operational definition of sampling, and cannot discriminate between visits to non-rewarded feeders that occurred ‘intentionally’ to sample information, and those that were simply errors. However, from the point of view of our hypotheses on information gathering by visiting non-rewarding feeders and its effects on subsequent reversal learning, all such visits can potentially provide information, whether they resulted from a ‘true error’ or from ‘sampling’.

Proactive interference during the reversal phases was quantified as the number of visits to the feeder during those phases that had been rewarded in the most recent previous phase. For the first reversal, proactive interference was thus defined as taking place when individuals visited the feeder during the first reversal learning phase that they had been assigned to during the initial learning phase. For the second reversal, proactive interference was defined as taking place when individuals visited the feeder during the second reversal learning phase that they had been assigned to during the first reversal phase. Note that we found no evidence that birds were biased during the second reversal towards visiting the assigned feeder from the initial learning phase, so these visits were not classified as proactive interference. We excluded visits when the individual’s assigned feeder was malfunctioning (some feeders temporarily failed to open or register visits due to electrical issues; see Reichert et al., 2020). Proactive interference visits for a given reversal phase included visits both before and after the learning criterion was met, because such interference could continue to bias visits through the entire phase (i.e., early and late errors; Nilsson et al., 2015), and because we had observed a significant bias towards the previous feeder across all visits in both the first and second reversal (Reichert et al., 2020). When examining the effect of proactive interference on reversal-learning performance, we used proactive interference errors registered in the first reversal as a predictor of trials-to-criterion in the second reversal. This ensured independence of the data used to quantify each variable because they occurred in distinct experimental phases.

Statistical analyses

Effect of sampling and proactive interference on learning

Objective 1, the test of relationships between sampling and trials to criterion, was explored in separate models for initial, first reversal, and second reversal learning. The model for the second reversal also included proactive interference to address Objective 2. For the initial learning phase, we used a general linear model with trials to criterion (ln-transformed) as the dependent variable and the following measures of sampling during the habituation phase (see above): sampling quantity, sampling bias, and the proportion of visits directed to the most visited feeder. We included interactions between species and these three variables. The following additional potentially confounding effects were explored in a previous study, and are therefore not discussed here, but were included in the model because they were sometimes related to behaviour at the feeders: sex, age, a binary feeder position variable indicating whether the feeder was at the edge of the array or not, and the time in hours that either the individual’s own assigned feeder or any of its non-assigned feeders were malfunctioning prior to it reaching the learning criterion (for further details, see Reichert et al., 2020). We removed non-significant interaction terms before calculating the final model coefficients so that we could interpret main effects.

The general linear models run for the first and second reversal phases were similar. The dependent variable was the number of trials to criterion (ln-transformed) and the main explanatory variables were sampling quantity and sampling bias in the previous phase, and the interactions between each of these variables and species. As for the initial learning model, we also included feeder location, sex, age, and the malfunctioning times of feeders, along with the number of trials to criterion in the previous phase (ln-transformed) and the number of rewarded visits post-criterion in the previous phase. The latter variable was included because the duration of each phase was the same for all birds, regardless of when they met the learning criterion. Thus, this variable accounts for variation among birds in the delay, and number of post-criterion rewards, between reaching the criterion and the change in experimental contingencies occurring at the start of the next phase, which could impact reversal learning speed (Mackintosh, 1974). For the second reversal phase, proactive interference visits in the previous phase, and the interaction between this variable and species, was also included to test the second objective.

Repeatability of sampling and proactive interference

In Objective 3, we tested for consistent individual differences in sampling behaviours and proactive interference using a repeatability analysis. For sampling quantity and sampling bias, we did this across the three learning phases, and across the two reversal learning phases alone, and did so for the two different species separately. We modelled sampling quantity as a Poisson variable, and included the phase and the number of correct visits made after the learning criterion was met (to control for overall visit numbers) as fixed effects, and individual identity as a random factor. Sampling bias was modelled as a Gaussian variable (the best fit we could achieve, though the errors departed from normality), and we included phase, the number of correct visits made after the learning criterion was met, and sampling quantity as fixed effects, and individual identity as a random factor. The repeatability value was calculated using mixed models in the rptR version 0.9.22 package (Stoffel et al., 2017) in R version 4.1.0 software (R Development Core Team, 2021). Because we included fixed effects in these models, the estimates were adjusted repeatabilities (Nakagawa & Schielzeth, 2010). Note that for these and all other analyses, we did not include a separate random effect of site because of low replication at some sites resulting in poor model performance. The repeatability of proactive interference was tested using a Poisson distribution with individual identity as a random factor, and with the following fixed effects: phase (first reversal or second reversal; note that proactive interference cannot be measured for initial learning), the total number of correct visits in that phase (to control for overall activity at the feeders), and the total number of rewarded visits made after the learning criterion was met in the previous phase. Separate models were run for great tits and blue tits.

For Objective 4, to assess whether sampling behaviour was consistent across contexts, we tested whether sampling during the habituation phase (when all feeders were rewarding) was predictive of sampling during the initial learning phase (when only one feeder was rewarding). Note that we did not use a repeatability approach here because the sampling behaviours were from very different contexts, and were effectively different variables, so could not be treated as the same response variable. We performed separate generalized linear models for sampling quantity (Poisson) and sampling bias (Gaussian) during the initial learning phase as the dependent variable, and the equivalent variables during the habituation phase as fixed effects. Specifically, we included three variables representing sampling during the habituation phase: the total number of visits, the proportion of visits to the preferred feeder, and the variance in feeder visits (further details above). We also included interactions between species and these three variables to test the generality of any observed association, along with main effects of sex and age. We focus our analyses of interactions on the species variable, rather than examining all possible interactions, because we expected the most pronounced differences in sampling to be at the species level, and because sex and age never explained variation in learning performance (Reichert et al., 2020). In addition, for the analyses of sampling bias during initial learning as the dependent variable, we included sampling quantity during the initial learning phase (i.e., the number of sampling visits) as a fixed effect, to account for the possibility that sampling bias would be skewed for individuals with low visit numbers (because it is calculated as a variance). For sampling quantity, we included the total number of rewarded visits after the learning criterion was met to control for total visit effort. Non-significant interaction terms were dropped before calculating the final model.

For Objective 5, we examined whether proactive interference, sampling quantity, or sampling bias observed during the two reversal learning phases affected the repeatability estimate of trials to criterion during these phases. We estimated repeatabilities separately for each species using rptR. Trials to criterion (ln-transformed) was the dependent variable, individual identity was the random effect, and phase was an additional fixed effect along with assigned feeder location and the malfunctioning times of the feeders as described above. We then calculated repeatability values with and without the number of proactive interference visits during this phase as an additional fixed effect. We compared the fit of these models using Wald tests with the anova function in R. We repeated these procedures but with sampling quantity or sampling bias as the additional fixed effect, rather than proactive interference. We then examined whether including the two sampling variables in addition to proactive interference resulted in additional effects on repeatability of trials to criterion in the reversal phases.

Results

Descriptives of visit behaviour

In the initial learning phase, individuals took a mean (SD) of 60.0 (100.4) trials to reach criterion, had a mean sampling quantity of 40.7 (46.2) visits, and had a mean of 428.1 (133.3) total visits to feeders across the entire phase. In the first reversal, the mean trials to criterion was 59.1 (70.7), mean sampling quantity was 36.9 (41.5), the mean number of proactive interference visits was 45.5 (50.2), and the mean total visits was 488.7 (114.1). In the second reversal, individuals took a mean of 70.6 (98.7) trials to reach criterion, had a mean sampling quantity of 29.4 (36.1), a mean number of proactive interference visits of 37.1 (34.6), and had a mean of 517.8 (147.2) total visits.

  1. 1)

    Is trials to criterion related to sampling behaviour?

In the initial learning phase, trials to criterion was not predicted by sampling bias (Estimate: −0.69 ± 8.14, P = 0.93), sampling quantity (Estimate: −0.0007 ± 0.002, P = 0.69), or feeder preference (Estimate: −0.57 ± 1.89, P = 0.76), during the habitation phase (Table 1). Similarly, there was no effect of sampling quantity on trials to criterion in the next phase for either the first reversal (Estimate: −0.003 ± 0.002, P = 0.12; Table 2) or the second reversal (Estimate: 0.003 ± 0.002, P = 0.22; Table 3; Fig. 1b). However, there was a non-significant trend for individuals that were more biased in their sampling to take fewer trials to meet criterion in the next phase, both for the first reversal (Estimate: −2.38 ± 1.37, P = 0.08; Table 2) and the second reversal (Estimate: −1.84 ± 1.02, P = 0.07; Table 3; Fig. 1c).

  1. 2)

    Is trials to criterion related to proactive interference?

Table 1 Trials to criterion during initial learning, the dependent variable, in relation to sampling during the preceding habituation phase
Table 2 Trials to criterion in the first reversal learning phase, the dependent variable, in relation to sampling in the preceding, initial learning phase
Table 3 Trials to criterion in the second reversal learning phase, the dependent variable, in relation to proactive interference and sampling during first reversal
Fig. 1
figure 1

a Significant relationship between the number of proactive interference visits during the first reversal phase and the number of trials to criterion during the second reversal phase (higher values = slower learning). Line and shaded region represent the model prediction (± 95% confidence interval) and dots represent the estimated marginal means of the data points for each individual. b Relationship between sampling quantity during the first reversal phase and the number of trials to criterion during the second reversal phase. This relationship was not significant. c Marginally non-significant relationship between sampling bias (higher values = feeders are sampled less broadly) during the first reversal phase and the number of trials to criterion during the second reversal phase

There was a significant positive relationship between trials to criterion for the second reversal and the number of proactive interference visits in the first reversal (Estimate ± SE: 0.007 ± 0.003, P = 0.017; Table 3; Fig. 1a). This indicates that a stronger tendency to use the previously rewarded feeder during the first reversal predicted slower learning in the second reversal.

  1. 3)

    Do individuals differ consistently in sampling behaviour and proactive interference?

Across the two reversal-learning phases, sampling quantity was not significantly repeatable in great tits (R = 0.129, CI = [0, 0.365], P = 0.16) but was moderate and significant in blue tits (R = 0.309, CI = [0.124, 0.475], P < 0.001). Results were similar when including sampling during initial learning along with both reversal-learning phases (great tits: R = 0.081, CI = [0, 0.238], P = 0.14; blue tits: R = 0.327, CI = [0.211, 0.452], P < 0.001).

Sampling bias was not significantly repeatable across the two reversal learning phases for either species (great tits R = 0.118, CI = [0, 0.406], P = 0.19; blue tits R = 0, CI = [0, 0.199], P = 0.5). This was also true when including initial learning along with both reversal learning phases (great tits R = 0.082, CI = [0, 0.252], P = 0.15; blue tits R = 0.044, CI = [0, 0.165], P = 0.22).

For both great tits and blue tits, proactive interference was moderately and significantly repeatable across the first and second reversal (Great tits: R = 0.383, CI = [0.145, 0.577], P = 0.001; blue tits R = 0.385, CI = [0.214, 0.538], P < 0.001). This indicates that individuals differed consistently in their tendency to inhibit responses that were no longer rewarded.

  1. 4)

    Is sampling behaviour consistent across contexts?

The number of sampling visits after reaching criterion (i.e., sampling quantity) in the initial learning phase was predicted by the sampling quantity to all feeders during the habituation phase (Estimate: 0.005 ± 0.0003, P < 0.001), and by sampling bias (Estimate: −5.04 ± 1.34, P < 0.001), but not by the proportion of visits to the most preferred feeder during the habituation phase (Estimate: 0.13 ± 0.29, P = 0.65; Table 4). Sampling bias during initial learning was not related to sampling quantity during the habituation phase (Estimate: −0.00003 ± 0.00009, P =0.72), but there was a marginally non-significant effect of sampling bias during habituation suggesting that birds who sampled feeders broadly during habituation continued to do so later in the experiment (Estimate: 0.82 ± 0.43, P = 0.057; Table 5; Online Supplemental Material (OSM) Fig. S1).

  1. 5)

    Is the repeatability of reversal learning speed explained by proactive interference and sampling?

Table 4 Consistency in sampling quantity across contexts. The relationship between sampling quantity post-learning criterion during the initial learning phase (dependent variable) and sampling variables during the habituation phase
Table 5 Consistency in sampling bias across contexts. The relationship between sampling bias (dependent variable) during initial learning phase and sampling variables during the habituation phase

The adjusted repeatability of trials to criterion in the reversal phases was lower for both species when proactive interference in those phases was included as a fixed effect than when proactive interference was not included (Table 6). However, including sampling quantity as a fixed effect had only a very minor effect on the repeatability of trials to criterion in the reversals (Table 6). When sampling bias was included as a variable, the adjusted repeatability of trials to criterion in the reversal phases was lower for great tits, but not for blue tits, compared to when sampling bias was not included (Table 6). Models of the trials to criterion in the reversal phases had a better fit when proactive interference was included as a variable than when proactive interference was not included (great tits: Wald test, Χ12 = 44.7, P < 0.001; blue tits: Wald test, Χ12 = 68.9, P < 0.001). Likewise, models with sampling bias included had a better fit than when sampling bias was not included (great tits: Χ12 = 8.17, P = 0.004; blue tits: Χ12 = 4.28, P = 0.04). However, including sampling quantity did not improve model fit for great tits (Χ12 = 1.80, P = 0.18) but it did for blue tits (Χ12 = 6.30, P = 0.01). Estimates of repeatability adjusting for sampling bias and sampling quantity in addition to proactive interference were very similar to the estimates when these sampling variables were not included (Table 6). This indicates that proactive interference and sampling bias, but not sampling quantity, explains among-individual variation in reversal learning speed for one or both species.

Table 6 Estimates of trials to criterion repeatability across the two reversal phases, either excluding, or including, proactive interference and sampling variables from the current stage

Discussion

While studies on humans and animal models are revealing the neurological and genetic bases of reversal-learning performance (Nilsson et al., 2015), as well as those specifically associated with perseveration and sampling errors (Jentsch et al., 2002; Odland et al., 2021), we still know little about individual variation in these processes in wild populations. The repeatability of reversal learning has been assessed in a few species, with results ranging from no repeatability to large and statistically significant repeatabilities (reviewed by Cauchoix et al., 2018). These individual differences could be due to a number of cognitive mechanisms and behaviours known to contribute to reversal-learning performance (Nilsson et al., 2015), but the extent to which each of these mechanisms varies among-individuals has not been examined before, to the best of our knowledge. Here we used a multi-option discrimination reversal-learning task to assess the contribution of (1) sampling behaviour, as measured by errors to options that had never been rewarding during the learning phases, and (2) proactive interference, as measured by errors to the previously rewarded option, to variation in reversal-learning performance. With the exception of sampling quantity in blue tits, sampling quantity and bias were not significantly repeatable, and neither variable had a strong relationship with reversal-learning performance. Proactive interference was moderately repeatable and negatively impacted reversal-learning performance. Furthermore, proactive interference explained all of the among-individual variation in reversal-learning performance for blue tits, though sampling bias and proactive interference had similar but indistinguishable effects in great tits.

Sampling quantity

We predicted a positive relationship between sampling quantity post-learning and the subsequent reversal-learning performance, i.e., that more sampling would lead to better performance. This prediction was not supported in either of the two reversals (Tables 2 and 3). However, there was evidence that sampling quantity was repeatable within and across contexts (see below), which suggests this variation is biologically relevant, and specifically we speculate that sampling quantity was more likely to reflect motivation to use the feeders. In mice engaged in a sequential go/no-go reversal task, exploration errors were related to the motivation of individuals to interact with the test device (Odland et al., 2021). If sampling quantity is indeed underpinned by motivation rather than an attempt to gather information as such, then the lack of a relationship between sampling quantity and trials to criterion would suggest that motivation to visit the feeders is not a driver of differences in reversal-learning performance, thereby ruling out an important confound in cognitive experiments (Cooke et al., 2021; Rowe & Healy, 2014). One limitation of our experiment is that it involved only two reversals, and each time the rewarded feeder was constant across a period of several days. Thus, sampling may have been of only limited utility in this relatively stable environment, in which case one might not readily detect any association with reversal learning. Future studies in the wild could easily increase the number of reversals and vary the rate at which they take place, which may affect the benefits of sampling and reveal additional individual variation in sampling that may affect reversal-learning performance.

Once the learning criterion was reached, our ‘sampling quantity’ variable was significantly and moderately repeatable in blue tits but not in great tits. Thus, in great tits, variation in this behaviour was mostly at the within-individual level, while some variation in blue tits was explained by consistent among-individual variation. Motivation to use freely available feeders has previously been shown to be moderately repeatable in this population for both great tits and blue tits (Crates et al., 2016), so it is unclear why we detect repeatability only in blue tits. There was also a species difference in sampling quantity itself: blue tits made more sampling visits than great tits. Furthermore, males and first-year birds sampled more than females and older birds. The reasons for these demographic differences despite a similar foraging ecology could be examined in future studies that manipulate the social environment at the feeders, perhaps revealing a role for competition among species, ages and sexes.

Both great tits and blue tits showed consistency in sampling quantity across contexts early in the experiment (habituation vs. initial learning). For great tits, the consistency in sampling in the habituation versus initial learning phase but lack of consistency across the learning phases could potentially be explained by a change in motivational and/or attentional state due to the reversal of contingencies, which may bring in other sources of variation in the tendency to visit non-rewarding feeders, such as a decrease in the ability to recognize and remember the actually rewarding feeder, even after the learning criterion was met (i.e., true errors). Future studies could investigate the relative contribution of learning errors and information gathering by manipulating the predictability and rate of reversal and the difficulty of the learning task.

Sampling bias

Sampling bias was unrelated to the total number of visits during habituation, suggesting it is not a general expression of motivation to use the feeders, but might indeed be more closely related to the tendency to collect information, or the way information is collected. Sampling bias during the habituation phase was a marginally non-significant predictor of sampling bias post-criterion of the initial learning phase, suggesting some cross-contextual consistency in how broadly birds sampled the different options. However, sampling bias during the learning phases of the experiment, or solely during the reversal learning phases, was not significantly repeatable in any of the two species. It is thus possible that this process of information gathering is different in the initial phases of the experiment (habituation and initial discrimination) than in the reversal phases, a change that could be prompted by variation in the predictability of rewards (Keasar et al., 2013).

The lack of repeatability of sampling bias means that most variation is due to within-individual adjustment in sampling, or errors made, in response to the environmental or test context. A potential source of within-individual variation in sampling bias is the energetic state of the animal, which has been shown to influence the propensity to sample novel options (Katz & Naug, 2015) or options that have not paid before (Arvidsson & Matthysen, 2016). If the state of individuals varies in a manner that is not consistent over the phases of the experiment (e.g., stochasticity in foraging success at the experimental feeders or other sources), this could explain the lack of repeatability of state, and thus of sampling tendencies. Greater experimental control over the energetic state of individuals in the laboratory might explain why the tendency to sample was found to be significantly repeatable in artificial selection lines of great tits (Smit & van Oers, 2019) but not here in this field-based study.

We predicted that if sampling many options is beneficial by providing information that can be used to inform future choices, then individuals with a smaller sampling bias, indicating a tendency to distribute sampling visits more evenly across the feeders, would require fewer trials to meet the learning criterion. However, we found no support for this prediction in either reversal. In natural conditions, sampling a range of options might be an adaptive strategy in some ecological conditions (Evans et al., 2017). This might not be the case in reversal tasks, where some information becomes irrelevant after the reversal; in our experiment, two of five feeders have a different payoff after a reversal. However, it is not clear why information on the other three feeders was not used by individuals to reduce errors and improve their performance (Pike et al., 2016). Other studies have reported that sampling does not always lead to the maximization of payoffs, for instance in bees learning to use two feeders that varied in their probability of reward (Dunlap et al., 2017). Increasing the costs of sampling – for example, by increasing the distance between feeders or adding a task to complete to have access to the food – could lead to more targeted information-gathering visits and, potentially, a positive impact on learning. Although a bias for the previously rewarded feeder during the reversals demonstrates that great and blue tits used some information from the previous phase in guiding their current decisions (Reichert et al., 2020), it is possible that birds mostly memorized information on rewarded, rather than non-rewarded, feeders. If this was the case, it could suggest that in our system reversal-learning performance is mainly driven by the formation and inhibition of positive associations, as opposed to learned non-reward (Nilsson et al., 2015).

Proactive interference

Proactive interference was moderately and significantly repeatable in both species, pointing to intrinsic differences among individuals. Repeatability is a prerequisite for, and sets the upper limit to, heritability in a trait. Equally most of the intrinsic differences could be caused by non-genetic effects (Kruuk, 2004) – for example, permanent environment or maternal effects (e.g., Quinn et al., 2009). Our results re-iterate previous findings that reversal-learning performance is negatively impacted by proactive interference (Crossley et al., 2019; Lewis & Kamil, 2006), but in our experimental design it was also possible for birds to make non-proactive interference errors (i.e., visits to other feeders than those made by visiting the previously rewarded feeder), which is not the case in two-option tasks. Note that our results indicate a robust link between proactive interference and reversal speed, as we examined proactive interference errors and reversal performance independently, instead of during the same reversal. Moreover, the criterion for reversal learning was attained more slowly than that for initial learning, an effect that has often been used to detect proactive interference in reversal learning studies (e.g., Mancini et al., 2019).

Proactive interference captured relatively more among-individual variance than reversal-learning performance itself, that is, it was more repeatable. This difference was especially pronounced for blue tits, for whom proactive interference was moderately repeatable (R = 0.39), but trials to criterion in reversals was not (Reichert et al., 2020). Combined with the finding that proactive interference explains all of the repeatability in reversal-learning performance for blue tits (Table 6), this suggests that examining proactive interference itself could be a valid avenue for future studies on the evolution of cognitive flexibility (Tello-Ramos et al., 2019). This would bring the benefit of examining a less composite trait than reversal-learning performance itself, which leaves out little or no among-individual variation to be explained by other variables – at least for blue tits in our study system. This was not true for the great tit, for which the effects of sampling bias and proactive interference on the repeatability of reversal learning were similar; note that when sampling bias and proactive interference were included in the same model, the order in which they were entered determined which explained repeatability, so their effects on repeatability were indistinguishable. Nevertheless, we argue that the proactive interference measure was likely the most important because, in the second reversal learning model, proactive interference was a significant fixed effect, but sampling bias had only a non-significant tendency to predict performance and this was in the opposite direction to that predicted, i.e., reversal-learning performance decreased with increasing sampling bias.

Conclusion

Our results demonstrate that an automated system can be used to provide novel insight into the mechanisms that cause consistent among-individual differences in cognitive flexibility in the wild. Most of our findings were similar for both species, suggesting that the mechanisms driving cognitive flexibility are likely to be similar among species. Decomposing among-individual variation into its constituents is an important step towards identifying heritable variation in traits potentially targeted by natural or sexual selection. It may also help explain why some tasks that seem to conceptually measure the same ability do not covary (e.g., detour-reaching and reversal learning, both implicated as measures of behavioural flexibility; Audet & Lefebvre, 2017; Troisi et al., 2021; Völter et al., 2018).

Here, we have identified proactive interference as a repeatable trait that underlies most of the among-individual variation in reversal-learning performance, at least for one of two species, and probably the other. This suggests that other processes that may contribute to variation in reversal-learning performance, such as the formation of an excitatory association with the currently rewarded feeder and of inhibitory associations with the non-rewarded feeders, or the utilization of working memory (Hassett & Hampton, 2017), do not do so at the among-individual level, although they could be repeatable traits in other test contexts (e.g., discrimination tests; Cauchoix et al., 2018). A natural extension of this work would examine whether proactive interference errors can themselves be decomposed into further constituents with significant among-individual variation (e.g., perseveration vs. learned non-reward; Nilsson et al., 2015), as well as their covariation with other phenotypic and life-history traits. Our results on sampling behaviour gave equivocal results, possibly because sampling is also a composite trait, with elements of exploration, motivation, and other unidentified processes. Additionally, the relative importance of these effects is likely to be dynamic across phases of the experiment. Examining decision making by fitting competing models of learning onto trial-by-trial choice data (Daw et al., 2006; Metha et al., 2020) could be a valuable approach to examine individual variation in exploration-exploitation in reversal-learning tasks. Furthermore, the costs and benefits of proactive interference and sampling may depend on the permanence of environmental change: proactive interference may act as a buffer when changes are transient, while sampling in such conditions may provide a misleading characterization of environmental conditions. In contrast, when changes are relatively long-lasting, sampling would enable individuals to adjust to the new situation quickly, while proactive interference would slow them down. Thus, investigations of individual variation and covariation in these traits under different patterns of environmental change will contribute to the understanding of reversal-learning performance in an ecological context.