“But, unless there is something extraordinary about the conceptual capacities of pigeons, our findings show that an animal readily forms a broad and complex concept when placed in a situation that demands one”. Herrnstein and Loveland (1964, p. 551)

Birds master a sheer endless variety of perceptual categories

The critical function of any brain is to predict the consequences of actions based on sensory stimuli. Analysis of sensory input can be rather simple, for instance when consuming a standardized food item that is directly in the field of view. But often decisions involve a wealth of past experiences and a complex sensory analysis since not all stimuli that require the same action also look the same. Perceptual categorization enables animals to group stimuli based on their sensory features (see Box 1 for formal definitions). This core cognitive ability is executed almost instantaneously, seemingly without any effort, and allows assigning functional associations to items in the world around us. In fact, categorization appears at a comparable timescale as the initial detection of an object. The category membership can be reported before an idiosyncratic identification of an object is possible (Grill-Spector and Kanwisher 2005). As a result of these operations, organisms handle the endless variety of perceptual input by first recognizing the category of items to subsequently discriminate between them or generalize across different stimuli. All these different processes contribute to categorization and the formation of concepts. How categorization is mediated at a neuronal level, what stimulus features are used, and how concepts emerge from categories remain open questions. These mechanisms have previously been reviewed (Soto and Wasserman 2014) and synthesized into a mechanistic hypothesis (Güntürkün et al. 2018). In the current review, we will provide insights from the realm of birds into the behavior and the neurobiology of perceptual visual categorization by mainly focusing on key developments of recent years. Although we only review studies that used visual stimuli, there is strong evidence from experiments using human participants that categorization of visual and tactile objects generates highly similar veridical perceptual spaces to form overlap** object categorization processes (Tabrik et al. 2021). Studies in corvids also show that auditory categorization follows highly similar principles to the visual system (Wagener and Nieder 2020).

The common elements model of categorization

The common elements model of categorization (Soto and Wasserman 2010, 2012) provides a theoretical and neurobiological framework that describes how the avian visual system parcellates objects into different categories and uses these representations to guide decision making. The model rests on two assumptions. First, objects belonging to a category are represented by a combination of shared perceptual features (the elements), and these elements have different probabilities of being a diagnostic measure of a particular category. Elements that have high probability of diagnosing a particular category are shared between many, if not all different objects, making these elements category-specific. In contrast, elements that have a low probability of diagnosing a particular category are not shared by many objects comprising the category, making these elements only stimulus-specific. Second, the model assumes that connections between category-specific or stimulus-specific elements and behavioral responses are strengthened through error-driven learning, depending on their ability to predict reward. As learning is proportionate to reward-prediction error, only stimulus-specific and category-specific elements that are predictive of reward control behavioral decisions.

The common elements model is implemented as a simple hierarchical feedforward network (Riesenhuber and Poggio 2000; Serre et al. 2007), with alternating simple cell-like and complex cell-like layers as inspired by the architecture of the mammalian ventral visual stream. This pathway is a recurrent occipito-temporal network that associates early visual areas with the anterior inferior temporal cortex, and shows diverse and clustered categorical selectivity for visual objects (Kravitz et al. 2013). Thereby, layers of simple cells are interleaved with layers of complex cells, which combine the input of several units with similar selectivity but slightly different positions and scales. These non-linear operations between layers allow the network to extract increasingly specific and complex image features, mimicking the hierarchical computations known to occur along the pigeon tectofugal pathway (Li et al. 2022; van Essen et al. 1992). The similarities between birds and primates means that understanding the physiology of the avian visual system represents a unique opportunity to compare how similar principles of perception, motor control and planning are implemented by neuronal hardware that differs from the mammalian cortex.

The thalamofugal pathway

The study of Stacho et al. (2020) demonstrated that the entire sensory pallium of birds encompassing both the components of the visual thalamofugal and the visual tectofugal systems is characterized by columnar canonical iterative circuits that are highly similar in both the thalamofugal and the tectofugal regions. Thus, these circuits are mostly identical throughout sensory systems and pallial areas (canonical) and they are repeated in identical way throughout the expanse of the sensory pallium. In addition, canonical circuits of both thalamo- and tectofugal systems are tangentially intersected by long-range associative axons that cross-connect all columns and link them to prefrontal, hippocampal, and (pre)motor structures (Fig. 1C). This cortical organization is only visible in the sensory pallium, while associative and motor areas have a different organization. The thalamofugal visual system terminates in the cortex-like territory of the Wulst (German for bulge or swelling), a laminated structure at the dorsal roof of the avian telencephalon which contains both a somatosensory and a visual processing region (Bischof et al. 2016; Pettigrew and Konishi 1976a, b; Wild 1987). This visual component of the Wulst receives projections from the dorsolateral geniculate nucleus (GLd) and constitutes together with the GLd the thalamofugal visual pathway (Güntürkün and Karten 1991). The Wulst is functionally analogous with the primary visual cortex (V1) in many respects, such as displaying detailed retinotopic maps of the visual space, selectivity to orientation/direction of motion, and small receptive field sizes (Bischof et al. 2016; Gusel'nikov et al. 1977; Revzin 1969). In predatory birds with frontally oriented eyes, such as owls, the cortex-like architecture of the Wulst is expanded which may be related to their behavioral specializations. In these birds, the Wulst plays an important role in computing binocular disparity (Nieder and Wagner 2001; Pettigrew and Konishi 1976a, b; Wagner and Frost 1993), and performs global shape analysis that goes beyond that performed by the primary visual cortex (V1; Nieder and Wagner 1999). The owl Wulst also displays clustered pinwheel arrangements of neurons sensitive to orientation, like the monkey and cat extrastriate cortex (Liu and Pettigrew 2003). Laterally eyed birds, such as pigeons, possess a much less differentiated Wulst lamination (Stacho et al. 2020), and no clustered orientation arrangements of pinwheels (Ng et al. 2010). The thalamofugal pathway in laterally eyed birds relates more to the processing of distant stimuli viewed in the monocular visual field (Budzynski et al. 2002; Budzynski and Bingman 2004) and spatial localization (Bischof et al. 2016; Watanabe et al. 2011).

The tectofugal pathway

The tectofugal pathway plays the dominant role in detailed pattern vision in laterally eyed birds. This is particularly true when stimuli are viewed nearby in the frontal binocular visual field, as is mainly encountered in an operant chamber (Güntürkün and Hahmann 1999; Remy and Güntürkün 1991). The differentiated network of 15 layers comprising the avian optic tectum highlights the tectofugal pathways importance in both spatial attention (Marín et al. 2005) and stimulus perception (Neuenschwander et al. 1996; Neuenschwander and Varela 1993). The optic tectum displays a detailed retinotopic map of the visual field, and a progressive increase in the complexity of response properties and receptive field sizes at increasing depths (Frost and DiFranco 1976; Luksch 2003). Layer 13 of the optic tectum projects to the thalamic nucleus rotundus, by transforming the tectal retinotopy to a rotundal functionotopy for form, color, 2D motion and looming (Laverghetta and Shimizu 1999; Wang et al. 1993; Hellmann and Güntürkün 2001). These modules project topographically to the pallium that is composed of an inner region called the nidopallium, and a more dorsal region called the mesopallium. The nidopallium contains the main projection zone of the tectofugal pathway, which is known as the entopallium (Husband and Shimizu 1999) and also displays functional specializations for form/color and motion information along its anterior–posterior extent (Cook et al. 2013; Nguyen et al. 2004), and large receptive fields (Gu et al. 2002). The entopallium displays a topographic arrangement of cortex-like fiber connections oriented roughly perpendicular with the overlying intercalated nidopallium (NI), and mesopallium ventrolaterale (MVL) layers (Krützfeldt and Wild 2005; Stacho et al. 2020). These layers might be analogous with the mammalian extrastriate cortex (Butler et al. 2011; Karten 1969) and play a critical role in the categorization of complex visual stimuli.

In the following section, we will focus on the operation of the tectofugal projections in the telencephalon, as it is the best-understood cortex-like component of the visual system in birds in terms of its neurophysiology. These bottom-up visual computations form the basis of object, category, and abstract rule processing in birds, which in many tasks are executed at levels comparable to primates (Scarf et al. 2016; Veit and Nieder 2013).

The avian visual cortex—perceptual categorization

Recent investigation of the physiology of the avian sensory cortex has revealed that hierarchical information processing builds increasingly complex and abstract representations of visual stimuli in the pigeon brain. These mainly feedforward shaped computations are very similar to the transformation of information observed across the mammalian ventral visual stream (Riesenhuber and Poggio 2000; Vinken et al. 2016). Arrays of neurons at higher stages of the processing hierarchy in mammals (such as primate inferior temporal cortex) are both selective to complex shapes and relatively invariant to non-linear changes, such as lighting, distance, viewpoint, and spatial translation (Bao et al. 2020; Freiwald and Tsao 2010; Gross and Schonen 1992; Wallis and Rolls 1997).

The entopallium is the first stage of hierarchical processing within the cortex-like architecture of the avian telencephalon that receives thalamic input and forwards information to the overlying MVL and NI layers to extract more complex features (Fig. 2A; Stacho et al. 2020; Clark and Colombo 2020). Consistent with entopallium reflecting a relatively early stage of categorization, neurons are selective for parameters such stimulus size and direction/speed of motion (Engelage and Bischof 1996; Gu et al. 2002), but the population responses do not distinguish well between images belonging to different stimulus categories (Fig. 2B; Azizi et al. 2019; Clark et al. 2022a). These features suggest that entopallium may reflect an intermediate stage of processing in the common elements model hierarchy of simple and complex unit layers (Serre et al. 2007; Soto and Wasserman 2012) that has not built sufficient receptive field invariance to discriminate between different object categories. Figure 2B illustrates these hypothetical feature selection operations within the visual cortex of pigeons.

Fig. 2
figure 2

Hypothetical wiring pattern of the avian visual cortex and its proposed function on feature extraction and processing. A Hypothetical hierarchical visual information flow within the visual tectofugal pallium. B Depiction of the hypothesized hierarchical feature selection operations at different levels of the visual DVR. At the level of the entopallium, a basic feature selection operation is performed on the visual input (here depicted by a “digital embryo”, Pusch et al. 2022). In our example, this operation is represented by the detection of edges that roughly correspond to the depicted orientations. However, at the stage of the entopallium the processed visual information is not sufficient to signal information about an object category. At the next hierarchical level, information from several Entopallial neurons converges in MVL neurons and is integrated. The resulting population code at the level of MVL conveys information about an object category viewed by the pigeon. After a further computational step, the visual information of the MVL cells converges at the level of the NI and is transferred to higher associative areas

Azizi et al. (2019) demonstrated that the population response of the overlying MVL layer distinguished between the features of images depicting animate and inanimate stimuli with far greater accuracy than at the level of the entopallium in a task that required the birds to peck the images for food reward without categorizing the stimuli. The visual features that the MVL population used to achieve categorization of the objects was also quite dissimilar from a simple V1-like model of Gabor filters, suggesting that MVL neurons represent more abstract features of stimuli than edges in particular orientations. Clark et al. (2022a) used a different image set and found that the population responses in MVL distinguished between the features of faces and scrambled faces with greater fidelity than the entopallium in a response-inhibition task that also did not require categorization (see Box 2 for further details). Interestingly, many MVL neurons respond strongly to scrambled images (Clark et al. 2022a, b) much like neurons in mammalian V1 (Vinken et al. 2016), suggesting that local edges are processed alongside some more abstract stimulus features at higher stages of processing within the cortex-like layers (cf. Fig. 2A, B). A preference for intact objects over scrambled images emerges at the level of NI (Clark et al. 2022b), suggesting that NI neurons sum the inputs of local orientation detectors at lower stages of processing to form receptive field filters that detect coarse low spatial frequency or complex shape features over a large area. The output of the NI layer is well situated to forward highly integrated visual information to the executive centers and memory systems of the avian brain (cf. Fig. 2A, B).

The avian ‘prefrontal area’

A global analysis of the architecture of the avian forebrain revealed a network organization remarkably similar to the mammalian connectome (Fig. 1D; Shanahan et al. 2013). In both group of animals, distinct local networks are dedicated to different sensory modalities, motor, limbic, and executive processes. These local networks are connected through central hubs, one of which is the prefrontal cortex. In birds, this corresponds to the nidopallium caudolaterale (NCL). This executive hub takes on a central position with afferent and efferent projections to all associative, sensory limbic and premotor structures. While the NCL does not share the cortical columnar circuitry with the cortex (Stacho et al. 2020), several lines of evidence indicate that it is indeed the avian functional counterpart of the mammalian prefrontal cortex (Güntürkün et al. 2021). The NCL is usually identified as the part of the pallium with the richest dopaminergic innervation (Güntürkün 2005; von Eugen et al. 2020). A part of these dopaminergic terminals form ‘baskets’ as dense encapsulations of individual perikarya that enable a very specific targeting of individual neurons (Waldmann and Güntürkün 1993; Wynne and Güntürkün 1995). It is possible that this mode of innervation might have a similar functional role in the unlaminated cluster of the avian NCL as layer-specific projections in the mammalian PFC. At the functional level, the similarity to PFC was initially established with various lesion and inactivation studies that reliably demonstrated that NCL is involved in higher, more abstract processes such as the processing of behavioral rules (Güntürkün 1997a; Hartmann and Güntürkün 1998; Mogensen and Divac 1982; Diekamp et al. 2002a, b). These reports were confirmed in many neurophysiological studies that involved the NCL in many of the typical prefrontal functions (Güntürkün et al. 2021). To name a few examples, neural correlates of categorization (Kirsch et al. 2009; Ditz et al. 2022), working memory (Diekamp et al. 2002a, b; Hahn et al. 2021; Veit et al. 2014), executive control (Rose and Colombo 2005), reward processing (Koenen et al. 2013; Packheiser et al. 2021), numerosity (Wagener et al. 2018), rules (Veit and Nieder 2013), and even sensory consciousness as the ability to be aware of a sensory event (Nieder et al. 2020) have been discovered in NCL. Also, the neural ‘code’ found in the NCL largely follows the same principles as neural representations in the PFC. In working memory, neurons (‘delay cells’) show evidence of active maintenance (Diekamp et al. 2002a, b), capacity limitations can be accounted for by divisive normalization and neural oscillations are in line with modern bursting models of delay activity. In both the PFC and the NCL, the neurons are tuned in a highly flexible, task-specific way (Rigotti et al. 2013). This ‘mixed selectivity’ enhances robustness and flexibility as well as the ability to represent highly abstract information.

The avian ‘prefrontal area’—perceptual categorization

We can think of categorization as a process that can occur at different levels of abstraction from physical stimulus properties (see Box 1 for formal definitions). The mammalian PFC and, correspondingly the avian NCL, are critical if abstraction increases. The location of the NCL within the avian pallial network allows the full integration of highly processed stimulus information from all modalities and the integration with limbic and, importantly, reward information. Unsurprisingly, neurons in PFC show categorical responses, that is they give a binary response even to a physically continuous stimulus. For instance, in a seminal experiment, Freedman et al. (2001) trained monkeys to categorize between renderings of cats and dogs. The stimulus set consisted of gradual morphs between cats and dogs, such that the stimuli were physically continuous. While neurons in inferior temporal cortex strongly responded to the physical ‘catness’ or ‘dogness’ of individual stimuli, prefrontal neurons gave a binary response as either cat or dog. In other words, prefrontal neurons did not represent the graded physical properties of the stimuli but only their category membership. The PFC is also able to flexibly respond to different categories. Interestingly, if the same stimulus set is categorized along different borders, then different groups of neurons represented the two categories (Roy et al. 2010). But if the animals flexibly switch between categorization involving different sets of stimuli, then the category representations were overlap** in the same neural population (Cromer et al. 2010). This highlights the importance of the PFC not only in rule-based categorization processes but also shows that conflicting, physically ambiguous categories require higher prefrontal involvement.

It is very likely that the category-selective response properties of NCL neurons are sculpted by reward and reward-driven dynamics of the strong dopaminergic input (von Eugen et al. 2020; Wynne and Güntürkün 1995) that activates local D1-receptors (Durstewitz et al. 1998). Their activation promotes synaptic stimulus–response associations (Herold et al. 2012) and signal the presence of predicted reward (Packheiser et al. 2021). In contrast, blocking of D1-receptors level the differential learning effects of unequal reward magnitudes (Rose et al. 2009, 2013; Diekamp et al. 2000). By the sum of these dopamine-mediated feedbacks, synaptic weights within cellular assemblies of NCL are increased and make it likely that the animal will increasingly select the rewarded stimulus category (Güntürkün et al. 2018; Soto and Wasserman 2010).

The contributions of the asymmetric avian brain

Avian visual pathways reveal task-specific and complementary hemispheric asymmetries in chicken hatchlings, adult pigeons and many more avian species (Güntürkün et al. 2020a, b). In both chicks and pigeons, the left hemisphere excels in visual discrimination of various object features like patterns or color (Güntürkün 1985; Rogers et al. 2007; Skiba et al. 2002), while the right hemisphere is superior in object configuration (Yamazaki et al. 2007), social cognition (Deng and Rogers 2002a; Nagy et al. 2010; Rugani et al. 2015) and spatial attention (Chiandetti 2011; Diekamp et al. 2005; Letzner et al. 2017). These asymmetries pay dividends, since birds with pronounced behavioral asymmetries fare better in foraging tasks (Güntürkün et al. 2000; Rogers et al. 2004). When tested in the context of learning the category “human vs. non-human”, Yamazaki et al. (2007) demonstrated that both hemispheres approach this challenge with complementary contributions. While the left side of the brain exploited the diagnostic value of tiny visual features, the right hemisphere concentrated on the overall configuration of the sought category. Indeed, Manns et al. (2021) could show in an elegant study that both hemispheres can take the lead during categorization, possibly based on the perceptual strategy used.

When testing pigeons in conditioning chambers, they use their frontal visual field when categorizing stimuli. The stimuli are then perceived with the dorsotemporal retina which is mainly represented in the tectofugal system (cf. Fig. 1B; Güntürkün and Hahmann 1999; Remy and Güntürkün 1991) that has a bias for local processing of object features (Clark and Colombo 2022). In contrast, the thalamofugal pathway seems to participate in global processing of more distant objects in the surrounding of pigeons (Clark and Colombo 2022). Therefore, under ecological circumstances, both hemispheres likely complement each other during categorization when using the entire visual field. Since the neurobiological studies discussed below mostly derive from experiments conducted in conditioning chambers, they possibly primarily uncover the neural fundaments of a left-lateralized superiority of visual feature coding in the context of perceptual categorizations.

Structural and physiological asymmetries of the avian visual system were investigated in both chicken (Adret and Rogers 1989; Costalunga et al. 2022; Deng and Rogers 2002b; Rogers and Sink 1988) and pigeons (Güntürkün et al. 1998; Manns and Ströckens 2014; Ströckens et al. 2013). The emergence of such asymmetries require, at least in part, an asymmetrical epigenetic event during early development. Birds take an asymmetrical position in the egg such the left eye of the avian embryo is covered by its own body, while the right eye points to the eggshell. Every time the breeding adults stand up, light falls onto the eggs, traverses the eggshell and primarily stimulates the right eye (Buschmann et al. 2006). This is the starting point for the right eye/left hemispheric superiority in visual object discrimination in birds (Manns 2021). Obstructing visual input to the right eye by a patch before (Rogers and Sink 1988) or after hatch (Manns and Güntürkün 1999) reverses both behavioral and anatomical asymmetries. While chicken predominantly evince asymmetries in the thalamofugal pathway, pigeons mainly show asymmetries in the tectofugal system (Güntürkün et al. 2020a; b). In the following, we will focus on the situation in pigeons.

Within the tectofugal pathway, already the first central structures show morphological and neurochemical asymmetries, indicating that bottom-up signals are processed in a lateralized manner (Güntürkün 1997b; Manns and Güntürkün 1999, 2003). In addition, contralaterally projecting tectal fibers are more numerous from the right tectum to the left rotundus than vice versa (Letzner et al. 2020; Fig. 3A, label A). Figure 3 summarizes the different asymmetrical processing steps and highlights their anatomical underpinnings using different labels (encircled letters A-D). These labels link the respective processing steps mentioned in the text and the figure.

Fig. 3
figure 3

Modified from ** perceptual features. These constitute the core of the common elements theory when applied to categories. In contrast, concepts are constituted by groups of stimuli that do not all share these perceptual features. Still, humans and some other animals might conceive them as a common group.

The emergence of categories and concepts has been recently investigated in a modeling-study using a deep neural network (Henningsen-Schomers and Pulvermüller 2021). Here, visual features that are present in all stimuli of the sought category (e.g., shared visual features of pigeon breeds, Fig. 5 left) create common elements of this category (overlap** dots shared by all stimuli). In addition, some elements are only shared by a subgroup of stimuli. The situation is different for abstract concepts. Their features were never shared across all members belonging to this concept, but only between subgroups of stimuli. Thus, as visible in Fig. 5 (right panel), the central zone of the concept is empty, while the overlap** zones between neighboring stimuli contain shared elements. This arrangement results in an intermediate state of feature overlap called family resemblance.

Fig. 5
figure 5

Schematic depiction of a hypothesis on how categories and concepts emerge. The left panel exemplifies the category “pigeon”. Each individual category member is characterized by a set of idiosyncratic features. These unique stimulus elements render the different pigeons identifiable. Further, some individuals might share visual aspects leading to subgroup features. However, the defining component of the category “pigeon” are overall shared features—visual aspects that are common to all category members. These shared features lead to a robust representation of a category based on visual similarity. The right panel depicts the situation for the concept “animal”, a concept that pigeons learn with some additional training (Roberts and Mazmanian 1988). As for categories, unique features characterize each individual instance of the concept. Further, several instances might have visual subgroup features. However, no features are shared by all instances of the category. As a result, no combined representation based on shared visual similarity is possible, but family resemblance emerges based on the presence of multiple subgroup features across all stimuli

After training the network with instances of such category members, the emerged cell assemblies were investigated. As a result, the authors found that stimuli belonging to a perceptual category (left side of Fig. 5) were represented in cell assemblies that showed category defining features in the neural network’s central connector hub area. This result is due to the effect that units coding for shared features are activated most frequently, leading to a relative suppression of the neurons responsive for unique features. If the common core is sufficiently activated, the categorical cell assembly will ignite as a whole, resulting in a strong persistence throughout task execution. In parallel, representation of unique features or subgroup features that are shared by only a few members of a category might pale, which results in an overshadowing of these features.

This is different for concepts. Here, no joint and shared features exist. In contrast, a larger number of neurons code for elements that are shared by only subgroups of the concept. Exactly the sum of all of these subgroup features could represented concepts. The stronger reliance on subgroup feature neurons in case of concepts creates the “family resemblance” and contextual dependency. Indeed, in humans abstract concepts do rely much more on its contextual embedding than perceptual categories (Schwanenflugel et al. 1988; 1992).

These modeling results (Henningsen-Schomers and Pulvermüller 2021) fit well with behavioral data showing that perceptual categories that share many stimulus details are easier to learn and categorize than more abstract categories at the superordinate level (see Box 1 for formal definitions; e.g., Lazareva 2004). Further, pigeons have severe problems mastering choice tasks using polymorphous concepts, i.e., stimuli that are defined as category members if they contain m-out-of-n stimulus features (Lea et al. 2006; von Fersen and Lea 1990). One explanation of this behavioral finding might be that these concepts need more behavioral training due to their neurocomputational demands to learn a group structure that lacks a central connector hub with common elements (Henningsen-Schomers and Pulvermüller 2021). If learning a concept requires the acquisition of a large number of context-dependent subgroups of features that jointly create a concept, it is easy to see, that animals with more pallial neurons can be ranked according to the speed of learning of a concept (Wright et al. 2003; Güntürkün et al. 2017). This might also explain why crows with their much larger number of associative pallial neurons are able to master these kind of tasks with ease, while pigeons face a hard time (Veit and Nieder 2013; Ströckens et al. 2022). In conclusion, sufficient training and computational power in associative brain structure might enable abstract concepts to evolve in various animal species.

The forthcoming frontiers

The synopsis of recent findings from anatomical studies, behavioral experiments, electrophysiological recordings and modeling attempts allow the formulation of a coherent theory of perceptual categorization and concept formation. Now, these theoretical implications need to be experimentally verified. In parallel, several methodological aspects might be worth to consider in future experiments on perceptual categorization and concept formation.

At the behavioral level, several algorithms to generate stimuli were introduced, which are geared to probe critical features used by the animals to facilitate categorization (Apostel and Rose 2021; Hegdé et al. 2008; Pusch et al. 2022). These stimuli represent artificial yet naturalistic objects that are free of human semantics but based on features for class distinction that can be tracked by the experimenter. Taking this approach a step further, genetic algorithms are used to adaptively change stimulus features during one experimental session, for instance to find optimal stimulus parameters for the animals (Qadri and Cook 2021). Such stimuli also allow a near perfect control over the statistics of the stimuli that define a category and might help to uncover the aspects, elements and features that guide the choice behavior of the animals.

The level of analysis might also benefit from the inclusion of additional behavioral parameters. One approach used in recent experiments is peck-tracking. Similar to human eye tracking, the peck location of the pigeons signaling their choice can be used as a proxy for measuring the pigeon’s visual attention. Indeed, it has been shown that pigeons, when learning to categorize visual stimuli, allocate their attention to the predictive features of the stimuli reflected by an increased pecking rate onto these stimulus aspects (Castro et al. 2021; Castro and Wasserman 2017; Dittrich et al. 2010; Pusch et al. 2022). In combination with the aforementioned stimulus material, this information might further aid the understanding of which stimulus features gain control over the elicited behavior.

This principle can be extended far beyond peck-tracking. Modern video analysis, such markerless pose estimation, allows tracking of behavioral aspects that were previously difficult to systematically incorporate in a detailed analysis (for example using DeepLab Cut: Nath et al. 2019; Wittek et al. 2022). All these approaches reduce experimenter biases and can reveal details not obviously visible in aggregated data to achieve an ecological valid and unbiased behavioral analysis (Anderson and Perona 2014).

On the neurophysiological level, the analysis of the supposed neural computations within the sensory aspects of the dorsal ventricular ridge (DVR)—a large pallial collection of nuclei that bulge below the lateral ventricle—and their connections with the NCL constitute core future questions. But these questions extend beyond the areas that were covered in this review and should incorporate key areas such as striatum and hippocampus. Both structures very likely constitute key contributors to categorization learning. Recent approaches like visual discrimination learning in awake and actively working pigeons tested with in ultrahigh magnetic field imaging systems, could aid these analyses, by visualizing with high resolution all cerebral areas that participate in certain task components (Behroozi et al. 2020). This further highlights the fact that categorization—like all cognition—cannot be understood at the level of individual neural structures but it must be seen as a network-process. The use of high-density methods such as electrophysiological recordings with silicone-probes, can allow parallel data-collection from the entire stacked avian visual cortex or even bilaterally from the visual and prefrontal structures simultaneously. The data that is generated with these approaches allows for the analysis of the temporal dynamics and population-level processes within and between the different nodes of the network. These critical tests might allow to further discern the network-level processes that underlie categorization and concept formation. Methods such as optogenetic stimulation and inhibition (Deisseroth 2011; Rook et al. 2021) further complement this approach by allowing causal interventions targeting for example top-down processes in perceptual categorization.

Last but not least, the differences in concept learning between pigeons and crows exemplifies important species differences within the avian class. These differences should be turned into important heuristic opportunities that enables us to see how ecological embedding and neural specialization affect the different components of avian cognition. This is only possible with a larger number of avian species that are tested.

Taken together, theoretical implications as well as methodical and conceptual advancements provide the opportunity for future experiments that will broaden our understanding of perceptual categorization in birds.