The perceptual expertise account of specialization for faces suggests that behavioral and neural hallmarks of face perception arise because most adults have had considerable experience at discriminating faces at the individual level (Diamond & Carey, 1986; Gauthier & Tarr, 1997; Wong, Palmeri, & Gauthier, 2009). Studies of visual expertise show that nonface objects of expertise, such as birds, dogs, body parts, cars, handwriting, x-rays, and even novel computer-generated objects such as Greebles and Ziggerins, can exhibit many characteristic behavioral and neurological markers of face perception (e.g., Busey & Vanderkolk, 2005; Diamond & Carey, 1986; Gauthier, Skudlarski, Gore, & Anderson, 2000; Gauthier & Tarr, 1997; Harley, et al., 2009; Stekelenburg & de Gelder, 2004; Wong et al., 2009). Objects of expertise can also recruit face-selective regions in the mid-fusiform gyrus (Gauthier, Skudlarski, et al., 2000; Harley et al., 2009; Xu, 2005) and lead to early physiological responses typically evoked by faces over the occipitotemporal cortex (Busey & Vanderkolk, 2005; Gauthier, Curran, Curby, & Collins, 2003; Rossion, Gauthier, Goffaux, Tarr & Crommelinck, 2002; Tanaka & Curran, 2001).

If faces and objects of expertise recruit similar processing mechanisms mediated by overlap** neural representations, objects from these domains may compete when they must be processed concurrently. Indeed, there is evidence of competition between faces and nonface objects of expertise on the face-selective N170 ERP component typically recorded at occipito-temporal electrodes at about 170 ms post stimulus onset (Gauthier et al., 2003; Rossion, Collins, Goffaux, & Curran, 2007; Rossion, Kung, & Tarr, 2004). In one study, ERPs were measured while subjects viewed an alternating sequence of face and car composites, judging whether the bottom half of an image was the same as the bottom half of the previous image. Holistic effects were estimated by the extent to which the task-irrelevant top of the images influenced judgments on the bottom of the images. Holistic processing of faces was reduced when car experts had to process the interleaved cars in a holistic manner, and this interference effect was correlated with the amplitude of the face-evoked N170 (Gauthier et al., 2003). Two other studies reported that the N170 response to faces was attenuated when observers concurrently fixated a nonface object from a category for which participants had acquired expertise either in the laboratory (Greebles; Rossion et al., 2004) or through long-term experience outside of the laboratory (cars; Rossion et al., 2007). Importantly, the effect of neural suppression significantly correlated with a behavioral index of visual expertise. Competition was also suggested in a training study in which a visual agnosic patient with an inferotemporal lesion participated in a training regimen that required learning to individuate novel objects called Greebles (Behrmann, Marotta, Gauthier, Tarr, & McKeeff, 2005). For this patient, small improvements in performance with Greebles during the long training program were associated with behavioral costs in face perception, and Greeble-evoked increases in activity in the fusiform gyrus were associated with decreases in response to faces. Overall, these results suggest that faces and objects of expertise may compete in a variety of situations.

With the exception of the single case study by Behrmann et al. (2005), previous work has been limited to neural measures of competition, without providing direct evidence that the perceptual processing of faces is actually impacted by competing objects of expertise. This was the case because either the behavioral task used was exceptionally easy (Rossion et al., 2004, 2007), or the task required selective attention to an image’s parts, thereby favoring novices (Gauthier et al., 2003). Thus, it remains to be determined whether neural measures of competition will necessarily translate into reduced behavioral performance. The relationship between neural markers of face selectivity and face perception is not clear. For instance, fMRI activity for nonface objects in the fusiform face area (FFA) correlates with behavioral expertise measures (Behrmann, Avidan, Gao, & Black, 2007; Gauthier, Curby, Skudlarski, & Epstein, 2005; Xu, 2005) but, paradoxically, FFA responses to faces can also be found in patients incapable of recognizing faces (Marotta, Genovese, & Behrmann, 2001; Rossion et al., 2003). A normal face-selective M170 potential (the MEG analog of the N170) has also been reported in developmental prosopagnosics (Harris et al., 2005).

To address this issue, one study directly measured perceptual thresholds for discriminating faces in a RSVP stream of faces alternating with task-irrelevant cars (McKeeff, McGugin, Tong, & Gauthier, in press). Car experts were slower than novices at identifying faces among task-irrelevant cars. However, in a control condition in which subjects searched for watches among car distractors, the same car experts were faster than novices at identifying studied watches, suggesting that objects of expertise do not simply grab attention. Rather, these results demonstrate how the perception of faces can be influenced by competing objects from another domain of expertise.

In most prior cases of competition between faces and objects of expertise, objects from the two categories alternated. In some cases, objects of expertise were held in visual short-term memory, whereas an effect was observed during face processing (e.g., Gauthier et al., 2003). Although, in that study, competition could have taken place at a perceptual level and/or in working memory, there are reasons to believe that the locus of competition was only perceptual. Indeed, interference between items in working memory obeys different principles governed by limitations in the encoding and/or maintenance of similar representations, rather than by expertise (Cheung & Gauthier, 2010). This suggests that we should be able to detect competition between domains of expertise in any task that taps into an expertise-specific perceptual bottleneck.

Here, we investigated whether task-irrelevant distractors from a category of expertise can also compete in the context of a visual search task in which stimuli are distributed in space rather than in time. If competition among categories of expertise is governed by perceptual factors, this suggests that it should influence performance during visual search, a task that is affected by perceptual factors (Duncan, 2006; Duncan & Humphreys, 1989; Raymond, Shapiro & Arnell, 1992; Treisman & Gelade, 1980; Wolfe, Cave, & Franzel, 1989). In McKeeff et al. (in press), the threshold for detecting a face in a rapid serial visual presentation (RSVP) stream varied as a function of one’s expertise for the irrelevant cars. To what extent was competition observed because cars were presented at fixation and at a rate at which attention could not be disengaged fast enough to ignore them? The visual search paradigm is one of the most useful paradigms for examining the deployment of attention. We measured competition in a spatial search to assess whether the locus of competition is early enough in processing that it disrupts selective attention to faces when peripheral task-irrelevant distractors could, in theory, be filtered out via top-down attentional mechanisms (Wolfe, 1994).

The efficiency of visual search can be influenced by the familiarity of the target, but also by the familiarity of the distractors. On the one hand, the literature supports a search asymmetry in which the detection of an unfamiliar target amongst familiar distractors is easier/faster than the reverse search (e.g., Malinowski & Hubner, 2001; Rauschenberger & Chu, 2006; Shen & Reingold, 2001; Wang, Cavanagh, & Green, 1994). For example, visual search experiments have demonstrated superior search efficiency for letters or objects in their canonical orientation, including search for mirror-reversed Ns or Zs among canonically oriented Ns and Zs (Wang et al., 1994), inverted As among upright As (Wolfe, 2001), or inverted animals among upright ones (Wolfe, 2001). Another line of visual search studies has explored the effect of stimulus familiarity when stimuli are kept constant. Rather than rendering objects less familiar by rotation or inversion, some of this work compared performance across American and Chinese subjects searching for meaningful or nonmeaningful Chinese characters (Rauschenberger & Chu, 2006; Shen & Reingold, 2001), German and Slavic subjects searching for meaningful or nonmeaningful Latin or Cyrillic alphabet units (Malinowski & Hubner, 2001), subjects searching for unfamiliar faces when the distractors were one’s own face (Tong & Nakayama, 1999), and white subjects searching for cross-race faces (faces of a different race than the subject) among same-race faces (Levin, 1996; Levin & Angelone, 2001). Across domains, search was most efficient for unfamiliar targets among familiar distractors.

The studies described above used relatively homogeneous targets and distractors. Other studies using more heterogeneous distractor displays with complex objects have demonstrated the importance of target familiarity in addition to distractor familiarity, showing that targets may actually become more salient than novel targets (Hershler & Hochstein, 2009; Mruczek & Sheinberg, 2005). When the task was to detect the presence of a car, a face, or a bird (at the category level) in displays with objects of various categories, faces, as well as objects of expertise, were found more rapidly (Hershler & Hochstein, 2009).

In the present study, we combine a search for a specific target among homogeneous distractors with the presence of a variable number of objects from a visually dissimilar category. Car experts and car novices searched for a target (face or sofa) in a display of distractors from two different categories. In face-target conditions, the distractors were either faces and cars or faces and sofas, whereas in sofa-target conditions the distractors were either sofas and cars or sofas and faces. The number of distractors from the target category remained constant (5), whereas the number of distractors from the task-irrelevant category varied (two, four, or eight). Only the number of task-irrelevant distractors (rather than targets) was manipulated to assess how expertise with distractors differently affected the efficiency of search for another object of expertise (face) or for a common object (sofa).

We predicted that car expertise would impede search for a target face among car distractors, such that visual search slopes would be correlated with an observer’s level of expertise with cars. As the number of task-irrelevant car distractors increases, the detection of a target face should become more difficult. However, we expected no such influence of car expertise during search for a sofa target among car distractors. If this were found, it would suggest a bottleneck for domains of expertise, rather than a general attentional effect for distractors of expertise.

Method

Participants

Thirty-one individuals participated (19 males; mean age, 23 years). Although our main prediction focused on car expertise as a continuous measure, we also divided participants into car “novices” or “experts” on the basis of a median split on scores from an independent test that compared car identification with bird identification. Bird identification was intended to serve as a baseline of novice-level performance across individuals, thus three participants were excluded on the basis of bird identification scores that were more than 2 standard deviations from the mean. The data are from the remaining 28 participants, 14 car “experts” and 14 car “novices.” All the participants had normal or corrected-to-normal visual acuity. The experiment was approved by the Institutional Review Board at Vanderbilt University, and all the participants provided written informed consent.

Stimuli and design

Stimuli were digitized grayscale images (30 faces (75 × 75 pixel, [2.62° of visual angle], 30 cars [85 × 50 pixels, 2.96° × 1.74° of visual angle], and 30 sofas [75 × 75 pixels, 2.62° of visual angle]) presented on a 21-in., CRT monitor (refresh rate = 100 Hz), using a Macintosh G3 computer using Matlab and Psychophysics Toolbox extension (Brainard, 1997; Pelli, 1997).

On each trial, participants searched for previewed targets in an array of distractors. Target and distractor stimuli were randomized across trials and participants. Thus, a target image on one trial could appear as a distractor image on a subsequent trial. Stimuli were presented in four conditions: face targets among face and car distractors (F/FC), face targets among face and sofa distractors (F/FS), sofa targets among sofa and face distractors (S/SF), and sofa targets among sofa and car distractors (S/SC).

In each condition, a trial began with the simultaneous presentation of two targets from a given category, either two faces or two sofas, for as long as needed by the participants to encode. They then pressed a key to proceed. A brief fixation and search array followed. The search array consisted of 8, 10, or 14 stimuli: one target, five distractors from the target category, and either two, four, or eight task-irrelevant distractors from another category (see Fig. 1). For each trial, stimuli were randomly assigned a position from 16 available locations created by a 4 × 4 matrix. Images were randomly jittered by a maximum of 50 pixels up/down and 50 pixels left/ right about their central location. Jittering avoided a grid-like display and discouraged a line-by-line search. Participants were instructed to search the display until they found a target, at which point they indicated which of the two studied targets had been found by pressing a right or a left key.

Fig. 1
figure 1

Trial sequence for the F/FC condition. Each trial began with the presentation of two faces, followed by a fixation cross. With the onset of the search display, participants searched for one of the two studied faces, responding as quickly and accurately as possible. Search displays always contained one studied image, five images from the same category as the studied images, and two, four, or eight nontarget distractors

Participants first completed 8 practice trials, 4 sampled randomly from a block of face-target trials and 4 sampled randomly from a block of sofa-target trials. For the actual experiment, 60F/FC trials and 60F/FS trials were randomly presented within one face-target block, whereas 60S/SF trials and 60S/SC trials were randomly presented within one sofa-target block. Each of fhe two blocks of 120 trials was repeated once. Thus, the complete design consisted of four blocks and a total of 480 trials.

In a separate task, car expertise was quantified using a sequential matching paradigm as in prior work (Gauthier et al., 2005; Gauthier, Tarr, et al., 2000; McGugin & Gauthier, 2010; Rossion et al., 2004; Xu, 2005). Participants made same/different judgments about car images (at the level of make and model, regardless of year) and bird images (at the level of species). For each of 112 car trials and 112 bird trials, the first stimulus appeared for 1,000 ms, followed by a 500-ms mask. A second stimulus then appeared and remained visible until a same/different response was specified or until 5 s elapsed with no activity. The results from this task yielded a separate sensitivity score for cars (car d’) and birds (bird d’). The difference between these measures (car d’–bird d’) yields a car expertise index for each participant. According to prior work, participants were classified as car “experts” when their self-report as an expert matched a car expertise index score of greater than 1.

Results

Measure of expertise

For a coarse analysis of expertise effects and to graph performance for groups of participants that differ on car expertise, we first split the participants into two groups. The 14 participants who had a car expertise index greater than 1 were also self-reported car experts; the remaining 14 participants were classified as novices. Figure 2 shows the range of car d’ scores (0.23–3.71) and more limited range of corresponding bird d’ scores (0.46–1.25). Car d’ was higher for self-reported car experts (d’ = 2.70) than for self-reported car novices (d’ = 0.71); F(1, 27) = 109.71, p  <  .0001), whereas no group difference was found in the bird task, F(1, 27)  <  1, n.s.. The car expertise index was also higher for self-reported car experts (delta d’ = 1.91) than for novices (delta d’  =  -0.14), F(1, 27) = 116.91, p  <  .0001). For all the participants, subsequent statistical analyses performed on car expertise index scores yielded results qualitatively and statistically similar to those obtained using the simple car d’ as a measure of expertise. In other words, our results were not produced by variability in performance on the bird task.

Fig. 2
figure 2

Scatterplot showing the distributions of car d’ and bird d’ scores from the car expertise test

Visual search task

Mean response times (RTs) for correct responses and mean accuracy scores are reported separately for car experts and novices in each condition in Table 1. We were primarily interested in search times, but we will also consider accuracy in order to rule out the possibility of trade-offs. In general, mean RTs were fairly long, as would be expected in a difficult search for a target among visually similar distractors from the same category (see Fig. 3).

Table 1 Mean Response Times (in Milliseconds) for Correct Trials and Mean Accuracy Rates for Each Condition and Set Size Separately for Car Experts and Car Novices
Fig. 3
figure 3

Response time (in milliseconds) as a function of the number of task-irrelevant distractors (two, four, or eight) for car experts (n = 14) and car novices (n = 14), who searched for face targets among face and car distractors, face targets among face and sofa distractors, sofa targets among sofa and car distractors, and sofa targets among sofa and face distractors. Error bars represent the standard errors of the means

For each individual and each condition, we computed the slope of search as a function of set size, using linear regression. This was done for correct RT and accuracy separately. In general, search times were fairly long, as was expected given that the search was made difficult by similar distractors of the same category. However, only the number of distractors from the other category was varied, and so the slope was calculated as a function of the number of distractors from the task-irrelevant category. Because search efficiency decreases as the similarity between target and distractors increases (Duncan & Humphreys, 1989), we expected that in most conditions, search slopes as a function of the number of visually dissimilar objects should be fairly shallow, which is what we observed. Negative search slopes (as a function of the number of irrelevant distractors) may occur because, when there are only two distractors, they may pop out and grab attention, whereas this effect would be reduced when the target and distractor categories are more numerically balanced in the display.

Although Fig. 3 suggests that car experts may be slower than car novices, overall, at searching for faces (the error bars reflect the within-subjects variability), there was no main effect of expertise for the F/FS search, F(1, 26) = 1.22, p = .28, or the F/FC search, F(1, 26)  <  1. Even expertise as a continuous variable does not correlate with search rate, collapsing across set size (F/FS, r  =  .12, n.s.; F/FC, r  =  .08, n.s.).

We conducted an ANOVA on the slopes for RT functions with target category (face/sofa) and distractor category (car/noncar) as within-subjects factors and expertise group (car novices/experts) as a between-subjects factor. Note that the distractor factor aligns faces and sofas as noncar categories, which is meaningful because the same prediction (no competition) is expected with both noncar distractors. This analysis revealed a significant main effect of target category, F(1, 26) = 5.921, MSE = 11304.8, p  =  .02, with steeper slopes for the sofa than for the face targets and a significant three-way interaction between group, target category, and distractor category, F(1, 26) = 5.540, MSE = 14,538.1, p = .03. Scheffé post hoc tests revealed a steeper slope for car experts than for novices in the search for faces among car distractors (p  <  .05), whereas there was no effect of expertise for the other three search conditions, all ps  >  .24.

The same ANOVA conducted on the accuracy data did not yield any significant effect, including the critical three-way interaction, (F(1, 26)  <  1). Thus, there was no evidence for a speed–accuracy trade-off.

Our prediction was that car expertise would reduce the efficiency of a face search in the presence of car distractors. Because the distinction between car novices and experts was arbitrary, we correlated visual search RT slopes with the quantitative measure of car expertise obtained from the sequential matching task. F/FC was the only condition that revealed a positive correlation between car expertise index scores and visual search slope, r  =  .43, p  =  .03 (see Fig. 4). Crucially, this effect does not seem to arise because cars are simply harder to ignore by car experts, since there was no effect of car expertise in a search for sofas with car distractors (or in the F/FS or S/SF conditions; all rs  <  .17, n.s.). To ensure that the few novices with especially negative search slopes in the F/FC condition could not account for the observed effect, we recalculated correlations for all conditions, while excluding the 2 participants with the most negative search slopes in the F/FC search condition. The results were qualitatively identical (F/FC: r  =  .39, p  =  .05; all other conditions nonsignificant).

Fig. 4
figure 4

Correlations between car expertise Index values (car d’–ird d’) and visual search slope (reaction time as a function of the number of task-irrelevant distractors) for faces among faces and cars (F/FC; r = .41, p = .03), faces among faces and sofas (F/FS; r = -.11, n.s.), sofas among sofas and cars (S/SC; r = -.17, n.s.), and sofas among sofas and faces (S/SF; r = -.02, n.s.)

Car expertise index scores and car d’ scores yielded equivalent results in all analyses. We conclude that task-irrelevant car distractors interfere with face perception as a function of car expertise.

Discussion

We found that objects of expertise interfere with concurrent face processing, even when they are entirely task irrelevant, visually distinct from faces, and separated in space from faces. This result converges with earlier neural evidence of competition between the concurrent processing of faces and objects of expertise (Gauthier et al., 2003; Rossion et al., 2007; Rossion et al., 2004), while strengthening the hypothesis that this interference can have significant behavioral costs as well.

There has been debate in the literature over how best to characterize the contribution of top-down and bottom-up processes in guiding saccadic search behaviors (Barton, Radcliffe, Cherkasova, Edelman, & Intriligator, 2006; Chen & Zelinsky, 2006). Competing hypotheses propose that visual search is either a bottom-up mechanism driven by low-level image characteristics and feature contrast in a scene or, alternatively, a top-down process under voluntary control of the searcher and high-level demands of the task (Hershler & Hochstein, 2009; Newell, Brown, & Findlay, 2004; Tatler, Baddeley, & Gilchrist, 2005). Our findings support the idea that visual search is, at least in part, a goal-driven process in which interference from distractors depends not only on our experience with them, but also on whether distractors and targets are objects of expertise recruiting a similar perceptual strategy. Prior work shows that some types of expertise recruit a holistic processing strategy typically observed for faces only (Busey & Vanderkolk, 2005; Curby & Gauthier, 2009; Wong et al., 2009). Moreover, one study specifically addressed competition between holistic processing of faces and cars in sequential presentation and showed increasing competition with car expertise (Gauthier et al., 2003). Together, these results suggest that holistic processing is the functional locus of the interference effects we observed. Studies also suggest the possibility of a localized neural bottleneck to account for the competition: At least one area in the extrastriate cortex, the FFA, is selective for both faces and objects of expertise (Gauthier, Skudlarski, et al., 2000; Gauthier, Tarr, Anderson, Skudlarski, & Gore, 1999; Harley et al., 2009; Moore, Cohen, & Ranganath, 2006; Xu, 2005).

Nonetheless, our results do not allow us to determine exactly how the presence of car distractors influenced the search for a face. For instance, are car experts more likely than novices to foveate a car, or is their decision slowed down by information present only in the peripheral visual field (Rajashekar, Bovik, & Cormack, 2006)? Future research using eye movement recordings may help resolve this issue.Footnote 1 Importantly, whatever car experts did differently as a function of the number of car distractors, they did so only when searching for a face. This allows us to rule out the possibility that observers are simply more likely to look at objects of expertise, because they would have done so in all car distractor conditions. Unless car experts were engaged in a search for faces, no behavioral cost of car distractor was observed. This suggests that expert perceptual skills are both top-down—processing cars differently depending on whether targets were faces or objects—and relatively automatic, since irrelevant cars interfered once a certain mode of processing was engaged. Of course, our behavioral paradigm does not allow us to determine whether neural competition also would arise when faces and cars are both presented but neither is task relevant. This question would be more easily addressed with neuroimaging.

Interestingly, competition in the present paradigm suggests that interference between domains of expertise may arise in a wide range of situations, well beyond dual-task paradigms that are tested in the laboratory. In everyday living, observers are rarely forced to make a judgment about one object while fixating another object (Rossion et al., 2007; Rossion et al., 2004), when objects from the two different categories occur in very rapid succession at the fovea (McKeeff et al., in press; Wong, Qu, McGugin, & Gauthier, 2010) or when judgments for objects from two categories must be made in alternation (Gauthier et al., 2003). However, top-down attention is often directed to objects from one category in the presence of distractors from another category in the peripheral visual field. For instance, an airport security screener could experience competition between task-relevant objects on the screen and irrelevant faces in the surrounding environment. Whether or not this happens, we would argue, depends on whether the expert skills rely on processing strategies that are common with those elicited by faces or, rather, those that are used for generic object recognition.