Introduction

Human faces are an important source of information during social interactions because they convey a wealth of personal characteristics, including identity, gender, age, race, attractiveness, and emotions (e.g., Cunningham, 1986; Ekman & Oster, 1979; Hall et al., 2005; Schyns et al., 2002). The human brain is also known to have a specialized neural network for processing facial stimuli (Allison et al., 1999; Haxby et al., 2000, 2002; Hoffman & Haxby, 2000; McCarthy et al., 1999; Puce et al., 1999). Because human faces are biologically and socially significant, they have the ability to preferentially capture and engage attention (Palanica & Itier, 2012; Palermo & Rhodes, 2007; Ro et al., 2001).

Another major reason why we look at human faces is to discriminate eye gaze. In everyday life, gaze signals play an important role to capture and orient attention. We can be easily captivated by anyone who looks at us (Kleinke, 1986), and influenced to look in the same area that someone else is looking (Driver et al., 1999; Friesen & Kingstone, 1998; Frischen et al., 2007a; Langton & Bruce 1999; Ricciardelli et al. 2009). Furthermore, the congruency of one’s eye gaze direction and their head orientation can capture attention better than if one’s eye gaze and head orientation are incongruent or misaligned with each other. For example, reaction times (RTs) to discriminate gaze direction are shorter when the eyes and head are oriented in the same direction than when they are oriented in different directions (Itier et al., 2007a, 2007b; Langton, 2000; Palanica & Itier, 2015; Seyama & Nagayama, 2005; Todorović, 2009). People generally align their heads and eye direction to grab another person’s attention or to orient the observer’s attention to some other specific place/person. The ability to discriminate eye gaze and head orientation is vital for proper communication, and deciphering these social cues is thought to play an important role in theory of mind—the ability to understand and attribute mental states to oneself and others (Baron-Cohen et al., 1997). Since these social cues are so important during human interaction, it has been proposed that a dedicated cognitive mechanism, known as the Direction of Attention Detector (DAD; Perrett & Emery, 1994), integrates visual information from looking at a person’s eyes, head, and even body position, to infer where their attention is oriented.

Although the majority of research examining how gaze and/or head orientation captures visuospatial attention has used explicit gaze discrimination tasks (e.g., Burton et al., 2009; Loomis et al., 2008; Palanica & Itier, 2015; Yokoyama et al., 2014), another way to study how attention is captured is by using an inhibition of return (IOR; Posner & Cohen, 1984) cueing paradigm. The phenomenon of IOR refers to the finding that after attention has been exogenously cued and then withdrawn from a location, attention is inhibited to return to that region, and there is delayed responding to stimuli subsequently presented at that cued location relative to previously uncued locations (Klein, 2000). IOR is suggested to promote efficient foraging behavior to bias searching towards novel areas relative to areas already inspected (Klein, 1988; Klein & MacInnes, 1999). IOR may follow an initial facilitation period (prior to ~200 ms Stimulus Onset Asynchronies, SOAs) where attention resides at the cued location and targets are detected faster than those presented at uncued locations. After the subsequent withdrawal of attention from the cued location (from ~300 ms to 3,000 ms SOAs), IOR then develops, in which RTs to cued targets are slower than RTs to uncued targets (see Samuel & Kat, 2003, for a review). However, IOR may also occur without facilitation at short SOAs, and IOR has been found for SOAs ranging from 0 to 6,000 ms (Berlucchi et al., 1989; Dodd & Pratt, 2007; Tassinari et al., 1994; Tassinari & Berlucchi, 1995; Tassinari et al., 1989; see Collie et al., 2000, for a review). This IOR effect at short SOAs has been inferred to be a motor bias against responding to targets appearing at previously cued locations, resulting from inhibiting saccades toward the cued location during covert target detection (Klein & Taylor, 1994; Rafal & Henik, 1994). However, IOR cannot be accounted for by a single mechanism, but rather many potential causes, including sensory/perceptual, attentional/cognitive, and motor/oculomotor (Berlucchi, 2006). It should also be noted that the majority of research examining IOR processes have used low-level visual stimuli (e.g., a brightened square in the periphery). It is possible that encoding or processing more meaningful stimuli, such as faces, requires more cognitive resources and thus facilitates different IOR response patterns than the traditional facilitation-IOR curve.

The few studies that have examined IOR with faces usually involve an exogenous cue-response paradigm in which a face cue is presented prior to a target (e.g., an asterisk), which is to be detected via button press or a speeded saccade to the target location. Sometimes, these studies use other control stimulus cues (e.g., household objects or scrambled faces) to compare the IOR response to faces (e.g., Taylor & Therrien, 2005; Taylor & Therrien, 2008; Theeuwes & Van der Stigchel, 2006; Weaver et al., 2012). In these paradigms, both types of stimuli should capture attention and lead to an IOR response. However, faces should initially capture more reflexive attention compared to other stimuli (Bindemann et al., 2005; Langton et al., 2008; Palermo & Rhodes, 2007; Ro et al., 2001) since faces are processed faster and more in depth than other stimuli. This means that after attention is disengaged from the stimulus cue (and brought back to fixation), there should be a stronger inhibition to return to that previously attended area where the face was, compared to if another stimulus was there, since the faces were more thoroughly attended to, and there should be a stronger bias to search for novel locations (perhaps for new faces). This would lead to a greater IOR response elicited by faces compared to other stimuli. Some of these studies have found an IOR effect for faces compared to other stimuli (Taylor & Therrien, 2008; Theeuwes & Van der Stigchel, 2006), while others have found no difference in IOR magnitude (Taylor & Therrien, 2005). However, it should be noted that these studies used a relatively narrow range of SOAs (e.g., 800–1,000 ms), so it is unknown what time course IOR may follow for face stimuli. Weaver et al. (2012) used two SOAs (200 ms and 700 ms) and found that both faces and household objects induced an IOR response for both short (200 ms) and long (700 ms) SOAs; however, faces reduced saccade latencies toward subsequently presented targets (in both cued and uncued locations), independent of an IOR, only at the short SOA (Weaver et al., 2012). That is, saccade latencies to targets for face trials were faster overall compared to saccade latencies to targets for household object trials, but only at the 200 ms SOA. These authors suggested that this reflected what they called a short-lived priming effect or social facilitation effect from the mere presence of a face, due to the higher level of vigilance to respond to social stimuli.

Overall, the literature is inconsistent on how and when faces capture attention via an IOR cueing paradigm. Furthermore, previous research has failed to investigate the development of IOR for faces over multiple SOAs, and whether gaze direction and/or head orientation influence the IOR response for faces, since these previous studies have only used front-view faces with straight gaze as facial stimuli. The current study had two main goals. First, we wanted to examine the time-course that IOR develops for face stimuli. Second, we wanted to investigate whether facial social cues (eye gaze and head orientation) affect this IOR development. The main purpose for a proposed DAD mechanism is to capture and orient one’s attention whenever these social cues are present in the environment. A feasible way to test how effective these social cues influence visuospatial attention is through an IOR cueing paradigm using multiple display onset times, as in the current study.

In two conditions, faces with frontal heads (Frontal Head condition) and averted heads (Averted Head condition) were used, with house stimuli used as a control in both conditions. The stimulus cue (either a house, frontal gaze face, or averted gaze face) was presented to the left or to the right of fixation for 150 ms, and after a variable SOA (150 ms, 300 ms, 800 ms, 1,200 ms, 2,400 ms), participants responded to a peripheral target (an asterisk) via button press. If facilitation of target responses were to occur (i.e., shorter RTs to cued than uncued targets), it would be likely to occur at the shortest SOA (150 ms); this would then be followed by IOR responses (i.e., longer RTs to cued than uncued targets) at the longer SOAs. However, based on other research (Berlucchi et al., 1989; Dodd & Pratt, 2007; Tassinari et al., 1994; Tassinari & Berlucchi, 1995; Tassinari et al., 1989), IOR could also occur at all of the SOAs that were tested. Based on the priming effect or social facilitation effect found by Weaver et al. (2012), it was predicted that a similar cue-based RT advantageFootnote 1 would occur for faces compared to houses at the early SOA (150 ms), independent of an IOR response. That is, an IOR response is still expected to occur at the 150 ms SOA, but overall RTs should be faster for face trials compared to house trials, regardless of cued or uncued positions. Since faces should initially capture a faster shift of attention compared to houses (Bindemann et al., 2005; Langton et al., 2008; Palermo & Rhodes, 2007; Ro et al., 2001), it is then predicted that at the later SOAs, a greater IOR response should be seen for faces in general compared to houses. Since faces can capture attention very rapidly, at around ~100 ms (Crouzet et al., 2010), and should be processed in depth very quickly (Langton et al., 2008; Ro et al., 2001), there should not be a need to immediately re-examine a face once it has been attended to because vital social information (like eye gaze) should have been fully processed from the initial attention capture. By contrast, other stimuli should not capture as much initial attention as faces, so there should be less of an inhibition to return to that same location since that previous stimulus was not attended to the same degree as faces (e.g., Theeuwes & Van der Stigchel, 2006). Thus, there should be more of a bias to re-attend or re-examine these non-facial stimuli since they were not initially processed as thoroughly as faces.

In addition to the face-object difference, this study investigated how gaze direction and head orientation would modulate these effects. It has been shown that head orientation strongly influences gaze discrimination (Itier et al., 2007a, 2007b; Langton, 2000; Seyama & Nagayama, 2005; Todorović, 2009), especially when stimuli are presented beyond foveal vision (Palanica & Itier, 2015), as in the current study. In line with the DAD hypothesis that eye and head cues are integrated to better capture attention, it was predicted that congruent gaze-head stimuli (i.e., frontal heads with a front-facing gaze and averted heads with an averted gaze) should capture more reflexive attention than incongruent gaze-head stimuli (i.e., frontal heads with averted gaze and averted heads with frontal gaze). At the short SOA, the cue-based RT advantage should be more enhanced (i.e., faster RTs) for congruent than incongruent gaze-head stimuli. At the longer SOAs, the IOR response should be larger for congruent stimuli than incongruent stimuli. Lastly, it should be noted that both congruent and incongruent gaze-head stimuli should capture more attention than houses, since faces in general should be more attention-grabbing than other stimuli. Thus, the order of IOR magnitudes should be largest for congruent gaze-head stimuli, followed by incongruent gaze-head stimuli, and then followed by house stimuli.

Methods

Participants

Twenty-four participants performed the Frontal Head condition (12 female, 12 male; 22 right-handed; age range 18–23 years, M = 19.8), and 23 participants performed the Averted Head condition (10 female, 13 male; 22 right-handed; age range 18–22 years, M = 19.7). Participants were undergraduate students from the University of Waterloo (UW), with normal or corrected-to-normal vision, who took part in the study for course credit. The study received full ethics clearance from the UW Research Ethics Board and all participants signed informed written consents.

Stimuli

House stimuli included greyscale photographs of eight different houses. Face stimuli included greyscale photographs of eight individuals (four men, four women) with neutral expression that were taken from George et al. (2001).Footnote 2 Each face was photographed against a black background with the head pointed straight towards the camera (i.e., frontal heads) and with the head oriented 30° to the right side (i.e., averted heads), with the eyes looking straight ahead at the camera (i.e., frontal gaze) or 30° to the right side (i.e., averted gaze). These four pictures were then mirror-reversed using Adobe Photoshop to avoid any bias between the left and right sides. The Frontal Head condition consisted of only front-view faces with either frontal or averted gaze, as well as the house stimuli, while the Averted Head condition consisted of faces with an averted head orientation and the same house stimuli as in the Frontal Head condition. The total area of all stimuli subtended a visual angle of 4.4° horizontally by 6.6° vertically (same area for faces and houses). The eye region of the faces subtended a visual angle of 2.5° horizontally by 0.5° vertically for frontal heads, and 2.2° horizontally by 0.5° vertically for averted heads (slightly narrower as the faces were oriented to the side). Examples of the types of stimuli used in the study are shown in Fig. 1; however, note that these are not the actual face stimuli used (see George et al., 2001, for accurate examples).

Fig. 1
figure 1

Stimulus presentation, with reproductions (i.e., not the actual photos) of the George et al. (2001) faces used in the study (an example of an actual house stimulus used in the study is shown). The fixation cross was shown during the entire duration of each trial to keep participants’ fixation focused. Please note that for averted gaze faces, both left- and right-looking faces were used, and for averted head views, both left- and right-facing head orientations were used

Apparatus

A Viewsonic PS790 CRT 19-in color monitor was used to present the stimuli (Intel Corel 2 Quad CPU Q6700; 1024 × 768 pixels; 60-Hz frame rate). A remote EyeLink 1000 eye-tracker from SR Research controlled central fixation and recorded any potential eye movements with a sampling rate of 1,000 Hz. Participants’ viewing position and distance were maintained by chin and forehead rests. At a viewing distance of 70 cm, the monitor subtended a visual angle of 29.2° × 22.2°.

Procedure

Participants were randomly assigned to either the Frontal Head or Averted Head condition. All aspects of both face conditions were identical except for the head orientation of the face stimuli. Participants initiated trials by fixating a central black fixation cross (1° × 1°) on a white background; the fixation cross was presented for 1,200 ms, which then became a fixation trigger for 300 ms to activate the next trial (see Fig. 1). A stimulus cue was then presented for 150 ms, the center of which was situated 4.5° to the left or to the right of fixation. From Palanica and Itier (2015), it was shown that gaze direction could be discriminated at this eccentricity, across head orientations, using covert attention. Following a variable cue–target SOA of 150 ms,Footnote 3 300 ms, 800 ms, 1,200 ms, or 2,400 ms, a response target (a black asterisk (*) symbol, 0.8° × 0.8° of visual angle) appeared to the left or to the right at the same eccentricity as the center of the stimulus cue for 1,000 ms regardless of whether a response was made or not. The central fixation cross remained visible throughout the trial. Participants were instructed to focus on the central fixation at all times and respond to the target as quickly as possible. Responses were made on a standard keyboard layout, using the keys z (for left position targets), and m (for right position targets), with the index fingers of both hands. After 12 practice trials, eight experimental blocks of 120 trials were completed, totaling 960 trials. A rest was given between blocks. One-third of the stimulus trials consisted of frontal gaze (FG) faces; one-third of the trials consisted of averted gaze (AG) faces (left-averted and right-averted gaze were counterbalanced); and one-third of the trials consisted of houses. All stimuli were presented randomly and were counterbalanced across all SOAs and cued/uncued positions (i.e., each block had the same number of stimuli, SOAs, and cueing positions). Targets appeared in cued and uncued locations with equal probability. Participants were given no information about the stimuli, and all were told that the purpose of the study was to examine visual perception. The entire experiment lasted approximately 75 min.

Data analysis

Data analysis was identical for both face conditions. Left- and right-averted gaze directions were combined and averaged for each target position. Preliminary analyses revealed no effect of gaze direction on cued or uncued targets (i.e., no gaze orienting effect was found). That is, the averted gaze faces did not increase or decrease the RTs to subsequently presented targets in either the “looked-at” or “not looked-at” locations.Footnote 4 The dependent variable was the RT which followed target presentation. For each subject, RTs that were below 100 ms or exceeded 2.5 standard deviations (SDs) from the mean of each stimulus condition were discarded, removing any anticipatory responses or outliers from the data (Van Selst & Jolicoeur, 1994), which eliminated 10.2 % in the Frontal Head condition, and 7.8 % in the Averted Head condition. All trials where more than one fixation was made were eliminated (3.7 % for Frontal Heads; 3.4 % for Averted Heads). Target detection accuracy (i.e., pressing the correct target button) was 99.2 %. Preliminary analyses also revealed no significant effects of participant gender, stimulus gender, or stimulus location (left or right side of fixation) for either condition.

For each head condition, a repeated measures ANOVA was run with the following within-subject factor design: 5 SOAs (150, 300, 800, 1,200, and 2,400 ms) × 2 cuing conditions (cued and uncued) × 3 stimulus cues (FG, AG, and House). For all ANOVAs, the Greenhouse-Geisser degrees of freedom correction was used when the sphericity assumption was violated. As the omnibus ANOVAs revealed a three-way interaction for each head condition (see results below), 2 (cueing condition) by 3 (stimulus cue) repeated measures follow-up ANOVAs were conducted at each SOA. As the main question pertained to the IOR effect and how it varied as a function of stimulus cue, when a cueing by stimulus cue interaction was found for a given SOA, a separate ANOVA using the factor cue type was run on the IOR effect directly (i.e., on the difference between the cued and uncued conditions), and IOR was compared between the three stimulus cues using post-hoc paired comparisons. For all post-hoc tests, the Bonferroni correction was used to control for multiple comparisons.

Results and discussion

A mixed omnibus ANOVA analyzing both head conditions together revealed a significant four-way SOA (within) by cueing condition (within) by stimulus cue (within) by head orientation (between) interaction (F(8, 360) = 2.43, MSE = 138.41, p < .05). This indicates that both conditions were indeed different from one another, and that head orientation affected the performances.

Frontal heads

The 5 (SOA) × 2 (cueing condition) × 3 (stimulus cue) omnibus ANOVA revealed main effects of SOA (F(4, 92) = 70.65, MSE = 661.75, p < .0001, η p 2 = .75), cueing condition (F(1, 23) = 39.60, MSE = 410.37, p < .0001, η p 2 = .63), and interactions between SOA and cueing condition (F(4, 92) = 23.80, MSE = 199.97, p < .0001, η p 2 = .51), and between SOA and stimulus cue (F(8, 84) = 6.47, MSE = 143.70, p < .0001, η p 2 = .22). There was also a significant three-way interaction between SOA, cueing condition, and stimulus cue (F(8, 184) = 2.44, MSE = 111.98, p < .05, η p 2 = .10). Accordingly, 2 (cueing condition) by 3 (stimulus cue) repeated measures ANOVAs were conducted at each SOA.

For the 150-ms SOA (Fig. 2a), a main effect of cueing condition (F(1, 23) = 74.61, MSE = 307.30, p < .0001, η p 2 = .76) revealed slower RTs for cued versus uncued targets, indicating an overall IOR effect. A significant main effect of stimulus cue was found (F(2, 46) = 13.91, MSE = 117.97, p < .0001, η p 2 = .38) and post-hoc comparisons revealed slower RTs for Houses than both FG and AG faces (all p < .001), indicating a cue-based RT advantage for faces. Lastly, there was a cueing condition by stimulus cue interaction (F(2, 46) = 4.19, MSE = 101.27, p < .05, η p 2 = .15). The ANOVA on the IOR effect confirmed an effect of cue type (F(2, 46) = 4.19, MSE = 202.54, p < .05, η p 2 = .15). Post-hoc paired comparisons revealed that the IOR effect (i.e., RT between cued stimulus and uncued stimulus) was significantly larger for Houses (M = 32 ms) than for AG faces (M = 22 ms; p < .05; Fig. 3a); the IOR magnitude for FG faces (M = 22 ms) was not different from either Houses or AG faces.

Fig. 2
figure 2

Target response reaction times (RTs) for the Frontal Head condition, as a function of cueing condition, stimulus cue, and stimulus onset asynchrony (SOA). Results for (a) 150-ms SOA, (b) 300-ms SOA, (C) 800-ms SOA, (d) 1,200-ms SOA, and (e) 2,400-ms SOA (all shown with standard error bars). See text for details

Fig. 3
figure 3

Inhibition of return (IOR) magnitudes (i.e., difference scores between cued stimulus reactions times (RTs) and uncued stimulus RTs) for the Frontal Head condition, as a function of stimulus cue and stimulus onset asynchrony (SOA). Results for (a) 150-ms SOA, (b) 300-ms SOA, (c) 800-ms SOA, (d) 1,200-ms SOA, and (e) 2,400 ms SOA (all shown with standard error bars). See text for details

For the 300-ms SOA (Fig. 2b), there were no main effects of cueing condition or stimulus cue, and no interaction between the two variables (all p > .1). That is, there was no IOR effect for any stimulus cue (Fig. 3b).

For the 800-ms SOA (Fig. 2c), there was a main effect of cueing condition (F(1, 23) = 25.83, MSE = 187.29, p < .0001, η p 2 = .53), indicating slower RTs for cued versus uncued targets (IOR effect). A main effect of stimulus cue (F(2, 46) = 4.44, MSE = 126.09, p < .05, η p 2 = .16) was found, and post-hoc tests revealed slower RTs for AG faces than Houses (p < .05). The other paired comparisons were not significant (p > .1). No interaction was found between cueing condition and stimulus cue (p > .7; Fig. 3c).

For the 1,200-ms SOA (Fig. 2d), a main effect of cueing condition was found (F(1, 23) = 6.06, MSE = 212.69, p < .05, η p 2 = .21), indicating slower RTs for cued versus uncued targets (IOR effect). No main effect of stimulus cue, and no interaction between cueing condition and stimulus cue was found (all p > .6; Fig. 3d).

For the 2,40- ms SOA (Fig. 2e), a main effect of cueing condition (F(1, 23) = 9.26, MSE = 72.43, p < .01, η p 2 = .29) revealed slower RTs for cued versus uncued targets (IOR effect). No effect of stimulus cue was found (p > .3), but there was a significant cueing condition by stimulus cue interaction (F(2, 46) = 3.56, MSE = 77.48, p < .05, η p 2 = .13). The separate IOR analysis confirmed a main effect of stimulus cue (F(2, 46) = 3.56, MSE = 154.95, p < .05, η p 2 = .13), and post-hoc comparisons revealed a trend for the IOR for FG faces (M = 8 ms) to be larger than for Houses (M = –1 ms; p = .055; Fig. 3e); the IOR magnitude for AG faces (M = 6 ms) was not different from FG faces or Houses (p > .1).

Averted heads

The 5 (SOA) × 2 (cueing condition) × 3 (stimulus cue) omnibus ANOVA revealed main effects of SOA (F(4, 88) = 40.92, MSE = 1927.86, p < .0001, η p 2 = .65), cueing condition (F(1, 22) = 18.75, MSE = 764.46, p < .0001, η p 2 = .46), and stimulus cue (F(2, 44) = 4.33, MSE = 174.22, p < .05, η p 2 = .16), and interactions between SOA and cueing condition (F(4, 88) = 23.99, MSE = 317.08, p < .0001, η p 2 = .52), and between SOA and stimulus cue (F(8, 176) = 3.58, MSE = 192.69, p < .005, η p 2 = .14). There was also a significant three-way interaction between SOA, cueing condition, and stimulus cue (F(8, 176) = 3.66, MSE = 213.89, p < .005, η p 2 = .14). Thus, 2 (cueing condition) × 3 (stimulus cue) repeated measures ANOVAs were conducted for each SOA.

For the 150-ms SOA (Fig. 4a), slower RTs were found for cued than uncued targets (IOR effect), as revealed by a main effect of cueing condition (F(1, 22) = 38.14, MSE = 755.88, p < .0001, η p 2 = .63). There was also a main effect of stimulus cue (F(2, 44) = 6.58, MSE = 139.12, p < .005, η p 2 = .23). Post-hoc comparisons revealed slower RTs for FG faces and Houses compared to AG faces (all p < .05), indicating a cue-based RT advantage for AG faces. No interaction was found between cueing condition and stimulus cue (p > .2; Fig. 5a).

Fig. 4
figure 4

Target response reaction times (RTs) for the Averted Head condition, as a function of cueing condition, stimulus cue, and stimulus onset asynchrony (SOA). Results for (a) 150-ms SOA, (b) 300-ms SOA, (C) 800-ms SOA, (d) 1,200-ms SOA, and (e) 2,400-ms SOA (all shown with standard error bars). See text for details

Fig. 5
figure 5

Inhibition of return (IOR) magnitudes (i.e., difference scores between cued stimulus reactions times (RTs) and uncued stimulus RTs) for the Averted Head condition, as a function of stimulus cue and stimulus onset asynchrony (SOA). Results for (a) 150-ms SOA, (b) 300-ms SOA, (c) 800-ms SOA, (d) 1,200-ms SOA, and (e) 2,400-ms SOA (all shown with standard error bars). See text for details

For the 300-ms SOA (Fig. 4b), there were no main effects of cueing condition or stimulus cue, and no interaction between the two variables (all p > .1). Thus, there was no IOR effect for any stimulus cue (Fig. 5b).

For the 800-ms SOA (Fig. 4c), slower RTs were found for cued than uncued targets (IOR effect), as revealed by a main effect of cueing condition (F(1, 22) = 10.99, MSE = 290.23, p < .005, η p 2 = .33). Additionally, a main effect of stimulus cue (F(2, 44) = 4.78, MSE = 178.23, p < .05, η p 2 = .18) was found, and post-hoc comparisons revealed slower RTs for AG faces and Houses compared to FG faces (only the FG-House comparison was significant, p < .05). No interaction was found between cueing condition and stimulus cue (p > .1; Fig. 5c).

For the 1,200-ms SOA (Fig. 4d), slower RTs were found for cued versus uncued targets (IOR effect), as revealed by a main effect of cueing condition (F(1, 22) = 6.93, MSE = 143.69, p < .05, η p 2 = .24). No main effect of stimulus cue, and no interaction between cueing condition and stimulus cue was found (all p > .2; Fig. 5d).

For the 2,400-ms SOA (Fig. 4e), there was no main effect of cueing condition (p > .6) or stimulus cue (F(2, 44) = 3.17, MSE = 88.81, p = .059, η p 2 = .13), but there was a cueing condition by stimulus cue interaction (F(2, 44) = 8.38, MSE = 139.38, p < .005, η p 2 = .28). The separate IOR analysis confirmed a main effect of stimulus cue (F(2, 44) = 8.38, MSE = 278.75, p < .001, η p 2 = .28), and post-hoc comparisons revealed that the IOR effect for AG faces (M = 12 ms) was significantly larger than for both FG faces (M = –2 ms) and Houses (M = –7 ms; all p < .05; Fig. 5e), which were not different from one another.

General discussion

The current study used an IOR paradigm to examine how gaze direction and head orientation captured covert attention over a variety of SOAs. The main results showed that all SOAs, with the exception of the 300-ms SOA, produced an IOR, regardless of stimulus type. Additionally, a cue-based RT advantage occurred for faces compared to houses at the short SOA (150 ms), independent of a general IOR response for all stimuli, and this was modulated by gaze direction for averted heads. Lastly, at the longest SOA (2,400 ms), a larger IOR magnitude was found for the faces whose gaze was congruent with head orientation compared to houses. These findings are discussed in more detail below.

Across both conditions, an IOR response was found even at the shortest SOA (150 ms), in line with other research showing a lack of facilitation at short SOAs (Berlucchi et al., 1989; Dodd & Pratt, 2007; Tassinari et al., 1994; Tassinari & Berlucchi, 1995; Tassinari et al., 1989). One theory for this effect posits that inhibition occurs as a result of maintaining fixation at a central location while covertly orienting visual attention to a peripheral cue. Although the peripheral cue initiates the preparation of a saccade toward the cued location, a covertly orienting task requires that this saccade be inhibited. This is thought to result in a motor bias against responding to targets appearing at previously cued locations (Klein & Taylor, 1994; Rafal & Henik, 1994). This theory of motor inhibition also proposes that facilitation observed at short SOAs only arises because there is a temporal overlap between the cue and target (Tassinari et al., 1994; Tassinari & Berlucchi, 1995; Tassinari et al., 1989). In other words, if the stimulus cue is still presented when the target appears (as also shown by the early SOAs of Posner & Cohen, 1984), then attention would not have disengaged from the cued location by the time the target is to be detected; however, if there is no temporal overlap between the cue and the target, then inhibition should occur (Collie et al., 2000). In the current study, there was no temporal overlap between stimulus cue and target presentation. Even at the shortest SOA (150 ms), targets were presented after the stimulus cues vanished (albeit immediately after, without any delay), which also helps explain the IOR effects across the majority of SOAs, without any facilitation preceding inhibition (Collie et al., 2000). In fact, the shortest SOA in the current study actually produced the largest IOR magnitude (across head orientations), which is similar to these previously mentioned studies (Berlucchi et al., 1989; Tassinari et al., 1994; Tassinari & Berlucchi, 1995; Tassinari et al., 1989) that also showed the largest IOR effects at the shortest SOAs (~200 ms). This finding is also similar to the larger IOR magnitude at the 200-ms SOA than the 700 ms SOA found in Weaver et al. (2012). Perhaps the larger IOR response at these short SOAs reflected an efficient foraging strategy (Klein, 1988; Klein & MacInnes, 1999) to search for new faces. Since faces can capture attention very rapidly, at around ~100 ms (Crouzet et al., 2010), this rapid attention capture could have led to a subsequent early inhibition for faces, and indirectly to other stimuli (in this case, houses). In other words, the immediate presentation of faces would have been processed in depth very quickly, and lead to a bias to attend to novel areas, to perhaps search for new faces. The fact that house stimuli also elicited this effect could have been a byproduct of having houses presented in the context of faces. Thus, observers would have still been induced to search for new faces, and inhibit their responses to targets at this early SOA.

Due to the stimulus cue presentation time of 150 ms, this was the shortest possible SOA that could be tested in this study. Additionally, since the target appeared immediately after the stimulus cue vanished (in the same location), there was no temporal overlap between stimulus cue and target. It is also possible that the large IOR effect observed at the shortest SOA was due to a perceptual forward masking effect (Lupiáñez & Weaver, 1998), since the stimulus cue and target occupied the same location, and the IOR effect seemed to briefly disappear at the 300-ms SOA, before reappearing at the longer SOAs. However, facilitation may have occurred if the target was placed just outside the location of the stimulus cue, and the stimulus cue was still being presented during target detection. This design would also allow the possibility to test even shorter SOA times than 150 ms, which may have shown a possible facilitation as well (similar to the short SOAs demonstrated by Posner & Cohen, 1984). Thus, future research may investigate this issue. It should also be noted that the current study used faces, rather than flickering squares (as in Posner & Cohen, 1984), as stimulus cues, which may have facilitated different IOR response patterns than previous studies using low-level stimuli.

Interestingly, all SOAs produced an IOR response (across stimulus types and face orientations) except for the 300-ms SOA. Since the shorter, 150-ms SOA and longer SOAs produced an IOR response, the lack of any effect (facilitation or IOR) at 300 ms was unlikely due to the “crossover” between facilitation and IOR curves, as found historically (Posner & Cohen, 1984). All SOAs were tested within subjects, rather than between subjects or between blocks of trials, which may have produced different response strategies. Intermixing SOAs within blocks makes the timing of the target presentation uncertain. Under such conditions, the 150-ms SOA would be immediately apparent (since the target directly succeeded stimulus cue presentation), and the longer SOAs (800 ms and greater) would give participants more time to process the type and location of the visual stimulus; the intermediate SOA of 300 ms may involve a temporary interference or lapse of processing for the stimulus information. It is also possible that two different cognitive processing mechanisms were simultaneously at play during this IOR paradigm: one that inhibited target responses at very short SOAs, and one at very long SOAs, thus leaving intermediate SOAs with neither a facilitation nor inhibitory target response. It is important to note that IOR (or lack thereof) can occur as a result of more than one attentional or perceptual mechanism (Berlucchi, 2006). Thus, future research could test different SOAs between subjects or between blocks of trials to examine whether these task effects influence the processing of these stimuli or strategy for detecting targets.

A cue-based RT advantage was found for faces at the shortest SOA, independent of an IOR response (i.e., faster RTs for face trials than house trials, regardless of cueing condition), possibly due to the biological and social significance of faces (Palermo & Rhodes, 2007) for which responses require a high level of vigilance. This effect was found at 150 ms, very close to the cue-based RT advantage found at 200-ms SOA by Weaver and colleagues (2012), and this finding in the current study was influenced by gaze-head interactions. With frontal heads, both direct- and averted-gaze faces elicited faster responses than houses, perhaps supporting the notion that faces, in general, are a special type of stimulus for attracting attention (Bindemann et al., 2005; Langton et al., 2008; Palermo & Rhodes, 2007; Ro et al., 2001). Weaver and colleagues (who also used frontal heads) suggested this effect to be an efficient strategy of visual attention to favour the priming of meaningful stimuli (i.e., faces) that may require immediate action from the observer. That is, it may be necessary to react to the unpredictability produced by social contexts, thereby eliciting a higher level of vigilance to faces, similar to a “threat detector” mechanism for processing threatening faces (Morris et al., 2001; Öhman, 2002). Although the current study only used faces with neutral expression, these findings highlight the fact that faces in general are capable of rapidly capturing attention compared to other stimuli. The fact that Weaver et al. (2012) found this cue-based RT advantage with overt eye saccades, while this study found it with covert attention (via button press) suggests that a higher level cognitive processing mechanism is at play for facial stimuli, and also supports the evidence that IOR processes can occur with either eye movements or behavioral responses (Klein, 2000). With averted heads, this cue-based RT advantage only occurred for averted-gaze faces, which suggests that there was a gaze-head congruency effect, but only for averted heads. If there was a general gaze-head congruency effect, then it would also be expected that this cue-based RT advantage be replicated only for frontal heads with frontal gaze. Instead, both frontal- and averted-gaze faces with frontal heads elicited this response. This may have occurred because a frontal head corresponds to the innate face template that human infants show a preference for (Johnson et al., 1991; Morton & Johnson, 1991), rather than an averted head. Thus, a frontal-oriented face may inherently capture attention, regardless of gaze direction, and lead to faster target responses (see also Shirama, 2012, for the idea that frontal faces guide attention). It is also important to note that the two head orientations (frontal and averted) were tested between subjects, due to the time constraints of the relatively long experimental design. Future research could examine whether the effect of head orientation was due to task demands or paradigm design.

Finally, a larger IOR effect occurred for faces compared to houses at the longest SOA (2,400 ms), and this was modulated by gaze direction and head orientation. With frontal heads, a trend toward a larger IOR for frontal gaze faces compared to houses was seen; with averted heads, a larger IOR was elicited by averted-gaze faces compared to the other stimuli. Although the effect of gaze-head congruency on IOR magnitudes was not as consistent across head orientations, these findings indicate that the congruent face stimuli (i.e., frontal heads with frontal gaze and averted heads with averted gaze) may have captured more initial attention than the incongruent face stimuli, when compared to houses, thus leading to a larger inhibition of target responses when subsequent targets were placed in those previous locations that contained the congruent facial stimuli. These results are in line with other research demonstrating that head orientation strongly affects gaze discrimination (Itier et al., 2007a, 2007b; Langton, 2000; Seyama & Nagayama, 2005; Todorović, 2009), especially when stimuli are presented beyond foveal vision (Palanica & Itier, 2015). If we assume that stronger initial exogenous attention capture leads to a larger inhibition of return (Posner & Cohen, 1984), then these findings also support the proposed Direction of Attention Detector (Perrett & Emery, 1994), which integrates social information from the eyes and head, and preferentially captures attention for congruently oriented facial stimuli compared to incongruently oriented facial stimuli. This was the first research of its kind to examine the effect of gaze-head congruency on inhibition of return, although future studies should examine the strength of this effect, especially across other SOAs not tested in the current study. It remains unclear how the cue-based RT advantage or initial exogenous attention capture proportionately relates to the subsequent IOR.

It is interesting that there were no specific IOR effects for faces (or houses) at the other, intermediate SOAs. This contrasts with other research showing a stronger IOR effect for faces compared to other stimuli at around 800–1,000 ms (e.g., Theeuwes & Van der Stigchel, 2006). However, this is currently the first study to examine the effects of facial stimuli during IOR with multiple SOAs ranging from 150–2,400 ms. This lack of IOR at other SOAs could be due to task effects of the multiple different SOAs tested within each participant, but future research would have to examine this issue. These findings may also reflect the influence of gaze direction that moderated the IOR effects, and it is possible that gaze cues only interact with IOR at very long SOAs. It has been shown, for example, that in gaze-cueing paradigms, where a face cue is centrally presented prior to the onset of a lateral target, target detection is faster when the face is looking toward the same side where the target later appears versus the opposite side, but only at short SOAs (~200 ms); at longer SOAs (~2,400 ms), target detection is actually slower for these same-side targets, suggesting a delayed onset of inhibition of return process for gaze cues (Frischen et al., 2007b; Frischen & Tipper, 2004; Marotta et al., 2013). Although gaze-cueing paradigms are different in design than the current IOR paradigm, these findings may suggest that gaze cues only interact with inhibitory processes at long SOAs. Future research could examine whether eye gaze and head cues influence IOR processes at SOAs beyond 2,400 ms, and whether these social cues induce an increased IOR magnitude compared to non-facial stimuli. It is possible that this IOR may last for several seconds (see Dodd & Pratt, 2007), or dissipate completely, although this is simply speculation since the current paper was seminal in testing the effects of eye gaze and head orientation during an IOR cueing paradigm.

Overall, these results support the idea that humans may have a cognitive mechanism sensitive to the responding of facial social cues in the environment. Specifically, the congruent combination of head and gaze direction seems to influence the exogenous attention capture of faces, such that congruent facial stimuli are able to capture more reflexive attention than incongruent stimuli, even when viewed covertly, outside of foveal vision. This supports the Direction of Attention Detector system that integrates directional information from the eyes and head (Perrett & Emery, 1994), and captures attention in the environment. Even in an IOR cueing paradigm, in which the stimulus cue had no bearing on the location or meaning of the subsequent target, the congruency of gaze-head signals influenced visuospatial attention. Presumably, this is because these social cues convey the behavioral intentions of others, and it is important to be alert to these cues in order to respond appropriately during social interaction.