1 Introduction

Sometimes, when receiving some short text messages without much context, we are left to wonder whether they are to be taken seriously or ironically. Or maybe, whether the sender is actually “alright”, as they say, or they are resented. During a conversation, much of this kind of information is conveyed by the speaker’s face. Unfortunately, unlike words, the information usually signaled by faces cannot be conveyed in digital written communications… or can they?

Indeed, it might be argued that the communicative toolkit of many forms of digital communication encompasses some sort of ‘artificial faces’ which vicariate the lack of natural ones. I am thinking about emoticons and about a specific subset of emojis, namely, facial emojis.

The term ‘emoticon’, likely derived from a portmanteau of ‘emotion’ and ‘icon’, refers to those icons made up of punctuation which allow me, for instance, to remark my surprise as follows: :-o. While if I am expressing it with a I am using an emoji (from the Japanese e, “image” and moji, “character”). Despite the many differences between emoticons and facial emojis, in this paper I will mainly focus on what they have in common qua stylized representations of faces. Henceforth, I will refer to them both through the abbreviation EmoT/J. Notice that EmoT/J is not to be interpreted as “Emoticons and Emojis”, but rather as “Emoticons and Facial Emojis”, as several non-facial emojis also exist (although they are used and studied less).

EmoT/J have been studied quite extensively by several academic disciplines, e.g.psychology (Cherbonnier and Michinov 2022; Bai et al. 2019), marketing (e.g. Guido et al. 2019), linguistics (e.g. Dresner and Herring 2010; Grosz et al. 2023), and semiotics (e.g. Danesi 2017; Marino 2022).

In comparison, EmoT/J have been underdiscussed in philosophy until recent years. Philosophers sometimes half-jokingly trace the dawn of emoticons to Ludwig Wittgenstein (cf. the opening quote; Krkač 2020). Plebani and Berto (2019) have also half-jokingly developed a sort of logic handbook where logic symbols are replaced by emojis, simplified graphical depictions of images like expressive faces (… and feces). In 2018, King lamented that it is “a bit surprising how little attention philosophy has given to the status of emoji” (2018, p. 1). And, I would add, to that of emoticons. Yet, in recent times the tide is changing.

Quite recently, Maier (2023) provided an in-depth semantic analysis of emojis, construing them as stylized pictures. In particular, Maier distinguishes between emojis that represent entities (e.g. cars, boats), which have truth-conditional contents (i.e., they express some state of affairs in the world), and those that represent bodily parts and gestures (including facial emojis), which have use-conditional contents (i.e., they qualify their user’s perspective toward some state of affairs).

EmoT/J seems a perfect topic for philosophers interested in situated affectivity, the thriving debate regarding how our affective lives are scaffolded by extra-bodily resources. Within the (blurry) perimeter of that debate, Lucy Osler (2020) has noted that in chat apps “text-based conversations [are often] complemented by an ever-increasing archive of emojis” (p. 584). Glazer (2017), construed emoticons as a means to express emotions in writing—unlike words, which can communicate them but in a less direct fashion (see § 3.2). And his analysis can easily apply to (some) emojis too.

However, despite the initial influence of Griffiths and Scarantino’s seminal essay on the role of emotions as tools for navigating social interactions (Griffiths and Scarantino 2009), most discussions in situated affectivity have turned their attention toward the inner, phenomenological side of emotions rather than their social role. Both Piredda’s (2020) definition of affective artifacts and Colombetti’s (2020) taxonomy of affective scaffoldingsFootnote 1 conceive and describe them primarily with respect to their influence on affective experience rather than to affective expression. Yet, since affect is not only something private we attend to by introspecting, but also an eminently social phenomenon, I propose to broaden the agenda of situated affectivity by paying attention (also) to those tools that alter the expression of affective states (Viola 2022).

With that ultimate aim in mind, in this paper I set out to reach the following intermediate step: investigating to what extent the analogy between the communicative role played by natural faces and that played by EmoT/J holds. To do so, in the next section (§ 2) I articulate the face-EmoT/J analogy and succinctly describe its theoretical underpinnings. Then, in each of the four ensuing sections (§§ 3–6) I discuss research on some facet of face perception and consider whether and to what extent the processes regarding hum an face perception might generalize to EmoT/J. More specifically, I discuss the expression of emotions (§ 3), the cultural norms that regulate or modulate them (§ 4), non-affective social information (§ 5), and the prioritized attention toward face-like visual stimuli (§ 6). In the conclusive section I wrap up and suggest further investigations (§ 7).

2 Theoretical Background: EmoT/J as Cultural Analogues of Natural Faces

The psychologist Van Kleef (2017) has suggested that “expressions of the same emotion that are emitted via different expressive modalities ([including] emoticons) have comparable effects, provided that the emotional expressions can be perceived by others” (p. 213). This analogy has been vindicated by empirical evidence (Van Kleef et al. 2015; Erle et al. 2022), albeit some important discrepancies also exist between what real faces and EmoT/J can convey (see below). Yet, I think that the analogy runs both broader and deeper than van Kleef’s original formulation suggests.

Why broader? To put it simply, because just as the role of faces in social cognition transcends the mere expression of emotions, so EmoT/J also play other roles in social cognition besides conveying affective states (see §§ 5–6).

Why deeper? Because an interesting story must be told, I submit, on how culture has exploited some of our neurobiological endowment. To tell this story, Sperber and Hirschfeld’s (2007) distinction between a proper domain and an actual domain of cognitive capacities can come in handyFootnote 2. The proper domain indicates what a psychological capacity has evolved for, its teleological raison d’ être. An example they describe—and that is also relevant for our present purpose—is the cognitive capacity for face detection, which probably developed in order to spot the face of a conspecific or predator, and to steer our attention towards it. To operate efficiently, cognitive capacities often leverage on some shortcuts. For instance, face detection can be activated also by simple visual patterns resembling a T or an inverted triangle, as in the logic symbol “” (see § 6). The mismatch between the actual and the proper domains of a capacity might result in either false negatives or false positives. For instance, we have a hard time spotting a face when eyes and nose do not conform to the aforementioned pattern. More often, given that the mechanism for face detection seems biased toward false positives, we can be misled into seeing faces when there are none—a phenomenon known as facial pareidolia. Interestingly, when the activation of a capacity admits degrees, activating conditions outside its proper domain may elicit the capacity even more intensely than proper conditions. A textbook case dates back to the pioneer of ethology Niko Tinbergen. He reported that some Herring Gull chicks, which typically peck on their parents’ red bills to request food (proper domain), peck more intensely when confronted with some artificial red stimuli. He thus coined the term superstimuli to refer to the class of objects that activate a capacity even more than it was intended to be activated (Tinbergen and Perdeck 1950).

Sperber & Hirschfeld suggest that cultural artifacts may be produced by deliberately reproducing conditions that tap into some capacity’s actual domain, irrespectively of their proper domains. For example, they claim, masks and portraits exploit the mechanism for human face perception, because they exhibit some formal condition that activates its actual domain, despite not falling into its proper domain (real faces).

I claim that, similarly to masks (on which see Viola 2023), EmoT/J are a particular class of artifacts that exploits the (neurotypical) human mechanisms for face perception to convey social information typically conveyed by real human faces (or, at least, that we extract from real faces; cf. § 5)Footnote 3. Psychological research has shown that human faces provide different kinds of social information, whose processing is due to partially distinct neurocognitive mechanisms. Accordingly, the architecture of the next four sections will carve out several aspects of face perception along the joints of these relatively separated mechanisms.

3 Expressing Emotions

3.1 Facial Expressions of Emotions

The intimate connection between facial movements and emotions has been vastly debated in psychologyFootnote 4. Even before Darwin published his influential book The Expression of Emotion in Man and Animals (Darwin 1872), the link between emotions and facial movements had already been discussed by some of his sources, like Charles Bell or Duchenne de Boulogne. The ‘golden age’ of research on facial expressions of emotions was probably around the 1970s due to the research program envisaged by Silvan Tomkins and pursued by Paul Ekman and Carroll Izard. For them, the onset of an emotion inescapably triggers a specific and innate set of facial movements common to all humankind, which they dubbed facial affect program (Ekman and Friesen 1969). The expression-emotion map** hence became the cornerstone of a research agenda that thrived in spite of a hoard of counterevidence and criticisms. Contemporary reappraisals suggest that this universal emotion-expression link might be way more modest than initially predicted (Barrett et al. 2019). The debate is open and heated, but recent empirical and theoretical work suggests that facial expressions of emotion do have some universal and spontaneous core, albeit one that leaves room for ‘accents’ and ‘dialects’ (Elfenbein 2013) and other culturally- and context-driven modulations (Cowen et al. 2021; Jack et al. 2012, 2016; Glazer 2019, 2022; see below, § 4).

On the conceptual level, one of the most radical opponents of the expression-emotion link is probably Alan Fridlund. He rejects the link between facial displays and emotions outright, regarding these as invisible and ineffable entities from the scientific point of view. Instead, he proposes to interpret facial movements in terms of some communicative intention: for instance, instead of ‘happiness’, a smile ought to be interpreted as an attempt to influence the observer to play or to affiliate (Crivelli and Fridlund 2018). While his criticism has the merit of highlighting the appeal to the teleological, strategical notion of facial movements, Scarantino (2017) convincingly shows that we neither need nor should give up on the interpretation of facial expressions in terms of emotions, as the expressive and the teleological facets can go (and often do go) hand-in-hand (see also Van Kleef 2009). Scarantino also points out that emotions can be expressed either via spontaneous movements or via intentional (and potentially deceiving) movements (Scarantino 2019). Most probably, however, the relationship between spontaneous and intentional emotional displays is better construed as a continuum than as a strict dichotomy.

That being said, while these two components are entrenched, some facial movements play certain pragmatic roles that cannot be sensibly couched in terms of emotion. Their discussion is postponed to § 5.

Some lines of evidence inspired by embodied cognition suggest that facial expressions can actually trigger certain affective reactions both in the expressor and the observer. According to the facial feedback hypothesis, observing mouth movements that resemble either a smile or a frowny mouth may induce positive or negative feelings, respectively. After several replications yielding mixed results, a recent massive multi-lab experiment (Coles et al. 2022) has robustly established that this effect occurs… although it is very slight. Unlike ‘cold’ ways to convey emotions, facial expressions have been claimed to induce emotional mimicry and emotional contagion (i.e. they can prompt the observer to mimic the emoter’s expression, and possibly even their emotion), thus promoting bonding and empathy between the expressor and the observers. This mechanism seems to be shared by several primates (Palagi et al. 2020), although in humans it is modulated by several social and contextual variables (Fischer and Hess 2017).

3.2 EmoT/J as Expressions of Emotions

The first and most obvious function of EmoT/J is that of expressing some affective state whose facial display they represent in a stylized fashion (Dresner and Herring 2010; Glazer 2017; Maier 2023: § 5). This EmoT/J- mediated emotional expression can occur alone, or it can accompany some text.

We can express our approbation or scorn about something by means of three typographical signs, as in the emoticons :-) or :-/, or via the corresponding emojis, as in or .

The emotion expressive role of EmoT/J has been vindicated by a solid amount of psychological literature (for reviews, Bai et al. 2019; Cherbonnier and Michinov 2022). Let us examine a few studies. In a recent online experiment on 217 participants from several countries, Neel and collaborators (2023) showed that the same affectively neutral text message (e.g. “It’s on Wednesday”) tends to be interpreted as more positively valenced when accompanied by a positive emoji (a slightly smiling face; avg valence 7.70 on a scale from 1 to 10, SD 1.67) than with a neutral emoji (a couple of eyes; avg 5.53, SD 1.56) or no emoji at all (5.10, SD 1.26), and more negatively with a negative emoji (a slightly frowning face; 3.25, SD 1.48).

The analogy between perceiving emotional emojis and emotional faces is further corroborated by an EEG study by Gantiva et al. (2020). 30 participants observed similar electrophysiological activity when subjects perceived angry, neutral and happy emojis or pictures of facial expressions of emotions from a database. However, the authors also report that the magnitude of some electrocortical activity patterns arguably reflecting salience and motivational force, namely P100 and LPP, were higher for faces than for emojis.

Does this expressive power generalize from emojis to emoticons? Yes, although probably to a lesser extent. An experiment on 127 subjects (Ganster et al. 2012) also employed neutral texts (albeit in the context of a longer chat) to compare the valence-altering effects of character strings (i.e. emoticons) and pictograms (similar to emojis). They found that both categories of stimuli enhance or decrease the emotional valence of a message, but the effect of emoji-like pictograms is bigger.

A similar trend is also observed in a study by Fischer and Herbert (2021). Their subjects (83 valid) were shown emojis, emoticons and pictures of facial expressions of emotions, and were asked to provide ratings of arousal (activated-sleepy), valence (pleasure-displeasure) and ‘emotionality’ (how well a given stimulus represents the emotion it is meant to represent). For all three dimensions, the ratings received by emojis were similar to those received by faces. Both were higher than those received by emoticons.

Some evidence even suggests that, in some circumstances, emojis could work as superstimuli when it comes to conveying emotions, namely, they convey certain emotions better than the corresponding expressions on real faces. For instance, a recent study on the recognition of Ekman’s 6 basic emotionsFootnote 5 from facial expressions and emojis performed on 96 male and female university students showed that male participants are better at recognizing emojis than female ones; and that males’ accuracy is higher with emojis than with actual faces (Dalle Nogare et al. 2023). According to the authors, this may be due to the fact that males tend to use emojis more than females.

In another study, Cherbonnier and Michinov (2021) designed and validated ‘superstimuli’ emojis, i.e. a set of emojis that represent ‘the basic six’ better than both facial counterparts and other emojis setsFootnote 6—where ‘better’ means “associated with the target emotion with higher accuracy”.

The evidence discussed thus far—not to mention the dozen studies I could not discuss for the sake of length—provides strong support for the intuitive idea that EmoT/J seem perfectly capable of conveying emotional states. Of course, EmoT/J must be conceived as analogous to intentional displays rather than spontaneous ones—and as such, they are more open to deception. Unlike spontaneous facial movements during in face-to-face interactions, sending a (message containing an) EmoT/J often requires a deliberative action, such as pressing “send”. In theory, that might give the sender enough time to reconsider and edit the message, potentially for the sake of strategic interactions—including deceiving ones. In practice, many digital interactions are also rather spontaneous: after all, people may often send chat messages impulsively only to regret them a minute later.Footnote 7

We may want to distinguish between two ways of ‘conveying’ emotion—say, a ‘cold’ way and a ‘hot’ one. Unlike an emotion word like “sadness”, which can refer to sadness without manifesting it (but see Glazer 2017; 2023), a facial expression seems capable of doing something more: at least in some cases (as in spontaneous expressions), it conveys emotions with vivid affective force, reflecting the emoter’s own state and possibly altering the observers’ affective states. So, are EmoT/J capable of conveying emotions in the ‘hot’ way, similarly to facial expressions, or are they only confined to the ‘cold’ way, just like emotion words?Footnote 8 Indeed, the literature presents some evidence for a ‘hot’ status of EmoT/J. In a large study, Lohmann and collaborators (2017) showed the same text message to 1745 female participants, accompanied by either a positive or a negative emoji, and asked them to imagine that it had been received from a dear friend. Other than promoting the ascription of a positive or negative state to the imaginary friend, emojis also produced congruent effects on the subjects’ mood, which Lohmann and collaborators interpreted in terms of elicited emotional contagion.

Arguments in favor of the ‘hot’ and affectively efficacious nature of emojis can also be found in a study on 100 subjects by Gantiva et al. (2021), who reported that emojis effectively altered several physiological parameters within the observers: the activity of the facial muscles regulating smiling and frowning, skin conductance (reflecting sweat), heart rate. These modifications were comparable to those induced by facial expressions of the same emotion. For instance, both happy faces and happy emojis induced activation of the zygomaticus major, the muscle that raises lips during smiles, suggesting that emojis may successfully trigger facial mimicry. Moreover, an EEG study by Liao and colleagues (2021) showed similar electrophysiological and self-reported responses to pain when expressed by faces and emojis, although the pain was perceived as more intense in faces than in emojis, both in self reports and in neural data.

There is, however, an important limitation to EmoT/J’s ability to scaffold bonding via emotional mimicry. A pivotal ingredient of mimicry-based bonding is pupil mimicry (Prochazkova et al. 2018). To put it simply, when someone’s pupil diameter co-varies with mine while we interact, I am more likely to trust them. Yet, as EmoT/J have fixed pupil size, their potential for emotional mimicry is largely limited—although it would be intriguing to check, for example, whether similar EmoT/J with dilated or narrow pupils (e.g. °_° vs. O_O) could be perceived differently by an observer based on their own pupil size while observing it.

For those who accept some degree of constructionism in emotion theory, i.e. for those who think that emotions are constituted by means of some conceptual act based on raw affective ingredients such as undifferentiated arousal (e.g. Schachter and Singer 1962; Russell 2003; Barrett 2017), EmoT/J can also help shape the sender’s own affect. By ty** :-), the sender is endorsing an interpretation of their own affective states along the ‘plot’ of happiness, and manifesting it not only to the receiver, but also to themselves. In sum, it seems reasonable to conceive EmoT/J as a sort of ‘affective attractor’ which allows the receiver and the sender to synchronize on some affective state.

4 Cultural Norms

4.1 Cultural Norms and Emotions

While the tension between natural/innate and cultural/learnt is a cliché of most psychological topics, it rarely manifests itself as intensely as in the debate over the facial expression of emotions. Even a herald of innatism in emotion like Ekman conceded that culture plays a role in facial expression—after all, he dubbed his own theory neurocultural (Ekman and Friesen 1969). In fact, while the functioning of the facial affect programs in themselves is pretty rigid, Ekman granted culture and learning a modulating role both upstream and downstream. Upstream, cultural and idiosyncratic factors may alter the conditions elicited by the facial affect programs (e.g., eating bugs may be disgusting in some cultures but not in others). Downstream, the so-called display rules regulate the appropriateness of displaying certain emotions in given contexts, often nudging the emoter to inhibit, alter, or artificially display an emotional expression that suits the moment (e.g., in most Western countries you are not supposed to laugh at a funeral).

Display rules, albeit often with different labels, have been studied by several historians and sociologists of emotion (e.g. Hoschild 1983; Stearns and Stearns 1985). In particular, Hoschild’s notion of emotional labor, i.e. the idea that some jobs (typically those performed by women) impose strict rules upon which emotions to express (and perhaps even to feel), germinated in a rich strand of studies about the social expectation that negative emotions get suppressed and positive ones (over)expressed in the workplace—possibly resulting in emotional dissonance and other forms of distress that place a burden on the workers’ well-being (Zapf 2002).

As compared to Ekman, many contemporary experimental psychologists are more inclined to ascribe a larger role to the cultural modulation of facial expression (see for instance Jack et al. 2012, 2016; Elfenbein 2013; Cowen and Keltner 2021). These modulations interest both the expressor and the observer. For instance, Jack and collaborators (2012) highlighted that in Eastern Asia the eye region is more diagnostic for many emotions than in Western countries, whose inhabitants rely more heavily on the mouth. As Glazer (2022) points out, a moderately universalistic stance on the expression of emotion leaves room for cultural modulators. For instance, as mentioned above, cultural norms may dictate whether to amplify or demote a particular facial muscular movement in order to endorse or suppress emotional communication in the presence of a specific audience (see Matsumoto et al. 2008). But they can also have a role in fine-tuning the facial movements prompted by some innate facial affect program-like mechanisms, or teaching an observer where to stare to recognize an emotion (take a stroll amidst the bookshelves dedicated to preschoolers: you will find a plethora of books aiming to teach kids how to identify, regulate and express their emotions; cf. Widen and Russell 2010).

4.2 Cultural Norms and EmoT/J

Since social norms modulate how and when emotions are expressed via facial movements, it should not surprise that some researchers have undertaken investigations concerning the norms that can be at play in EmoT/J-mediated expressions.

The available evidence seems rather in line with this hypothesis. In one of the first studies of this kind, Derks and colleagues (2007) asked 158 secondary school students from Netherlands to respond to internet chats regarding task-oriented and socio-emotional contexts (working on a class assignment and choosing a present for a common friend, respectively). Subjects could respond with text or emojisFootnote 9, or with a combination thereof. The results showed that “people use more emoticons [sic] in socio-emotional contexts than in task-oriented contexts” (Derks et al. 2007, p. 846) leading the authors to conclude that “display rules for Internet communication are comparable to display rules for face-to-face communication” (ibid.).

More recently, Liu (2023) has reported a nuanced set of display rules based on a large sample of young Japanese participants (age 12–29, N = 1289, only 78 males). As in the study by Derks and colleagues (2007), subjects had to provide answers to some chat messages with or without emojis, but they were allowed a larger set of emojis to pick from in order to best represent real-life contexts. The chats differed in several social variables, e.g. the interactor was unfamiliar or familiar, a friend of the same or of the opposite sex, of higher or equal social status, in a private or in a public context. The results show that “participants [express] most emotions toward same-sex friends, followed by opposite-sex friends, unfamiliar individuals, and those of high status” (Liu 2023, p. 12). Moreover, different emojis were typically employed for each context. Another interesting observation is that “smiling emojis were used when participants de-intensified their expression of negative emotions” (p. 13).

However, some studies warn us that the rules regulating facial movements and EmoT/J usage are not perfectly parallel. Somewhat ironically, one of those was performed by a team led by the scholar who explicitly proposed the face-EmoT/J analogy, Van Kleef. In three experiments, he and his team asked their participants to assign warmth and competence ratings to some text messages. The same texts were compared either with or without ‘smileys’, i.e. a rather stylized smiling emoticon—as well as to neutral and smiling face photos used as control conditions. It was found that “contrary to actual smiles, smileys do not increase perceptions of warmth and actually reduce perceptions of competence” (Glikson et al. 2018), but only in formal settings.

Yet, cultural influences on EmoT/J may go beyond display rules dictating when to use which EmoT/J. As for cultural dialect and accents in facial expressions, they may also bring about slight modifications in EmoT/J shape and meaning.

For instance, on noting how in Western countries the expression and recognition of facial expressions is more mouth-driven than in Eastern Asia, Jack and collaborators (2012) trace a parallel with the corresponding emoticons, remarking that often “In Eastern Asia, (^.^) is happy and (>.<) is angry” (p. 7242). Their suggestion opens up an intriguing but under-investigated research hypothesis, namely that EmoT/J may contribute to sha** our facial expressions of emotions, as well as our gazing patterns when looking at someone else’s facial expressions. A simple prediction could be that Western subjects exposed to a huge amount of East Asian, eye-centered emoticons would increase their ocular movements at the expense of mouth movements when expressing emotions, and/or take the eye region to be more diagnostic for reading emotions from an expressing face.

5 Non-affective Information

5.1 Non-affective Information from the Face

As already hinted, not all the social information an observer can retrieve from a face pertains to the affective domain. First and most obviously, faces allow for the recognition of one’s identity. From a neurocognitive point of view, this process of recognizing the identity of a familiar person is thought to be largely independent of that of emotion recognition (Bruce and Young 1986): the former is mainly based on invariant features of the face and hinges upon neural machinery going from the occipital to the temporal cortex, while the latter is based on facial movements and hinges upon the workings of an occipito-parietal neural pathway (Haxby et al. 2000). Of course, we cannot recognize the personal identity of someone we have never met. However, the first time we see someone we tend to form some heuristic judgment about their social identity. For instance, a rapid gaze often suffices to infer someone’s age and biological sex with some accuracy. But since our information-hungry brains are prone to fill gaps in knowledge of the social world by making bold generalizations, we often form first impressions of someone’s personality based on their facial appearance. Such impressions are often ill-founded, and yet they exert a worryingly sizable influence on our attitudes toward strangers (Todorov et al. 2015; Smortchkova 2022).

Besides the information we draw from a still face, some facial movements also exist that play several communicative functions not pertaining to the emotional domain. Many such movements—sometimes construed as gestures—often complement verbal communication, playing a relevant pragmatic role. An obvious example are facial movements for nodding or denying. Moreover, gaze may have multiple functions in social interactions, such as engaging in (or disengaging from) interaction when we move our gaze upon (or away from) an interactor’s eyes; or regulating turn-taking (for a review, see Rossano 2013). Domaneschi and colleagues (2017) have shown that pictures of the upper region of certain facial expressions suffice to alter the illocutionary force of some sentences: depending on different contractions of the muscles surrounding the eye regions (eyebrows, eyelids, and cheeks), the same sentence is perceived by the subject as belonging to a different illocutionary type, such as an assertion, an order, or advice, among others.

5.2 Non-affective Information from EmoT/J

While emoticons have been invented multiple timesFootnote 10, the most famous inventor of the smiley face is probably the scientist Scott E. Fahlman. On September 19,1982, he made the following proposal on Usenet—a discussion board at the Carnegie Mellon University, a sort of precursor of internet forums, seemingly loaded with irony:

I propose that the following character sequence for joke markers:

:-)

Read it sideways. Actually, it is probably more economical to mark.

things that are NOT jokes, given current trends. For this, use.

:-(.

(Fahlman 1982)

Despite being partly (meta-)ironical in itself, Fahlman’s proposal also anticipates something that linguists will later concede about emoticons (e.g. Dresner and Herring 2010), and subsequently about emojis (e.g. Gawne and McCulloch 2019), namely, that they can play some pragmatic functions. For instance, Dresner and Herring (2010, p. 256) interpret the winking face emoticon—i.e. ;-)—as “an indicator that the writer is joking, teasing, or otherwise not serious about the message’s propositional content”, immediately clarifying that “joking is not an emotion—one could joke while being in a variety of distinct affective states. Rather, joking is a type of illocutionary force, something that we do by what we say” (ibid.). Similarly, smiling faces may be used, among other things, to smooth down the illocutionary force of some speech acts – e.g. downgrading an order into a piece of advice. In that respect, their working seems comparable to that of (the upper region of) faces in Domaneschi et al.’s experiment (2017).

As soon as we shift the focus of our analysis from the representation of facial movements to the representation of still facial features, the face-EmoT/J analogy reveals its greatest shortcomings. It goes without saying that, unlike actual faces, EmoT/J cannot be used to infer personal identity. In fact, in virtue of their stylization (Maier 2023), EmoT/J are meant to represent anyone’s face in general, but no face in particularFootnote 11.

Neither can they provide clues for social identity, if not indirectly. It is certainly possible to make accurate inferences about someone’s personality traits based on how they use emojis in digital communication (especially extraversion and openness; see Wall et al. 2016). Yet, unlike actual faces, EmoT/J provide no clues about the socio-demographic factors of their users. This is not to say that emojis (both facial and non-facial) cannot encode some demographic factors: for instance, they can encode ethnicity via skin color, or old/young age. Yet, the way in which using an emoji of a certain skin color represents ethnicity entails a different sense of representation than actually belonging to some ethnic group: the latter is a natural and non-intentional representation, while the former is a deliberate and iconic representation, which is freely available to users irrespective of their own skin colors. In fact, many white-skinned Twitter users have used dark-skinned emojis to express their endorsement to the Black Lives Matter movement (Alfano et al. 2022).

6 Face Detection: Increasing Salience and Activating Theory of Mind

6.1 Detecting Actual Faces

While a lot of social information can be inferred from faces (correctly or not), the very first bit of information that a natural face provides is that there is a face, and hence an agent. The mechanism for detecting faces is prone to false positives: it gets automatically and mandatorily triggered by whatever stimulus exhibits a face-like gestalt, i.e., whatever visual pattern resembles a T or an inverted triangle (). Once detected, faces are quick to grab our attention (i.e., they get rapidly foveated) and slow to let it go (Palermo and Rhodes 2007; Langton et al. 2008; Devue et al. 2012).

Face detection is also deeply inscribed in our ontogeny: experiments on human newborns show that they preferentially pay attention to face-like patterns (), as compared to an upside-down-face-like pattern () or to randomly scrambled dots (Goren et al. 1975; Buiatti et al. 2019). Impressively, by projecting face-like dots on the uterine wall of pregnant mothers thanks to ultrasound, Reid and collaborators (2017) observed a similar preference even in fetuses at the third trimester, revealed by fetuses’ head movements toward the face-like stimuli.

Once our face detection mechanism locks onto something—be it an actual face or a pareidolia (see Alais et al. 2021)—it ignites a cascade of inferences to extract social information from it, including all the mechanisms described in the previous section of this paper. In a slogan, we can say that face detection triggers theory of mind or person perception. When we see a face (or face-like visual pattern), we cannot even decide not to recognize its identity or emotional expression, or not to form any impressions about its bearer. In fact, once we see a face, it takes less than 100 milliseconds to decode a lot of social information and to formulate a hoard of first impressions about it (Todorov et al. 2015).

6.2 Detecting EmoT/J

Despite clearly falling outside the proper domain of face detection, most EmoT/J fall within its actual domainFootnote 12. As such, they may benefit from the privileged ‘attention-grabbing’ status of face-detection-triggering stimuli. While most studies in experimental psychology refer to facial pareidolias in general (usually employing accidental paraidolias as stimuli), with due prudence their results may be generalized to EmoT/J too.

Besides the studies on abstract face-like patterns () mentioned in the previous sub-section about infants (Goren et al. 1975; Buiatti et al. 2019) and highly developed fetuses (Reid et al. 2017), more recent investigations attest the effects of more ecological pareidolias in adults. For instance, in two experiments on Australian students (N = 18 for each experiment), Keys and collaborators (2021) demonstrated that pictures of inanimate objects resembling faces are located more rapidly than similar ‘face-less’ objects in visual search tasks.

More recently, Jakobsen and collaborators (2023) performed a series of experiments using the probe-dot task, a paradigm where subjects are briefly exposed to a pair of images (cues) both to the right and to the left of the center of the screen, and then to a target image (probe) appearing either on the right or on the left. Subjects have to indicate the side of the target probe as fast and accurately as possible. When used as cues in a congruent location, pareidolic images facilitated faster and more accurate recognition of the probe (unless they were presented upside-down), probably because they drew subjects’ attention in that direction.

Caruana and Seymour (2022) provide further evidence that face-like objects grab attention, as they reach perceptual awareness more easily than corresponding non-face-like ones. They showed pictures of several face-like and face-less objects to 41 subjects, masking them with breaking continuous flash suppression,Footnote 13 and reported that face-like stimuli reached awareness more often than non-facial stimuli.

Interestingly, in two EEG experiments where subjects (N = 17 and N = 22, respectively) were shown both actual faces and pareidolic face-like objects amidst several other images shifting at high frequencies, Rekow and colleagues (2022) described similar electrophysiological patterns for subjects who reported awareness of either faces or pareidolias, supporting the similarity of both kinds of stimuli from a neural point of view.

Some research on marketing and advertising also corroborates the power of (accidental) pareidolic stimuli to attract consumers’ attention (e.g. Guido et al. 2019; Noble et al. 2023). In the field of marketing research, Valenzuela-Gálvez and others (2023) report higher customer engagement in emails containing emojis—although, somehow surprisingly, non-face emojis were even more efficient than face emojis in the context of their experiments.

All things considered, despite being barely mentioned in theoretical reviews about the roles of EmoT/J (but see Gawne and McCulloch 2019) and possibly underappreciated by EmoT/J users themselves, it seems that exploiting the attention-prioritizing powers of face detection could be reasonably listed among the powers of EmoT/J.

7 Conclusion and Caveats

The aim of this paper was to investigate whether the empirical literature vindicates the status of emoticons and facial emojis—in short, EmoT/J—as cultural artifacts that vicariate our natural faces, allowing us to convey emotions and other facial information. Like other cultural artifacts, EmoT/J exploit a misalignment between the proper domain of certain psychological capacities (the thing they were teleologically selected to do) and their actual domain (the way in which they actually work). In particular, they exploit the fact that face perception seems to operate on some non-facial stimuli to convey some of the social information typically conveyed by actual faces. Hence, I have offered a philosophical reading of the empirical evidence regarding the analogy between actual faces and EmoT/J, highlighting where it pertains and where it does not, with respect to the following aspects of face perception: the expression of emotions, the cultural norms that surround it, non-affective social information, and attention prioritization. In many respects, EmoT/J seem to be up to their task of constituting “face avatars”—though not without some “buts”. Indeed, they seem capable of expressing emotion, and even eliciting emotional contagion; but they can never be spontaneous like some facial expressions, and as a result they cannot be as reliable. They also seem influenced by cultural norms about when to express emotion, and even about how to express it; yet, the appropriate contexts for smiles do not perfectly match those for smileys. They substitute faces for some non-affective pragmatic roles like setting the illocutionary force of some speech act; but they are silent about most of the social information one could infer from faces. And finally, similar to actual faces, they are attention-grabbing.

I do not claim this discussion to be exhaustive. For instance, I have not discussed the possibility that EmoT/J may undertake complex semantical roles. The project Emoji Dick, for example, a translation of Melville’s Moby Dick made up only of emojis (https://www.emojidick.com/), seems to suggest that they may have the resources to replace written language, at least in some cases. However, it is doubtful that emoji-only texts possess the grammatical richness that allows for complex sentences (see Cohn et al. 2019). And in any case, it is highly likely that richer semantics pertains to non-facial emojis (Maier 2023), which remain outside the scope of the present discussion.

Yet, much more should be said even within the broad categories of the face perception processes I have identified. For instance, Palmer and Clifford (2020) have showed that not only do pareidolic face-like objects attract our gazes toward them: they may also steer our gaze toward the direction in which they are looking, just like natural faces. Shall we conclude that emojis with sufficiently detailed gaze direction can steer our gaze where they want?

Perhaps it is too early to ‘conclude’ anything. In fact, rather than concluding anything, this paper is aimed at inspiring new empirical studies. But it also aims, echoing King (2018), to invite further philosophical reflections about EmoT/J. The ability to free our oral language from the spatiotemporal constraints of the “here and now” by means of writing has changed our societies (Ong 1982) and our brains (Dehaene 2009) forever. Now that the same power has been endowed to our faces, where to next?