About 100 years ago, the term robot was introduced in the context of Karel Čapek’s play R.U.R.: Rossum’s Universal Robots (1920). While the idea of artificial humans had been a topic of literature and film before (e.g., the Golem, Frankenstein, Metropolis), R.U.R. was a turning point and provided a label and a concept that would quickly spread internationally. Within a few decades robots and other humanoid artificial creatures would be common in science fiction stories, film and later television and video games. One of the interesting aspects of these artificial creatures was that typically, they would be presented as smart, possessing (artificial) intelligence, but being cold, distant, and unemotional. In fact, emotions seemed to be the missing element in truly obtaining humanity, such as the character Lt. Cmdr. Data in the Star Trek universe (Kakoudaki, 2015). Indeed, several studies suggest that emotion has become an even more crucial aspect of human identity in response to the inexorable rise in machine intelligence (e.g., Cha et al., 2020; Kaplan, 2004; Stein & Ohler, 2017).

About 20 years ago, Rosalind Picard (1997) introduced the concept affective computing and ever since, a broad and heterogeneous research program linking AI and affective science has been growing rapidly. While research in this context existed before (see also Picard, 2015), it did not present a cohesive body of activities and was not perceived as such. After the turn of the millennium, in a relatively short time, societies, conferences, and journals centered around the new concept appeared and grew at a rapid pace. The IEEE flagship journal IEEE Transactions on Affective Computing, founded in 2010, reached soon a higher impact factor than any canonical journal on emotions/affective science (13.99 at the time of writing). This remarkable expansion correlates with the current growth of artificial intelligence in the guise of machine learning and data analytic approaches that are transformative in many disciplines and applied areas on the one hand and the rise of affectivism on the other (Dukes et al., 2021).

The present contribution will take stock of the state of affective science in affective computing and social robotics. We will highlight challenges to implementing affect in machines and discuss the potential benefits for researchers in the field of affective science in the coming years to connect with researchers involved in affective computing, AI, and social robotics.

Motivations for Development of Affective Computing

Many researchers in affective computing are interested in develo** systems that are supposed to gain usability in the widest sense in the interaction of humans and artificial systems. Benefits are proposed for physically embodied systems, such as robots (HRI: human robot interaction), or virtual entities, such as virtual agents or chatbots. Designers and researchers hope that by diagnosing the state of users or interactants, such systems can alter their behavior or convey simulated emotions to better fit the situation, or the needs of the user. Service providers could identify angry customers and respond with empathy or concern, or at least transition them to a human representative (e.g., Waelbers et al., 2022). Home devices like Alexa might target ads to when a customer is emotionally predisposed to purchase (Li et al., 2017). Automated tutors might detect student frustration and provide encouragement or adjust instruction accordingly (Malekzadeh et al., 2015). Because of the implications of being able to diagnose user states and develop responsive systems, there is a considerable business case. Studies from the year 2022 estimate the global affective computing market by 2026 between 182 and 255 billion US$ (Reports and data, 2022). Arguably, there is no aspect of affective science research that surpasses the current market interest of affective computing. It is all the more relevant that the connections between emotion researchers from the behavioral-, social-, and neurosciences and much of the affective computing enterprise are comparatively weak. It should also be noted, that particularly in the context where information on affective states is being used to sell products, concepts, or services, there are considerable ethical issues. These concerns are being discussed by experts at conferences and in the literature, as well as by the media in public discourse. This is an ongoing discussion that we can only mention and not pursue in this overview.

In contrast, a smaller group of researchers is interested in develo** artificial agents that represent an internal affective state, in this case, the idea is that the behavior of such agents will be determined by the co-action of cognition, affect, and motivation (e.g., Lim & Okuno, 2015). Attempts to create feeling machines are not frequent and have not yet been very successful though there is recent excitement that “foundation” models like GPT-3 may have spontaneously acquired socio-emotional abilities (Kosinski, 1966), a simple chatting system simulating a psychotherapist. Since then, there has been a constant development of systems that are able to hold a conversation in text in specific areas, such as education (e.g., Wollny et al., 2021) or health care (e.g., Parmar et al., 2022). However, if systems are to be embodied, a multi-modal synthesis approach is needed that involves not only what is being said, but how it is said, in the sense of involving paralinguistic cues and nonverbal behavior in general. Multimodal synthesis of behavior is hampered by the many degrees of freedom of behavior on the one hand, and the lack of theories that cover all different behavioral dimensions. Furthermore, there are many technical challenges with issues, such as synthesizing speech and mouth movements in a synchronous fashion in real time.

Clearly emotional expressions are part and parcel of behavior shown in interactions, but what and when they are shown is typically not covered in emotion theories. Being able to create a working system that shows expressions that relate to affective states, involves a joint effort of multiple disciplines, that involve psychology, communications, possibly linguistics, sociology, ethology, and more. Alternatively, one simply records many interactions and AI can produce behavior without recourse to any theory—is this really what we want? We know that generative processes depend on data being fed. Theories help to identify conditions and contexts that should be included in sampling the data for machine learning, as it is simply not viable to sample all of human behavior in all contexts with all of the facets that might play a role in the cohesion of affective components.

Discussion

There is no doubt that affective computing is a growth industry in computer science and engineering and in some corners of affective science. However, while there is already huge interest on the business side, there are various issues that provide challenges on the scientific backbone of such developments. These lacunae are areas that are looking for serious investment in research activity.

We do not know the actual relationship of visible/audible affective behavior and underlying subjective experience and physiological activation. It has been shown that there are moments when there is coherence, and there are moments when there is no coherence (e.g., Mauss et al., 2005). While this is sufficient to reject the notion of specific expressions as diagnostics at a given moment (e.g., Krumhuber & Kappas, 2022), it is not sufficient to generate behavior of an artificial system in real-time, ongoing interactions. Here, it is necessary for a system to decide what behavior to show.

Having access to expressive artificial systems is a chance to test some assumptions regarding the importance of expressive behavior between humans. There is broad evidence that situational context affects the interpretation of facial and vocal behavior (e.g.,Calbi et al., 2017; Wieser & Brosch, 2012). Interestingly, recent advances in deep learning approaches, such as GPT-4, are beginning to enable machines to reason about situations in human-like ways (e.g., Tak & Gratch, 2023) which may open new windows into analyzing how interaction partners integrate situational and expressive factors to construct social meaning.

We need to have a better understanding of automatic analysis of objective behavior, as there are numerous factors relating to the quality of the recordings, as well as biases in samples, such as race or age, that affect the reliability of machine learning approaches.

There is much reason to believe that research and development in the area of artificial affect will benefit from a closer relationship between emotion researchers and engineers. However, affect is only one facet of interpersonal interaction and this requires also the integration of other areas, such as communication science, linguistics, and ethology. Robots that only embody text, as produced by some AI and flaunt emotional expressions at moments when the contents seem to have an emotional tone, or simply mimic the interactant will neither resemble real human behavior, nor will they be ultimately successful. These would not be the droids we are looking for. We need ethologically valid models of interaction that embed affect as one of their elements. There is much to do.