1 Introduction

Do actions have a gender? One may say it probably depends on the agent performing the action. But what if the agent cannot be categorized into a specific gender class typically determined by some physical features, as in the case of non-biological agents such as the majority of robots we encounter? Is it possible to define gender beyond the surface features of an agent, more specifically in functional terms, e.g., considering the actions performed by an individual? The feminist philosophy and phenomenology give some important hints to address this question. Simone de Beauvoir's [4] perspective on gender discerns that gender is a formation; it is something we do [43] or something we realize [6]. This formation is the active process referring to "doing gender" based on social relationships [4]. The "social" corresponds to practices and activities which are shaped according to cultural norms and stereotypes [6]. It means that gender is a non-compulsory category recurring according to norms [21]. This recurrence is realized by the body-environment interaction. According to Maurice Merleau-Ponty [20], as the body interacts with and orients toward its surroundings, this body's act gains meaning. So, these theoretical frameworks indicate that gender can be defined beyond the surface features an agent possesses. This implies that robots, which are increasingly becoming part of our lives,can be attributed gender even though they may lack typical phenotypic features associated with certain genders. One possible reason is that robots, especially the ones with humanoid structures, usually have the capability to perform many actions that may include stereotypical content such as 'feminine' (e.g., taking care of the children) and 'masculine' (e.g., playing or watching soccer games) when they are performed by humans.

2 Gender Schemas and Stereotypes

As with other social-cognitive structures that enable us to make sense of the world and ourselves, gender schema is composed of associations that organize behaviors/thoughts and shape how an individual perceives. Martin et al. [19] elaborates on how these gender schema’s function and explains it in three main parts. Primarily, gender schemas guide a person's response to gendered information. The gendered information relies on societies' femininity/masculinity ideals, including descriptions of how a woman/man should behave and act. For example, people are expected to behave according to their traditionally gendered ideals, such as women should be caring, and men need to be tough. Secondly, the concept of a schema is crucial for understanding the way information is organized in memory. It means that people encode gender-congruent information better than incongruent information. Domestic tasks (e.g., childcare, cooking) are mostly done by women in our environment. This association creates a congruency between the tasks and the doer of that tasks. So, when people encounter similar associations in different social settings, they may retrieve the association of the women-domestic task (congruent) better than men-domestic tasks (incongruent). Thirdly, gender schemas serve the purpose of providing an information base for inferential use. It is used in situations where information is unclear or when certain details are not attended to in familiar situations. For example, when a person meets a new person, they may infer the person's interests and personality traits according to the person's gender. If the new acquaintance is a woman, they can deduce that she is interested in feminine-associated tasks (e.g., makeup and fashion).

While gender schemas are usually formed by the messages we receive from society, one can override these schemas through individual experiences. For instance, a person can see themselves or others as feminine while their assigned sex is male, or masculine while their assigned sex is female, or make a definition independent of these definitions (e.g., non-binary) [35]. This may depend on which social group the person feels like they belong to, as suggested by the social identity theory [38]. Due to the importance of group memberships for a person's sense of self, individuals may become motivated to perceive themselves as distinct from the comparison groups to which they belong [22]. So, it is possible that people's evaluation of themselves and others may change according to the gender with which they identify themselves and the relevant group members of that gender and may be distinct from the gender schemas imposed by society. So, it is important to separate an individual's opinion and the general opinion evaluations of the society, which forms the basis of our study. Given the possible conflicts between an individual's gender schema and a schema imposed by society, it is also important to consider how confident one is in gender evaluations, which is one of the issues we addressed in this study.

Whether it is one's individual schema or a schema that is imposed by society, people use these schemas throughout their lives, and when they become automatic, they are called (gender) stereotypes. Stereotypes reflect expectations about members of certain social groups [29]. Especially when the scope of these social groups is enlarged with the inclusion of advanced technological entities such as social robots, people project these automatic gendered evaluations on them, which will be discussed in the following section.

2.1 Evidence for Gender Attribution to Robots

A growing body of research indicates that people use their gender stereotypes while they perceive and interact with humanoid robots and attribute gender to them. These studies usually manipulate either the surface features of the robots or their occupations. For example, Eyssel and Hegel [12] manipulated hair length and asked people to make judgments about the gender of a robot. Female robots with long hair are perceived as more communal than male robots with short-hair, while male robots with short hair are perceived as more agentic than females. Other studies found that participants evaluated a robot as more masculine when it performs a security job and more feminine when it performs a guidance-related job [40]. Similarly, a male-looking robot was assigned a security job, and a female-looking robot was assigned a health-related job [39].

Stereotypes also influence how robots are designed. For example, a receptionist robot was designed to have more feminine features (e.g., hip-waist ratio) and was evaluated as more "hospitable". On the other hand, when robots are designed with broader shoulders, they are found to have an "authoritative" appearance [41]. These studies indicate that people perceive robots with masculine characteristics as more capable of male stereotypic tasks (for example, carrying an item, using machines) than feminine tasks (e.g., childcare, housework, etc.) [12]. Another study shows that people's perceptions of emotional intelligence can differ in both human and robot agents [8]. The study reported that participants rated male agents with significantly higher emotional Intelligence. Also, they observe that people's attributions change either (1) as a result of the robot's voice or (2) as a result of gender-specific expectations. It means that people expect a female robot to have a higher emotional intelligence rating but are disappointed with the female robot showing lower emotional intelligence. Another study found that when participants and the robot were of the same gender, people's acceptance was greater, and the participants felt psychologically closer to the robot [13]. In addition to these, recent research shows how robot appearances and task characteristics influence people's perceptions of robots [24]. In this study, researchers manipulated robot gender as male/female and task type as social/analytical and measured the level of trust, humanness, and social perception of the gendered robot. The results showed that the robots are viewed as more competent and trustworthy when performing analytical tasks than social tasks, regardless of the robot's gender. Also, they found that there is a tendency for people to dehumanize female robots independent of the tasks performed.

2.2 Consequences of Gender Attribution to Robots

One way to understand the significance of gender attribution to social robots is to consider its consequences. Previous research shows that people tend to project gender schemas and stereotypical behaviors onto robots, especially when they have humanoid characteristics and are placed in social roles [1]. Siegel et al. [33] report that users tended to view robots of the 'opposite'Footnote 1 gender as more credible, trustworthy, engaging, and therefore more persuasive than those of the 'same' gender. This implies that the attribution of gender to robots can affect people's emotions and behaviors towards robots [25, 26]. It is also possible that reflecting gender schemas onto robots leads to discriminatory acts towards them or even dehumanization [21, 36]. On the other hand, gender attribution to robots may change our relationship with other humans as well. For instance, Liang et al. [18] propose that realistic humanoid robots have the potential to disrupt social interaction between humans and create less empathic relationships by offering a poor substitute for human connection.

3 Gaps in Our Knowledge About Gender Attribution to Social Robots

Gender attribution to social robots so far is studied with surface/physical features and tasks/occupations. However, what constitutes gender may not be limited just to those features, and further work is needed to understand what aspects of an agent lead one to attribute a specific gender to them. For instance, actions are important because body movement and comportment are determinative of what makes an experienced human [44]. Actions taken by an agent may carry gender information depending on what has been learned from the socio-cultural environment. In fact, in many cultures certain action categories are associated with certain genders. For instance, in Türkiye in which we conduct the present study, women are usually related to activities such as childcare or housework and men are related to activities such as sports or socializing due to allocating more time in these activities [17]. Given the increasing role of social robots in our cultural practices, it remains unknown whether they are enculturated and gendered like humans.

Another issue that is overlooked in gender attribution studies in social robotics is the distinction between one's self-evaluation of an agent's gender and how they think the society evaluates the gender of the agent. Hira [15] states that people can express their gender identity with(out) concerning social expectations of being a woman or a man. This, in turn, implies that one may evaluate the gender identity of a social robot by disregarding the gender stereotypes constructed in society. So, studies on gender attribution to social robots, especially those with self-reports, should explicitly distinguish whether one is doing a gender evaluation based on their own views or what they think about the evaluation of the society.

A final issue that requires further investigation is whether there are gender differences and individual differences in gendering robots. Previous research shows some evidence that there are gender differences in gendering non-human agents, including avatars in online games, as well as robots [9, 10, 14, 16, 30]. To what extent these differences remain when people evaluate robots based on culturally defined gendered actions they perform is not known. Furthermore, it is an open question whether the extent to which one is exposed to gender roles imposed by a society determines how one evaluates robots in terms of gender.

In the present study, we aim to test whether humans attribute gender to robots based on the actions they perform. We define gender as referring to masculinity and femininity and not as (assigned) sex. An individual's assigned sex can be taken as a categorical variable whose two values are maleness and femaleness, and neither of them can be further broken down; neither is gradable. On the other hand, masculinity and femininity are both continuous variables representing the ideals constructed by society. For instance, considering masculinity and femininity as a continuum, there would be a degree of being masculine/feminine relative to someone [32].

We hypothesize that people attribute a specific gender to a gender-neutral robot when it performs actions that are associated with that gender category. More specifically, we predict that when a gender-neutral robot performs feminine actions, it is evaluated as feminine, and when it performs masculine actions, it is evaluated as masculine. However, we also hypothesize that gendering based on society-level evaluations is much more pronounced than participants' own (self) evaluations. In other words, we predict that a robot performing a masculine or feminine action is evaluated much more masculine or feminine, respectively, in society-level evaluations than self-evaluations. In addition, we also investigate whether there are gender differences and individual differences in gender attribution to social robots. We hypothesize that the extent to one receives messages about gender roles from the society, measured by the Socialization of Gender Norms Scale (SGNS) [2, 11] could predict society-level evaluations but not necessarily self-evaluations.

We believe that our study has important contributions to HRI. First of all, it will extend previous work that studies gendering robots based on visual features or occupations and test whether the actions they perform are effective in gender attribution. Second, it will reveal how one's own evaluations and society's evaluations may differ in gendering robots. Third, it will show whether there are individual differences in gender attribution to robots based on the gender stereotypes we get from society, which is a topic that has not been tested before. Finally, our study will shed light on the possible implications of this work on robot design.

4 Materials and Methods

Our study consisted of the main study (Study 3) and two pre-studies conducted before that (Study 1 and Study 2). In Study 1, we identified a gender-neutral robot among some candidate robot characters. In Study 2, we identified culturally defined feminine and masculine action categories. In Study 3, we animated the gender-neutral robot identified in Study 1 to perform the gendered actions identified in Study 2 and measured how people evaluate the gender-neutral robot when it performs gendered actions.

4.1 Study 1: Selection of a Gender-Neutral Robot

The main aim of the study was to determine a neutral-looking robot in the feminine-masculine dimension. This robot was going to be used in the animations created for Study 3 (see below). In this study, 76 participants were shown pictures of 7 different robot characters from various angles (Neu-1, Neu-2, Neu-3, Fem-1, Fem-2, Ma-1, and Ma-2; See Fig. 1) and then asked to rate the gender of these characters on a scale between (−3) and (+ 3) where [−3 0) means masculine, (0 + 3] means feminine, and 0 means neutral. We used this interval and values because, (1) as Tabachnick & Fidell [37] indicated, the Likert-type scales may treat variables as continuous (even if the data are ordinal), 2) we wanted to have a typical 7-point Likert scale, (2) we wanted to indicate a neutral option so 0 was also placed in the middle of the scale. We then define −3 as indicating extremely masculine, −2 moderately masculine, −-1 less masculine, 0 neutral, + 1 less feminine, + 2 moderately feminine, and + 3 extremely feminine. The robot pictures were used in Study 1 with the permission of the robots’ designers. The robots were selected from Unity, Mixamo, CgTrader, and 123rf.

Fig. 1
figure 1

The static pictures of robots that were used in Study 1 to identify the most neutral-looking robot stimulus.

Figure 2 shows the mean responses given to each robot character. The robots designed as feminine on purpose (Fem-1 and Fem-2) were rated as feminine, and the robots designed as masculine on purpose (Ma-1 and Ma-2) were rated as masculine. Among the robots that were designed as neutral on purpose, Neu-2 and Neu-3 were rated as masculine. The robot that was rated as closest to neutral was Neu-1 (See Fig. 3). So, we selected this robot for Study 3.

Fig. 2
figure 2

The graph shows the mean responses given to each robot character. The error bars show the standard error of the mean

Fig. 3
figure 3

The neutral-looking robot stimulus was used in the main study (Study 3)

4.2 Study 2: Selection of Feminine and Masculine Actions

The category of the actions in the videos was determined by a separate study conducted with 54 participants (35 women, Age range: 19–60, Mean Age: 27.5). They were people who live in Turkey. No information about their occupation or education level was collected. Participants were asked to write down the human actions they think women and men usually do in daily life in 10 min. We defined them as actions rather than gendered social tasks because we refrained from portraying the genders based on 'tasks' or 'roles' that sound like 'requirements' of being feminine/masculine. We defined actions that were associated with women as "feminine" and the actions that were associated with men as "masculine". Our results show that the top 5 actions that were associated with women were babysitting, cooking, cleaning, shop**, and doing make-up, whereas the top 5 actions that were associated with men were playing video games, doing sports, driving, watching TV, and mending. All these actions were reported to be performed by women or men by more than 50% of the participants. One interesting fact about these action categories was that feminine actions were mostly related to care work, whereas masculine actions were mostly related to leisure tasks. These results were consistent with previous reports about gender roles in general and gender stereotypical activities in particular in Türkiye [17].

4.3 Study 3: Main Study

4.3.1 Participants

103 people participated in the study (Age range:18–61, Mean Age = 25.9, SD = 8.95), 68 Women, 34 men, and 1 Queer. Unfortunately, we had to exclude the queer participant due to not having a sufficient number of participants in this category (n = 1). The participants were native speakers of Turkish. The study was approved by the Human Research Ethics Committee of Bilkent University. All participants signed a consent form before the experiment.

4.3.2 Stimuli, Design, and Procedure

The visual stimuli consisted of ten 5-s videos created using 3D animation techniques (Autocad, Sketchup, 3d max, Lumion, cinema 4d). In each video, the gender-neutral robot determined in Study 1 performed an action (See Fig. 4). Half of the videos depicted "feminine" actions (babysitting, cooking, cleaning, shop**, makeup), and the other half depicted "masculine" actions (driving, mending, doing sport, watching TV, playing PC games), which were determined in Study 2.

Fig. 4
figure 4

Single frames for each of the action videos that were used as stimuli the left column shows feminine actions, and the right column shows masculine ones.

There are two main independent variables in the study. The first is the action category, which has two levels: masculine and feminine. The other independent variable is the gender of the participants: male and female.

There were 2 dependent variables. Both variables measured gender attribution, but one of them asked the participants to evaluate the robot in the videos in the feminine-masculine dimension based on their personal view (Self-view), and the other asked the participants to evaluate the robot based on the view of the society in which they live in (Society-view). They were measured on a [−3 + 3] scale where −3 means extremely masculine, + 3 means extremely feminine, and 0 means neutral. Participants were also asked about their confidence level in their answers on a 5-point Likert-type scale where the ankers were "not at all" (0), "not sure" (1), "undecided" (2), "sure" (3), "completely sure" (4). We used a 5-point scale to be sensitive enough to capture the confidence of the participants in their evaluations (a 3-point scale would not be sensitive enough, and a 7-point would be too detailed beyond our interests).

In addition, we also aimed to measure individual differences in gender attribution to social robots. To this end, we used the Socialization of Gender Norms Scale (SGNS) developed by Epstein [11] and adapted to the native language of the participants by Arici [2]. Epstein (2008) examined the relationship between gender role development, gender role conflict, and well-being by focusing on how gender roles develop within the individual. He developed the SGNS, which consists of expressions received from parents and friends about gender roles during socialization. It basically aims to measure the perception of gender roles via the frequency of gender-role-related messages people receive from society (e.g., Q3: Men should be the initiators in romantic relations and should be the ones to ask women out, Q12: People who have premarital sexual relations risk bringing shame to the family name).

The study was conducted online through Qualtrics. It consisted of two parts. In the first part, participants were shown the 10 videos in a randomized order and were asked to rate how masculine or feminine the agent looked based on their self-view and society-view and how confident they were in their response. In the second part, they filled in the SGNS. The study took around 15 min.

4.3.3 Data Analysis

We ran several statistical tests to measure whether people attribute gender to gender-neutral robots based on the category of the action performed on JASP 0.16.1.0. First, we ran 2 (Gender: Male, Female) × 2 (Action category: Masculine, Feminine) repeated measures ANOVA on gender attribution scores based on self-view and society-view separately. Second, we conducted 2 (Gender) × 10 (Action exemplars) repeated measures ANOVA post hoc on the same scores to investigate whether specific action exemplars were more distinctive from the same category's other exemplars. Third, we conducted 2 (View type) × 2 (Action category) and 2 (View type) × 10 (Action exemplars) repeated measures in order to test whether participants' own views and what they think about society view would be significantly different from each other. In addition, although it was not of primary interest, for completeness, we conducted 2 (View type) × 10 (Action exemplar) and 2 (View type) × 2 (Gender) repeated measures ANOVA on confidence scores. To investigate individual differences in gender attribution, we ran a linear regression analysis to find out whether the perception of gender roles (SGNS scores) predicts gender attribution for self-view and society-view scores.

5 Results

5.1 View Analysis

5.1.1 Self-View Analysis

2 (Gender) × 2 (Action category) ANOVA on self-view scores showed a main effect of action category (F (1, 100) = 49.114, p < 0.001, η2 = 0.188). The robot performing feminine actions (M = -0.100, SD = 1.121) was rated significantly higher in the masculine-feminine dimension (i.e., more feminine) than the robot performing masculine actions (M = -1.033, SD = 0.768) (MD: 0.975, SE: 0.161, t = 6.040 p < 0.001, See Fig. 5). There was no main effect of gender (F (1, 100) = 0.185, p = 0.668). There was no interaction between action category and gender (F (1, 100) = 0.807, p = 0.371).

Fig. 5
figure 5

The graph shows the self-view mean responses given to each action category (referring to type of actions in the table) by female and male participants. The error bars show the standard error of the mean

2 (Gender) × 10 (Action exemplar) ANOVA on the self-view scores showed a main effect of action exemplar (F (1, 100) = 18.120, p < 0.001, η2 = 0.129; See Fig. 6). Post hoc analysis shows that within the feminine actions category, babysitting was rated significantly less feminine than cleaning (MD = −1.000, SE = 0.239, t = −4.189, p < 0.001); cooking was rated significantly less feminine than makeup (MD = −1.029, SE = 0.239, t = −4.312, p < 0.001) and cleaning (MD = −1.471, SE = 0.239, t = −6.160, p < 0.001). There was no significant difference between the other action exemplars within the feminine actions category (p > 0.05). Among the masculine actions, doing sports was rated significantly more masculine than watching TV (MD = −0.934, SE = 0.239, t = −3.911, p < 0.003) and driving (MD = −1.037, SE = 0.239, t = −4.343 p < 0.001). Playing PC game was rated significantly more masculine than watching TV (MD = -0.985, SE = 0.239 t = −4.127 p < 0.001) and driving (MD = −1.088, SE = 0.239 t = −4.558 p < 0.001). There was no significant difference between the other action exemplars within the masculine actions category (p > 0.05).

Fig. 6
figure 6

The graph shows the self-view mean responses given to each action exemplar (referring to actions in the table) by female and male participants

There was no main effect of gender (F (1, 100) = 0.185, p = 0.668). There was no interaction between action exemplars and gender (F (1, 100) = 1.098, p = 0.362).

5.1.2 Society-View Analysis

2 (Gender) × 2 (Action category) ANOVA on society-view scores showed a main effect of action category (F (1, 100) = 303.772, p < 0.001, η2 = 0.646). The robot performing feminine actions (M = 1.351, SD = 1.344) was rated higher in the masculine-feminine dimension (i.e., more feminine) than the robot performing masculine actions (M = -1.814, SD = 0.797) (t = 18.636, p < 0.001; See Fig. 7). There was no main effect of gender (F (1,100) = 1.744, p = 0.190). There was no interaction between the action category and gender (F (1,100) = 0.029, p = 0.865).

Fig. 7
figure 7

The graph shows the society-view mean responses given to each action category (referring to the type of actions) by female and male participants. The error bars show the standard error of the mean.

The 2 (Gender) × 10 (Action exemplar) ANOVA on the society-view scores showed a main effect of action exemplar (F (1, 100) = 107.190 p < 0.001, η2 = 0.467; See Fig. 8). Post hoc analysis (Bonferroni corrected) showed that all feminine actions were rated significantly higher in the masculine-feminine dimension (i.e., more feminine) than the masculine actions (p < 0.01). Babysitting action was rated significantly different from all masculine actions (p < 0.01) and from one feminine action, shop** (MD = 0.78 SE = 0.236 t = 3.299, p = 0.014). Cooking was rated significantly different from all masculine actions (p < 0.01). Cooking was rated less feminine from one feminine action, cleaning (MD = −1.029 SE = 0.236, t = −4.357, p = 0.001).

Fig. 8
figure 8

The graph shows the society-view mean responses given to each action exemplars (actions) by female and male participants. Error bars show the standard error of the mean

Among the masculine actions, doing sports was rated significantly more masculine than watching TV (MD = −1.098, SE = 0.224, t = −4.908, p < 0.001) and driving (MD = 0.961, SE = 0.224, t = −4.294, p < 0.004). Doing sport was rated significantly more masculine than watching TV (MD = −1.074, SE = 0.236, t =−4.544, p < 0.001) and driving (MD = −0.926, SE = 0.236, t = −3.921, p = 0.001). The PC game was rated significantly more masculine than watching TV (MD = -1.191, SE = 0.236, t = −5. 042, p < 0.001) and driving (MD = −1.044, SE = 0.236 t = −4.419, p < 0.001). There was no significant difference between the other action exemplars within the masculine actions category (p > 0.05). There was no main effect of gender (F (1, 100) = 1.744, p = 0.190). There was no interaction between action exemplar and gender (F (1,100) = 0.559, p = 0.831).

5.1.3 Comparison of Self-View and Society-View

2 (View) × 2 (Action category) ANOVA showed that there was a main effect of the action category (F (1, 100) = 380.224, p < 0.001, η2 = 0.411). The robot performing feminine actions was rated higher in the masculine-feminine dimension (i.e., more feminine) than the robot performing masculine actions (See Fig. 9). There was a main effect of view (F (1, 100) = 13.628, p < 0.001, η2 = 0.014). Society-view responses were higher than the self-view responses. There was also an interaction between the view and action category (F (1,100) = 79.997, p < 0.001). While the society-view responses were higher than the self-view responses for feminine actions, the opposite was true for masculine actions (See Fig. 9).

Fig. 9
figure 9

The graph shows the interaction between view type and action category with a mean, mean difference, standard deviation, and t and p values

2 (View) × 10 (Action exemplar) ANOVA showed a main effect of view where society-view responses were evaluated higher than the self-view responses (F (1, 100) = 13.628, p < 0.001, η2 = 0.008; See Fig. 10). There was a main effect of action exemplar (F (1,100) = 101.320, p < 0.001, η2 = 0.268). Table 1 shows the post hoc pair-wise comparisons between action exemplars. There was also an interaction between view and action exemplars (F (1,100) = 24.447, p < 0.001, η2 = 0.067). Society-view reports were more pronounced for action exemplars of both action categories (more feminine or more masculine) than the self-view reports (p < 0.001). Table 2 shows the planned contrasts between self-view and society-view responses for each action exemplar.

Fig. 10
figure 10

The comparison of the society-view and self-view reports shows the more pronounced responses (more femininity or masculinity) in the society-view reports compared to self-view reports

Table 1 Post hoc comparisons for the main effect of action exemplars
Table 2 Pair-wise comparisons for all actions by view categories

5.2 Confidence Level Analysis

2 (View) × 10 (Action exemplars) ANOVA on the confidence scores showed a main effect of action exemplar (F (1, 100) = 6.859, p < 0.001, η2 = 0.036). There was no main effect of view (F (1, 100) = 0.018, p = 0.894). There was a significant interaction between view and actions exemplars (F (1, 100) = 6.736 p < 0.001, η2 = 0.017).

Fig. 11
figure 11

The regression analysis shows that SGNS predicts scores for society-view feminine actions: when people's SGNS score is high, they attribute less femininity. SGNS did not predict society-view masculine action scores

2 (View) × 2 (Gender) ANOVA on confidence scores showed no effect of gender (F (1, 100) = 1.556, p = 0.215) and no effect of view type (F (1, 100) = 0.650, p = 0.422). However, there was a significant interaction between view and gender (F (1, 100) = 7.896, p < 0.006, η2 = 0.009). In the post hoc comparisons (Bonferroni Corrected) for the majority of actions, there was no significant difference between self-view and society view scores (p > 0.05) except for several of them, which are listed below.

5.2.1 Comparison of Self-View Action Responses

The confidence level of the self-view responses between none of the pair-wise action exemplars was significantly different from each other (p > 0.05).

5.2.2 Comparison of Society-View Action Responses

The confidence level of society-view make-up action (M = 3.265, SD = 0.730) was rated higher than society-view shop** actions (M = 2.794, SD = 1.047) (t = 4.950, p < 0.001). It was rated higher than society-view watching TV action (M = 2.765, SD = 0.935) (t = 4.384, p = 0.002) and society-view driving action (M = 2.676, SD = 1.055) (t = 5.516, p < 0.001). Society-view shop** action (M = 2.794, SD = 1.047) was rated lower than society-view doing sport action (M = 3.353, SD = 0.779) (t = −5.516, p < 0.001) and society-view playing game action (M = 3.265, SD = 0.807) (t = −4.809, p < 0.001). Also, it was rated lower than society-view mending action (M = 3.256, SD = 0.727) (t = −0.001, p < 0.001).

The confidence level of society-view doing sport action (M = 3.353, SD = 0.779) was rated higher than society-view watching TV action (M = 2.765, SD = 0.935 (t = -4.950, p < 0.001). It was also rated higher than the society-view of driving action (M = 2.676, SD = 1.055) (t = 6.081, p < 0.001). Society-view playing game (M = 3.265, SD = 0.807) action was rated higher than society-view driving action (M = 2.676, SD = 1.055) (t = 5.374, p < 0.001). It was also rated higher than society-view watching TV action (M = 2.765, SD = 0.935) (t = 4.243, p = 0.004). Society-view watching TV action (M = 2.765, SD = 0.935) was rated lower than society-view mending action (M = 3.256, SD = 0.727) (t = 1.131, p = 0.002). Society-view driving action (M = 2.676, SD = 1.055) was rated lower than society-view mending action (M = 3.256, SD = 0.727) (t = -5.516, p < 0.001).

5.3 Regression Analysis

A linear regression analysis was conducted to examine to what extent SGNS could predict participants' society-view and self-view masculine-feminine gender attribution. Our results showed that it significantly predicted society-view feminine action scores (R2 = 0.052), suggesting that the predictor variable explained a 5.2% variance in the outcome variable with (F (1,100) = 5.52, p = 0.021; β = −0.23, p < 0.021). However, SGNS did not predict society view masculine action scores (F (1,100) = 0.16, p = 0.691), nor self-view masculine action scores (F (1,100) = 0.03, p = 0.865) or feminine action scores (F (1,100) = 2.32, p = 0.13). (Figure11)

6 Discussion

There is a growing body of research on gendering robots in HRI [7, 25, 34] The aim of this study was to go beyond the studies that define gender in terms of physical features [12] or occupations [39, 40] and test whether gendered actions defined in social and cultural terms could affect how robots are attributed to gender. Our results show that when a gender-neutral robot performs gendered actions, it is attributed to the same gender category as the action. More specifically, a gender-neutral robot is evaluated as feminine if it performs actions usually associated with women, and masculine if it performs actions usually associated with men. However, people's evaluations of the robot's gender are found to be different from how they think the society they live in evaluates them. The degree of gendering was much more pronounced in the society-level evaluations than the personal ones. In other words, a robot performing a masculine action was rated more masculine at the society-level evaluations than the personal ones. The same is true for a robot that performs feminine actions.

Furthermore, we also investigated individual differences in gendering robots using the Socialization of Gender Norms Scale (SGNS), which measures the extent one receives messages about gender roles from society. Our results show that the gendering of a robot performing a feminine action at the society-level evaluations is well predicted by the SGNS. However, the gendering of a robot performing a masculine action was not predicted well, which in turn, indicates the differences in how gender roles taught by society affect gender attribution to robots.

6.1 Novelty and Contributions to HRI

Our study has three main contributions to the HRI literature. First, our study extends previous work on gendered robots by considering a social and cultural aspect of the gender concept, namely the actions or activities associated with each gender. In that sense, our study complements the work that defines the gender of a robot primarily with visual features [27] and provides evidence that gender should be defined by taking into account its physical, social, and cultural aspects in the context of actions [5]. At first sight, physical features seem to have "enough" information to infer the gender of a robot. For instance, people may think that having a broad shoulder makes a robot masculine [41] or that having long hair makes it more feminine [13]. However, when the critical visual features that help people easily assign a gender category are lacking, they may have to rely on other information, such as the actions and activities performed by the robot. Our study shows that actions can also be gendered depending on society's social and cultural norms (e.g., women clean or take care of their babies whereas men play video games), and this can project to the evaluations of robots even though they look gender neutral.

Furthermore, our results suggest that actions may vary in the degree they induce gendered evaluations for robots. While the feminine and masculine action categories identified in our pre-study were largely separated in gender evaluations of robots in the main study, some action exemplars within each category represented that category more strongly than the other exemplars (e.g., cleaning was more feminine than shop**, and playing video games was more masculine than driving). This variation may be determined by how often that action is performed by the women and men of the society where the evaluations are made.

Second, our study demonstrates the importance of distinguishing between one's personal evaluations and what one thinks about society's evaluations of gendering robots. Given that the concept of gender is inherently social, but that gender schemas can be overridden by one's individual experiences [34], one might expect that people may deviate from the social norms in gendering robots. Indeed, our data shows that society-level gendering is much stronger than the personal-level gendering of robots. More specifically, a robot performing a feminine or masculine action was evaluated more strongly as feminine or masculine, respectively, in society-level evaluations than the personal-level evaluations. Furthermore, our results also indicate that society-level evaluations can be predicted by the extent to which one is exposed to information about gender roles in the society they live in. However, this holds true only for actions in the feminine category and not those in the masculine category. This, in turn, suggests that as a person receives a greater deal of information about gender roles from society, they will think that society attributes less femininity to feminine actions but that their judgments about masculine actions will not be affected. This is an interesting finding which may be further investigated in future studies to reveal the individual differences in gender evaluations in different cultural contexts.

Third, our study demonstrates the role of animation techniques in HRI. In an ideal world, in order to understand our real-world interactions with robots, we should design studies in which humans share the same physical space with robots and interact with each other. However, this is not always possible for practical reasons. The easiest alternative is to use depictions of real robots such as images, as it is traditionally done in psychological and cognitive science research. The problem with that approach is that (1) participants may not be satisfied by the interaction with a robot in a virtual form as compared to a physically embodied robot [42], (2) the content of the images is constrained by the capabilities of the real robot that is depicted, i.e., by the current technology we have. On the other hand, in HRI, we aim to predict the future and guide the design of future robots that can be used effectively in human life. Therefore, it is important to be able to create scenarios that are physically not feasible now but can be imagined and studied. To this end, animation techniques provide great opportunities to create interactive content and manipulate context and character, which allows us to envision robots as part of specific social and cultural contexts while they interact with humans and measure human responses to such scenarios [3, 31]. Although the physical embodiment emphasized by Wainer et al. [42] may not be fully attained with animations, the way a robot is embedded in a real-life context, as we did in the present study, may provide a vivid interaction experience [28] and help with prototy** in the design of robots and reduce costs.

Our study has important implications for robot design. It seems like the appearance of the robots is not the only factor that will determine how they will be evaluated by the people who will interact with them. Their behavior and the kind of activities they perform are at least as important as their appearance. The fact that they could be assigned gender in line with the culturally gendered actions suggests that, as humans, we project our gender schemas to non-biological agents, which in turn may have implications for how we interact with them. For instance, if a person discriminates against females in their human–human interactions, they may tend to discriminate and even dehumanize those robots that show feminine characteristics and may not have effective interactions [23, 36]. In contrast, people may view the robot of the oppositeFootnote 2 gender to be more engaging, as suggested by Siegel et al. [33], and thus may have successful interactions. Furthermore, our study also implies that the same robot can be evaluated differently in two different cultures (e.g., if they differ in gender roles) and suggests that robots should be customized according to the cultural norms of the society in which they will be used.

6.2 Limitations and Future Work

Our study has several limitations. First, the gender evaluations were based on a specific robot used to animate the actions. That robot was carefully determined with a pre-study that aimed to select a gender-neutral robot among many alternatives. All the other robot candidates were found to be highly feminine or masculine, so we ended up having one specific robot that met our criteria. In order to test how generalizable our results are, future studies could employ a variety of robots that are all considered to be gender neutral. On a related topic, a natural follow-up of this study can investigate whether people could override their evaluations of a gendered robot (instead of a gender-neutral robot) when it performs the actions of the masculine look-feminine actions (vice versa). For instance, one can test whether a male-looking robot is evaluated as feminine if it does makeup or takes care of a child. This would show how effective actions are in gendering robots.

A second limitation of our study is that the robot stimuli shown in the pre-study (Study 1) are static pictures of the robot from three angles, whereas the robot in the main study (Study 3) was moving. This raises the question of whether it is the action in cultural terms or a specific posture of the robot while performing the action that gives a clue about the gender of the robot. While it is difficult to tease apart these two possibilities in this experimental setting, it is unlikely that it is a specific posture that implied a certain gender category, as we made a big effort to animate the robot as neutral as possible and in the same way when they perform the different actions. Future studies can test this possibility by asking participants to give justifications for their evaluations.

Another related limitation is that we used only visual modality to present the robot animations. The animations lack sound, which might have provided some cues about the gender of the robot. For instance, the voice of the robot (intonation, stress, etc.) or even how they feel when you touch them in an interactive scenario could be effective in gendering robots. Thus, future studies could examine the evaluations of robots in very rich multimodal environments and possibly in interactive scenarios.

Another possible limitation of our study is that the results (both self-view and society-view evaluations) could be highly dependent on the background of the participants and the culture they are raised in. Different cultures (e.g., western vs. eastern) may assign different gender roles to men and women, which would, in turn, be projected on how they perceive and interact with robots. Even the pre-study that identified gendered action categories in the present study might have revealed different action exemplars if it was conducted in a different culture. Thus, future work could examine cross-cultural differences in gendering robots and test to what extent beliefs about gender roles in society could affect how robots are evaluated. It is also important to note that the evaluations may also depend on whether the robot is of personal use or will be used publicly. For instance, if the robot is used publicly in a shared setting, people may have a tendency to prioritize what society thinks and act accordingly.

One also needs to be cautious when interpreting self-reports in HRI. Self-reports allow the researchers to measure the explicit judgments of people in response to robots, but they may not always reflect what they would do in real interaction. In other words, what people report and what they do may be different from each other. For instance, in the context of the present study, one might want to give the impression that they do not discriminate against women and assign stereotypical gender roles to them (e.g., cleaning), so they might rate a cleaning robot quite low in the feminine spectrum. However, they might behave differently in their daily interaction with the robot. This calls for study designs that resemble real-life situations and even measure behavior implicitly.

A final limitation of our study was that it was conducted online due to the ongoing COVID-19 pandemic, and there might be variability in the platforms (computers, tablets, mobile phones) the participants completed the task. Although our task did not require participants to engage with the stimuli's low-level visual properties while making their gender judgments, the differences in size, viewing angle, or distance may not be negligible and might have affected the results. Future work should ideally test participants under the same presentation conditions.