Introduction

In our daily lives, we constantly perform actions (e.g., to put on a belt buckle) and observe others performing actions. It is also quite common for us to imagine ourselves executing actions, such as planning a task we intend to perform later. In addition, there are situations in which we both observe and imagine actions (e.g., watching someone drinking a glass of water and imagining ourselves doing the same). Consequently, the study of action observation in combination with imagery has been attracting a considerable interest in several research domains, such as memory (e.g., Ghetti et al., 2008; McDaniel et al., 2008), sports psychology (e.g., Bruton et al., 2016; Wright et al., 2018), motor rehabilitation (e.g., Emerson et al., 2018; Scott et al., 2020), neuroscience (e.g., Eaves et al., 2016; Nedelko et al., 2012), and robotics (e.g., Hofree et al., 2015; Press, 2011).

To explore the processes underlying action observation and imagery and their effects on different (e.g., cognitive) processes, researchers depend on action-related stimuli, such as visual representations of actions (e.g., pictures and videos) and action statements (e.g., “to sharpen the pencil”). For example, studies probing mental object representations rely on object pictures (e.g., Yee et al., 2013), and studies on motor training recur to action videos (e.g., Nedelko et al., 2012; Scott et al., 2020). In the field of memory, action videos have been employed to explore associative memory deficits (e.g., Old & Naveh-Benjamin, 2008), as well as memory for actions (McDaniel et al., 2008). Specifically, regarding the latter, studies have used action statements (e.g., Goff & Roediger, 1998; Li et al., 2020; Manzi & Nigro, 2008; Peters et al., 2007; Thomas et al., 2003), object photos (e.g., Gonsalves et al., 2004; Lindner & Echterhoff, 2015) and, in some cases, action videos corresponding to specific action statements (e.g., Kashihara et al., 2017; Lindner et al., 2010). Therefore, there is a clear need for well-controlled action-related stimuli to further understand the mechanisms subserving action observation and imagery. Nevertheless, ensuring that the chosen experimental stimuli comply with all the requirements for performing rigorous experimental research (e.g., all actions are easy to imagine) is a demanding and particularly time-consuming task.

There are several databases of static images ready to be used, namely line drawings of objects (e.g., Cuetos & Alija, 2003; Snodgrass & Vanderwart, 1980), line drawings of actions (e.g., Akinina et al., 2015; Masterson & Druks, 1998; Schwitter et al., 2004; Shao et al., 2014), object photographs (e.g., Brodeur et al., 2010; Souza et al., 2021), and action photographs (e.g., Bonin et al., 2004; Fiez & Tranel, 1997; Shir et al., 2021). However, with a few exceptions (e.g., Umla-Runge et al., 2012), validated dynamic visual stimuli (i.e., action videos) are scarce. Moreover, normative studies have typically gathered data regarding specific parameters (e.g., name agreement, word frequency) that are pertinent for object-related tasks (e.g., naming time, object perception) but are not necessarily the most relevant for action-related tasks, specifically when considering action observation and imagery (e.g., Akinina et al., 2015; Cuetos & Alija, 2003).

One important feature of controlled action stimuli is that they depict only the action itself while being stripped of other distinctive features, such as a distracting background and the actor’s face. Action pictures, such as those created by Fiez and Tranel (1997) or by Shir et al. (2021) in the ObjAct stimulus set, include both the action and the actor’s face, and, in the case of ObjAct, the background as well. However, face processing involves specific cognitive mechanisms (e.g., Farah et al., 1998; Tsao & Livingstone, 2008) that can interfere with action processing manipulations. Specifically, Ferstl et al. (2017) demonstrated that actor identity (facial features, clothing, and body posture) interferes with action recognition. Umla-Runge et al. (2012) provided a set of action videos without the actor’s face and with a neutral background. However, those stimuli were validated solely on familiarity for different cultures, lacking information on other parameters relevant to the study of action observation and imagery (e.g., action imageability).

The current study aimed to validate a set of object-related action statements and corresponding dynamic (i.e., action videos) and static (i.e., object photos) stimuli. Our goal was to create stimuli that mimic simple everyday actions that are easy to imagine, familiar, and have a conventional way of being performed (e.g., sharpening a pencil). In Study 1, we asked participants to evaluate action characteristics, namely imageability, image agreement, action familiarity, action frequency, and action valence. One should note that action frequency and familiarity were assessed separately. Action familiarity refers to the extent to which participants interact or think about the action in their daily lives, whereas action frequency addresses the frequency with which participants perform the action. In this sense, action frequency is not equal to familiarity. An action may be familiar because participants watch another person performing it often. For example, a given participant may not open a safety pin regularly but considers the action familiar since they observed their grandmother doing it. Whereas action frequency can arise from self-performance only, action familiarity can develop from multiple sources (e.g., self-performance, imagination, observation). Thus, even though these parameters overlap, separate ratings allow for a finer assessment of participants’ experience with the action. Additionally, these actions should reflect situations that, without a specific context, would be characterized as neutral, that is, in which emotional content is irrelevant, considering that the emotional quality of a stimulus may change its processing (e.g., Pell et al., 2015). In Study 2, with a different sample, we asked participants to evaluate object features, i.e., object familiarity, object valence, and action prototypicality regarding the objects. By collecting data on the actions and on the objects separately, we can disentangle whether action ratings, specifically familiarity and valence, depend on object ratings.

Study 1

Method

Participants

Two hundred and three volunteers participated in this study. However, 23 abandoned the study before completing 30% of the task, and 19 failed the attention-check question. An attention-check question was presented at the beginning of the study to confirm that participants were able to watch the videos. When participants failed this question, the study finished automatically. This task is detailed in the Procedure section. Thus, our final sample comprised 161 volunteers who evaluated 78–100% of the task items and passed the attention-check question (139 female; Mage = 23, SD = 8.64, age range 18–77 years; 148 right-handed, 9 left-handed, and 4 ambidextrous). One hundred and thirty-two were college students and received course credit for their participation in the study. All participants were native speakers of European Portuguese. Participants were randomly assigned to one set of 20 actions. The actions were randomly divided into three subsets, and each participant only evaluated one subset. Each action was rated by a minimum of 50 and a maximum of 56 participants. The sample size was selected following previous normative studies for action pictures (e.g., Bonin et al., 2004; Schwitter et al., 2004).

Materials

The action statements were originally taken from Goff and Roediger (1998) and Thomas and Loftus (2002). Lindner et al. (2010) had already added some new action statements to a list based on Goff and Roediger (1998) and Thomas and Loftus (2002), and some other action statements were created analogously for the studies reported here. Due to time constraints, such as the time needed for participants to watch action videos and/or imagine actions from object pictures, or the number of repeated presentations, most experiments in this field used 60 or fewer actions (e.g., Kashihara et al., 2017, Lampinen et al., 2003, Lindner & Henkel, 2015, Thomas et al., 2003). Thus, we selected 60 everyday objects (e.g., sock, napkin) or pairs of objectsFootnote 1. The criteria defined to select and create the actions were the following: all actions were object-related, the objects were small (the smallest object was a €0.20 coin, and the largest was a 10×10×15 cm napkin holder) and could be manipulated with one or two hands, and only one plausible action was assigned to each object. To ensure that the items were visually neutral, i.e., not salient, we opted for objects with solid colors, avoiding distinctive prints, symbols, or letters. Whenever possible, we removed brand labels and other marks. As in the studies mentioned above, only one unique action was assigned for each object, characterized by being plausible and straightforward for the corresponding object (e.g., object: “glove” = action: “put the glove on”).

As mentioned above, actions were selected considering studies on false memories for actions. In these studies, participants usually imagine performing actions from a first-person perspective (e.g., Goff & Roediger, 1998). In the case of observation, videos from a third-person perspective lead to more false memories than videos from the first-person perspective (Lindner et al., 2010). As such, only the objects were photographed so they could be used in self-imagination tasks. Specifically, they were placed on a white tabletop, and a colored photograph was taken. A tripod was used to ensure that the same angle and distance were kept for all images (Fig. 1A). As in Lindner et al.’ (2010) experiments, the videos were filmed from a third-person perspective in a neutral set. The actor was stripped of distinctive features (e.g., nail polish, watch, rings), wore a black jersey, and performed the actions repeatedly for approximately 10 seconds (Fig. 1B). The videos contain the sound produced by the objects while being manipulated to make sure they depict a realistic execution of the actions. All photos and videos can be found online (https://www.osf.io/ywsvd/?view_only=0c4bedeb591e460b97b554f828d17d67).

Fig. 1
figure 1

Example of visual representations of actions: object photo (A) and screenshot of the video () for the action “To open the locker”

Procedure

Data were collected online via Qualtrics survey software. The survey link was shared on social networks and made available for college students from one Portuguese university on a platform in which they can participate in studies in exchange for course credit. The study started with the presentation of the informed consent and instructions. Participants were informed that they could terminate their participation at any moment without any consequences. To ensure that participants were able to watch the videos on their devices, an attention-check question was presented. This consisted of the presentation of a yellow circle and the sound of a phone ringing. Participants should correctly indicate the object shape and color and the sound presented; otherwise, the study would not continue.

The 60 actions were randomly divided into three sets of 20. Each participant completed only one set that was presented in random order. Trials began by showing an action statement (e.g., “flip the coin”). After reading it, participants pressed a button, and the object photo was presented alongside the instruction to imagine themselves performing the action. They were asked to imagine action execution repeatedly while the photo was presented on the screen (10 seconds). Then, participants rated action imageability, that is, how easy it was for them to imagine the action (from 1 = extremely difficult to imagine to 9 = extremely easy to imagine). Afterwards, the action video was presented, and after having watched the video, participants were asked to rate (i) image agreement, the degree to which the action video is similar to their mental image of the action (from 1 = not similar at all to 9 = very similar); (ii) action familiarity, the degree to which the action is familiar to the participants (from 1 = not familiar at all to 9 = very familiar); (iii) action frequency, how often the participants perform the action on a daily basis (from 1 = never to 9 = very often); and (iv) action valence, the degree to which the action is pleasant to the participants (from 1 = extremely unpleasant to 9 = extremely pleasant).

Results and discussion

A detailed description of each normative parameter per action is provided in the Appendix (https://www.osf.io/ywsvd/?view_only=0c4bedeb591e460b97b554f828d17d67). We note that, despite presenting an English translation, only the Portuguese version of the action statements was tested. Table 1 depicts the descriptive statistics for the 60 actions. Imageability (M = 8.08) and image agreement (M = 7.58) ratings were high, and the distributions were negatively skewed. These ratings indicate that participants considered most actions easy to imagine and that the action videos matched their mental image for most actions. High image agreement ratings indicate, as intended, that the videos most likely depict a conventional way to perform these actions. The actions were also rated as familiar (M = 7.24). The mean action frequency rating was neither high nor low (M = 4.49). The full scale was used (range = 1.47–8.09) for this parameter, indicating variability in actions’ frequency. Concerning emotional valence, most actions were evaluated as neutral or positive (M = 5.78, range = 4.78–7.13).

Table 1 Descriptive statistics for the action’s parameter ratings

Because some parameters did not follow a normal distribution, we tested the relationship between the normative parameters with Spearman correlations (ps are reported two-tailed, Table 2). Imageability and action familiarity ratings correlated positively with all parameters, suggesting that more familiar actions are easier to imagine, are performed more frequently, and are perceived as more pleasant (i.e., higher valence ratings). In turn, actions that are easier to imagine and more familiar are also depicted in the videos in a manner that matches participants’ action imagination, having higher image agreement. Image agreement correlated positively with action valence, indicating that participants rated actions with higher image agreement as more pleasant (Table 2).

Table 2 Spearman correlations matrix between actions’ normative parameters

Study 2

Method

Participants

One hundred and sixteen college students participated in this study in exchange for course credits. However, one abandoned the study before completing 30% of the task. Thus, our final sample comprised 115 volunteers, who evaluated 100% of the task items (105 female; Mage = 20.96, SD = 4.50, age range 18–43 years; 109 right-handed, 6 left-handed, and 0 ambidextrous). All participants were native speakers of European Portuguese, and had not participated in Study 1. Participants were randomly assigned to one set of 30 objects. The objects were randomly divided into two sets, and each participant only evaluated one set. Each action was rated by a minimum of 57 and a maximum of 58 participants. The sample size was selected following Study 1.

Materials

In this study, we used the same object photos and action statements as in Study 1.

Procedure

Data were collected online via Qualtrics survey software. The survey link was made available for college students from one Portuguese university on a platform in which they can participate in studies in exchange for course credit. The study started with the presentation of the informed consent and instructions. Participants could quit the study at any moment without any consequences. The 60 objects were randomly divided into two sets of 30. Each participant completed only one set that was presented in random order. Trials began by showing the name of an object/pair of objects (e.g., “coin”) and its picture. Below the object picture, participants were asked to rate (i) object familiarity, the degree to which the object is familiar to the participants (from 1 = not familiar at all to 9 = very familiar); and (ii) object valence, the degree to which the object is pleasant to the participants (from 1 = extremely unpleasant to 9 = extremely pleasant). Afterwards, the object was presented alongside its corresponding action statement. Participants were asked to rate action prototypicality, the degree to which the action is prototypical regarding that specific object (from 1 = not prototypical at all to 9 = very prototypical). To clarify the instruction, an example was provided: “A sparrow is a highly prototypical exemplar of the category ‘birds’, whereas a penguin is a poorly prototypical exemplar of this category. In this sense, ‘wrap** a bandage around a finger’ is more prototypical of the action ‘bandage’ than ‘drawing on the bandage’.”

Results and discussion

Each normative parameter is described in the Appendix (https://www.osf.io/ywsvd/?view_only=0c4bedeb591e460b97b554f828d17d67). As in Study 1, only the Portuguese version of the stimuli was tested despite presenting an English translation. Table 3 depicts the descriptive statistics for the 60 objects. Ratings of object familiarity (M = 7.99) and object-action prototypicality (M = 7.15) were high, and their distributions were negatively skewed. These ratings confirm that most objects are familiar to the participants, and that the action-statements correspond to actions prototypical for most objects. Concerning emotional valence, the objects were evaluated as positive (M = 6.51).

Table 3 Descriptive statistics for the object’s parameter ratings

We calculated Spearman’s correlations (ps are reported two-tailed) between normative parameters. Object familiarity correlated positively with object valence, indicating that more familiar objects are perceived as more pleasant (Table 4).

Table 4 Spearman correlations matrix between objects’ normative parameters

Two parameters—familiarity and valence—were rated for both actions and objects to disentangle if action ratings depended on object features. Even though the samples of the two studies are comparable (i.e., in both cases, participants were mostly female college students in their early 20s, and 50–58 answers were obtained per stimuli) and all parameters were rated using the same scale (1–9), these ratings were transformed into Z scores to ensure a reliable correlation analysis. Note that this analysis was conducted at the item (objects and actions) level (i.e., not at the participant level). We ran a pairwise two-tail Pearson correlation for the Z scores of action and object familiarity and valence. Neither action and object familiarity (p = .14) nor action and object valence (p = .25) were correlated. Thus, action familiarity and valence were not associated with object familiarity and valence, respectively.

For a closer examination of our findings, we considered the correlations reported in other normative studies with visual representations of action-related stimuli (Table 5). Given the variety of stimuli and parameters assessed in those studies, the pattern of correlations reported is not always consistent with those found in the current study. Yet, the correlations between imageability, action familiarity, and image agreement found in Study 1 are similar to those reported in previous normative studies that validated drawings of actions (e.g., Akinina et al., 2015; Shao et al., 2014). Among these studies, normative valence ratings were only collected by Souza et al. (2021) in a study that used object photographs. The authors found a positive correlation between an object’s familiarity and valence, as we observed between action familiarity and valence (Study 1) and between object familiarity and valence (Study 2). Additionally, whereas we measured the frequency of action execution, most studies measured word frequency (object name or action verb).

Table 5 Correlations reported in normative studies on visual representations of action-related stimuli

To provide a complete view of the findings, we also included those correlations. Familiarity correlated positively with both action frequency in our study and with word frequency in several studies (e.g., Akinina et al., 2015; Bonin et al., 2004; Cuetos & Alija, 2003). Nevertheless, there are differences between the correlational patterns of word frequency on the one hand and action frequency on the other. Unlike action frequency in Study 1, word frequency did not correlate with imageability (e.g., Akinina et al., 2015, Bonin et al., 2004, Cuetos & Alija, 2003). In some cases, word frequency correlates negatively with image agreement (e.g., Akinina et al., 2015; Bonin et al., 2004), and in others, the correlation does not reach statistical significance (Schwitter et al., 2004; Shao et al., 2014). These data indicate that action frequency should be considered in the study of action observation and imagery. Despite not being usually measured, action frequency is a relevant normative parameter that provides action information beyond the psycholinguist parameters of word frequency.

In sum, the actions included in the 3ActStimuli set vary in frequency; imageability, image agreement, action familiarity, object familiarity, and object-action prototypicality were high on average. Regarding valence, while most actions were rated as neutral, most objects were considered positive. Some of the relationships between normative parameters follow previous normative studies on drawings of actions. In addition, the link between imagination and familiarity is known in false action memory research (e.g., Garry & Polaschek, 2000), in which the imagination of familiar actions has been found to increase false memories (e.g., Mammarella, 2007, Thomas & Loftus, 2002).

A possible limitation of our stimulus set is that we did not control for psycholinguistics parameters such as the length of action statements or the low-level properties of the visual stimuli (e.g., contrast, luminance, or visual complexity). Since our aim was to provide materials for action observation and imagery tasks, we did not create action photographs. However, future normative studies should also control additional parameters (e.g., visual complexity) and include action photographs.

Note that some parameters—for example, action familiarity (Umla-Runge et al., 2012)—are expected to vary across cultures. In both studies, participants were recruited mostly from a Portuguese university and only native speakers of European Portuguese were included. As such, the validation and use of these materials in other cultures should be tested.

Conclusions

This study provides a set of 60 object-related actions validated in three different formats: action statements, and corresponding dynamic (action videos) and static (object photos) stimuli. Even though object photos are commonly used in different research areas, videos represent a more ecologically valid way to depict actions in research focused on action observation and other action-related phenomena (Muylle et al., 2020). In addition, our action videos are stripped of possible distractions, such as the actor’s face, allowing participants to focus on the action only. Moreover, the existence of both object photos and action videos for the same actions will allow, in future studies, an easy manipulation of action observation and imagery within the same experiment. As mentioned above, the combined study of these processes has raised growing interest (e.g., Eaves et al., 2016). In this sense, we intended to provide norms on parameters relevant for action observation and imagery research, namely imageability, image agreement, action familiarity, action frequency, action valence, object familiarity, object valence, and object-action prototypicality. Whereas ratings of imageability, image agreement, and familiarity have been reported in other studies on visual representations of action-related stimuli (e.g., Akinina et al., 2015; Bonin et al., 2004), norms for valence, action frequency, and object-action prototypicality of action-related stimuli are scarce.

The actions selected correspond to everyday actions performed by manipulating a small object with one or two hands, rated highly on object-action prototypicality. Most objects were considered familiar and positive in valence. Although some actions were not executed regularly by our participants, most were rated as familiar, easy to imagine, and neutral. Overall, the high image agreement indicates that our action videos represent a conventional way of performing these actions. The 3ActStimuli set constitutes a valuable and complete tool to be used in distinct research domains, from memory to sports psychology.