Introduction

Bullying is a sub-class of aggression that may take many different forms and involves intentional and repeated attempts to distress or harm a less powerful victim (Olweus, 1993, 2013). Common types include physical bullying (e.g. pushing, hitting), verbal bullying (e.g. name-calling, verbal threats), and relational bullying (e.g. rumour circulation; manipulation, social exclusion) (Baldry & Farrington, 2004). Like bullying, cyberbullying is another sub-type of aggression and despite sharing similar characteristics to bullying (i.e. intent to cause harm; repetition, power imbalance), it is also characterised by unique differences (Smith, 2016; Macaulay et al., 2020). Unlike bullying, cyberbullying is perpetrated online, and features of anonymity and publicity play a bigger role in the online domain (Macaulay et al., 2020; Smith, 2016; Steer et al., 2020). It is common among school students (Menesini & Salmivalli, 2017; Olweus & Limber, 2010; Salmivalli, 2010) and is and continues to be a threat to their well-being, with teachers often struggling to address the issue within the school (Macaulay et al., 2018).

One recent large-scale survey in the UK of 9150 12–20-year-olds revealed that 51% were bullied at least once a month, with at least 34% being bullied each week (Ditch the Label, 2018). In addition, Przybylski and Bowes (2017) in their survey of 120,115 UK adolescents found that 27% had experienced being bullied on a regular basis. Despite scholars recognising variations in reported prevalence due to differing definitions and assessment methods used within research (see Volk et al., 2017), it is clear that bullying is a problematic issue that calls for intervention in the school environment. It can have diverse negative effects on the health and well-being of victims (Arseneault, 2017; Boulton et al., 2008; Hawker & Boulton, 2000; Reijntes et al., 2010), bullies (Copeland et al., 2013; Cowie & Myers, 2017), and onlookers (Midgett & Doumas, 2019; Rivers, 2012). Previous cross-sectional studies have reported how involvement in bullying can lead to serious adverse outcomes, such as increased suicidal ideation (Hinduja & Patchin, 2019), higher levels of social anxiety (Hawker & Boulton, 2000), and depression (Foody et al., 2020). These outcomes of school bullying not only occur within childhood experiences, but also through into adulthood (Zych et al., 2015), suggesting that it may precipitate adverse later life outcomes (Arseneault et al., 2010). Thus, bullying is a serious public health concern and it is imperative that effective anti-bullying interventions are put in place to address the issue.

While progress has been made in implementing interventions to tackle bullying, meta-analyses reveal residual rates that are usually far from zero (Merrell et al., 2008; Ttofi & Farrington, 2011). However, meta-analyses and systematic reviews on anti-bullying intervention strategies, in general, report some positive outcomes for anti-bullying prevention efforts, reducing bullying victimisation and perpetration (e.g. Evans et al., 2014; Gaffney et al., 2019). For example, Gaffney et al. (2019) found in their meta-analysis on the effectiveness of school-bullying prevention programs that there was a reduction of victimisation by 8–12% from evaluations conducted in the UK, with positive outcomes at reducing victimisation and perpetration globally. Despite this, anti-bullying interventions have not always been regarded as effective (Cunningham et al., 2016). One reason might be because students are not always receptive to anti-bullying initiatives delivered by teachers and other adults. Rigby and Bradshaw (2003) and Boulton and Boulton (2012) reported that many students believed teachers were not usually interested in tackling bullying and expressed little or no desire to collaborate with them in this regard. More recently, qualitative focus groups with young people suggested that anti-bullying interventions did not engage students, were delivered in a repetitive manner, and students felt that teachers were not the best group to deliver these anti-bullying messages (Cunningham et al., 2016). In addition, there is a belief among students, particularly victims of bullying, that teacher involvement may make the situation worse, and so students do not expect teachers to get involved (Newman & Murray, 2005; Smith & Shu, 2000). However, in order to solicit help, it is crucial that students do report bullying victimisation to teachers, and so the current study investigated how that might be encouraged. Partly for these reasons, student-led interventions, often a peer support service or buddy system (Boulton, 2005; Cowie, 2011; Tzani-Pepelasi et al., 2019), have also been implemented and evaluated. While these may help victims who use this service feel better, they do not come close to being a “proven” anti-bullying strategy because they are not directed at variables that are likely to mitigate bullying or its negative effects in the wider community of students (Gaffney et al., 2019; Houlston & Smith, 2009; Thompson & Smith, 2011). Nevertheless, students are now regarded as central to anti-bullying work, and efforts to find alternative ways of involving them are warranted, especially in community samples (Salmivalli, 2010). Studies have revealed significant, albeit modest, associations between beliefs about bullying and actual behaviour, suggesting that changing the former could lead to reductions in the latter (Boulton et al., 2001; Boulton et al., 2002). We now consider some specific beliefs that research suggests may do so.

Some sub-types of bullying, notably social exclusion and verbal bullying, are not perceived as serious as others, or even not as bullying per se, and this may be a reason why some students engage in them (Newman & Murray, 2005). For example, bullies may regard verbal bullying as a form of joking around with their peers (Shute et al., 2008). Some research has also reported that many young people and adults fail to recognise verbal bullying as something serious, at least relative to physical bullying (Jacobsen & Bauman, 2007; Maunder et al., 2010). Such beliefs may encourage the notion that incidents of verbal and social exclusion bullying are acceptable in the school environment or at least less likely to be “picked up” by the teachers. However, both forms can be considered a harmful form of peer relationships with adverse consequences (Coleman & Byrd, 2003; Hawker & Boulton, 2001). Given that verbal and relational bullying are seen as less serious than other forms of bullying, it is important that further work explores how harmful young people view verbal and relational bullying and if they regard these types of bullying as acceptable. In a sub-sample of approximately 6400 young people that reported experiencing bullying victimisation, verbal bullying and relational bullying were identified as the most prevalent types of bullying, with 88% having experienced verbal and 53% relational (Waasdorp & Bradshaw, 2015). So, while these forms of bullying are often not only considered less serious than other forms, they are also a more common experience for young people.

Considering gender differences, more girls than boys have been found to be victims of verbal or relational bullying (Waasdorp & Bradshaw, 2015) and, not surprisingly, to be more likely to view these types of bullying as more serious (Maunder et al., 2010; Shute et al., 2008). For example, Shute et al. reported that boys perceive verbal bullying as less serious and deemed such acts as harmless with no concern on the impact bullying could have. Despite this, one other study suggested that boys and girls are equally likely to engage in verbal or relational bullying, with no gender differences on the perceived severity of the bullying (Newman & Murray, 2005).

Despite the distress it often arouses, bullying often goes un-reported by victims (Boulton, 2005; Cowie, 2000; Hunter et al., 2004). Encouraging more disclosure is a necessary condition for mobilising social support and is clearly warranted to help eliminate bullying in the school community and beyond. The bullying literature has noted that many students choose not to disclose bullying victimisation to teachers because they do not believe that teachers can or will provide help (Bradshaw et al., 2007; Unnever & Cornell, 2004). More recently, one study found that students were less likely to report their victimisation experiences if they did not trust the teachers and school staff (Berger et al., 2019). This consideration highlights the promising role of student-led interventions not only to combat bullying in general, but also to support students that may not trust the support of teachers in the school environment. Indeed, teachers can play an important role when providing social support to victims of bullying to reduce any negative outcomes. From a theoretical perspective, the stress buffering hypothesis suggests that increased perceived social support can reduce the relationship between a stressor and a negative outcome (Cohen & Wills, 1985). Boys have been found to be less inclined to partake in services that focus on social support for bullying, and girls are often more inclined to seek help generally (Boulton, 2005; Cowie, 2000). In addition, girls are more likely to view seeking help as the best strategy to overcome bullying and make them feel better (Hunter et al., 2004). In general, when it comes to the disclosure of bullying and seeking help, girls compared to boys are more likely seek social support to help cope with victimisation (Eliot et al., 2010; Oliver & Candappa, 2007).

One reason why boy’s may be less inclined to seek help is that such help seeking behaviours may compromise boys’ sense of masculinity, and boys may perceive it is less socially acceptable for them to ask for help (Cowie, 2000; Nadler, 1998). By encouraging young people to disclose victimisation to teachers, social support can be mobilised which will provide more options to the victim on how to cope with their victimisation, potentially reducing the negative outcomes. Teachers are important sources of support to help victims of bullying (Beckman & Svensson, 2015; Boulton et al., 2013), so it is important to encourage disclosure for young people so they can ask for the right kind of help and support and also understand when it is important to tell a teacher they have been bullied and would benefit from support.

Encouraging bystanders, those who witness bullying, to take responsibility to do something positive is also important since they are often reluctant to do so (Caravita et al., 2009; Gini et al., 2008; Thornberg et al., 2020). Young people are known to make evaluations about the likely impact of different types of bullying on victims (Chen et al., 2015) and so can respond in a negative or positive manner as they weigh up the risks and benefits according to different courses of action. Recent research with 868 11–13-year-olds in the UK found that bystanders reported they would provide emotional support to the victim and intervene to address the bully when they evaluated the incident to be severe, characterised by the intensity of the bullying, frequency of the victimisation, and extent the victim was upset (Macaulay et al., 2019).

As with other aspects of bullying, the moderating role of gender on bystander intervention is inconsistent. Boys showed more positive defending behaviour in some studies (Caravita et al., 2009), but in others, girls did so (Gini et al., 2008; Macaulay et al., 2019). It has been suggested that girls view bullying in general as more serious than boys and so may feel a greater responsibility to do something positive to help the victim (Maunder et al., 2010; Molluzzo & Lawler, 2012). Nevertheless, so many young people are often reluctant to intervene, leading Salmivalli et al., (1996, p. 117) to suggest that “bystanders were trapped in a social dilemma”. It would appear that many young people recognise bullying as inappropriate but are afraid to intervene to support victims because of the perceived impact on their social status and safety (Boulton, 2013). Rock and Baird (2012) also suggested that students do not know what to do, and that teaching them safe intervention strategies would be helpful. Indeed, a recent meta-analysis showed that such bystander-targeted interventions led to more positive bystander behaviour such as supporting victims and dissuading perpetrators (Polanin et al., 2012). However, effect sizes (ESs) are not always large (e.g. Kärnä et al., 2011a, 2011b) and so different approaches to mobilising bystanders need to be developed and tested, including changing underlying beliefs about how to do it and why it is a helpful thing to do.

The Cross-age Teaching Zone Intervention

Co-operative group work (CGW) has been shown to assist students’ learning in academic (Baines et al., 2007; Veldman et al., 2020) and social/behavioural domains (Blatchford et al., 2006; Cowie et al., 1994), including anti-bullying learning (Boulton et al., 2016; Ttofi & Farrington, 2011). Similarly, cross-age teaching (CAT) approaches have been shown to benefit tutors’ academic development (Karcher, 2009; McDaniel & Besnoy, 2019; Robinson et al., 2005; Top** et al., 2011) and social/behavioural development (Robinson et al., 2005; Watts et al., 2019). For example, Watts et al. reported from a synthesis of the literature that the use of CAT approaches provides an effective platform in promoting academic, social, and behavioural skills for young people. Given these positive but separate results for CGW and CAT across such a wide variety of domains and variables, the first author developed an approach that combined them to target social outcomes, referred to here as the cross-age teaching zone (CATZ), and in this paper, we consider if it can be used to promote anti-bullying beliefs among students.

There are good theoretical and empirical reasons why a focus on the effect of CATZ on tutors rather than tutees is appropriate. Working with the lesson content (LC) facilitates cognitive restructuring and elaboration as it is incorporated into existing schemas (O’Donnell & O’Kelly, 1994; Slavin, 1996; Thurston et al., 2007; Top** & Ehly, 1998), which suggest that CATZ provides opportunities for learning as tutors work with the LC, make links with knowledge they already hold, and go on to develop more advanced cognitive structures and schemas. Applying Vygotsky’s sociocultural theory (1978), the scaffolding of this type of learning provided by the adult facilitators of CATZ interventions, and the fact that tutors are required to re-work the LC into a viable lesson, means that tutors are in the zone of proximal development, that is, just outside what they can do or know unaided. In our case, that means CATZ tutors will likely be “thinking about bullying” in novel ways. The fact that CATZ tutors are working co-operatively to develop and deliver their lesson further optimises the likelihood that they will learn the LC. Slavin (1996) argued that such co-operative activities provide “implicit” reward and incentive structures, and so CATZ tutors are likely to see that they have a responsibility to their group that can be met if they themselves master/learn the LC. Role theory also suggests that acting as a teacher promotes that feeling of responsibility even more because it engenders a sense of care towards tutees (Biddle, 1986; Robinson et al., 2005). This implies that the tutors teaching the material to younger students (i.e. the tutees) would take their role seriously, hence facilitating an effective learning environment for both the tutors and tutees.

Moreover, it is now apparent that intervention approaches that only indirectly and subtly “challenge and change” existing thought patterns can be effective (Longmore & Worrell, 2007). This is especially important given that many students are resistant to direct attempts to change their bullying-related beliefs and actions (Boulton & Boulton, 2012). Thus, having CATZ tutors work on material about bullying in a general sense to help their tutees, in the absence of direct attempts to change their beliefs, could be sufficient for them to “take ownership” of this new information about bullying and internalise it. While attempts to stop students engaging in bullying are clearly warranted, it is now clear that this is very difficult to bring about, and researchers are beginning to explore interventions that target underlying beliefs, attitudes, and knowledge (Evans et al., 2014; Gaffney et al., 2019; Kärnä et al., 2011a, 2011b). It is important to do this among the wider community of school students and not just those who currently act as bullies (Salmivalli, 2010). Partly for this reason, and partly because interventions that have the most positive effects on the most students are more likely to be taken up by school staff, we are currently looking at the effect of CATZ on tutees’ attitudes/beliefs, feelings, and behaviour. The psychological mechanisms that might be responsible for any such positive effects are likely to be different to those outlined above for tutors, and so we will report them in detail in the near future.

In terms of anti-bullying practices, the acceptability of community interventions with students in schools is important because it can influence levels of engagement, treatment integrity, and ultimately treatment outcome (Cowan & Sheridan, 2003; Cunningham et al., 2016; Kazdin, 1980). Reports of the social validity of anti-bullying interventions are rare, and no study has so far reported it for CATZ with its co-operative and cross-age teaching characteristics. Considerable evidence suggests that girls are more open to acting as “agents of change” across a range of peer-led interventions (Boulton, 2005; Cowie et al., 2002), plausibly because acting in a somewhat formal hel** capacity may compromise boys’ sense of masculinity or macho self-image (Cowie, 2000; Petersen & Rigby, 1999).

Summarising the above, there is a clear need for student-led community-based interventions that help promote anti-bullying beliefs. As we have seen, there are good empirical and theoretical reasons to expect that CATZ may engender these beliefs among tutors. Our primary aim, therefore, was to conduct three linked studies that examined the effect of CATZ on beliefs that (i) non-physical forms of bullying are unacceptable (study 1), (ii) disclosing bullying to adults and getting the right kind of help have value and importance (study 2), and (iii) victims can be assisted in safe ways (study 3). Each intervention was delivered by different researchers in semi-autonomous ways. Given that the magnitude of positive effects of interventions delivered by their creators in one context are often attenuated when they are delivered by other people in another context (see Yeager & Walton, 2011), this would allow us to assess the likelihood that CATZ could be rolled out more widely by diverse groups of facilitators.

Related to the above, our second aim was to test if the effect of CATZ differed as a function of gender. As noted above, the literature has reported some, but not always consistent, gender differences in the kinds of beliefs that we measured and also in how they respond to peer-led interventions.

Finally, our third aim, also with relevance to the implications of our findings for anti-bullying practice, we assessed the social validity of CATZ by examining how acceptable it was to participants. Simply put, to be optimally effective, student-led interventions have to be well-received by students themselves.

Method

Participants, Measures, Design, and Data Collection

Participants (N = 419) were drawn from five junior schools in the UK. Students’ consent, and that of parents or head teacher in their loco parentis role, was solicited, and the response rate was 91% for taking part in our data collection procedures and 100% for taking part as a CATZ tutor. To explain the discrepancy, some parents requested that their children not be presented with questionnaires but none requested that their children did not take part in CATZ if the rest of their classmates would be doing so. At the request of the schools, and partly for logistical reasons, randomisation was at the class level such that a whole class was either randomly assigned to be CATZ or control. In each school, there were at least two classes and at least one class was a CATZ class and at least one class was a control class. In none of the participating schools were the classes streamed, meaning that the children in different classes were likely to be similar to each other. Overall, across the three studies, 237 and 182 students were randomly assigned in this way to act as CATZ tutors or controls, respectively. Teachers asked us to work with the oldest students in the school (mean age = 11.5 years in UK) because they deemed them most appropriate to deliver anti-bullying learning to their school mates.

In each study, data were collected on a whole class basis. Participants received a questionnaire, and a researcher read out instructions followed by each question. To encourage considered responses, students were informed, “This is not a test so there are no right or wrong answers. We just want to know what each of you think and so there is no need to copy what somebody else has put. Is that OK?” Pre-intervention (T1) data were collected immediately prior to the implementation of the intervention/control experience, and post-test (T2) data were collected about a week after it ended. Follow-up (T3) data were also collected five weeks later in study 2.

Table 1 Mean (and standard deviation) scores of the individual studies, results of t-tests to compare CATZ and controls at each time, and condition × time interaction effects

In each of the three studies, different outcome measures were employed (italicised below, nine in total) and different students in different schools took part. Details of each study are provided next, and then, the general approach used to deliver CATZ in all three studies will be described.

Study 1

Ninety-nine participants took part, 55 in the CATZ group (27 girls and 28 boys) and 44 acted as business as usual controls continuing with their normal lessons (28 girls and 16 boys). The four dependent variables were beliefs about non-physical forms of bullying, specifically Harmful Exclusion measured with the closed question “How harmful do you think social exclusion is?”, Harmful Verbal measured with the closed question “How harmful do you think verbal bullying is?”, Acceptable Exclusion measured with the closed question “How acceptable do you think social exclusion is?”, and Acceptable Verbal measured with the closed question “How acceptable do you think verbal bullying is?” Each of these four questions had a 5-point response option initially anchored with “not at all” and “a lot”, and they were subsequently scored from one to five so that low values were more desirable (i.e. participants saw the behaviour as less acceptable and more harmful).

A sub-set of 35 non-CATZ participants from the two control classes who were present at the time was asked these questions again one week later to assess test–retest reliability. It was good (all p < 0.001) for Harmful Exclusion (r = 0.59), Harmful Verbal (r = 0.75), Acceptable Exclusion (r = 0.72), and Acceptable Verbal (r = 0.44).

Study 2

This study involved 197 participants, 106 CATZ (58 girls and 48 boys), and 91 business as usual controls continuing with their normal lessons (50 girls and 41 boys). The two dependent variables were beliefs concerning getting help when one is bullied, specifically When to Tell measured with the open question “If you were bullied, how would you know when it would be a good idea to tell a teacher?”, and Wanted Help measured with the open question “If you had been bullied and told a teacher, what could you do to help make sure you get the right kind of help?” For each open question, a researcher developed a coding scheme to identify common categories of responses, and two independent raters then used it to code all of the responses collected. High levels of inter-coder agreement were obtained, Cohen’s kappa ≥ 0.89. Most cases of disagreement were resolved by a discussion among coders and in a few instances by a third coder. Once coded, the number of responses that represented “desirable and appropriate” knowledge was tallied for each participant and this value was used in the statistical analyses. For example, two examples of such responses for Wanted Help were “I could tell the teacher what I wanted them to do to help me” and “I would ask the teacher not to tell the bully that I had reported them”.

A sub-set of (non-CATZ) participants was asked these two questions again one week later to assess test–retest reliability. Using the number of responses that indicated desirable and appropriate knowledge, reliability was good (p < 0.001) for Wanted Help (r = 0.66) and When to Tell (r = 0.70).

Study 3

There were 123 participants in study 3, 76 CATZ (39 girls and 37 boys), and 47 controls (18 girls and 29 boys). Unlike in studies 1 and 2, researchers delivered a 40-min presentation to control participants focused on the material that CATZ participants had been asked to include in their lessons (i.e. the CATZ LC). This allowed us to examine the effectiveness of CATZ against a form of direct instruction, something that is deemed another appropriate way — alongside a business as usual control group — to test an educational intervention (Yeager & Walton, 2011).

The three dependent variables were beliefs about supporting victims, specifically Victim Support — Emotional, Victim Support — Address Bully, and Victim Support — Other. The three DVs were derived from three open questions, “If you saw another child being bullied, what would you do?”, “How could you try to stop a bully being nasty to someone without making them pick on you?”, and “How could you help someone if they were bullied?” The coding procedure was similar to that employed in study 2. The number of “desirable and appropriate” responses across these three questions was tallied for each DV. Example responses meeting this criterion were “I would try to help the person feel better” (Victim Support — Emotional), “I would tell the bullying to stop doing it” (Victim Support — Address Bully), and “I would go and tell a teacher” (Victim Support — Other).

A sub-set of (non-CATZ) participants was asked these questions again one week later to assess test–retest reliability. Using the number of responses that indicated desirable and appropriate knowledge, reliability was good (all p < 0.001) for Victim Support — Emotional (r = 0.52), Victim Support — Address Bully (r = 0.66), and Victim Support — Other r = 0.67.

Social validity was assessed among a sub-set of CATZ tutors selected from across the three studies (N = 188), about a week after they had delivered their CATZ lesson, with four items: “How much would you like to design and give another CATZ lesson on something else about bullying?”, “How much do you think designing and giving a CATZ lesson is a good way to help students your age learn about bullying?”, “How much do you think CATZ is a good way to teach younger students about bullying?”, and “How much do you think other students of your age in other schools would like to try CATZ?”. They tap key aspects of acceptability, notably willingness to take part and perceived value (Elliott, 1986). A 1 (“not at all”) to 10 (“a lot”) response scale was employed. Cronbach’s alpha was 0.88, and so a mean Social Validity score was computed across the four items. High scores indicate high social validity. A sub-set of participants (n = 74) were asked these questions again one week later to assess test–retest reliability and this was found to be high, r = 0.89, p < 0.001.

The CATZ Interventions and Training CATZ Tutors

CATZ was developed by the first author. The first author developed a semi-standardised protocol on the basis of considerable pilot work and previous evaluation studies of CATZ and used this to train co-authors to deliver CATZ in the three studies (“Facilitators” henceforward). The aim was to ensure relative — but not necessarily homogeneous — consistency in the way CATZ was delivered to students by Facilitators across the three studies, as this would mimic the kind of variation likely to result if CATZ was to be implemented more widely and delivered by teachers themselves in school communities (Boulton, 2014; Cowie et al., 1994). This was also made likely because each co-author of this paper acted as facilitator in only one of the three studies reported here.

Facilitators encouraged “buy-in” by explaining to CATZ tutors that taking part was voluntary, they could stop at any time (and re-join) without giving a reason, and they were being invited to work in small groups of about five students to design a (roughly) 30-min anti-bullying lesson and to deliver it to a small group of students who were two years younger than themselves. Facilitators stressed that this was an important task because the LC could help the younger students learn important things. Facilitators emphasised that, notwithstanding their responsibility to “educate” the younger students, tutors might actually enjoy taking part and themselves learn useful things. Indeed, given that students are often resistant to adult-implemented initiatives to tackle bullying partly because they are perceived as “boring” (Boulton & Bouloton, 2012), facilitators were asked to engender a sense of fun and ownership of the lesson among tutors that complemented their sense of responsibility. Tutors were informed that facilitators would provide them with the required lesson content (LC), offer suggestions about how to plan, test and deliver a lesson on it, but that the details would be left to them and they could augment that content. Facilitators aimed to strike a balance between being suitably supportive on one hand and leaving tutors to take ownership of their lesson on the other. While the final decision on the lesson itself was left to each group of tutors, facilitators ensured that as a minimum, they all designed a poster that contained the LC and prepared a script of what was to be said and done by each group member during their lesson. This ensured that the LC was addressed in each of the groups’ lesson. Importantly, at no point did facilitators state or even imply that they “wanted” the tutors to learn this information or that tutors needed to change their bullying-related knowledge or behaviour. Rather, tutors were reminded that this was the information they would help the younger tutees learn via CATZ.

Across the three studies, CATZ tutors received similar guidance from facilitators and had similar time — about four 60-min sessions — to prepare their lesson, spread over 2–3 weeks. Then, within a few days, they delivered their lesson. Facilitators and class teachers observed these but did not take an active role. Facilitator reports confirmed the integrity, and hence relative consistency, with which they delivered CATZ to tutors, and in how CATZ tutors delivered their lesson to tutees. All groups of tutors were seen to take the task seriously and were judged to have done a good job.

Results

Plan of Analysis

Because CATZ tutors were nested into groups, multilevel modelling was considered. However, we could not use these analyses because there was some changing of group membership that violates the assumption of independence across the different groups (Tabachnick & Fidell, 2014). Hence, to determine if CATZ did or did not have statistically significant effects on each variable, and to assess if gender was a moderator, a 2 (Condition) × 2 (or 3 where appropriate) (Time, repeated measures) × 2 (Gender) mixed analysis of variance (ANOVA) test was employed, and post hoc tests were used to identify sub-group differences, with Bonferroni corrections to control for family-wise inflation of type I errors.

Unlike tests of statistical significance, measures of effect size (ES) provide information about the practical significance of findings and the relative size of an experimental/intervention effect and allow comparisons across studies and interventions (Thalheimer & Cook, 2002). Partial eta squared (η2) was used as the index of ES in the ANOVAs, but we also calculated the much more widely used (and possibly understood) Cohen’s d, with 95% confidence intervals. Cohen (1988) suggested effect sizes 0.20 to.49 be deemed small, those between 0.50 and 0.79 deemed medium, and those ≥ 0.80 deemed large. We also report the common language ES (McGraw & Wong, 1992) because it expresses an ES in an easy-to-understand format of a percentage. The latter represents the probability that any randomly selected person from one group will have a higher (or lower) value than any person selected at random from the other group after the intervention (Grissom & Kim, 2005).

Initial Equivalence of the CATZ and Control Groups

Independent group t-tests confirmed that on seven out of the nine outcome measures, CATZ and control groups did not differ at T1 (t’s < 2.0, all p > 0.05). We now describe the two exceptions. For Acceptable Verbal, the CATZ group initially had significantly less desirable scores than controls (means = 1.53 and 1.20, respectively, t (97) = 2.93, p = 0.004). Because there was more scope for CATZ participants to change in a desirable direction, this means it would be easier to detect a positive effect of CATZ on this measure. However, the CATZ group initially had significantly more desirable scores than controls on Victim Support — Emotional (means = 0.38 and 0.04, respectively, t (121) = 3.56, p = 0.001), and this means it would be more difficult here to detect a positive effect of CATZ. Thus, for all but one variable (Acceptable Verbal), there was scope for a “fair/conservative test” of the effects of CATZ.

Effects of CATZ on Individual Measures and Tests of Gender as a Moderator

The Condition x Time x Gender interaction was non-significant for all nine outcome variables, indicating that gender did not moderate any effects of CATZ. Hence, gender did not feature in any results we now go on to report concerning these nine outcome variables.

Descriptive data, summaries of Condition x Time interaction effects, and comparisons between CATZ and control participants at each assessment are given in Table 1. Across time comparisons (repeated measures t-tests) for CATZ participants are shown in Table 2. For all outcome measures, the Condition x Time interaction was significant. Using Cohen’s (1988) scheme, partial η2 ESs were low (< 0.06) for Acceptable Exclusion; medium (0.06 to 0.138) on four measures, Harmful Exclusion, Victim Support — Emotional, Victim Support — Address Bully, and Victim Support — Other; and high (> 0.138) on four measures, Harmful Verbal, Acceptable Verbal, Wanted Help, and When to Tell. With only one exception, involving Acceptable Exclusion at T2, CATZ participants had significantly more desirable scores than controls at T2 and at T3 on all variables. On none of the nine measures did control participants evidence a significant change in a positive direction from T1 to T2 or T1 to T3, but this was the case for all measures among the CATZ groups (Table 2). On the two study 2 measures with follow-up data (Wanted Help and When to Tell), T3 scores were significantly less desirable than at T2 among CATZ participants.

Table 2 Across time comparisons (repeated measures t-tests) for CATZ participants

Effect Sizes

Table 3 contains ESs for the nine outcome measures. In all cases, confidence intervals for Cohen’s d did not contain zero, mirroring the ANOVA results reported above that indicated an effect of CATZ. For T1 to T2 changes, values of d were between “low” (0.2) and “medium” (0.5) for Harmful Exclusion (0.44) and Acceptable Exclusion (0.49); between “medium” and “high” (0.8) for Victim Support — Other (0.64), Victim Support — Emotional (0.68), and Victim Support — Address Bully (0.79); and “high” plus for Harmful Verbal (0.83), Acceptable Verbal (0.95), When to Tell (1.51), and Wanted Help (1.57).

Table.3 Cohen’s d and common language effect sizes of individual studies

The common language ESs for the nine outcome measures indicated that there was between a 62% and an 87% (mean = 72.4%) probability that any randomly selected CATZ participant would have a more desirable T1 to T2 change score than any randomly selected control participant.

For T1 to T3 changes (study 2), values of d were “high” plus for Wanted Help (1.13) and When to Tell (1.14). The common language ESs indicated that there was a 79% probability that any randomly selected CATZ participant would have a more desirable T1 to T3 change score than any randomly selected control participant for both of these variables.

Social Validity

Overall, the mean social validity score was 8.6 on a 1–10 scale, with relatively little variability (standard deviation = 0.9), and 68.1% of participants scored 8 or above, a reasonable criterion for “high”. The minimum score was 6.25. Social validity did not differ significantly between girls and boys, t (186) = 0.03.

Discussion

The main aim of this study was to investigate the effect of the relatively new CATZ co-operative cross-age teaching intervention on diverse bullying-related beliefs variables measured in three separate studies. With very few exceptions, results indicated that CATZ did have a positive effect. On eight out of nine measures, not Acceptable Exclusion, CATZ participants had significantly more desirable scores than controls at T2, despite this not being the case at T1. Moreover, CATZ participants showed significant improvements from T1 to T2 on all measures, whereas controls did not change for the better on any measure. On the two measures with follow-up data in study 2, Wanted Help and When to Tell, the T3 scores were significantly more desirable than at T1 among CATZ participants, but did not change in that direction among controls.

Cohen’s d ESs for the effects of CATZ were between “low” and “medium” for Harmful Exclusion and Acceptable Exclusion; between “medium” and “high” for Victim Support — Other, Victim Support — Emotional, and Victim Support — Address Bully; and “high” plus for Harmful Verbal, Acceptable Verbal, When to Tell, and Wanted Help. The common language ESs for the nine outcome measures indicated that there was between a 62% and an 87% (mean = 72.4%) probability that any randomly selected CATZ participant would have a more desirable T1 to T2 change score than any randomly selected control participant. Summarising this aspect of our findings, the mostly medium or high ESs indicate that CATZ is likely to have noteworthy practical benefits for those who experience it.

The latter contention is endorsed by our other finding that the positive effect of CATZ was equally evident among girls and boys, and so the practical recommendations we offer below appear to be relevant to both. Positive effects of CATZ on boys are especially encouraging given that they have been found to score in a less desirable manner on a range of bullying related variables than girls; boys tend to disclose being bullied less than girls (Boulton, 2005; Hunter et al., 2004), are less likely to intervene in bullying and support the victim (Gini et al., 2008; Macaulay et al., 2019), and often see bullying as less serious (Maunder et al., 2010; Molluzzo & Lawler, 2012). We have shown that all of these beliefs can be addressed via CATZ and hence suggest that a wider implementation of the intervention would be beneficial within school. Benefits may extend beyond the students who hold more positive beliefs. For example, the covert nature of relational bullying introduces challenges for teachers to identify and support recipients of it and CATZ offers useful strategies that they can use to ask adults for help.

After experiencing CATZ, a sub-set of our participants (N = 188) were asked to rate it for acceptability and perceived value, and over two-thirds had very high scores, 8 or above on a 1–10 scale. Again, gender differences were not evident. It is encouraging that boys were as open as girls to engaging in CATZ, given that the former tend to be less enthusiastic towards other forms of peer support, broadly defined (Boulton, 2005; Cowie, 2000; Cowie et al., 2002; Peterson & Rigby, 1999). What it is about CATZ that appeals to students, especially boys, is worthy of study. Collectively, our findings of effectiveness and social validity provide strong support for CATZ as an anti-bullying intervention targeted at “improving” beliefs. Schools have many issues to deal with besides bullying, and “short but effective” interventions will likely be taken up more widely (Boulton, 2014). CATZ appears to meet this criterion, and future studies could explore how schools might incorporate CATZ into their wider anti-bullying efforts. Importantly, positive effects of CATZ were found across the three studies, each of which were delivered by different facilitators. This is encouraging as for a more widespread school-level implementation of CATZ; different teachers will be acting as facilitators for the CATZ intervention and providing the learning content for the tutors to rework into a viable lesson to be delivered to younger tutees. One important aspect of any intervention implementation is the fiscal stability of resources needed for an effective outcome (Forman et al., 2009). Furthermore, to avoid replication failures, the training of implementers to run an intervention scheme needs to be feasible (Kumpfer et al., 2020). A helpful thing about CATZ is that teachers could easily be trained to act as facilitators for the CATZ sessions, which would in turn reduce the costs involved running the intervention. As schools have many different budgetary constraints which impact their decision to participate in an intervention scheme (Boulton, 2014), CATZ offers them a cost-effective way to enhance male and female students’ anti-bullying beliefs.

An important caveat for any enthusiasm for CATZ is that our follow-up data in study 2 showed that some of the gains were lost from T2 to T3. Nevertheless, T3 scores among CATZ participants were still significantly more desirable than T1 scores on the two relevant variables. Whether these “losses” can be eliminated with more CATZ experiences is a worthy issue for future studies. That this is a realistic possibility is suggested by research on memory and consolidation of learning which highlights the benefits of “extra” time and experience with learning material (McGaugh, 2000).

Cross-national research has shown variations in bullying-related variables (Boulton et al., 1999; Menesini et al., 1997) that could influence how receptive children are to CATZ and how much of a dose might be required. Researchers are beginning to test if CATZ can have similarly positive effects on students in different countries, and though early results are encouraging (Marx, 2018), more studies are clearly warranted. The same is true for age since we only studied upper primary school age students here. While CATZ has been shown to improve bullying-related beliefs among high school students (Boulton & Boulton, 2017), researchers would do well to examine the effectiveness and social validity as a function of age.

In evaluating this work, our three studies met four out of six criteria identified as important in intervention evaluations by Durlak et al. (1991); sample size exceeded 30 in each group, random assignment (at the class level) and intent-to-treat design were employed, and all pre-test post-test comparisons are reported. However, no blinded outcomes were recorded in any study, and an attention-only control condition was employed only in studies 1 and 2. Future studies should strive to meet the latter two criteria, but they are difficult to achieve in practice; many of our participants made spontaneous comments about their experiences of CATZ during data collection that would compromise blind testing, and teachers were reluctant to “waste time” on an attention-only placebo. Indeed, most teachers only agreed to take part in our studies if there was a “proper” intervention. Wait-list control methods offer a solution, but they are more disruptive and require extra time that schools are often unable to provide. Our finding from study 3 that CATZ had positive effects on beliefs that were not evident among children who experienced a direct attempt to change those beliefs via a lesson may help convince more schools that CATZ is a “proper” intervention. It must also be noted that different variables were assessed in the three studies, and that some of the effects found are therefore based from a relatively small sample size.

While seeking to explain the oft-found discrepancy between attitudes and behaviour, Ajzen (1991) proposed the theory of planned behaviour which is the modified version of an earlier model, the theory of reasoned action (Ajzen & Fishbein, 1980). The theory has been employed by researchers largely to investigate the impact of motivational factors on intentions to act and behaviour per se. Theorists believe that actors’ behavioural intentions are the most immediate predictors of behaviour (Ajzen & Fishbein, 1980) and that attitude towards the behaviour, perceived subjective norm, and perceived control over the behaviour also play a role (Ajzen, 1991). While a strong case can be made for studying the kinds of beliefs/knowledge variables included in our three studies, not least because they are thought to influence actual bullying-related behaviour (Boulton et al., 2002; Boulton et al., 2001; Boulton et al., 2002; Salmivalli & Voeten, 2004), the fact that we did not assess the effects of CATZ on any behavioural measure per se can be considered a limitation. As such, future research should endeavour to explore the components of theory of planned behaviour in the context of the CATZ intervention. For example, in the context of bystander behaviour, if bystanders do not know what do to when they witness bullying, CATZ can be used to promote knowledge on what strategies they can use. In other words, if we can promote positive attitudes on acting as a positive bystander, via CATZ, we can work to promote intentions to act the behaviour to combat bullying. In addition, interesting questions about the role of changes in the types of cognitions we studied as mediators of the effects of CATZ on actual behaviour arise out of our work and provide fruitful avenues to address in the future.

Our study is also limited by its focus on what might be described as rather “unintentional” beliefs about bullying as opposed to “intentional” beliefs more directly related to perpetrating bullying and/or intervening in a supportive way, such as “I believe I have a duty to help someone being bullied and I will do so if I see it taking place”. Hence, future studies would do well to extend the range of beliefs examined that might improve following experience of CATZ.

That our dependent variables were (i) all single items, (ii) not from established measures, and (iii) high in social validity mean that it is possible that some of our results could be attributable to social desirability effects. Future studies would do well to address these limitations. However, that it would be a mistake to dismiss the entire set of results reported here as mere artifacts of social desirability is suggested by the facts that scores were not uniformly high at T1 and that at T2 and T3 scores improved among CATZ but not control participants.

Another aspect of our study that could limit the extent to which our results can be generalised is the fact that the CATZ facilitators, although different in each of the three studies, were nevertheless all trained by the same person. Hence, it remains possible that some of the positive effects found could be attributable to them and how they worked with the children rather than to CATZ itself. Future studies that involved diverse facilitators trained by other people are clearly warranted to rule out this possibility and at the very least the current study provides an empirical rationale for such a body of work.

In sum, with very few exceptions, the evidence from three separate studies suggests that CATZ can help students acquire important and diverse bullying-related knowledge. Unlike some other teacher-led interventions, students appear very receptive to it. While we need more research to understand how its effects may be maximised, and how effective it might be with other groups and with other facets of bullying-related beliefs, our findings support the wider take-up of CATZ as part of schools’ efforts to tackle the pervasive problem of bullying.