1 Introduction

The intensification of agricultural practices from the twentieth century onwards is partly responsible for the reduction of areas covered by grassland ecosystems, notably with a considerable loss of semi-natural grasslands and a decrease of their biodiversity in different regions worldwide (e.g., Fakarayi et al. 2015; Münch et al. 2017; Schirpke et al. 2017; Gibson et al. 2018). Plant species’ richness (Fig. 1) was also observed to decrease with the current warming trend (White et al. 2014), especially in climates that are becoming more arid and less productive (Harrison et al. 2015). These changes affect the human population broadly and may actually have a great socio-economic impact (Dunford et al. 2015). In fact, not only grassland replacement and biodiversity erosion alter the continuity of the forage production supporting livestock agriculture but also the delivery of a broad set of ecosystem services essential to society (Loreau 2010; Bengtsson et al. 2019) like carbon storage, pollination and the maintenance of the general aesthetic of landscapes (e.g., Oertel et al. 2016; Tribot et al. 2018). These services are related to the plant diversity of grasslands (Turnbull et al. 2016), whose high biodiversity is not only consisting of plants, but also of mammals, arthropods and microorganisms (Plantureaux et al. 2005; Baur et al. 2006; van Klink et al. 2015). This biodiversity is recognized as an ecological and evolutionary insurance (after Yachi and Loreau 1999) thanks to the stabilizing effect of species diversity on aggregate ecosystem properties through fluctuations of component species (e.g., phenotypic changes, Norberg et al. 2001). Different components of plant diversity (e.g., species richness, functional diversity, assemblage structures) would also make grasslands more resilient to hazards and extreme weather events (such as prolonged droughts, e.g., Vogel et al. 2012; Craven et al. 2016) and would be able to stabilize forage production and maintain overall ecosystem services (Cleland 2011). It is thus essential to preserve these open spaces in order to preserve their biodiversity and the associated services, but also to study them to better appreciate their evolution under different constraints (Zeller et al. 2017). In hay meadows, which typically occur where the environmental constraints are less important compared to high-elevation pastures, the management practices and their intensity tend to be the main drivers of plant diversity (Pittarello et al. 2020), whose changes reflect the evolution of both environmental conditions (pedo-climate) and management practices (Pontes et al. 2015). While the effects of increased temperature on grassland production are systematically studied and understood (e.g., Parton et al. 1995; Song et al. 2019), the effects of warming on plant diversity is an evolving and multifaceted challenge (Cowles et al. 2018). This is because temperature changes are dynamic and their effects on grassland communities depend on a number of other factors like moisture and nutrient availability (e.g., Zavaleta et al. 2003). Likewise, the effects of cutting events on the botanical composition of a sward are related to environmental conditions (e.g., Wen and Jiang 2005).

Fig. 1
figure 1

Wildflowers and grasses in a species-rich grassland. Photograph by Célia Pouget (PhD student at Université Clermont Auvergne, INRAE, VetAgro Sup, UREP, Clermont-Ferrand, France in 2018–2021).

Studies that have reported the response of grassland plant diversity to climate and management conditions (e.g., Su et al. 2019) indicate that the pattern of responses is complex and needs additional analyses based on quantitative assessments. An objective assessment is increasingly important as grasslands continue being vulnerable to warming conditions (e.g., Gao et al. 2018), and halting grassland abandonment is an emerging topic of interest (e.g., Lasanta et al. 2017), especially in mountain regions (Haddaway et al. 2014). The proportion of grassland plant species tends to decline following abandonment (Riedener et al. 2014) and plant species decline due to abandonment could not easily be reversed (and grasslands restored) by mowing alone (Stampfli and Zeiter 1999). However, the variability in the reported results is also likely due to the different challenges associated with the quantification of impacts on plant diversity. In particular, there is no standardized mode of conducting the experimental design and setup of control versus the experimental dataset (Christie et al. 2019). In the wake of diverse findings and conclusions, and because of the availability of an increasing number of peer-reviewed publications as well as the maturity of the results, there are science questions relevant to the issue of plant diversity modifications, e.g., is warming or mowing modifying species richness and, if yes, by what amount and under which conditions? We performed two meta-analyses using species richness as an indicator of plant diversity conditions. In fact, despite the growing knowledge about grassland modifications induced by temperature increase and mowing regime, quantitative assessments and analyses are still limited (e.g., Tӓlle et al. 2016; Gruner et al. 2017). Here, we provide a conceptual framework (Fig. 2) of the direct and indirect effects of mowing (one cut per year versus abandonment) and climate change (warming) on the grassland ecosystem (after Li et al. 2018), using harvested biomass and species richness as expressions of functioning and stability (e.g., species richness can promote community stability through increases in asynchronous dynamics across species; Zhang et al. 2018). We highlight that the type of inference presented in Fig. 2 (which represents a simplified view of the grassland ecosystem) depends on the extent to which the meta-analysis can establish causality between the outcomes of interest and the hypothesized related factors. This means that for only a subset of the above questions, it may be possible to find consistency in the set of bibliographic data to code into the state-of-the art literature and develop meta-analyses of the extracted data. Specific objectives were to analyse (1) the mean effect of mowing (first meta-analysis: one mowing event per year versus abandonment) and (2) the mean effect of warming (second meta-analysis: warming versus ambient temperature), both conducted on species richness in grasslands (and concomitant harvested biomass when available). In this way, we have pursued standardized meta-analyses to review fragmented results in a common framework. For the impact of mowing on plant diversity worldwide, our study complements previous reports from Tӓlle et al. (2018) on the effects of different mowing frequency on the conservation value of semi-natural grasslands in Europe. It also completes the assessment with a meta-analysis on the effect of warming on the biodiversity of different ecosystems including plant terrestrial ecosystems (Gruner et al. 2017).

Fig. 2
figure 2

Conceptual framework of this study. Direct (blue arrows) and indirect (red arrows) effects of climate change (i.e., warming) and management (i.e., mowing) jointly determine (coloured hatched line) the functioning (expressed by harvested biomass) and stability (expressed by species diversity) of grassland ecosystems, as mediated by plant growth and community properties.

2 Materials and methods

2.1 Literature search method

Our meta-analysis method quantitatively combines and summarizes research results across individual and independent studies performed worldwide and published in peer-review journals (grey literature was not included in our meta-analyses). The first step was to find all the pertinent articles on the topic. We used a keyword search and expert recommendations to find the related articles in two international bibliographic databases. The literature search was initiated using the ISI Web of Science (WoS, (http://apps.webofknowledge.com) with the following topic search terms:

  • (Title)TI=(grassland OR meadow OR pasture OR pampa OR steppe OR prairie OR savanna OR tundra)). AND (Topic)TS=(diversity OR diverse OR richness OR evenness OR cover OR abundance AND plant OR "functional type*")). AND (Title)TI=(cut OR mow OR clip OR treatment OR management)). NOT (Title)TI=(forest OR tree OR shrub*))

  • TI=(temperature* OR warm* OR air OR heat* OR stress* OR "extreme temperature") AND TI=(grassland* OR meadow* OR pasture* OR pampa* OR steppe* OR prairie* OR savanna* OR tundra*) NOT TI=(forest* OR tree* OR shrub*) AND TS=(diversity* OR diverse* OR richness OR evenness OR cover OR abundance* AND plant* OR "functional type*")

Searches were also undertaken with Scopus (http://www.scopus.com) in order to pick up publications that were not indexed in the WoS database:

  • TITLE (grassland OR meadow OR pasture OR pampa OR steppe OR prairie OR savanna OR tundra) AND TITLE-ABS-KEY (diversity OR diverse OR richness OR evenness OR cover OR abundance AND plant OR "functional type*") AND TITLE (cut OR mow OR clip OR treatment OR management) AND NOT TITLE (forest OR tree OR shrub*) AND LANGUAGE (English) AND DOCTYPE (ar).

  • (TITLE (temperature* OR warm* OR air OR heat* OR stress* OR "extreme temperature") AND TITLE (grassland* OR meadow* OR pasture* OR pampa* OR steppe* OR prairie* OR savanna* OR tundra*) AND NOT TITLE (forest* OR tree* OR shrub*) AND TITLE-ABS-KEY (diversity* OR diverse* OR richness OR evenness OR cover OR abundance* AND plant* OR "functional type*") AND LANGUAGE (english)) AND DOCTYPE (ar OR re) AND PUBYEAR > 1984 AND PUBYEAR <2021 AND (LIMIT-TO (SUBJAREA, "AGRI") OR LIMIT-TO (SUBJAREA, "ENVI") OR LIMIT-TO (LIMIT-TO (SUBJAREA, "MATE") OR LIMIT-TO (SUBJAREA, "EART") OR LIMIT-TO (SUBJAREA, "BIOC") OR LIMIT-TO (SUBJAREA, "MULT"))

This review covers articles published from 1985 to 2020. The cut-off date for data collection was 31 December 2019, which ensured including 2020 articles web published in 2019. We also added other pertinent articles from peer-review journals to the extent that we are aware of them. In particular, for the effect of warming, we used part of the bibliography of a meta-analysis made by Gruner et al. (2017).

2.2 Inclusion criteria and data extraction

Care was taken to standardize and document the process of data extraction. The quantitative review followed a structured protocol, which included pre-setting objectives and the inclusion criteria for studies, approach for data collection, and the analyses to be done (Pullin and Stewart 2006). To facilitate the capture, organization and elimination of duplicate records from electronic WoS and Scopus databases searching, bibliographic records were imported into EndNote reference manager (https://endnote.com) and outputted in BIBTeX format (Lorenzetti and Ghali 2013). Data extracted from articles were recorded on carefully designed spreadsheets and accompanying tables with details of the study characteristics, data quality, relevant outcomes, level of replication and variability measures.

Using PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses; Liberati et al. 2009) diagrams depicting the flow of information through the different phases of the literature review, we mapped out the number of records identified, included and excluded, and the reasons for exclusions. Any single article had at least one experiment with a case-control design. The control was defined as being identical to the experimental treatment (case) with regard to all variables apart from the type of factor applied. Here, the mowing and warming experiments included ambient temperature (no warming) and abandonment (no mowing) as controls, respectively. The articles from the literature search were filtered by title and abstract, discarding obviously irrelevant studies (e.g., when species richness referred to other organisms than grassland plant species). After the examination of abstracts, the full text of the remaining articles was examined in detail. Articles that quantitatively reported effects of mowing (or clip** or cutting) or warming on species richness (SR as species conservation metric) were selected. When available, concomitant harvested biomass (HB as provisioning service metric, g DM m-2) determinations were also considered in the analyses. Articles had to contain data in the form of experimental determinations together with a measure of variation (e.g., means and variance). Articles with unreported outcomes (e.g., no species richness available), ineligible experimental design (e.g., lack of control) and missing essential statistics (e.g., standard deviations or related variability metrics) were discarded. In our meta-analyses, experiments with and without fertilisation were pooled. We also took into account only the effect of mowing (one cut per year) even if there was a previous grazing period. Mowing once per year is the most commonly used mowing frequency in species-rich grasslands (e.g., Hejcman et al. 2013) and was used as a treatment in all included experiments regardless of the timing of the mowing event during the year. Articles comparing more frequent cuts during the same year were excluded from the meta-analysis, as these comparisons (often using once-a-year mowing, not abandonment, as control) were outside the scope of the present study, but their results were used as complementary elements to improve the discussion of our results.

For the articles that met the inclusion criteria, the sample size, mean and standard deviation (sd) of the response variables were extracted (or calculated where a variability measure other than sd was provided, e.g., standard error). With sample data collected at different dates, mean and sd were used as practical descriptors of time-series central tendency and spread. Critical appraisals were performed by two authors independently, i.e., the above data were extracted and ~ 50% of the extracted data were randomly cross-checked by another author. In case of disagreement on data extraction, a consensus was reached through discussion among all authors. As some studies had not reported the exact values for relevant variables and experimental design details, more than 10 disagreements on the most appropriate inference for these missing data were discussed within the team.

2.3 Effect sizes

The goal of any meta-analysis is to provide an outcome estimate (or overall effect size) that is representative of all study-level findings. Effect sizes were characterized by the response ratio (RR), which is frequently used to quantify the proportion of changes due to experimental manipulations and thus provide a measure of the experimental effects (Hedges et al. 1999; Nagakawa and Santos 2012). This is calculated as the ratio of the average values of a treatment (\({\overline{X}}_T\)) and its control (\({\overline{X}}_C\)). Then, log-response ratio (LRR) values, \({\ln}\left(\frac{{\overline{X}}_T}{{\overline{X}}_C}\right)\), are calculated as these are the size effects used in ecological meta-analyses, primarily because they tend to be normally distributed around zero for small samples. This means that a size effect with a value of zero represents no difference between the groups being compared (treatment vs. control). Meta-analysis, by pooling LRR values from several studies, also assigns a weight to each LRR that is inversely proportional to its sampling variance, equal to \({\operatorname{var}}(LRR)=\frac{{\left({sd}_T\right)}^2}{N_T{{\overline{X}}_T}^2}+\frac{{\left({sd}_C\right)}^2}{N_C{{\overline{X}}_C}^2}\), where sd and N are the standard deviation and sample size of \({\overline{X}}_T\) and \({\overline{X}}_C\), respectively (e.g., Lajeunesse 2011). The percent change (%) in the level of the outcome from baseline to the treatment is 100·[exp(LRR)-1].

2.4 Meta-analysis models

To perform the two meta-analyses, we referred to the set of dedicated functions of the metafor package (Viechtbauer 2010), implemented within the statistical software RStudio (https://www.rstudio.com) for R version 3.5.3 x64. Meta-analysis models determine if an effect (y) is significant or not in a given experiment (i). In mathematical form, this is expressed as yi = θi + ei, where θ and e indicate the unknown true effect and the known sampling error, respectively. Once effect sizes are extracted from the primary studies, they are pooled by applying a fixed- or random-effects model. A random-effects model was used in our meta-analysis, because the fixed-effects model assumes that there is only one underlying population effect size and that the observed effect sizes deviate from this population effect only because of sampling variation (an unrealistic scenario of heterogeneity among the population effect sizes). A random-effects model assumes that each study has its own population effect, i.e., effect sizes vary due to sampling variation and also due to systematic differences among studies. In this model, not only is the combined effect size estimated, but also the variance of the overall effect among studies. The mixed-effects model was also applied to explain heterogeneity in the data with the use of moderators (covariates). In this case, it is a challenge for the meta-analyst to find moderating variables (moderator) that explain the variation in effect sizes among studies. Mixed-effects analyses were only conducted if at least half of the studies reported information on moderators.

The Q-statistic (or multiple significance testing across means; weighted squared deviations) was used to evaluate heterogeneity through 0 < I2 < 100, which quantifies the proportion of total variability that is due to heterogeneity rather than sample variations: I2 > 75% means high heterogeneity; values between 50 and 75% are considered as moderate heterogeneity; if the I2 is between 25 and 50%, it is considered as low heterogeneity; below 25%, it is considered as no heterogeneity (e.g., Gianfredi et al. 2019). When p-values for the Q-test and effect sizes (random-effect model) were less than 0.1, homogeneity and no-effect assumptions were considered invalid. After quantifying variation among effect sizes beyond sampling variation (I2), we examined the effects of moderators (covariates) that might explain this additional variation. The significance of moderators was tested using the probability (P) of an omnibus test (i.e., the Qm statistic). For that, in addition to SR and HB determinations, we recorded information on moderators that may affect the response variables - from the k articles (kn) and experiments (j) for which this information was available. Plot size (S, m2), duration of the study (D, number of years), year of publication of articles (Y), site elevation (E, m a.s.l.) and two site-specific climatic variables (mean annual air temperature: T, °C; mean annual precipitation total: R, mm) were chosen as moderators of the SR and HB responses in the mixed-effects model. Year of publication of the studies can be a potential source of bias because changes in study methods and characteristics occurring over time can correlate to effect sizes (e.g., Jennions and Møller 2002). As well, since vegetation within smaller plots tends to be more homogeneous than within larger plots, plot size may influence the number of recorded species and the estimate of SR (Chytrý 2001). Some authors also found that as the duration of the study increased so did the plant species diversity and productivity (e.g., Cardinale et al. 2007; Pallett et al. 2016). Then, plant community composition can change along elevation gradients (e.g., Ohdo and Takahashi 2020), with global warming pushing species towards higher elevations (e.g., Engler et al. 2009), and temperature and rainfall are the climatic variables most used to explore the relationships between climate and plant community data (e.g., Harrison et al. 2020). For the effect of mowing, the cutting height (H, cm) was also considered because it can affect the community characteristics and biomass production (e.g. Wan et al. 2016). The temperature difference between control and warming treatments (ΔT, °C) was instead used as moderator in the meta-analysis of the effects of warming because divergent effects may be due to different warming treatments among experiments and different temperature sensitivities of various plant species (e.g., Llorens et al. 2004). As well, the heating technique (M, 0: open-top chambers; 1: heaters) used in vegetation warming experiments (a categorical moderator) may lead to potential differences in plant responses for: (i) the different control on temperatures of passive (e.g., open-top chambers) and active (e.g., infrared heating) warming methods (e.g., De Boeck and Nijs 2011), and (ii) the size of the device, open-top chambers used in field experiments being generally relatively small (i.e. ≤ 3 m in diameter), allowing the establishment of a well-controlled and essentially homogeneous environment (e.g., Cunningham et al. 2013).

2.5 Potential data analysis bias and results

Possible publication biases were tested, either visually by means of funnel plots, which show the observed effect sizes on the x-axis against a measure of precision (standard error) of the observed effect sizes on the y-axis, or statistically by means of the test for plot asymmetry (Egger et al. 1997). The results of meta-analysis were displayed in forest plots for each outcome, where individual experiments were plotted sequentially on the y-axis. The x-axis shows outcome measures (log-ratio and 0.95 confidence interval for each study). Point estimates are represented by square boxes, where the weight of a study is reflected by the size of the square. The point estimates are accompanied by a line, which represents their associated 0.95 confidence interval. A vertical midline (line-of-no-effect) divides the diagram into two parts. A confidence interval that crosses the line-of-no-effect indicates a statistically non-significant difference, whereas a confidence interval that does not cross the midline indicates a significant difference for either the treatment or control, depending on whether it is located at the left side or the right side of the midline. That is, right-sided (left-sided) result estimates (LRR > 0) for our two outcomes of interest, SR and HB, are higher (lower) in the treatment than in the control (and vice versa).

3 Results and discussion

3.1 Literature search

The heuristic search of the state-of-the art literature in the WoS and Scopus bibliometric databases yielded 999 articles for the effects of mowing (Fig. 3a) and 1793 articles for the effect of warming (Fig. 3b), after removing 467 and 411 duplicates from the original set of 1466 and 2204 records (with pairwise observations in the control and treatments), respectively. The two bulks of articles were reviewed, and initially screened, for their relevance to the study topic. After applying the criteria to the original set of articles and adding 31 articles from other sources, 43 and 34 articles (46 and 42 experiments, respectively) met the criteria and were selected to quantify the effects on SR of mowing or warming, respectively (Supplementary material). In 16 articles for mowing (18 experiments) and 17 articles (22 experiments) for warming the same analysis was performed to assess the effect of the same factors on HB.

Fig. 3
figure 3

PRISMA-flow diagram of studies’ selection process on the effect of mowing and warming on species richness (n, number of articles). Some articles included more than one experiment and, in this case, these experiments (j) were considered as separate experiments (j = 46 with mowing, j = 42 with warming). Subsets of the identified records also included the effect of mowing (16 articles, 18 experiments) or warming (17 articles, 22 experiments) on harvested biomass.

Table S1 and Table S2 show the characteristics of the articles included in the meta-analysis on the effects of mowing and warming, respectively. The current literature did not provide a robust sample of articles and quantitative results corresponding to different subclasses (e.g., abandonment versus management with two, three, etc. mowing events associated with fertiliser supply gradients; warming under gradients of atmospheric CO2 concentration and water status levels). The included studies report on grassland research conducted in 46 mowing experiments in 18 countries from Asia, Europe, North America and Oceania, and 42 warming experiments in nine countries from Asia, Europe and North America (Fig. 4). Using the Köppen-Geiger climate classification (Peel et al. 2007), our research shows an uneven geographical distribution of the selected studies for the effect of mowing (Table S1), with most articles focusing on temperate-oceanic (44%) and warm- or hot-summer continental (37%) climate zones of the northern hemisphere (with the exception of one study from the southern hemisphere in the temperature-oceanic climatic zone of Australia). Studies from cold (12%), Mediterranean (2%), and subtropical (5%) areas remain rare. Climate zones only in part reflect the distinctive characteristics of grassland systems, which varied widely in environmental conditions, mowing regimes and experimental settings. All recorded articles on the effect of warming document studies carried out on grasslands in the northern hemisphere (Table S2): 15 in China, 10 in the USA, and nine in central and northern Europe. Most of them (41%) are from regions with ice cap and tundra climates, showing that manipulation studies focusing on the effects of warming on grassland systems are not gaining interest in the Mediterranean and develo** regions of the world. They are all unfertilised treatments and include two main devices to simulate the experimental climate warming and to study plant responses, i.e., open-top chambers and infrared heaters. As with articles on the effects of mowing, the types and designs varied considerably also within the same study.

Fig. 4
figure 4

Global map of study sites that provided data for meta-analysis of the effects of mowing (red triangles) or warming (blue dots) on species richness only (empty markers) or on species richness and harvested biomass (solid markers).

3.2 Potential data analysis bias

The statistical distributions of LRR values were determined to be nearly normal according to quantile plots (Fig. S1).

For mowing, high heterogeneity was found with both SR (I2 = 92%; Q = 499, p < 0.01) and HB (I2 = 66%; Q = 55, p < 0.01) determinations. However, no evidence of publication bias was found in our meta-analysis for the effect of mowing on SR and HB that would reflect bias toward not reporting small positive or negative effect sizes, as demonstrated by the substantial symmetry of the funnel plots (SR: z = -1.06, p > 0.10; HB: z = -2.00, p = ~ 0.05). The points falling outside both funnels (Fig. S2, top graphs) are located on both sides of the funnel, hence indicating no clear-cut direction in the bias. For SR, Fig. S2 (left) shows that the majority of the data are clustered in one-point cloud (same order of magnitude), with the exception of the study of Lanta et al. (2009), whose high variability is found in the forest plot (Fig. 5a). For warming, significant results with both SR (I2 = 92%; Q = 815, p < 0.01) and HB (I2 = 55%; Q = 60, p < 0.01) are taken as evidence of heterogeneity. The overall funnel plots are however relatively symmetric (Fig. S2, bottom graphs) and consistent with low likelihood of publication bias (SR: z = 1.40, p > 0.10; HB: z = -0.94, p > 0.10).

Fig. 5
figure 5

Forest plots of the meta-analysis (log-response ratios and 0.95 confidence limits) comparing species richness, SR (a) and harvested biomass, HB, in g DM m-2 (b) in unmown (0, control) and once-a-year mown (1, treatment) grasslands, with the relative standard deviations (sd). RE model stands for random-effects model.

3.3 Effect of mowing on SR and HB

A forest plot for all 43 recorded articles combined (46 experiments) indicates a significantly positive effect of mowing (one cutting event per year) on SR compared to abandonment (Fig. 5a): pooled LRR = 0.28 (c. 32% increase), 0.95 confidence interval from 0.19 to 0.37 (p < 0.01). There are however three studies, which showed an opposite effect. This was distinctly observed in Finnish meadow patches (LRR = -0.50, Huhta and Rautio 1998), where an increase in SR due to a successional change may have only been apparent, plausibly related to short-term effects and creating (according to authors) the illusion that abandonment is more desirable than management. In fact, early succession was characterized by a transient loss of plant species diversity (Velbert et al. 2017) in wet meadows of north-west Germany (LRR = 0.22). While in the Qinghai-Tibetan plateau (Xu et al. 2015) SR was observed not to be sensitive to the short-term effects of mowing (LRR = -0.02), in a mesic hay meadow of Western Hungary (near the Slovenian border), Szépligeti et al. (2018) noted that mowing once a year may not be efficiently preventing (LRR = -0.15) the spread of tall goldenrod (Solidago gigantea Ait.) and control native competitive species (which hinder the growth of rare and less competitive species).

Without including the unmanaged option in their analysis, Tälle et al. (2018) observed small differences in the effects of different mowing intensities on the SR of European semi-natural grasslands (LRR < 0.13, with 0.1 representing the difference between a SR of two communities consisting of 10 and 11 species, respectively). The authors highlighted that while lower and higher mowing frequency can be expected to have both positive and negative effects on plant diversity at the same time, they concurred with other authors (e.g., Batáry et al. 2010; Tóth et al. 2018) that any kind of management which is actually applied tends to be more important than the intensity of the management itself. We show that the difference can indeed be high when moving from abandoned fields to once-a-year mowing. The highest estimated mean effect size of LRR = 1.53 (wet experiment from Truus and Puusild 2009), in particular, reflects the substantial decrease in SR on long-abandoned floodplain grasslands, which is likely a consequence of increased light competition and the accumulation of dense litter layers, as several low-growing plant species are outcompeted by strong competitors during succession or germination and establishment are inhibited by litter layers.

In addition to the results of our meta-analysis, some results by individual studies were also informative on the effect of alternative mowing schemes on SR. Figure 6 shows the changes in the SR of grassland plants under combinations of mowing frequency beyond one cut per year versus no cut, which were identified in the systematic review and in additional sources (section “References of the review on the effect of different mowing regimes”), and not included in the meta-analysis. Overall, it appears that a moderate mowing intensity of one or two cuts per year is positive for maintaining or enhancing a high plant SR. With two cuts per year over abandonment, we observe a similar mean response (LRR = 0.28) but greater variability across studies compared to just one cut (that is the core of this meta-analysis), likely associated with the influence of varying situations of soil fertility. In fact, the more a grassland is harvested, the more intensively it must be fertilised to remain productive. Significant interactions are often observed between cutting and fertilisation treatments, with maximum counts detected in unfertilized plots and the total number of plant species decreasing along with increasing fertilization rates (e.g., Hejcman et al. 2007). The addition of nitrogen, in particular, significantly influences the increase of plant biomass and height, leading to a decrease in species diversity (Tilman 1987; Silvertown et al. 2006). Then, the potential benefits of mowing on SR are progressively lost with more frequent cuts (i.e., three to four cutting events per year compared to one cut). It is known that regular disturbance by mowing can trigger niche partitioning, leading to higher species diversity (e.g., Mason et al. 2011), but too frequent harvests may threaten the long-term survival of certain plant species (e.g., Loydi et al. 2013) by suppressing their seed stock. Our results can be interpreted in terms of the humped-back model (Huston 1979), a dynamic equilibrium model predicting that taxonomic richness may be greatest at intermediate biomass production and at intermediate levels of available resources (stress) and disturbance factors (Pierce 2014). In fact, a hump-shaped relationship between vegetation biomass and SR, based on the balance between competition and abiotic stress, has been found in a large number of case studies (van Klink et al. 2017), and with SR likely peaking at intermediate productivity levels (Boch et al. 2019). Consistently with the pattern predicted by the intermediate disturbance hypothesis, SR may be maintained by extensive agricultural practices (Uchida and Ushimaru 2014). By alleviating understory light limitation through the removal of plant biomass, both mowers and grazers play an important role in maintaining plant diversity in grassland ecosystems, where they increase ground-level light availability (Borer et al. 2014).

Fig. 6
figure 6

Log-response ratios (LRR) and 0.95 confidence bars comparing species richness for different mowing regimes (number of cuts per year). The number of studies for these data is given in brackets (to the left). For LRR, the values of the mean and standard deviation are on the right-hand side of the plot. The refereance articles are listed in the section “References of the review on the effect of different mowing regimes.”

However, even if mowing frequency only marginally affecting plant diversity measures like SR might still affect the species composition in a grassland and, considering that mowing is costly, it is important to find a balance between mowing frequency and conservation benefits beyond SR (Tälle et al. 2018). The most suitable mowing frequency can be highly site-specific because the mechanisms linking mowing to conservation value are complex, and there is often no need or no resource for a second cut (beneficial for the feeding of herbivores), or weather conditions may make hay making difficult in autumn (Szépligeti et al. 2018). The level of detail of the present study, aiming at assessing the overall SR, does not allow to refer to the richness (and abundance) of plant species of nature conservation interest (which would be a more valuable indicator than the overall richness). Studying the effects of disturbances requires measures of species abundance, rather than just their presence, and an experimental approach to complete the understanding of the mechanisms involved (e.g., Debussche et al. 1996). For instance, it is possible that the abundance of each plant species decreases or the plant species turnover increases while the SR remains the same. In this case, different results would be expected when assessing biodiversity outcomes taking species abundance into account, e.g., Shannon diversity, which is calculated on the proportion of each species relative to the total number of species (Milberg et al. 2017).

A possibly important factor not taken into account in the present study is the timing of mowing during the year (either this information was not available for some included studies or too small subgroups would have been created by including this factor). In fact, the effects can be different depending on whether the harvest occurs early or late in the growing season. Early mowing can have negative effects on plant species with late seed-setting. In combination with more frequent harvesting this can affect the ability of species to re-grow back (e.g., Humbert et al. 2012). Then, the two American studies of the review (Dickson and Foster 2008; Foster et al. 2009) in which fertilisers were used during the study period, were also combined in the meta-analysis.

In the study of Lanta et al. (2009), the estimate of LRR = 0.14 was obtained with a wide confidence interval (from -1.58 to 1.86), likely due to the wide variation in the original dataset. We also note that five experiments showed effects that are about three- to five-fold higher (LRR from 0.71 to 1.53) than the average. Fenner and Palmer (1998) in Belgium (LRR = 0.92), and Jacquemyn et al. (2011) in United Kingdom (LRR = 0.71), noticed that several small herbs and rosette plants were quickly lost in abandoned plots, with mowing reducing the proportion of tall-growing plants and increasing light penetration to the ground surface. As Truus and Puusild (2009), with LRR = 1.53 (wet experiment), Metsoja et al. (2014) - LRR = 1.17 (tall forb meadow) - and Neuenkamp et al. (2013) – LRR = 1.14 (tall forb meadow) - observed that mowing had a distinct role in activating the soil seed bank in Estonian flooded, well drained meadows dominated by tall forb meadow communities. These are highly productive communities (e.g., ~ 1000 g m-2 in Neuenkamp et al. 2013), where plant SR is determined primarily by light and litter rather than nutrient availability.

Opposite to SR, over the 16 independent studies (18 experiments) for the effect of mowing on HB (Fig. 5b), the pooled LRR value equal to -0.23, or c. -21% (0.95 confidence interval from -0.31 to -0.14, p < 0.01) suggests an overall negative influence of disturbance. In the included studies, mowing (which had a positive effect on SR) distinctly had a negative effect on HB. Although this is undoubtedly a trade-off between a provisioning service (forage production) and biodiversity-mediated ecosystem services (e.g., pollination, pest control, soil fertility and yield stability), there are studies which indicate that vegetation density and biomass production may be reduced in unmanaged treatments because litter accumulated on the sward surface prevents plants sprouting (as observed, for instance, in Czech Republic by Pavlů et al. 2016). A stimulating effect of cutting on grassland productivity was also observed by Sasaki et al. (2011) in temperate Japan, which was attributed to the over-compensatory growth because of changes in floristic composition owing to the mowing treatment. There is indeed a body of literature (as reviewed, for instance, by Sonkoly et al. 2019) that shows that HB increases when SR increases, mainly from experiments with grasslands sown along gradients of a limited number of plant species compared to monocultures.

In the mixed-effects model, planned moderators were mostly not significant (p > 0.10). When a grassland is abandoned, changes in SR can be expected as a function of time since abandonment (vegetation succession; e.g., Tasser and Tappeiner 2002) but we could not confirm an effect of the duration of the experiment. Only the year of publication was a significant moderator (p < 0.05) of the effect of mowing on HB (k = 16, j = 18) when this covariate was assessed alone (with more negative LRR values observed in the oldest experiments, i.e., mean LRR of about -0.20 in 2010-2019 and -0.31 in 1993-2009). The covariate explained ~ 33% of the heterogeneity (Table S3) but the effect was not significant (P > 0.05) when different moderators were assessed together.

3.4 Effect of warming on SR and HB

A forest plot for all 34 recorded articles combined (42 experiments) indicates a significantly negative effect of warming (different treatments) on SR compared to control (Fig. 7a): pooled LRR = -0.14 (c. -13%), 0.95 confidence interval from -0.21 to -0.06 (p < 0.01). The decline in SR, observed here for an average temperature increase of 1.8 ± 0.9 °C (range: 0.15 to 4.10 °C), is consistent with the response of terrestrial ecosystems (-10.5% of SR) as observed by Gruner et al. (2017) for an average warming of 3 °C. It cannot be excluded that short-term simulation of warming, without considering temporal adaptation, has exacerbated the warming effect on terrestrial ecosystems (Leuzinger et al. 2011).

Fig. 7
figure 7

Forest plots of the meta-analysis (log-response ratios and 0.95 confidence limits) comparing species richness, SR (a) and harvested biomass, HB, in g DM m-2 (b) in ambient (C, control) and warmed (W, treatment) grasslands, with the relative standard deviations (sd). RE stands for random-effects model.

The results of the mixed-effects model showed that SR was somewhat significantly moderated by the year of publication (p ~ 0.05) when this moderator, which explained only ~ 9% of the heterogeneity (Table S3), was assessed alone (k = 34, j = 42). We note that more recently published studies were more numerous and yielded larger effect sizes, with an imbalance with only eight studies published prior to 2010 (giving an average LRR of -0.05). This could be due to the widespread use of small, low-cost open-top chambers (passive warming) in climate change experiments, especially on short-statured vegetation like grassland steppe and temperate grasslands (Frei et al. 2020). According to Leuzinger et al. (2011), a diminishing effect size is expected with a longer duration and a larger spatial scope of experiments. In light of this, we would have expected an influence of the experimental methodology on SR/HB responses since infrared heaters (active heating) can be applied to larger plots than open-top chambers. The three experiments of Wang et al. (2017) do indeed indicate that a smaller open-top chamber of different sizes could have an impact on the response to warming on both SR (which tends to became even more negative, LRR = -0.87, with a smaller chamber) and HB (which, conversely, tends to became more positive, LRR = 0.40, with a smaller chamber). The duration of experiments could also have had an influence on the grassland response to warming since SR changes slowly (e.g., Galvánek and Lepš 2008), but we have no confirmation of these effects in our study.

Site elevation (p < 0.05) and annual rainfall (p < 0.01) emerged as significant moderators when all moderators were included in the mixed-effects model (P < 0.05; k = 22, j = 30). The latter explained ~ 36% of the heterogeneity (Table S3). We note that smaller size effects of warming on SR (lesser plant diversity loss) tend to be associated with dry areas (< 300 mm precipitation per year, with LRR of about -0.01 on average). In fact, the response of SR to warming was observed to be stronger the lower the aridity (e.g. Peñuelas et al. 2007). Similarly, less negative LRR values (i.e., more limited decline in plant diversity) were found for grassland sites below 1000 m a.s.l. (about -0.06 on average), where SR is generally lower (e.g., Dengler et al. 2014). The more pronounced decline of SR in high-elevation grasslands may reflect that plant species that are adapted to cold areas tend to be more sensitive to warming. It can be assumed that the thermal niche of plant species may be narrower than at low altitudes, which considerably hinders adaptation/acclimation in the short-term (e.g., Löffler and Pape 2020). While changes in species cover and the composition of plant communities indicate an acceleration of the transformation towards more heat-demanding vegetation, this colonisation process could take place at a slower pace than the continued decrease in cryophilic species, thus favouring periods of accelerated species decline (Lamprecht et al. 2018).

Of the few experiments in which LRR > 0 (i.e., increased SR under higher temperatures), the one from Zhu et al. (2015), with LRR = 0.15, is consistent with the situation of a meadow steppe dominated by a perennial rhizome grass species – Leymus chinensis (Trin.) Tzvelev (Chinese rye grass) – which is the first to germinate each year. A higher accumulation of plant community biomass in the warmed plots leads to more plant litter, which suppresses the germination and regrowth of L. chinensis, reducing its dominance and allowing other species (annual forbs) to quickly colonize the plant community. In Eskelinen et al. (2017), warmer climate increased SR (LRR = 0.11) via recruitment in conditions where competition with the residents was relaxed (e.g., in disturbed sites), where herbivores kept vegetation open and in habitats with relatively low nutrient availability.

Over the 17 recorded articles (22 experiments) for the effect of warming on HB (Fig. 7b), the pooled LRR value equal to 0.10, or c. 11% increase (0.95 confidence interval from 0.04 to 0.17), suggests an overall positive influence of increasing temperatures (p < 0.05). This is in accordance with Song et al. (2015) also showed in an alpine meadow on the Qinghai-Tibetan Plateau (arid and cold area) how warming and mowing combined (a treatment not included in our meta-analysis) negatively affected HB (LRR = -0.08) and positively affected SR (LRR = 0.04), indicating the dominant role of management (which tends to favour SR and limit HB) over an environmental change (which, conversely, is supposed to favour HB and limit SR). Higher biomass production under warming conditions could explain the decline in SR, through competitive exclusion, for which all environmental conditions likely to favour high levels of HB could lead to a decline of SR. However, as warming increases evapotranspiration, greater drought conditions could dampen the biomass response, thus reducing competitive exclusion, which favours the stability of SR. In fact, the adverse effects of warming caused by increased heat stress and water scarcity could, in the long term, counteract the positive effect of mowing on SR, especially in temperate grasslands (e.g., De Boeck et al. 2008).

4 Conclusion

Using a meta-analytical methodology, we generated an integrated analysis of a large amount of observation data from different regions of the world, which better reflect the general patterns of grassland response than several fragmented studies performed so far. First, we found higher SR and lower HB in plots that were mown, suggesting the importance of management practices based on the application of disturbances such as prescribed mowing to enhance plant species diversity. Second (and opposite to the first result), we found that HB can be higher in plots that are exposed to higher temperatures while warming tends to decrease the number of plant species. The opposite responses of SR and HB to disturbances in the two meta-analyses suggest possible competitive exclusion mechanisms, which have not be investigated in this study. This is supported by the importance of site elevation (narrow thermal niche preventing plant species from adapting quickly at high altitudes) and annual rainfall (competitive exclusion in humid areas) in explaining the response of SR to warming. However, the present results of meta-analyses have some limitations. First, SR and HB are kinds of ecosystem response influenced by multiple factors and there are complex interactions between them that we have not considered due to lack of adequate data. Second, even if publication bias was substantially avoided, we have no access to unpublished researches or studies published in other language than English, which may have influenced our results. Despite some limitations, the present meta-analyses provide the latest evidence regarding the positive effect of moderate physical disturbance (i.e., limited mowing) on the creation and maintenance of highly diverse, ecologically and agriculturally valuable grasslands. In parallel to that, our results confirm the importance of considering plant species’ response to environmental stresses together with competition when predicting community dynamics under warming scenarios. Further quantitative analysis of these relations may contribute to improve grassland simulation models addressing the dynamics of plant diversity. Overall, we argue for long-term, two-factor warming and mowing experiments that incorporate both SR and HB assessment to guide discussions of how best to meet the relevant goal of improving our understanding of grassland responses to global changes.