Introduction

Metapopulations are dynamic groups of connected, unstable populations. These populations are unlikely to independently persist in the long-term without immigration of new individuals from other populations in the group (Levins 1969; Hanski 1998). Despite local extinction events and small population sizes, metapopulations persist due to population connectivity and asynchronous local dynamics (Hanski et al. 1995; Harding and McNamara 2002). Metapopulation dynamics influence microevolutionary processes, including gene flow and genetic drift, which in turn affect population genetic patterns of diversity and differentiation (Pannell and Charlesworth 2000). For example, metapopulations would be expected to exhibit high variation in genetic diversity among populations, generally small effective population sizes that vary among populations, and a high degree of genetic differentiation overall (Hastings and Harrison 1994; Pannell & Charles 2000; Walser and Haag 2012).

Despite the utility of population genetics in understanding metapopulation dynamics (e.g., Billerman et al. 2019) and even offering some advantages over traditional demographic approaches alone (Lamy et al. 2012), genetic approaches also have limitations. In a classic metapopulation sampled at multiple time points, genetic diversity and effective population size are expected to decrease following population turnover, or the replacement of individuals in a population, and increase with time since extinction (Pannell & Charlesworth 1999; Wang and Caballero 1999). The timing of sample collection has been shown to influence the perceived strength of genetic differentiation among populations (James et al. 2015) and the relationship of population differentiation to the surrounding landscape (Draheim et al. 2018). As a temporally dynamic system, metapopulation studies with pooled or single-year data may not capture temporal genetic variation (Fleishman et al. 2002). However, one-time sampling can be used to provide important “snapshots” of population genetic diversity and differentiation in metapopulations. For example, one-time genetic sampling of a flowering plant metapopulation provided genetic evidence of recent population bottlenecks, suggesting local extinction events recently occurred (Tero et al. 2003). Billerman et al. (2019) used a single sample period of a frog metapopulation to identify asynchronous extinction-recolonization dynamics by quantifying recent and historical genetic bottlenecks in comparison to rates of gene flow and connectivity. However, bottleneck tests alone can be misleading for characterizing extinction events (Peery et al. 2012), and temporally replicated samples may better inform extinction-recolonization dynamics. For example, repeated temporal genetic sampling revealed aestivation rather than extinction as identified by demographic sampling in some populations of freshwater snails (Lamy et al. 2012). Although population genetic and genomic approaches continue to decrease in cost and are increasingly affordable for non-model organisms (Meek and Larson 2019), temporally replicated sampling may remain cost-prohibitive for many studies or may not suit the timeline of funding opportunities and study objectives. Thus, understanding the extent to which one-time sampling may influence our understanding of metapopulation genetic differentiation and diversity can help inform the limits of interpretation and application of research outcomes.

Many pond- and wetland-breeding amphibians are assumed to be metapopulations due to the patchy distribution of their habitats and breeding populations (Marsh and Trenham 2001; Smith and Green 2005). Amphibians are also among the most threatened taxa globally, with many factors exacerbating population declines including habitat alteration and climate change (Foden et al. 2013; Stuart et al. 2004; Reid et al. 2019). Amphibian metapopulation conservation and management recommendations include increasing dispersal pathways (Griffiths et al. 2010), integrating local and regional level conservation efforts (Alford and Richards 1999), and maintaining local habitat quality (Marsh and Trenham 2001). However, understanding both the temporal and spatial patterns of metapopulation connectivity is likely vital to develo** effective conservation strategies for amphibians. This is particularly true for amphibian metapopulations in temporally dynamic habitats, where the asynchronous availability of breeding ponds is often assumed to drive extinction-recolonization dynamics (Pechmann et al. 1991; e.g., Lamy et al. 2012).

The Arizona treefrog (Hyla (Dryophytes) wrightorum) is distributed in the Sonoran Desert from the southwestern United States into northeastern Mexico. The Huachuca-Mountains Canelo Hills (HMCH) region of southeastern Arizona hosts populations of the species that are geographically, morphologically, and genetically isolated from the rest of the range (Gergus et al. 2004) and that are included as one of Arizona’s Species of Greatest Conservation Need. The HMCH populations rely primarily on intermittent ponds with spatially and temporally variable availability (i.e., wet period) to complete their life cycle (Gendreau et al. 2021). Environmentally driven stochastic population fluctuations are often associated with amphibian metapopulation dynamics (Marsh & Trenham 2001). For Arizona treefrogs, statistically significant population genetic differentiation and small effective population sizes (Mims et al. 2016), provide further evidence for metapopulation dynamics within these populations. The HMCH region, along with much of the southwestern United States, is undergoing significant climatic changes, such as increased temperatures and more severe droughts (Williams et al. 2022; Kunkel et al. 2013). Amphibians in the region, including the Arizona treefrog, face numerous threats as a result of climate change and other interacting factors, including disease and invasive species (Mims et al. 2020). Simulations suggest that climate-induced reductions in breeding habitat and larval survival could lead to a transition from a metapopulation to a few isolated populations of the Arizona treefrog in the HMCH region, increasing the risk of regional extinction (Mims et al. 2023).

We evaluated population genetic composition for multiple temporally distinct sampling periods for the Arizona treefrog metapopulation in the HMCH region. We quantified spatial and temporal variation in genetic diversity, effective population size, and genetic differentiation. We hypothesized that spatial and temporal variability in local genetic diversity and effective population size would be high because of the local population stochasticity associated with metapopulation dynamics. However, we expected to find little temporal variation in genetic diversity and effective population size across the metapopulation (i.e., global) due to ongoing gene flow (Parsley et al. 2020). We also expected significant spatial genetic differentiation within each sample period, with evidence for apparent local extinction-recolonization events in a few populations based on temporal pairwise genetic differentiation and bottleneck tests. Finally, we examined whether previously identified isolation-by-distance (Mims et al. 2016; Parsley et al. 2020) landscape patterns were consistent across sample periods.

Methods

Study system and sample collection

We collected genetic samples from nine Arizona treefrog populations within the Huachuca Mountain-Canelo Hills (HMCH) Region of Arizona, USA (Fig. 1). Generation time and turnover rates in Arizona treefrog populations are unknown, but congeners have estimated lifespans of approximately 5 years, reach maturity in approximately 1 year, and metamorphose in as little as 1 to 2 months (Moore et al. 2021; calculated as the average of each trait across all hylids with trait data). Sampling efforts occurred during the summer monsoon seasons of 2014, 2018, 2019, and 2021 following the sampling methods of Mims et al. (2016). Based on the congener life history estimates, it is likely we captured multiple generations across the three sample periods, but there is some possibility we recaptured a few of the same individuals in different sample periods. Adult and larval tissue samples were collected using buccal swabs (adults), toe clips (adults), or tail clips (larvae). Tissue was stored in the field in either a desiccant (drierite: tail, toe clips) or buffer ATL (buccal swabs). Desiccated samples were then stored at room temperature and buccal swabs were placed on ice in the field and all samples were stored at -20 °C in the lab until DNA extraction. Climate conditions across sample years included an average monsoon season in 2014, an unusually late monsoon season in 2019, and an unusually wet monsoon season in 2021 that was preceded by an unusually late, dry monsoon in 2020 (National Weather Service 2022).

Fig. 1
figure 1

Estimated range of the Arizona treefrog (A) and approximate locations of 9 sample locations (B). For (A), range is shown in red and study extent is boxed in yellow. For (B), the size of the pie chart represents the total number of individuals sampled across all sample years. Pie chart sections represent the proportion of the total individuals sampled in an individual sample year. The number of sampled individuals includes siblings and larvae and adults. Map background shows hillshade derived from the USGS National Elevation Dataset (U.S. Geological Survey 2019) and species range map was modified from Mims et al. (2016) and Duellman (2001). Treefrog illustration by Shari Moore

We selected breeding locations by identifying historical sites (Mims et al. 2016) and opportunistically visiting suitable habitat. To the extent possible, we sampled the same sites across years (sites 1–8). We did not sample individuals from sites 1, 3, and 8 in 2019 because those ponds either did not fill or filled after the conclusion of our field season (V. L. Buxton pers. comm.). We identified one additional sampling location (site 9) for the first time in 2018 from an opportunistic habitat visit. Site 9 was the only site sampled in 2018 and was not sampled in 2019; for that reason, we grouped it with the 2019 samples for analysis. We excluded sample locations with n < 5 samples from population-level analyses for the given year with low sample size. Sample locations are considered ‘populations’ for all subsequent analyses. For the remainder of the manuscript, we will refer to analyses conducted on a population within a single sample year as ‘Population x Year’ for clarity. All collections followed an IACUC approved protocol (IACUC Protocol #21–134) and were conducted with Arizona Game and Fish Department and US Forest Service sampling permits.

DNA extraction, microsatellite genoty**, and marker screening

Samples collected in 2014 were extracted, genotyped, and originally published in Mims et al. (2016). For consistency across sample years, we followed similar procedures for the remaining years. We extracted DNA from samples using the Qiagen DNeasy Blood & Tissue Kit. We obtained genotypes from 17 microsatellite loci previously developed by Mims et al. (2016; GenBank accession numbers KX086286-KX086302). We conducted multiplexed PCRs using 0.2 µM primers, 1X Qiagen Multiplex PCR Master Mix, RNAse-free water, and 1–2 µl template DNA (depending on the collection method) to a final reaction volume of 10 µl. PCR conditions followed Mims et al. (2016). PCR products were sequenced at Yale University’s Keck DNA Sequencing Facility (New Haven, CT). We used Geneious Microsatellite Plugin software v2022 to genotype individuals (Kearse et al. 2012). Microsatellite peak calls can be biased to the analyzer and software, so a subset (20%) of all samples originally collected for Mims et al. (2016) were recalled to test for bias among sample years and observers. We transformed allele bins from Mims et al. (2016) to match the bins of the 2019 and 2021 data where differences occurred (Supplementary material, Table S1), and used the transformed set for all following steps. Individuals with > 25% of missing data following reruns (if sufficient tissue was available) of extraction, amplification, and sequencing steps were discarded from all following analyses.

We screened each locus for each year separately for deviations from Hardy-Weinberg Equilibrium (HWE) using χ2-tests and exact Monte Carlo permutation tests using 1000 permutations. To correct for the number of HWE tests run within each sample period, we used the false discovery rate correction method (Benjamini and Hochberg 1995). Loci were also screened for linkage disequilibrium using the Index of Association (Agapow and Burt 2001). Finally, we checked for the presence of null alleles at each locus based on the method of Brookfield (1996). Screening steps were performed using ‘adegenet’ v2.1.3 (Jombart 2008), ‘poppr’ v2.9.2 (Kamvar et al. 2014), and ‘pegas’ v1.0.1 (Paradis 2010) in R version 4.2.2 (R Core Team 2022). To minimize the biases associated with including sibling larval samples in population genetics (Goldberg and Waits 2010), we checked all larval samples for full siblings using the program COLONY v2.0.6.7 (Wang 2018). All but one individual per family was removed if the probability of full sibship was greater than 50%. Full siblings identified from separate sites (2 individuals) or years (51 individuals) were retained. All genotype data, with and without siblings removed, are available on figshare (https://doi.org/10.6084/m9.figshare.23704260.v1).

Genetic diversity, effective population size, and bottlenecks

We calculated genetic diversity estimates using expected heterozygosity (HE), observed heterozygosity (HO), and allelic richness (AR) within and across populations and years. AR was rarefied to the smallest population sample size within each sample period. Because of the clear declining trend we identified in AR, we tested for a significant linear relationship between AR and number individuals sampled using Pearson’s correlation to ensure the trend was not an artifact of sample size. Diversity metrics were calculated using ‘adegenet’ in R. We estimated effective population size (Ne) for each sampling location and year using the linkage disequilibrium method implemented in NeEstimator v2.1 (Do et al. 2014). We chose the linkage disequilibrium method over the temporal method because our temporal samples are only a few generations apart (Waples and Do 2010). We assumed random mating among individuals, excluded alleles occurring only once per population or with an allele frequency less than 0.05, and calculated upper and lower 95% jackknifed confidence intervals. We tested for a significant linear relationship between Ne estimates and sample size of each Population x Year using Pearson’s correlation, both with and without full siblings included.

We tested for evidence of recent bottlenecks, or a reduction in Ne through significant deviations from mutation-drift equilibrium, using the program BOTTLENECK 1.2.02 (Piry et al. 1999). Following the recommendations of Piry et al. (1999) for microsatellite data, we tested under two mutation models: two-phased mutation (TPM) and stepwise mutation (SMM). We set the TPM parameters as 95% single-step mutations, 5% multiple-step mutations, and a variance among multiple steps of 12. We performed a Wilcoxon signed rank test using 1,000 iterations, which is recommended for tests evaluating fewer than 20 loci (Piry et al. 1999), to test for significant results of heterozygosity excess to indicate recent effective population size reduction. Finally, we performed a mode-shift test to determine if the allele frequency distributions were L-shaped, as would be expected if no recent reduction in effective population size occurred (Luikart et al. 1998).

Population differentiation

We calculated genetic differentiation globally across and within years using G’’ST (Meirmans and Hedrick 2011) and FST (Weir and Cockerham 1984). Pairwise genetic differentiation was calculated between each population pair within and across years using G’’ST (Meirmans and Hedrick 2011), FST (Nei 1987), and proportion of shared alleles (Dps; Bowcock et al. 1994). FST and G’’ST were linearized and calculated as (Differentiation Metric) / (1 - Differentiation Metric) (Slatkin 1995). We assessed significance for pairwise G’’ST by calculating upper and lower 95% confidence intervals using bootstrap** with 10,000 replicates. Differentiation estimates were calculated using ‘hierfstat’ v0.5.7 (Goudet and Jombart 2020), ‘mmod’ v1.3.3 (Winter 2012), ‘pegas’, and ‘adegenet’ in R (R Core Team 2022). An analysis of molecular variance (AMOVA) was performed using ‘ade4’ v1.7.16 (Thioulouse et al. 2018) to quantify variance among and between the three sample years across populations, with significance tests based on 10,000 permutations.

We evaluated individual-based hierarchical population differentiation using the Bayesian clustering program STRUCTURE 2.3.4 (Pritchard et al. 2000). We treated each Population x Year as an independent putative population. Ten replicates of each K from 1 to n + 1 were run for 500,000 cycles following a burn-in period of 50,000 cycles. We used the LOCPRIOR model because of the weak, but significant, genetic differentiation quantified within our samples (Pritchard et al. 2000). The most likely number of clusters, K, was determined using the Evanno delta-K method (Evanno et al. 2005). The analysis was repeated within clusters until we identified the terminal cluster. Terminal clusters were identified when K = 1 had the highest log-likelihood or when K was equal to the number of included sampling locations. Individuals from the same sampling location were kept together during hierarchical analysis, regardless of cluster assignment, to analyze differentiation across Populations x Years.

We also used an ordination approach to further examine population differentiation with discriminant analysis of principal components (DAPC) calculated using ‘adegenet’ in R. DAPC is useful for evaluating genetic variation and identifying group clusters, as it emphasizes the genetic variation between populations, over within population variation. In DAPC, k-means is used to identify the number of clusters. We ran increasing values of k from 1 to n + 1, where n is the number of included sample locations. We again treated each Population x Year as an independent putative population. The optimal value of k was estimated using the lowest BIC; however, k values with similar BIC values were retained for comparison. We retained principal components using the 𝛼-score, where the number of PCs that maximizes the difference between observed and random discrimination of groups is retained (Jombart 2008). We estimated the temporal change in population means in ordination space through time by calculating the Euclidean distance between the centroid of each population at t and t + 1 across all retained discriminant functions.

Isolation-by-distance relationships

We tested for isolation-by-distance relationships in each sample period because of the previously identified significant relationships in these populations (Mims et al. 2016; Parsley et al. 2020). We used the dist function to calculate Euclidean distance between each population’s XY coordinates (R Core Team 2022). We log-transformed the geographic distance measure (Rousset 1997). We then compared pairwise genetic distance (linearized G’’ST, linearized FST, and DPS) with Euclidean distance in two ways. First, we used a Mantel test (Mantel 1967) as calculated using the ‘vegan’ package v2.6-2 (Oksanen et al. 2022) and 10,000 or the maximum possible permutations to assess significance. Second, we used a linear and logistic matrix regression modeling approach with the ‘ecodist’ package v2.0.7 (Goslee and Urban 2007). We used 10,000 randomizations to assess significance based on a null hypothesis that the genetic distance by geographic distance relationship is zero.

Results

Nine populations were sampled in at least two of the sample years, and five populations were sampled in all three years (populations 2, 4, 5, 6, and 7). Ultimately, we had 23 Population x Year combinations. Across all years and populations, we collected 693 individuals and genotyped 648 (Supplementary material, Table S2). We discarded 17 individuals with > 25% of missing data following reruns. We removed 62, 9, and 29 larvae identified as full siblings from 2014, 2019, and 2021 respectively (100 total; Population x Year mean = 6.3, range = 1–22). The total number of individuals for downstream analyses was 531 (231 in 2014, 143 in 2019, and 157 in 2021), with a mean of 23.1 and range of 5–44 individuals for each Population x Year (Fig. 1; Table 1).

Table 1 Population summary metrics, genetic diversity, and effective population size for each site and year sampled

We genotyped 17 polymorphic microsatellite loci for each individual. Following bin transformations for the Mims et al. (2016) data, there was an average error rate of 1.4% for 2014 individuals per locus called differently (loci error range 0–3.39%; Supplementary material, Table S1). We identified 10 loci significantly out of Hardy-Weinberg equilibrium (HWE) globally in 2014 (alpha < 0.05), 5 loci in 2019, and 6 loci in 2021 using both 𝜒2 and Monte Carlo methods (Supplementary material, Table S3). No loci were consistently out of HWE across all populations, and no population was consistently out of HWE across all loci in any of the three sampling periods. We found significant linkage disequilibrium in 2014 (p = 0.005) and 2021 (p = 0.005). However, the strongest correlation (rbarD) was 0.14 and 0.11, respectively, and we retained all loci in subsequent analyses. We found low frequencies of null alleles at 16 of the 17 loci (0–0.06). One locus had a null allele frequency of 0.15.

Observed heterozygosity averaged across all loci within 2014, 2019, and 2021 was 0.68, 0.68, and 0.70, respectively. Expected heterozygosity averaged across loci in 2014, 2019, and 2021 was 0.73, 0.72, 0.73, respectively. Population-level heterozygosity was more variable than global heterozygosity both within populations across sample periods and among populations within sample periods (Table 1; Fig. 2a and b). The magnitude of change in observed heterozygosity across sample periods was variable among populations, ranging from − 10.6 to 17.0% difference for the same population between two sample periods (Table 2; Fig. 2a and b). Two populations had greater observed heterozygosity than expected in 2014 and 2019 (2014: populations 3 and 7; 2019: populations 4 and 9) (Fig. 2c, blue shaded area). Half of the populations had greater observed than expected heterozygosity in 2021 (populations 2, 4, 5, 7, and 9; Fig. 2c, blue shaded area). Counter to identifying no common directional trend in heterozygosity across sample periods, all populations had decreased rarefied allelic richness (AR), with an average percent change between two consecutive sample periods of − 17.52% and an average percent change between a population’s first and last sample period of -26.11% (Table 2; Fig. 2d). AR averaged across all populations was 5.75 (2014), 4.50 (2019), and 4.07 (2021) (Table 1). There was no strong correlation between number of individuals sampled and AR (Pearson’s r = 0.391, p = 0.065). Global FIS in each year was 0.04 (2014), 0.03 (2019), and 0.0004 (2021).

Fig. 2
figure 2

Change in population-level genetic diversity across time for observed heterozygosity (A), expected heterozygosity (B), difference between expected and observed heterozygosity, with red highlighting the populations where expected is greater than observed and blue highlighting the populations with observed greater than expected (C), and rarefied allelic richness (D). Populations are numbered and colored approximately from northern to southern populations, except for population 9. Local heterozygosity across sites is variable within years, with individual site variability across years, but no consistent trend across all sites. Allelic richness reflects an overall consistent decrease across years

Table 2 Percent change between sampled periods for the 9 populations for genetic diversity and effective population size estimates. Diversity and effective population size estimates were calculated with siblings removed. Confidence intervals were not taken into account for effective population size percent change calculations

Mean population-level Ne within each sample period, excluding infinite estimates, was 85.9 (2014), 122.8 (2019), and 70.6 (2021), with considerable variation among populations within each sample period and across sample periods (Table 1; Ne estimates with full siblings included are in Table S4). Excluding infinite estimates, the mean Ne percent change between two sample periods was 33.21% (range − 76.85 – 506.59%; Table 2). However, it is worth noting that the confidence interval for 12 of the 23 Population x Year Ne estimates included infinite estimates (Table 1). We found no significant correlation between Ne estimates and number of individuals sampled, regardless of sibling inclusion or exclusion (Supplementary material, Figure S1). Wilcoxon tests for bottlenecks showed evidence for significant deviations from mutation-drift equilibrium in a few populations and sample periods (Table 3). For evidence of recent Ne reductions, only population 5 in sample period 2019 and population 9 in 2021 showed significant heterozygosity excess (P ≤ 0.05). We found evidence of mode-shift in allele frequencies consistent with a recent reduction in Ne in populations 7 and 9 in 2019, and populations 8 and 9 in 2021. However, population 8 in 2019 and population 9 in 2019 and 2021 had fewer than the recommended number of individuals for bottleneck tests (Piry et al. 1999).

Table 3 Bottleneck test results for each site and year sampled using two mutation models, two-phased mutation (TPM) and stepwise mutation (SMM)

Global population differentiation was not significantly different among years (G’’ST: Fig. 3, FST: Supplementary material, Figure S2). We found evidence for small, but significant, population genetic differentiation based on global differentiation measures in all three sample periods with G’’ST = 0.151 (2014), 0.156 (2019), and 0.202 (2021) and FST = 0.044 (2014), 0.041 (2019), and 0.053 (2021). Differentiation among populations across all years was significantly greater than differentiation among years across all populations (G’’ST = 0.007 and FST = 0.001; Fig. 3). AMOVA results also supported greater differentiation among populations than years, with significant differentiation between populations within years (p = 0.0001) but not between years (p = 0.991). Variation between populations accounted for 4.56% of the molecular variance, while variation between years accounted for functionally 0% of the variation (Supplementary material, Table S5). 2.89% variation was explained between individuals, with the majority being accounted for within individuals (93.06%) (Supplementary material, Table S5).

Fig. 3
figure 3

Within- and across-year global differentiation as calculated using G’’ST. Error bars indicate 95% confidence intervals as calculated using bootstrap** with 10,000 replicates. We found no significant difference in global differentiation from year to year (light dots). Spatial differentiation (populations grouped together across all years) was significantly higher than temporal differentiation (populations grouped together within years) (dark dots). Global FST values were also calculated (Supplementary material, Figure S2)

Pairwise G’’ST was significantly different from 0 between most population pairs within each sample period (Table 4; pFST and DPS in Supplementary material, Table S6 and S7). Pairwise spatial differentiation within any sample period ranged 0.036 (Populations 4 and 5 in 2014) – 0.628 (Populations 3 and 9 in 2021) (Table 4). All populations had lower differentiation with themselves at previous time periods (mean pG’’ST = 0.010) than with other populations at previous time periods (mean pG’’ST = 0.218) or other populations within the same time period (mean pG’’ST = 0.219) (Fig. 4; pFST and DPS in Supplementary material, Figure S3). The greatest pairwise temporal differentiation within a population was 0.048 (Population 2 in 2014 and 2021).

Table 4 Pairwise linearized G’’ST between each Population x Year pair
Fig. 4
figure 4

Boxplot comparing the temporal and spatial pairwise linearized G’’ST across all sites and years. Pairwise values were calculated between (1) each population and itself in previous sample years, (2) each population and all other populations in previous sample years, and (3) each population and all other populations in the same sample year. Differentiation among populations is similar within and across years (2 and 3), but there is little differentiation within intra-population, temporal comparisons (1). Linearized FST and DPS were also calculated and compared showing similar trends, though DPS was less distinct (Supplementary material, Figure S3)

STRUCTURE analyses for all years and populations together provided support for K = 2 clusters, with the same populations more likely to be in the same cluster, regardless of sample period (Fig. 5; Hierarchical cluster analysis can be found in supplementary material, Figure S4). DAPC cluster analyses supported K = 7, with similar support for K = 6 and K = 8 using BIC (Supplementary material, Figure S5). We report here only results for K = 7 because K = 6 and K = 8 showed similar patterns (Supplementary material, Figure S6). We retained 15 principal components based on the 𝛼-score and 6 discriminant functions for K = 7 (Fig. 6). Population group mean change in ordination space between two consecutive sample periods ranged 0.488 (Population 7 from 2014 to 2019) – 1.495 (Population 9 from 2019 to 2021) (Supplementary material, Table S8). Group mean change in ordination space across sample periods, from 2014 to 2021, ranged 0.811 (Population 5) – 1.308 (Population 4).

Fig. 5
figure 5

STRUCTURE results for Hyla (Dryophytes) wrightorum across nine populations and three sample years. Each vertical bar represents one individual and colors indicate probability of cluster assignment as determined by the Evanno delta-K method (Evanno et al. 2005). Ten replicates of each K from 1 to 24 (n + 1) were run for 500,000 cycles following a burn-in period of 50,000 cycles. We found support for K = 2 clusters across all populations and years. Nested structure results, where we conducted hierarchical analyses until terminal clusters (K = 1) were reached, can be found in the supplement (Supplementary material, Figure S4)

Fig. 6
figure 6

Discriminant analysis of principal components (DAPC) for all populations and years as calculated using k-means clustering, with most likely K = 7 as determined by BIC (Supplementary material, Figure S5; results also supported K = 6 and K = 8, Supplementary Figure S6). Points show the group centroid of each population in each sample year. Arrows show changes in population relationships in consecutive years. Filled ellipses show 90% confidence intervals of the entire population group across years. We retained 15 principal components based on 𝛼-score and 6 discriminant functions

We found support for significant isolation-by-distance in all three sample years using both the Mantel tests and matrix regressions between pairwise G’’ST and Euclidean distance (Table 5; FST and DPS showed the same patterns). The exception was the matrix regression method in 2014.

Table 5 Results for Mantel test and matrix regressions for isolation-by-distance. We examined the relationship between linearized pG’’ST with Euclidean distance, or straight-line distance between each population’s geographic coordinates. Geographic distances were log-transformed. Bolded p-values highlight significant relationships (p-value < 0.05)

Discussion

We found that, for an isolated anuran metapopulation, temporal genetic variation is missed when using a single sample period rather than multiple sample periods. Globally, genetic diversity and differentiation largely did not change between sample periods. However, at a local scale, genetic diversity at the population scale, effective population size at specific ponds, and pairwise genetic differentiation between populations exhibited varying degrees of temporal variation. Multiple years of genetic sampling also revealed declining trends in allelic richness and effective population size.

Heterozygosity aligned with expected metapopulation dynamics. We found globally stable heterozygosity over time, high variability among populations within sample periods and within most populations across sample periods, and no strong directional trend. The temporally and spatially variable local genetic diversity reflects the asynchronous local population dynamics often characteristic of metapopulations (Pannell and Charlesworth 2000). In turn, strong gene flow and connectivity between populations can maintain the stable, high global genetic diversity over time (Østergaard et al. 2003; Honnay et al. 2009). Additionally, spatial variability in genetic diversity has been associated with population isolation in other amphibian metapopulations (Ambystoma bishopi: Wendt et al. 2021), which is true for this metapopulation as well (Mims et al. 2016; Parsley et al. 2020). While we found no strong directional trends in heterozygosity, it is important to be cautious with definitive interpretations on the status of the metapopulation from this result alone. Heterozygosity can be maintained over shorter timescales despite population decline (Amos and Balmford 2001), and it is possible that even with repeated sampling, the temporal extent of our study is not yet sufficient to capture longer term trends.

Although heterozygosity exhibited no clear trends and was temporally stable when averaged across populations, we found a consistent decline in allelic richness. We rarefied allelic richness to the smallest sample size in each year and found no relationship between the number of individuals sampled and the decline in richness. This finding indicates the observed decline is likely not an artifact of sample size in each Population x Year. Importantly, allelic richness differs from heterozygosity measures in that it is linked to a species’ long-term evolutionary potential, is more sensitive to short, recent bottlenecks, particularly in small populations, and is more likely to reflect changes in rare alleles (Allendorf 1986; Greenbaum et al. 2014). Allelic richness is thus more likely than heterozygosity to reflect effects of recent habitat loss and reduced recruitment (Schlaepfer et al. 2018). For the HMCH Arizona treefrog metapopulation, recruitment could be negatively affected by short hydroperiods or a lack of water during the breeding season (Mims et al. 2023), as observed at known breeding sites in the region over the course of this study (Gendreau et al. 2021). It is possible that our findings point to the leading edge of a long-term decline. However, longer term monitoring may be needed to fully understand the implications of these findings for the stability of the metapopulation over time. Although we are unable to directly link this loss of genetic variation to loss of adaptive potential, genetic variation is an important component of population viability and is likely linked to adaptability (Kardos et al. 2021). Using repeated temporal sampling, we were able to identify signs of genetic erosion occurring within this system that would be missed with a single sample period.

Calculating effective population size from a single year in a metapopulation can be misleading when estimating the risk of local populations to long-term deleterious processes. Ne was variable across sites and across sample periods, and we estimated potentially low Ne (Ne < 100) in many of the populations by the final sample period. Effective population size less than 100 can indicate risk of inbreeding depression or drift in the short term (Frankham et al. 2014). Low Ne is often expected in pond breeding amphibians (Beebee and Griffiths 2005; Reyne et al. 2022), but repeated temporal sampling revealed changes in the proportion of populations likely falling within this risk zone over time. The numerous large or infinite confidence intervals make Ne alone an inconclusive line of evidence for risk. However, examined alongside trends in AR, evidence suggests there is at least some risk for deleterious processes occurring in these populations. Furthermore, the three sample periods are likely sufficient to capture processes affecting trends in both Ne and AR, but insufficient to yet see any change in heterozygosity (Crow and Kimura 1970). Effective population size is a valuable tool for species conservation and management (Frankham et al. 2014), but a single sample year revealing low and spatially variable Ne may not raise any conservation concerns in an amphibian metapopulation that could be at risk of future population declines.

Spatial differentiation, or the genetic differentiation between populations within each sample period, was significantly higher than temporal differentiation, or differentiation between different sample periods within the same population. Additionally, the most likely population clusters were consistent across sample periods. Metapopulations with increasing differentiation (Walser and Haag 2012) and higher temporal than spatial differentiation (Østergaard et al. 2003) would indicate frequent population turnover. We would also expect local populations to reflect the external gene pool in previous sample periods following turnover (Lamy et al. 2012). Local populations within our study had higher pairwise differentiation with other populations, regardless of the sample period, than with themselves at an earlier sample period. In accordance with infrequent evidence of genetic bottlenecks, we did not find substantial evidence of complete population turnover or significant local extinction events in the treefrog populations. Given the estimated generation time of this species, the temporal extent of our study may not be sufficient to capture complete turnover. Alternatively, the low temporal genetic differentiation and variable temporal pairwise differentiation could reflect recolonization by nearby populations, especially considering the significant isolation-by-distance relationship within each sample period (Lamy et al. 2012). Nevertheless, temporal genetic sampling at longer intervals may be a better indicator of extinction-recolonization dynamics within this HMCH Arizona treefrog metapopulation, particularly given some processes within the system, such as population isolation, are predicted to occur over decades (Mims et al. 2023).

Multiple sample periods allowed us to capture local temporal variation and metapopulation level trends that would have otherwise been overlooked with only a single sample period. If population-level conservation or management actions are being considered for a temporally dynamic system, such as a metapopulation, multiple sample periods are likely necessary to avoid time-point sample bias (James et al. 2015) and to tease apart natural fluctuations from disturbances (Pechmann et al. 1991). However, conservation of metapopulations is most effective when the entire population group is considered, because even the loss of small, isolated populations can be detrimental to the metapopulation (Semlitsch and Bodie 1998). In many cases, amphibian conservation and management focused only on the local population level would likely have little effect on long-term persistence (Marsh and Trenham 2001). Amphibian metapopulation conservation requires a balanced understanding of local and global dynamics. Although a single sample period provides a relatively good snapshot of global metapopulation dynamics, it can miss important population-level dynamics that may be highly relevant to conservation and management decisions. Additionally, multiple sample periods may be necessary to determine long-term persistence and population viability for a metapopulation in the initial stages of potential instability.

The Arizona treefrog is currently a state listed species of concern in Arizona, and the HMCH metapopulation was recently a candidate for federal listing as a Distinct Population Segment, though it was not ultimately given the designation (USFWS 50 CFR Part 17 2016). This species, along with other amphibians of the southwestern United States, are increasingly vulnerable to regional human water needs, fire, invasive species, and climate change (Mims et al. 2020; Griffis-Kyle et al. 2018). In addition, populations within the HMCH metapopulation are predicted to become increasingly isolated due to anticipated loss of breeding habitat under a changing climate, ultimately leading to the loss of metapopulation dynamics (Mims et al. 2023). At a broad scale, consistency across sample periods in global heterozygosity and differentiation indicated stability within the Arizona treefrog metapopulation in the HMCH region. Yet at a finer scale, we observed concerning trends of declining allelic richness and reduced effective population size through time. Loss of genetic diversity and small population sizes are a major conservation concern, as they can indicate increased probability of extinction and reduced adaptability to future scenarios (Jump et al. 2009), such as the threats facing the Arizona treefrog. Monitoring genetic variation is still a critical tool in the conservation toolbox despite advances in other areas of conservation genetics (Kardos et al. 2021). Our observations using repeated temporal genetic sampling highlight the importance of monitoring to identify trends in metapopulations over time, ultimately providing an early warning system for declines in genetic diversity.