Introduction

Macroscopic fossils in Ediacaran rocks document a diverse array of morphologically complex, multicellular organisms1,2,3,4,

Table 1 List of networks in this study

Network analysis has several advantages over conventional methodologies used in paleoecology and biostratigraphy. First, it allows for the application of community-detection algorithms, which can be used to partition fossil networks into community-like modules36,44,45 (Supplementary Table 1). Second, network theory accommodates an assortment of metrics for describing local (node specific) and global (whole network) properties and the variables underlying their community structures. Lastly, network theory supports analyses of integrated data structures that leverage multiple types of information. Partitioning of a bipartite network, for example, can lead to the discovery of community units based on the union of two sets of entities. Altogether, such methods indicate that network analysis can be used to discover, identify, and characterize ecologically and geologically meaningful associations of Ediacaran macrofossils.

In this study, we analyze one unipartite network and six bipartite networks derived from fossil occurrence data (Table 1). All of the networks contain taxa nodes (see Methods). In the unipartite network, two genera are connected if they have been reported together at one or more fossil collection points (e.g., beds, localities, sections, outcrops, etc.) anywhere in the world36. This structure represents the most basic expression of our dataset, and its modules generally represent paleocommunities36. The bipartite networks integrate occurrence data with metadata on facies (rock types corresponding to specific environments) and geologic units. In bipartite networks containing paleoenvironments, taxa are connected to their habitats and preservational environments33,35 (Supplementary Table 4), and in bipartite networks containing formations, taxa are connected to geologic units where their fossils are preserved. In this context, the bipartite networks support investigation of Ediacaran biotopes and biozones, as their modules reflect unions of taxa with environments and geologic units. All other differences among the networks are related to raw data (Table 1).

We applied 14 community-detection algorithms to the networks to explore their community structures and identify paleocommunities, biotopes, and biozones (see Supplementary Discussion). Additionally, we calculated a number of metrics: modularity, homophily, and centrality (Supplementary Table 1). Modularity (Q) is a global property describing community structure. In its simplest form46,47, the modularity of a community structure equals the fraction of links that connect nodes of the same communities minus the corresponding fraction expected in an equivalent network with a random distribution of connections. Homophily, another global property, measures the tendency of nodes to connect to others possessing similar nominal or continuous properties, and is measured with assortativity coefficients, which are similar to Pearson correlation coefficients. We calculated assortativity coefficients for the various network projections to assess the relative extents that their topologies reflect the underlying properties of their nodes, such as the preservational modes22 (Supplementary Table 2) and morphogroups7,48 (see Supplementary Discussion and Supplementary Table 3) of the taxa and the locations of the formations. Finally, centrality represents a local property related to the relative importance of a node. A centrality score may, for example, equal a node’s degree (number of links) or its betweenness, the number of shortest paths that pass through it (i.e., how often it serves as a bridge between other nodes).

Application of community-detection algorithms to the networks resulted in the discovery of numerous modules (Figs. 24; Supplementary Figs. 512). The community overlap propagation algorithm (COPRA) generally returned non-overlap** community structures with the fewest modules and greatest Q scores (Supplementary Figs. 57 and Supplementary Table 6). It also returned overlap** modules, when procedures were formulated to allow taxa to be assigned to multiple community units. The overlap** community structures generally resemble their non-overlap** counterparts, except where nodes are assigned to multiple modules. Sensitivity analysis shows that the network partitioning results do not significantly change, even when at least 10% of connections (30–100 links) and five weakly connected (i.e., uncommon) taxa are omitted from each network (Supplementary Fig. 8). Moreover, data randomization indicates that it is unlikely that any the community structures reported in this study arose due to chance49 (Supplementary Fig. 9). Thus, overall, the community structures are robust.

Fig. 2
figure 2

Unipartite network of Ediacaran macrofossil genera. a Network graph. Two genera are linked if those taxa cooccur at any fossil collection point in the dataset. Colors indicate modules identified using the COPRA community-detection algorithm (v = 5). According to randomization testing, this community structure is statistically significant (Supplementary Fig. 9; Q = 0.82, P < 0.01, Z = 7.700). All genera fall into four modules: the Avalon (blue), White Sea (green), Miaohe (orange), and Nama (red) clusters (Fig. 1; Supplementary Figs. 24). b Venn diagram illustrating module overlap. Areas of circles correspond to their relative numbers of genera, numbers are counts of taxa, and values in parentheses are proportions. c, d Stacked bar graphs showing numbers of genera in the modules and their preservational modes (c) and morphogroups (d). Source data are provided as a Source Data file

Fig. 3
figure 3

Bipartite network of paleoenvironments and Ediacaran macrofossil genera. a Network graph. A genus and paleoenvironment are linked if fossils of the taxon have been reported from matching facies (Supplementary Table 4). Colors indicate modules identified using the COPRA community-detection algorithm (v = 2). According to randomization testing, this community structure is statistically significant (Supplementary Fig. 9, paleoenvironments projection, Q = 0.39, P = 0.05, Z = 1.46; taxa projection, Q = 0.51, P = 0.05, Z = 1.27). All paleoenvironments and genera fall into three modules—the deep (blue), shallow (red), and intermediate (green) clusters—named for the relative water depths of their centroids along a model shallow-to-deep-water transect. b Venn diagram illustrating taxonomic overlap of modules. Areas of circles correspond to their relative numbers of genera, numbers are counts of taxa, and values in parentheses are proportions. c Stacked bar graph showing numbers of genera belonging to the various Ediacaran paleocommunities in each module (colors are those used in Fig. 2a, b). d, e Stacked bar graphs showing numbers of genera in the modules and their preservational modes (d) and morphogroups (e). See Fig. 2 for color keys and Supplementary Fig. 10 for related analyses. Source data are provided as a Source Data file

Fig. 4
figure 4

Bipartite network of Ediacaran formations and macrofossil taxa. a Network graph. A taxon (genus or ichnogenus) and geologic formation are linked if fossils of the taxon have been reported from that geologic unit. Colors indicate modules identified using the COPRA community-detection algorithm (v = 6). According to randomization testing, this community structure is statistically significant (Supplementary Fig. 9; formations projection, Q = 0.63, P < 0.01, Z = 12.17; taxa projection, Q = 0.51, P < 0.01, Z = 6.57). All formations and taxa fall into four modules: the Lantian biota (teal), Ediacara biota biozone (blue), Terminal Ediacaran biozone (red), and Ediacaran/Cambrian taxa (green) clusters. b Venn diagram illustrating taxonomic overlap of modules. Areas of circles correspond to their relative numbers of genera and ichnogenera, numbers are counts of taxa, and values in parentheses are proportions. c Stacked bar graph showing the numbers of taxa belonging to the various Ediacaran macrofossil paleocommunities (colors are those used in Fig. 2a, b) and representing traces in the modules. d Stacked bar graph showing numbers of taxa in the modules and their preservational modes (see Fig. 2 for color key). e Stacked bar graph showing numbers of taxa in the modules and their biotope assignments (colors are those used in Fig. 3a, b). See Supplementary Figs. 11 and 12 for related analyses. Source data are provided as a Source Data file

Application of the COPRA method to the unipartite network of Ediacaran genera (Fig. 2a) resulted in detection of four overlap** modules (Fig. 2b). The largest and most central cluster—the White Sea module—predominantly consists of bilateralomorph, dickinsoniomorph, and kimberellomorph genera, along with other Ediacara-type taxa (Fig. 2c, d). This cluster overlaps with the Avalon and Nama modules. Whereas the Avalon module primarily consists of rangeomorphs, the Nama cluster includes a mixture of taxa known from skeletal fossils, secondarily mineralized tubes, Ediacara-type fossils, and carbonaceous compressions. The remaining cluster, the Miaohe module, overlaps with the Nama module but is dominated by filament-, strap-, and ribbon-shaped taxa typically preserved as carbonaceous compressions. Assortativity coefficients indicate the best predictor of association in this network is preservational mode (Fig. 2c; Supplementary Fig. 7 and Supplementary Table 8). The relationship between morphology (form or morphogroup) and linkage is weak.

The COPRA method facilitated the detection of three modules in the bipartite network of paleoenvironments and Ediacaran genera (Fig. 3a, b). Each module includes one or more paleoenvironments and many genera, and therefore, resembles a biotope. The smallest module—the deep biotope—consists of Avalon taxa (Figs. 2a, 3c) known exclusively from turbiditic facies representing deep-water paleoenvironments (Supplementary Table 4). The next smallest module—the shallow biotope—predominately consists of White Sea and Nama genera, which occur in clastic and carbonate facies reflecting shallow water paleoenvironments regularly influenced by wave action and tidal processes (Supplementary Table 4). Conversely, the largest module—the intermediate biotope (intermediate with regard to water depth)—is comprised of White Sea, Nama, and Miaohe taxa (Figs. 2a, 3c). These genera occur in clastic and carbonate facies representing offshore shelf and ramp paleoenvironments located below fair-weather wave base and characterized by low-energy conditions (Supplementary Table 4). This module also includes the carbonate slope and basin paleoenvironment, which shares various taxa with the offshore shelf, offshore shelf transition, and outer ramp paleoenvironments. Virtually all Miaohe genera (Fig. 2a) belong to this intermediate biotope (Fig. 3c). All three clusters share genera with each other, but the greatest overlap occurs between the shallow and intermediate biotopes. Assortativity coefficients show that none of the nominal properties of the genera (e.g., preservational mode or morphogroup) represent strong predictors of association (Supplementary Fig. 7). Addition of ichnogenera to this network does not significantly alter its community structure or the module assignments of taxa (Supplementary Fig. 10).

Partitioning of the bipartite network of Ediacaran formations and taxa (genera and ichnogenera) with COPRA again resulted in the discovery of four overlap** modules (Fig. 4a, b). These modules resemble assemblage-based biozones (Fig. 5), in that they are defined by various taxa and strata (see Supplementary Discussion). Two modules contain the majority of nodes (Fig. 4b) as well as all nodes with high centrality scores (Fig. 6). The largest module—the Ediacara biota biozone (EBB)—is comprised of formations containing Avalon, White Sea, and Miaohe genera (Fig. 4c) known from Ediacara-type fossils and carbonaceous compressions (Fig. 4d). Collectively, the taxa of the EBB represent all biotopes, with the majority belonging to the module of intermediate water depth (Figs. 3a, 4e; Supplementary Fig. 10a). In contrast, the second largest module—the Terminal Ediacaran biozone (TEB)—consists of formations dominated by Nama genera (Fig. 4c), including skeletal taxa and forms known from Ediacara-type fossils and carbonaceous compressions (Fig. 4d). These fossils are preserved in both nearshore and offshore settings (Figs. 3a, 4e; Supplementary Fig. 10a), and taxa of the deep biotope are notably missing. Together, the TEB and EBB modules dwarf the small remaining clusters (Fig. 4b). One of these small modules contains the Lantian Formation and the genera of the Lantian Biota5 and Supplementary Table 6), affirming that it performed best at identifying nonrandom associations of nodes. It also typically returned the fewest communities, making it one of the most conservative approaches for partitioning the dataset. For this reason, we can interpret the modules as macro-level community units, representing the largest and most significant associations of nodes. Although the density of connections within each module is high, the associations are separated by relatively sparse regions of connections36 that can be interpreted as consequences of biotic turnover of Ediacaran taxa across space and time. For all practical purposes, the unipartite network modules represent paleocommunities, (i.e., associations of taxa that lived and were preserved together at various localities around the world). Along these same lines, the modules consisting of paleoenvironments and taxa constitute biotopes—environments with unique communities of taxa and specific ranges of substrates, hydrodynamic conditions, and light availability. Finally, the modules consisting of formations and taxa signify assemblage biozones, given that they consist of lithologic packages that can be correlated based on associations of taxa.

Altogether, the results provide an empirical framework for exploring how Ediacaran communities were distributed across space and time. Analysis of the unipartite network (Fig. 2a, b) corroborates the results of hierarchical clustering and NMDS (Fig. 1; Supplementary Figs. 24), as well as the findings of other studies31,32, which show that Ediacaran localities and formations can be divided among four clusters based on taxonomic similarity. To a degree, the relative ages of these paleocommunities are unknown, as their sequential appearance through stratigraphy generally varies from region to region (Supplementary Fig. 4). Notably, the multipartite networks do not contain modules that are analogous to these paleocommunities. Assortativity coefficients indicate that the topologies of the multipartite networks do not strongly reflect the preservational modes and/or morphogroups of the taxa or the geographic locations of the geologic units. Therefore, the bipartite modules do not represent artifacts of taphonomy or paleobiogeography, and instead most likely represent variation in taxa across facies and stratigraphy.

Consistent with previous interpretations31, network analysis shows that the paleocommunities inhabited environments of varying depth. The Avalon paleocommunity represents a deep-water (slope and basin) biotope31 that shared only a few taxa with shelf environments (Figs. 2, 3a–c). Shelf environments were characterized by two biotopes that shared many genera (Fig. 3b), with the boundary between them located around wave base. The White Sea and Nama paleocommunities (Fig. 2a) occurred in both biotopes, and occupied similar habitats across shelf settings (Fig. 3c), particularly in shoreface and offshore transition environments where their fossils were commonly preserved (Fig. 3a)31,33,34,35,56. In contrast, the Miaohe paleocommunity primarily lived and was preserved in offshore and slope environments (Fig. 3c)47 (Supplementary Tables 6 and 7) were computed for the overlap** and non-overlap** communities in the network projections using the COPRA software67 written in the JAVA language by S. Gregory (http://gregory.org/research/networks/software/copra.html)67. The degree centrality and betweenness centrality scores of nodes (Fig. 6) in the taxa projection of the bipartite network (Fig. 4) were determined using functions of the igraph package and their default settings in RStudio (Source Data). Measures of whole-network properties were also computed using functions of the igraph package for the various networks and network projections in this study (Supplementary Fig. 7 and Supplementary Table 8). Homophily was measured with assortativity coefficients, which are similar to Pearson correlation coefficients, for various continuous and nominal properties of nodes. Assortativity coefficients measuring homophily with respect to degree were determined for all network projections (Supplementary Table 8). Additionally, assortativity coefficients were calculated for taxa projections from data on the preservational modes, morphogroups, and form categories of the genera and ichnogenera. Lastly, assortativity coefficients were calculated for formation projections from data on the G-Plate geoplates, regions (continents), and countries of the geologic units.

Partitioning networks into non-overlap** modules

Prior to selecting the COPRA method67, we partitioned the networks in this study into non-overlap** modules with fourteen community-detection algorithms (see Supplementary Discussion) and then compared the outputs in terms of their numbers of communities and extended modularity scores47 (Supplementary Fig. 5 and Supplementary Table 6). Weighted and non-weighted versions of the unipartite network were partitioned with the leading eigenvector, Louvain, fast greedy, infomap, walktrap, and edge-betweenness algorithms in the igraph package of RStudio, in addition to the COPRA method of the COPRA software67. Non-weighted versions of the bipartite networks were partitioned with the COPRA method; QuanBiMo, LPAwb, and DIRTLPAwb algorithms of the bipartite package in R; LP-BRIM algorithm of the lpbrim package produced by T. Poisot and D. B. Stouffer (http://poisotlab.io/software/) for R; simulated annealing algorithm of the rnetcarto package produced by G. Doulcier, R. Guimera, and D.B. Stouffer for R; leading eigenvector and Adaptive BRIM algorithms of the BiMat package in MATLAB; and biSBM algorithm of the C++ code made available by D. Larremore (http://danlarremore.com/bipartiteSBM/). Some of the algorithms do not output a single best fit community structure. The methods lacking output determinism include infomap, walktrap, COPRA, LPAwb, DIRTLPAwb, LPBRIM, Adaptive BRIM, biSBM, and QuanBiMo methods. These algorithms start fromrandom starting states, and therefore, may produce multiple outputs from a single network. With the exception of the infomap, walktrap, and biSBM algorithms, which did not produce greatly varying results from one run to the next, these methods lacking output determinism were repeatedly applied to each network, and the outputs with the best modularity scores were saved. For each algorithm, the number of runs was determined so the analysis could finish in approximately two hours. The QuanBiMo algorithm was run 100 times; the Adaptive BRIM algorithm was run 1000 times; the LPBRIM algorithm was run 10,000 times; and the COPRA, LPAwb, and DIRTLPAwb algorithms were run 100,000 times. For the COPRA analyses in this comparative work, the v parameter (i.e., the maximum number of communities per vertex) was set to 1, and following the recommendation of the software developer67, the solutions were extra-simplified throughout the partitioning process to remove communities contained within others.

Partitioning networks into overlap** modules

The networks were partitioned into overlap** modules with the COPRA method of the COPRA software67. To find the best solutions, we executed COPRA 100,000 times on each network. Again, the solutions were extra-simplified. For this work, we devised and implemented a jackknife resampling and network partitioning procedure to identify the v parameter of each network in this study (Figs. 2a, 3a, 4a; Supplementary Figs. 10a, 11a, and 12a). For each network, a single node was removed from the data, the network was partitioned using the COPRA method (v = 1), and the number of non-overlap**, non-singleton communities (n) was recorded. Then, the node was reinserted into the network, and the steps were systematically repeated, so that every node in the network was omitted once and a distribution of n values was produced (Supplementary Fig. 6), where the total number of n values equals the size of the network (i.e., the number of nodes). Following this procedure, the v parameter is equal to the maximum n value in the distribution.

Sensitivity analysis

To assess the sensitivity of the network partitioning results to the level of sampling, we analyzed the effects of omitting links and nodes from the networks (Supplementary Fig. 8). In this analysis, links connect taxa to collections, paleoenvironments, and formations. For each network, links were randomly subsampled from the data in order to identify a subnetwork. Next, the number of nodes omitted from the subnetwork as a consequence of the subsampling procedure was determined. Then, the COPRA algorithm (v = 1) was applied 100,000 times to the subnetwork, and the community structure with the highest modularity score was identified. Finally, the best subnetwork partition was compared to the best network partition (i.e. the reported community structure). This final step involved calculating a normalized mutual information (NMI) score with igraph package function in RStudio. In network analysis, NMI is a common measure of similarity (linear and nonlinear dependence) for two clusterings49. These scores are similar to Pearson correlation coefficients with values between 0 (no dependence) and 1 (identical clusterings). Our NMI calculations assume that each omitted node represents its own module in a subnetwork. Overall, these steps were repeated one hundred times at various sampling levels, each corresponding to a percentage of links. Using this procedure, we compiled distributions of NMI scores and omitted node counts vs. sampling level. High NMI scores, particularly those paired with high omitted node counts, indicate results robust to variation in the data. To produce a null model for testing the statistical significance of the NMI scores, we repeated the procedure, except each NMI score was calculated for a pair of networks that were randomly produced with properties (size and degree distribution) based on the network and subnetwork. Unipartite and bipartite null models were generated in RStudio using functions of the igraph (sample_degseq) and bipartite (vaznull) packages, respectively. In this case, the null hypothesis is that the network and its subnetworks do not have comparable community structures at a given sampling level (i.e., the observed NMI scores reflect random similarities). If the majority (95%) of the observed NMI scores are greater than those of the random networks (one-sided statistical test), then the null hypothesis can be rejected.

Randomization testing

A number of methods have been proposed for determining whether a community structure is statistically significant or, conversely, if it could have arisen due to chance49. Typically, a high modularity score is a good indicator of community structure46, but not all networks with high modularity have strong community structure. To assess if the community structures reported in this study arose due to chance (Figs. 2a, 3a, 4a; Supplementary Figs. 10a, 11a, and 12a), we performed a randomization test (Supplementary Fig. 9). For a unipartite network, the null hypothesis of this test is that its observed modularity score equals the value of a random network of matching size and degree distribution, i.e. a network that has the same numbers of nodes with various degrees (numbers of connections). The null hypothesis is essentially the same for bipartite networks. However, each projection in a bipartite network has its own modularity score, so the null hypothesis states that one or both scores are equal to those of random networks. To test this hypothesis for each network, the links among nodes were randomized using functions of the igraph and BiRewire packages in RStudio. Nonetheless, the nodes’ degree distribution was preserved for all projections in the random network. The randomized network was then partitioned using the COPRA method and v parameter that was applied to the original network, and the modularity of the community structure was recorded. These steps were repeated 100 times for each network, producing one distribution of modularity scores per unipartite network and two distributions of modularity scores per bipartite network (one for each projection). These distributions were used to calculate P-values and Z scores to test the null hypothesis (i.e., a one-sided statistical test). The P-value is the probability of discovering a community structure with a higher modularity score if the connections among nodes were randomly distributed (i.e., there is no meaningful community structure). If the P-value of a unipartite network or both P-values of a bipartite network are less than alpha (α) at the 90% (0.10), 95% (0.05), and/or 99% (0.01) confidence levels, the null hypothesis can be rejected, and its community structure is considered statistically significant. A Z score greater than 1 also suggests that an observed community structure is significant49.

Network visualization

Networks were visualized in RStudio using functions in the following packages: igraph, GGally, ggplot2, ggnetwork, and intergraph. Static network graphs (Figs. 2a, 3a, 4a; Supplementary Figs. 10a, 11a, and 12a) were generated using the ggnet2 function of ggplot2 and its default parameters, and nodes of equal size were placed without self-loops according to the Fruchterman-Reingold force-directed algorithm.

Generic richness data

The taxonomic diversities of the biozones in this study were estimated from sample-based incidence (i.e., presence/absence) data with rarefaction, extrapolation, and non-parametric richness estimators (Fig. 7; Supplementary Figs. 1417). In this work, samples are fossiliferous formations and fossil collection points, which vary in number among the biozones. Biozone assignments of Ediacaran formations were taken directly from network analysis results (Fig. 4). Collection points, on the other hand, were assigned to biozones based on their formations. Four subsets of samples were analyzed. The first subset is comprised of all samples of body fossils. Genera in this subset include simple discs (e.g., Aspidella) and possible junior synonyms as well as taxa that may be based on taphomorphs, pseudofossils, and microbial structures. The second subset consists of all samples of index fossil genera (i.e., the morphologically distinct taxa that define the biozones, Fig. 4; Supplementary Fig. 13). In contrast, the third and fourth subsets exclude all samples associated with the deep biotope (i.e., the Bradgate, Briscal, Drook, Fermeuse, Mistaken Point, Nadaleen, and Trepassey samples) as well as all genera assigned to the environmentally restricted Avalon and Miaohe paleocommunities (Figs. 2a, 3a–c). Whereas the third subset consists of all remaining samples of White Sea and Nama taxa, the final subset is comprised of samples of taxa that are known from Ediacara-type fossils and assigned to those paleocommunities (Fig. 2). Ichnogenera occurrences were not included in any of the subsets.

Sample-based rarefaction and extrapolation

A number of sample-based rarefaction analyses were performed using the EstimateS software68 to compare the three (Ediacara biota, Terminal Ediacaran, and Fortunian) stage-level biozones (Supplementary Fig. 13) in terms of taxonomic diversity (genus richness) estimates and sampling intensity (Fig. 7). For sampling intensity 1:n (where n equals the number of samples within each biozone), the expected number of taxa and unconditional 95% confidence interval was calculated for 1000 runs using established analytical methods68, which duplicate the results of conventional subsampling techniques (Source Data). Additionally, non-parametric methods for extrapolation68 were used to estimate the expected numbers of taxa that would be found in augmented samplings with greater numbers of samples. These exact analytical methods were also used to calculate unconditional 95% confidence intervals for the extrapolated values. The unconditional 95% confidence intervals in this rarefaction and extrapolation work can be used for hypothesis testing. The null hypothesis is that two assemblages (i.e., biozones) are equal with respect to their taxonomic diversity. If the confidence intervals of two biozones do not overlap at the current sampling level of the biozone with fewer samples, the null hypothesis can be rejected, and the observed difference in generic diversity is considered statistically significant. Values extrapolated beyond the current sampling level may provide evidence to the contrary, particularly if the rate of taxa accumulation is significantly greater in one assemblage than the other. The amount of variance, however, generally increases with the level of extrapolation, so interpretations of the data should holistically consider the shapes of the rarefaction/extrapolation curves as well as their uncertainties at various sampling levels.

Estimation of taxonomic richness

Five common non-parametric richness estimators were used to estimate the generic diversities of the three stage-level biozones (Supplementary Fig. 13) as functions of sampling intensity (Source Data). These analyses were conducted using the EstimateS software68 (Supplementary Figs. 1417). The estimators correct richness values observed in incidence data by adding terms based on the frequencies of rare taxa (i.e., taxa represented in only one sample or a few)52. They include the Chao-2 (classic formula), bootstrap, first-order jackknife, and second-order jackknife estimators as well as the incidence coverage-based estimator (ICE). For sampling intensity 1:n (where n equals the number of samples within each biozone), samples were randomly selected without replacement from each biozone, and the number of genera was determined using each estimation method. This subsampling was repeated 1000 times for each biozone, and the mean number of genera was calculated for each sampling intensity level. The distributions of iterated bootstrap, second-order jackknife, and ICE mean values were used to calculate conditional variance values and 95% confidence intervals for these estimators. Conversely, exact analytical methods68 were used to derive unconditional variance values and 95% confidence intervals for the mean second-order jackknife and Chao-2 estimators. Unlike the conditional confidence intervals, the unconditional intervals do not converge to zero at the maximum sampling intensity level. If the unconditional 95% CIs of two assemblages do not overlap for a given estimator at this level, the data indicate that the two assemblages are statistical different68. On the other hand, if two conditional CIs do not overlap, the results simply suggest that the smaller reference sample was not drawn from the larger one.

Code availability

The authors declare that the study does not include results produced using custom software or mathematical algorithms. All codes are available from the corresponding author upon reasonable request.

Reporting summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.