Abstract
Background
Resolving the phylogeny of rapidly radiating lineages presents a challenge when building the Tree of Life. An Old World avian family Prunellidae (Accentors) comprises twelve species that rapidly diversified at the Pliocene–Pleistocene boundary.
Results
Here we investigate the phylogenetic relationships of all species of Prunellidae using a chromosome-level de novo assembly of Prunella strophiata and 36 high-coverage resequenced genomes. We use homologous alignments of thousands of exonic and intronic loci to build the coalescent and concatenated phylogenies and recover four different species trees. Topology tests show a large degree of gene tree-species tree discordance but only 40–54% of intronic gene trees and 36–75% of exonic genic trees can be explained by incomplete lineage sorting and gene tree estimation errors. Estimated branch lengths for three successive internal branches in the inferred species trees suggest the existence of an empirical anomaly zone. The most common topology recovered for species in this anomaly zone was not similar to any coalescent or concatenated inference phylogenies, suggesting presence of anomalous gene trees. However, this interpretation is complicated by the presence of gene flow because extensive introgression was detected among these species. When exploring tree topology distributions, introgression, and regional variation in recombination rate, we find that many autosomal regions contain signatures of introgression and thus may mislead phylogenetic inference. Conversely, the phylogenetic signal is concentrated to regions with low-recombination rate, such as the Z chromosome, which are also more resistant to interspecific introgression.
Conclusions
Collectively, our results suggest that phylogenomic inference should consider the underlying genomic architecture to maximize the consistency of phylogenomic signal.
Similar content being viewed by others
Background
Reconstructing phylogenetic relationships for rapidly radiating groups has proven to be particularly difficult [1,2,3,4]. This is because rapid radiations are particularly prone to extensive incomplete lineage sorting (ILS) and resulting high gene-tree discordance, which can result in unresolved or poorly resolved nodes in species trees [5,48]. We found that the most common topology occurred in 16% of the windows, which was not recovered by any coalescent or concatenated inference phylogenies (Fig. 4c). Conversely, the topology recovered by the intron-set-based phylogenies was the fourth most common topology and occurred in 12% of the windows, while the exon-set-based MP-EST and ASTRAL topologies were the thirteenth and fourteenth most common topologies and appeared in only 2.5 and 2.1% of the windows, respectively (Fig. 4c). Altogether, the distribution of gene tree frequency in combination with short internal branches in the species tree is consistent with the expectation of the existence of an anomaly zone in Prunellidae.
Effect of recombination rate variation on topology distribution
If the introgression is the predominant process generating topological discordance and anomaly zone, we would expect gene tree topology in the genomic regions with low recombination rate would be more resistant to introgression. We subsequently investigated tree topology and variation in introgression and recombination rates across the chromosomes for the species falling within the anomaly zone. We used population sequencing data from P. modularis (n = 9) to estimate recombination rates using ReLERNN [49] and PyRho v0.1.6 [50]. As the comparisons based on recombination rates estimated by ReLERNN and PyRho (see “Methods”) showed similar results, we present only the ReLERNN-based results in the main text; those based on PyRho are placed in the supplementary material (Additional file 1: Fig. S3). We averaged recombination rate (cM/Mb) in 50 kb non-overlap** windows and selected windows falling in the upper and lower 10% percentile of recombination rate and estimated topology distribution across these windows. We found that topology 4 ((P. montanella, P. rubida), ((P. koslowi, P. fulvescens), (P. o. fagani, P. o. ocularis, P. atrogularis))) was more frequent within the high-recombination regions of autosomes (Fig. 5a and Additional file 1: Fig. S3). This topology is congruent with phylogeny inferred from intron-set. In contrast, the low-recombination regions on the autosomes recovered topology 1 as having the highest frequencies. The analysis of the Z chromosome found topology 3 to be the dominant topology, especially in the low-recombination regions of that chromosome (Fig. 5a and Additional file 1: Fig. S3).
Tree topology changes with variation in recombination rate and introgression. a The frequency distribution of the four most common topologies in the high- and low-recombination regions of the autosomal and Z chromosomes, respectively. b, c Interplay between the topological distribution and recombination rate variation (left) as well as between the topological distribution and genetic introgression (right) in the Z chromosome (b) and autosomes (c). Topology 4 (blue), which is congruent with the phylogeny inferred from the intron-set, is enriched in the genomic regions with high-recombination rate and high level of gene flow, while the topology 3 (reddish) is more common in the genomic regions with low-recombination rates and less signature of gene flow. d ASTRAL species trees reconstructed for the low-recombination regions within the Z chromosome (left) and for the high-recombination regions within the autosomes (right), respectively. The two phylogenies differ in the position of P. montanella/P. rubida, P. fulvescens/P. koslowi, and P. modularis (indicated by reddish branches). The phylogeny of high recombination regions within autosomes is similar to those of intron-set
We then investigated the interplay between the topology distribution and variation in introgression and recombination rate. We specifically focused on gene flow between P. modularis, P. ocularis/P. atrogularis, P. montanella/P. rubida, and P. fulvescens/P. koslowi with Passer montanus as outgroup (see “Methods”). We found that the genomic regions supporting topology 4 have high rates of recombination and gene flow, while genomic regions supporting topology 3 have low rate of recombination rate and introgression (Wilcoxon statistic, P < 0.001, Fig. 5b and Fig. 5c, Additional file 1: Fig. S3 and Fig. S4). This pattern is more pronounced in the Z chromosome than in the autosomes.
We further reconstructed ASTRAL trees using 50-kb genomic windows with the upper and lower 10% percentile of recombination rate separately, and found that the topology from the genomic regions of the autosomes with the highest recombination rate was identical to the trees estimated from the intron-set-based phylogeny (Fig. 5d, Additional file 1: Fig. S4). However, the phylogenetic relationships reconstructed using the low-recombination regions in the Z chromosome placed P. montanella + P. rubida as a separate lineage, instead of clustering with P. koslowi + P. fulvescens as exhibiting by the phylogeny based on the high-recombination regions (Fig. 5d, Additional file 1: Fig. S4). Taken together, these results suggest that the low-recombination regions within the Z chromosome tend to contain few introgressed segments, likely representing the probable speciation-driven branching relationships for the accentors.
Discussion
Phylogenomic relationship of accentors
Lineages that have experienced a rapid radiation are prone to ILS and interspecific hybridization, a situation that poses a great challenge for phylogenetic reconstruction [74] and 3D-DNA v190716 [75] was used to anchor contigs to scaffolds. Possible assembly errors such as misjoins, translocations, and inversions were manually examined and corrected using the Assembly Tools module within JUICEBOX v1.11.08 [74] (Additional file 1: Fig. S5). We aligned P. strophiata genome with the Zebra finch (Taeniopygia guttata) genome using MUMmer v3.23 [76] and checked the collinearity of the two genomes.
Taxon sampling
We included all currently recognized species of Prunellidae (Supplementary Table 1), which consists of a single genus (Prunella) with twelve species [40, 42]. Prunella ocularis fagani was previously treated as a distinct species [77] but is now treated as a subspecies of P. ocularis [40, 42]. As P. o. fagani is geographically widely separated from P. o. ocularis, we herein treat P. o. fagani and P. o. ocularis as two taxonomic units. We included two to nine individuals for each species except for P. koslowi and P. atrogularis, for which only a single individual was available for each species. We used cryo-frozen or 96% ethanol-preserved tissue for all taxa except for P. o. fagani for which DNA was extracted from the toepad of a museum study skin.
DNA extraction, library preparation, and resequencing
The DNA was extracted from the tissue and museum toepad samples of 34 accentors and two Tree Sparrow Passer montanus using the Qiagen QIAamp DNA Mini Kit according to the manufacturer’s protocol. Sequencing libraries for fresh tissues were prepared using the Illumina TruSeq PCA-free (190/350 bp) kit and were sequenced on an Illumina Novaseq platform in Annoroad Gene Technology and Berry Genomic Institute. The library from museum specimen was prepared using the protocol published by Irestedt et al. [78] and sequenced by SciLifeLab (Stockholm). The samples were sequenced to a mean coverage of 21 × (Supplementary Table S2).
Filtering raw reads and reference map**
Raw sequenced data were cleaned using the fastx toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) with the following steps: (1) removal of adapters, (2) removal low-quality reads; reads with the proportion of “N” > 3% or reads with > 50% low-quality bases (< 3). Raw sequencing data from the museum specimen were cleaned by the same procedure except deleting 5 bp from both ends to avoid wrong sequences of the degraded DNA. We mapped clean reads of 34 accentors, and two tree sparrows and one Red-banded flowerpecker ( Dicaeum eximium, GCA013396995) against the de novo genome of P. strophiata using BWA mem v0.7.12 [79], and then sorted and removed duplicates using Picard (http://broadinstitute.github.is/picard/). We called variants using bcftools mpileup v1.9 [80]. We removed indels and filtered variant call format (VCF) using criteria: (1) minQ > 30, (2) min-DP > 10 and max-DP < 2500, (3) max-missing rate ≤ 0.1, (4) SNPs at least 5 bp away from indels. The VCF after filtering was used for downstream analysis.
Extracting and aligning homologous exonic and intronic loci
To investigate the potential influence of different genetic markers on phylogenetic inference, we assembled intronic and exonic datasets. We carried out these steps using a custom designed BirdScanner pipeline [81] (github.com/Naturhistoriska/birdscanner). Specifically, we performed searches using profile hidden Markov models (HMM) [82] to obtain a large number of sequence homologs of nuclear exonic and intronic loci across the whole genome. Profile HMMs use information from variation in multiple sequence alignments to seek similarities in databases, or as here, genome assemblies [83]. The HMM profiles were based on the alignments of exonic and intronic loci generated by Jarvis et al. [1] for four passerine species, Acanthisitta chloris, Corvus brachyrhynchos, Geospiza fortis, and Manacus vitellinus. For each HMM query and taxon, the location in the genome for the highest hit was identified, and the sequence parsed out using the genomic coordinates. The parsed-out gene sequences were then aligned gene by gene using MAFFT v7.310 [84] and poorly aligned sequences were identified, based on a calculated distance matrix using OD-Seq (github.com/PeterJehl/OD-Seq) and excluded from further analyses. We also checked the alignments manually and removed those that included non-homologous sequences for some taxa (indicated by an extreme proportion of variable positions in the alignment) and those that contained no phylogenetic information (no parsimony-informative sites). We also filtered the alignments to only include those that contained all samples. A total of 2373 exonic and 6879 intronic loci were kept for the subsequent analyses. All separate alignments were combined to a single concatenated alignment for the concatenation analyses, or kept separate for coalescent analyses based on gene trees.
Phylogenomic analyses
We used both concatenated and coalescent approaches to estimate phylogenomic relationships of the accentors for the intron-set and exon-set, respectively. For the concatenated approach, trees were constructed for the exon-set and intron-set separately using IQ-TREE [43] and applying “–m TEST” option to find the best substitution model for each alignment. We inferred the maximum-likelihood trees from the two concatenated datasets with 1000 ultrafast bootstraps to obtain branch supports as implemented in the IQ-TREE software [85].
For the coalescent analyses, we first used IQ-TREE to estimate the best maximum-likelihood tree for each intronic or exonic dataset. Statistical confidence of each gene tree was assessed by performing 100 bootstrap replicates using the best substitution model for each alignment. We used ASTRAL-III v5.6.3 [44, 45] to construct coalescent trees from the best maximum-likelihood gene trees estimated for the exon-set and intron-set separately. We also ran MP-EST coalescent analyses (MP-EST v2.1) [46] with 100 runs beginning with different random seed numbers and ten independent tree searches within each run. The MP-EST species tree topology was inferred using the best maximum-likelihood gene trees as input. Confidence of each node was evaluated by performing the same species tree inference analysis on 100 maximum-likelihood bootstrap gene trees. The resulting 100 species trees estimated from bootstrapped samples were summarized onto the ASTRAL and MP-EST species trees using the option “-f b” in RAxML.
Test topological difference between estimated gene trees and species trees
We next considered whether topological differences between estimated gene trees and the species trees are well supported. For each locus, we tested the estimated gene tree topology against each of the four candidate species trees that were inferred for the intron-set and exon-set, respectively (see “Results”). We used approximately unbiased (AU) tests in IQ-TREE to test whether individual gene trees fit each of the four candidate species trees. For each gene tree, a Bonferroni-corrected P value of 0.05 adjusted for multiple comparisons was considered to reject species tree topology.
Coalescent simulations
To investigate how much gene tree heterogeneity can be explained by ILS and gene tree estimation error, we carried out coalescent simulations as described in Cai et al. [Integrating signals of topology distribution and variation in introgression and recombination rate We estimated chromosome-wide introgression using fd statistic in 50-kb non-overlap** sliding windows using ABBABABAwindows.py [97]. We specifically focused on comparisons between P. montanella/P. rubida and P. fulvescens/P. koslowi as these two lineages constitute the major topological conflicts observed (see “Results”). We estimated fd values for each of the four trios using P. modularis as outgroup, and then calculated their average fd values for subsequent comparisons. To assess how topology frequency changes with variation in introgression and recombination rate, we compared average fd and r values between the windows supporting the different topologies for the autosomal and Z chromosomes, respectively. We used Wilcoxon statistic to test for statistical significance.
Availability of data and materials
Variant call data, alignments, and trees of intron and exon, and codes for this study are available in the figshare repository (https://doi.org/https://doi.org/10.6084/m9.figshare.25202057.v3) [98]. Resequencing data of 34 accentors are available in the NCBI database under accession number PRJNA960939 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA960939) [99]. Genome assembly of Prunella strophiata is available in GenBank (https://www.ncbi.nlm.nih.gov/nuccore/JAZBQD000000000) [100].
Abbreviations
- ISL:
-
Incomplete lineage sorting
- TWISST:
-
Topology weight analysis
- gCF:
-
Gene concordance factor
- gDF:
-
Gene discordance factor
- RF:
-
Robinson-Foulds
- HMM:
-
Hidden Markov models
- BS:
-
Bootstrap support
References
Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, et al. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science. 2014;346:1320–31.
Prum RO, Berv JS, Dornburg A, Field DJ, Townsend JP, Lemmon EM, et al. A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing. Nature. 2015;526:569–73.
Tarver JE, Dos Reis M, Mirarab S, Moran RJ, Parker S, O’Reilly JE, et al. The interrelationships of placental mammals and the limits of phylogenetic inference. Genome Biol Evol. 2016;8:330–44.
Chen M-Y, Liang D, Zhang P. Selecting question-specific genes to reduce incongruence in phylogenomics: a case study of jawed vertebrate backbone phylogeny. Syst Biol. 2015;64:1104–20.
Edwards SV. Is a new and general theory of molecular systematics emerging? Evolution. 2009;63:1–19.
Cai L, ** Z, Lemmon EM, Lemmon AR, Mast A, Buddenhagen CE, et al. The perfect storm: gene tree estimation error, incomplete lineage sorting, and ancient gene flow explain the most recalcitrant ancient angiosperm clade. Malpighiales Syst Biol. 2021;70:491–507.
Cloutier A, Sackton TB, Grayson P, Clamp M, Baker AJ, Edwards SV. Whole-genome analyses resolve the phylogeny of flightless birds (Palaeognathae) in the presence of an empirical anomaly zone. Syst Biol. 2019;68:937–55.
Scherz MD, Masonick P, Meyer A, Hulsey CD. Between a rock and a hard polytomy: phylogenomics of the rock-dwelling mbuna cichlids of Lake Malaŵi. Syst Biol. 2022;71:741–57.
Pease JB, Haak DC, Hahn MW, Moyle LC. Phylogenomics reveals three sources of adaptive variation during a rapid radiation. Plos Biol. 2016;14:e1002379.
Edelman NB, Frandsen PB, Miyagi M, Clavijo B, Davey J, Dikow RB, et al. Genomic architecture and introgression shape a butterfly radiation. Science. 2019;366:594–9.
Li G, Figueiró HV, Eizirik E, Murphy WJ. Recombination-aware phylogenomics reveals the structured genomic landscape of hybridizing cat species. Mol Biol Evol. 2019;36:2111–26.
Degnan JH, Rosenberg NA. Discordance of species trees with their most likely gene trees. Plos Genet. 2006;2:e68.
Rosenberg NA, Tao R. Discordance of species trees with their most likely gene trees: the case of five taxa. Syst Biol. 2008;57:131–40.
Solís-Lemus C, Yang M, Ané C. Inconsistency of species tree methods under gene flow. Syst Biol. 2016;65:843–51.
Long C, Kubatko L. The effect of gene flow on coalescent-based species-tree inference. Syst Biol. 2018;67:770–85.
Mallet J, Besansky N, Hahn MW. How reticulated are species? BioEssays. 2016;38:140–9.
Meredith RW, Janečka JE, Gatesy J, Ryder OA, Fisher CA, Teeling EC, et al. Impacts of the cretaceous terrestrial revolution and KPg extinction on mammal diversification. Science. 2011;334:521–4.
Song S, Liu L, Edwards SV, Wu S. Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model. Proc Natl Acad Sci U S A. 2012;109:14942–7.
McCormack JE, Faircloth BC, Crawford NG, Gowaty PA, Brumfield RT, Glenn TC. Ultraconserved elements are novel phylogenomic markers that resolve placental mammal phylogeny when combined with species-tree analysis. Genome Res. 2012;22:746–54.
McCormack JE, Hird SM, Zellmer AJ, Carstens BC, Brumfield RT. Applications of next-generation sequencing to phylogeography and phylogenetics. Mol Phylogenet Evol. 2013;66:526–38.
Chojnowski JL, Kimball RT, Braun EL. Introns outperform exons in analyses of basal avian phylogeny using clathrin heavy chain genes. Gene. 2008;410:89–96.
Yu L, Luan P-T, ** W, Ryder OA, Chemnick LG, Davis HA, et al. Phylogenetic utility of nuclear introns in interfamilial relationships of Caniformia (order Carnivora). Syst Biol. 2011;60:175–87.
Foley NM, Thong VD, Soisook P, Goodman SM, Armstrong KN, Jacobs DS, et al. How and why overcome the impediments to resolution: lessons from rhinolophid and hipposiderid bats. Mol Biol Evol. 2015;32:313–33.
Rannala B, Yang Z. Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics. 2003;164:1645–56.
Liu L, Yu L, Kubatko L, Pearl DK, Edwards SV. Coalescent methods for estimating phylogenetic trees. Mol Phylogenet Evol. 2009;53:320–8.
Nadeau NJ, Martin SH, Kozak KM, Salazar C, Dasmahapatra KK, Davey JW, et al. Genome-wide patterns of divergence and gene flow across a butterfly radiation. Mol Ecol. 2013;22:814–26.
Good JM, Vanderpool D, Keeble S, Bi K. Negligible nuclear introgression despite complete mitochondrial capture between two species of chipmunks. Evolution. 2015;69:1961–72.
Lamichhaney S, Berglund J, Almén MS, Maqbool K, Grabherr M, Martinez-Barrio A, et al. Evolution of Darwin’s finches and their beaks revealed by genome sequencing. Nature. 2015;518:371–5.
Nater A, Burri R, Kawakami T, Smeds L, Ellegren H. Resolving evolutionary relationships in closely related species with whole-genome sequencing data. Syst Biol. 2015;64:1000–17.
Martin SH, Jiggins CD. Interpreting the genomic landscape of introgression. Curr Opin Genet Dev. 2017;47:69–74.
Zhang D, Rheindt FE, She H, Cheng Y, Song G, Jia C, et al. Most genomic loci misrepresent the phylogeny of an avian radiation because of ancient gene flow. Syst Biol. 2021;70:961–75.
Springer M, Gatesy J. On the illogic of coalescence simulations for distinguishing the causes of conflict among gene trees. J Phylogenet Evol Biol. 2018;6:3.
** Z, Liu L, Davis CC. Genes with minimal phylogenetic information are problematic for coalescent analyses when gene tree estimation is biased. Mol Phylogenet Evol. 2015;92:63–71.
Brandvain Y, Kenney AM, Flagel L, Coop G, Sweigart AL. Speciation and introgression between Mimulus nasutus and Mimulus guttatus. Plos Genet. 2014;10:e1004410.
Schumer M, Xu C, Powell DL, Durvasula A, Skov L, Holland C, et al. Natural selection interacts with recombination to shape the evolution of hybrid genomes. Science. 2018;360:656–60.
Martin SH, Davey JW, Salazar C, Jiggins CD. Recombination rate variation shapes barriers to introgression across butterfly genomes. Plos Biol. 2019;17:e2006288.
Drovetski SV, Semenov G, Drovetskaya SS, Fadeev IV, Red’kin YA, Voelker G. Geographic mode of speciation in a mountain specialist avian family endemic to the Palearctic. Ecol Evol. 2013;3:1518–28.
Liu B, Alström P, Olsson U, Fjeldså J, Quan Q, Roselaar KCS, et al. Explosive radiation and spatial expansion across the cold environments of the old world in an avian family. Ecol Evol. 2017;7:6346–57.
Shirihai H, Svensson L. Handbook of Western Palearctic Birds. Volume 1. Passerines: Larks to Warblers. London: Bloomsbury Publishing; 2018.
Gill F, Donsker D, Rasmussen P. (Eds). IOC World Bird List (v13.1). 2023. https://doi.org/10.14344/IOC.ML.13.1.
Zang W, Jiang Z, Ericson PGP, Song G, Drovetski SV, Saitoh T, et al. Evolutionary relationships of mitogenomes in a recently radiated old world avian family. Avian Res. 2023;14:100097.
Clements JF, Schulenberg TS, Iliff MJ, Fredericks TA, Gerbracht JA, Lepage D, et al. The eBird/Clements checklist of Birds of the World: v2022. Downloaded from https://www.birds.cornell.edu/clementschecklist/introduction/updateindex/october-2022/2022-citation-checklist-download/.
Nguyen L-T, Schmidt HA, Von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32:268–74.
Zhang C, Rabiee M, Sayyari E, Mirarab S. ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinformatics. 2018;19:15–30.
Rabiee M, Sayyari E, Mirarab S. Multi-allele species reconstruction using ASTRAL. Mol Phylogenet Evol. 2019;130:286–96.
Liu L, Yu L, Edwards SV. A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol Biol. 2010;10:1–18.
Huang H, Knowles LL. What is the danger of the anomaly zone for empirical phylogenetics? Syst Biol. 2009;58:527–36.
Martin SH, Van Belleghem SM. Exploring evolutionary relationships across the genome using topology weighting. Genetics. 2017;206:429–38.
Adrion JR, Galloway JG, Kern AD. Predicting the landscape of recombination using deep learning. Molecular Biol Evol. 2020;37:1790–808.
Spence JP, Song YS. Inference and analysis of population-specific fine-scale recombination maps across 26 diverse human populations. Sci Adv. 2019;5:eaaw9206.
Hatchwell B. Family Prunellidae (Accentors). Handbook of the birds of the world. 2005;10:496–513.
Stepanyan LS. Conspectus of the ornithological fauna of Russia and adjacent territories (within the borders of the USSR as a historic region). Moscow, Russia: Academkniga; Moscow, Russia (In Russian). 2003.
Rokas A, Carroll SB. Bushes in the tree of life. Plos Biol. 2006;4:e352.
Avise JC, Robinson TJ. Hemiplasy: A new term in the lexicon of phylogenetics. Syst Biol. 2008;57:503–7.
Suh A. The phylogenomic forest of bird trees contains a hard polytomy at the root of Neoaves. Zool Scr. 2016;45:50–62.
Svardal H, Salzburger W, Malinsky M. Genetic variation and hybridization in evolutionary radiations of cichlid fishes. Annu Rev Anim Biosci. 2021;9:55–79.
Rosenberg NA. Discordance of species trees with their most likely gene trees: a unifying principle. Mol Biol Evol. 2013;30:2709–13.
Linkem CW, Minin VN, Leaché AD. Detecting the anomaly zone in species trees and evidence for a misleading signal in higher-level skink phylogeny (Squamata: Scincidae). Syst Biol. 2016;65:465–77.
Morales-Briones DF, Kadereit G, Tefarikis DT, Moore MJ, Smith SA, Brockington SF, et al. Disentangling sources of gene tree discordance in phylogenomic data sets: testing ancient hybridizations in Amaranthaceae s.l. Syst Biol. 2021;70:219–35.
Nachman MW, Payseur BA. Recombination rate variation and speciation: theoretical predictions and empirical results from rabbits and mice. Philos Trans R Soc Lond B Biol Sci. 2012;367:409–21.
Fontaine MC, Pease JB, Steele A, Waterhouse RM, Neafsey DE, Sharakhov IV, et al. Extensive introgression in a malaria vector species complex revealed by phylogenomics. Science. 2015;347:1258524.
Edwards SV, Kingan SB, Calkins JD, Balakrishnan CN, Jennings WB, Swanson WJ, et al. Speciation in birds: genes, geography, and sexual selection. Proc Natl Acad Sci U S A. 2005;102(Suppl 1):6550–7.
Sætre G, Borge T, Lindroos K, Haavie J, Sheldon BC, Primmer C, et al. Sex chromosome evolution and speciation in Ficedula flycatchers. Proc R Soc B. 2003;270:53–9.
Axelsson E, Smith NG, Sundstrom H, Berlin S, Ellegren H. Male-biased mutation rate and divergence in autosomal, Z-linked and W-linked introns of chicken and turkey. Mol Biol Evol. 2004;21:1538–47.
Bartosch-Härlid A, Berlin S, Smith NG, Moller AP, Ellegren H. Life history and the male mutation bias. Evolution. 2003;57:2398–406.
Edwards SV. Phylogenomic subsampling: a brief review. Zool Scr. 2016;45:63–74.
Mirarab S, Bayzid MS, Warnow T. Evaluating summary methods for multilocus species tree estimation in the presence of incomplete lineage sorting. Syst Biol. 2016;65:366–80.
Leaché AD, Fujita MK, Minin VN, Bouckaert RR. Species delimitation using genome-wide SNP data. Syst Biol. 2014;63:534–42.
Haenel Q, Laurentino TG, Roesti M, Berner D. Meta-analysis of chromosome-scale crossover rate variation in eukaryotes and its significance to evolutionary genomics. Mol Ecol. 2018;27:2477–97.
Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 2017;27:722–36.
Chin C-S, Peluso P, Sedlazeck FJ, Nattestad M, Concepcion GT, Clum A, et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods. 2016;13:1050–4.
Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. Plos One. 2014;9:e112963.
Pryszcz LP, Gabaldón T. Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res. 2016;44:e113–e113.
Durand NC, Shamim MS, Machol I, Rao SS, Huntley MH, Lander ES, et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 2016;3:95–8.
Dudchenko O, Batra SS, Omer AD, Nyquist SK, Hoeger M, Durand NC, et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science. 2017;356:92–5.
Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, et al. Versatile and open software for comparing large genomes. Genome Biol. 2004;5:1–9.
Gill F, Donsker D. (Eds). IOC World Bird List, version 6.1. 2016. https://doi.org/10.14344/IOC.ML.6.1.
Irestedt M, Thörn F, Müller IA, Jønsson KA, Ericson PGP, Blom MP. A guide to avian museomics: Insights gained from resequencing hundreds of avian study skins. Mol Ecol Resour. 2022;22:2672–84.
Li H, Durbin R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics. 2009;25:1754–60.
Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10:giab008.
Ericson PGP, Irestedt M, Nylander JAA, Christidis L, Joseph L, Qu Y. Parallel evolution of bower-building behavior in two groups of bowerbirds suggested by phylogenomics. Syst Biol. 2020;69:820–9.
Eddy SR. Accelerated profile HMM searches. Plos Comput Biol. 2011;7:e1002195.
Eddy SR. Profile hidden Markov models. Bioinformatics (Oxford, England). 1998;14:755–63.
Katoh K, Standley DM. MAFFT Multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–80.
Hoang DT, Chernomor O, Von Haeseler A, Minh BQ, Vinh LS. UFBoot2: improving the ultrafast bootstrap approximation. Mol Biol Evol. 2018;35:518–22.
Liu L, Yu L. Phybase: an R package for species tree analysis. Bioinformatics. 2010;26:962–3.
Ly-Trong N, Naser-Khdour S, Lanfear R, Minh BQ. AliSim: a fast and versatile phylogenetic sequence simulator for the genomic era. Mol Biol Evol. 2022;39:msac092.
Bogdanowicz D, Giaro K, Wróbel B. TreeCmp: comparison of trees in polynomial time. Evol Bioinform. 2012;8:EBO-S9657.
Malinsky M, Matschiner M, Svardal H. Dsuite-Fast D-statistics and related admixture evidence from VCF files. Mol Ecol Resour. 2021;21:584–95.
Efron B. Size, power and false discovery rates 2007. Ann Statist. 2007;35:1351–77.
Malinsky M, Svardal H, Tyers AM, Miska EA, Genner MJ, Turner GF, et al. Whole-genome sequences of Malawi cichlids reveal multiple radiations interconnected by gene flow. Nat Ecol Evol. 2018;2:1940–55.
Martin SH, Dasmahapatra KK, Nadeau NJ, Salazar C, Walters JR, Simpson F, et al. Genome-wide evidence for speciation with gene flow in Heliconius butterflies. Genome Res. 2013;23:1817–28.
Pease JB, Hahn MW. Detection and polarization of introgression in a five-taxon phylogeny. Syst Biol. 2015;64:651–62.
Minh BQ, Hahn MW, Lanfear R. New Methods to calculate concordance factors for phylogenomic datasets. Mol Biol Evol. 2020;37:2727–33.
Sayyari E, Mirarab S. Testing for polytomies in phylogenetic species trees using quartet frequencies. Genes. 2018;9:132.
Terhorst J, Kamm JA, Song YS. Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat Genet. 2017;49:303–9.
Martin SH, Davey JW, Jiggins CD. Evaluating the use of ABBA–BABA statistics to locate introgressed loci. Mol Biol Evol. 2015;32:244–57.
Jiang Z, Zang W, Ericson PGP, Song G, Wu S, et al. Gene flow and an anomaly zone complicate phylogenomic inference in a rapidly radiated avian family (Prunellidae). 2024. Figshare. https://doi.org/10.6084/m9.figshare.25202057.v3.
Jiang Z, Zang W, Ericson PGP, Song G, Wu S, et al. Re-sequencing data of Prunellidae. NCBI BioProject. 2024. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA960939.
Jiang Z, Zang W, Ericson PGP, Song G, Wu S, et al. Prunella strophiata isolate XZ15142, whole genome shotgun sequencing project. GenBank https://www.ncbi.nlm.nih.gov/nuccore/JAZBQD000000000 (2024).
Acknowledgements
We thank Matthew Hahn for his comments on a previous version of this paper, Laura Kubatko for helpful discussion on gene flow and the anomaly zone, and Wanjun Chen (BGI-Shenzhen) for Hi-C assembly. Samples for this study were kindly provided by the Burke Museum and Yale Peabody Museum, USA, Natural History Museum of Denmark, Copenhagen, and Natural History Museum of Norway, Oslo. We are particularly grateful by Jon Fjeldså, Kristof Zyskowski and Sharon Birks for the assistance with this. We thank Martin Irestedt for extracting DNA and building sequence library for the sample of Prunella o. fagani for which only museum skin was available. The authors acknowledge support from the National Genomics Infrastructure in Stockholm funded by Science for Life Laboratory, the Knut and Alice Wallenberg Foundation and the Swedish Research Council, and SNIC/Uppsala Multidisciplinary Center for Advanced Computational Science for assistance with massively parallel sequencing and access to the UPPMAX computational infrastructure.
Funding
This research was funded by the National Natural Science Foundation of China (NSFC32020103005 and U23A20162) and Third **njiang Scientific Expedition and Research (2022xjkk0205). PA was supported by the Swedish Research Council (2019–04486) and Jornvall Foundation and PE by the Swedish Research Council (2017–3693).
Author information
Authors and Affiliations
Contributions
Y.Q., P.G.P.E., and P.A. conceived the research idea. Y.Q., P.G.P.E. and S.V.D designed the data analyses. Z.J., W.Z., and P.G.P.E. conducted data analyses. S.W. and P.A significantly contributed to data analyses. S.F. assembled chromosomal genome. Y.Q., Z.J., W.Z., P.G.P.E., G.S., and D.Z. interpreted data. F.L., S.V.D., G.L., T.S., P.A., and S.V.E. provided critical samples. Y.Q. and P.G.P.E. have drafted the work with contributions from S.V.E., P.A., and S.V.D. All authors have read, commented on, and approved the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
All research in this paper is based on pre-existing museum collections that have been collected under appropriate permits over many decades.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Fig. S1.
Synteny of aligned P. strophiata genome with zebra finch genome and these two genomes showed high collinearity. Fig. S2. Polytomy test for the MP-EST and ASTRAL species trees as the guide trees. Fig. S3. Tree topology weights vary with recombination rate (estimated from PyRho). Fig. S4. Interplay between topology and variation in introgression rate. Fig. S5. Hi-C heatmap reconstructed for Prunella strophiata genome. Table S1. Statistics of the assembly of Prunella strophiata genome. Table S2. Completeness of the genome assembly of Prunella strophiata evaluated by BUSCO. Table S3. Chromosome synteny of aligned Red-breasted accentor genome with zebra finch genome. Table S4. List of the species were used for phylogenetic analyses. Table S5. Resequencing information and genome wide coverage of 36 individuals used in this study. Table S6. Gene concordance factor (gCF) for the nodes (1–7, Fig. 4a-b) that support the species tree (gCF), the two most common alternative topologies (gDF1 and gDF2), and the relative frequency of all other topologies (gDFp).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Jiang, Z., Zang, W., Ericson, P.G.P. et al. Gene flow and an anomaly zone complicate phylogenomic inference in a rapidly radiated avian family (Prunellidae). BMC Biol 22, 49 (2024). https://doi.org/10.1186/s12915-024-01848-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12915-024-01848-7