Abstract
Amino acid substitution model is a key component to study the plant evolution from protein sequences. Although single-matrix amino acid substitution models have been estimated for plants (i.e., Q.plant and NQ.plant), they are not able to describe the rate heterogeneity among sites. A number of multi-matrix mixture models have been proposed to handle the site-rate heterogeneity; however, none are specifically estimated for plants. To enhance the study of plant evolution, we estimated both time-reversible and time non-reversible multi-matrix mixture models QPlant.mix and nQPlant.mix from the plant genomes. Experiments showed that the new mixture models were much better than the existing models for plant alignments. We recommend researchers to use the new mixture models for studying the plant evolution.
Similar content being viewed by others
Data availability
The datasets and script used in this paper are available at https://doi.org/10.6084/m9.figshare.24500212.v4.
References
Dang CC, Minh BQ, McShea H, Masel J, James JE, Vinh LS, Lanfear R (2022) NQMaker: estimating time nonreversible amino acid substitution models. Syst Biol 71:1110–1123. https://doi.org/10.1093/sysbio/syac007
Durbin R, Eddy SR, Krogh A, Mitchison G (2006) Biological sequence analysis: probabilistic models of proteins and nucleic acids. papers2://publication/uuid/28FE17E2–9BF7–4BF3–8079–5302425D060F
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89:10915–10919. https://doi.org/10.1073/pnas.89.22.10915
Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Bioinformatics 8:275–282. https://doi.org/10.1093/bioinformatics/8.3.275
Kosiol C, Goldman N (2005) Different versions of the dayhoff rate matrix. Molec Biol Evol 22:193–199. https://doi.org/10.1093/molbev/msi005
Lartillot N, Philippe H (2004) A bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Molec Biol Evol 21:1095–1109. https://doi.org/10.1093/molbev/msh112
Le SQ, Gascuel O (2008) An improved general amino acid replacement matrix. Molec Biol Evol 25:1307–1320. https://doi.org/10.1093/molbev/msn067
Le SQ, Gascuel O (2010) Accounting for solvent accessibility and secondary structure in protein phylogenetics is clearly beneficial. Syst Biol 59:277–287. https://doi.org/10.1093/sysbio/syq002
Le SQ, Lartillot N, Gascuel O (2008) Phylogenetic mixture models for proteins. Philos Trans Roy Soc B: Biol Sci 363:3965–3976. https://doi.org/10.1098/rstb.2008.0180
Le SQ, Dang CC, Gascuel O (2012) Modeling protein evolution with several amino acid replacement matrices depending on site rates. Molec Biol Evol 29:2921–2936. https://doi.org/10.1093/molbev/mss112
Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, Von Haeseler A, Lanfear R, Teeling E (2020) IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Molec Biol Evol 37:1530–1534. https://doi.org/10.1093/molbev/msaa015
Minh BQ, Dang CC, Vinh LS, Lanfear R (2021) QMaker: Fast and accurate method to estimate empirical models of protein evolution. Syst Biol 70:1046–1060. https://doi.org/10.1093/sysbio/syab010
Müller T, Vingron M (2000) Modeling amino acid replacement. J Comput Biol 7:761–776. https://doi.org/10.1089/10665270050514918
Quang LS, Gascuel O, Lartillot N (2008) Empirical profile mixture models for phylogenetic reconstruction. Bioinformatics 24:2317–2323. https://doi.org/10.1093/bioinformatics/btn445
Ran JH, Shen TT, Wang MM, Wang XQ (2018) Phylogenomics resolves the deep phylogeny of seed plants and indicates partial convergent or homoplastic evolution between gnetales and angiosperms. Proc Roy Soc B: Biol Sci 285:20181012. https://doi.org/10.1098/rspb.2018.1012
Robinson DF, Foulds LR (1981) Comparison of phylogenetic trees. Math Biosci 53:131–147. https://doi.org/10.1016/0025-5564(81)90043-2
Schwarz G (2007) Estimating the dimension of a model. Ann Stat 6:461–464. https://doi.org/10.1214/aos/1176344136
Veerassamy S, Smith A, Tillier ERM (2003) A transition probability model for amino acid substitutions from blocks. J Comput Biol 10:997–1010. https://doi.org/10.1089/106652703322756195
Wang HC, Li K, Susko E, Roger AJ (2008) A class frequency mixture model that adjusts for site-specific amino acid frequencies and improves inference of protein phylogeny. BMC Evolut Biol 8:331. https://doi.org/10.1186/1471-2148-8-331
Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Molec Biol Evol 18:691–699. https://doi.org/10.1093/oxfordjournals.molbev.a003851
Yang Z (1993) Maximum-likelihood estimation of phylogeny from dna sequences when substitution rates differ over sites. Molec Biol Evol 10:1396–1401. https://doi.org/10.1093/oxfordjournals.molbev.a040082
Author information
Authors and Affiliations
Contributions
LSV designed the study. NHT and LSV wrote the manuscript. NHT implemented the scripts and carried out the experiments.
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no conflict of interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Handling Editor: Karol Marhold.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tinh, N.H., Vinh, L.S. Improving the study of plant evolution with multi-matrix mixture models. Plant Syst Evol 310, 14 (2024). https://doi.org/10.1007/s00606-024-01896-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00606-024-01896-0