Abstract
The field of viral genomic studies has experienced an unprecedented increase in data volume. New strains of known viruses are constantly being added to the GenBank database and so are completely new species with little or no resemblance to our databases of sequences. In addition to this, metagenomic techniques have the potential to further increase the number and rate of sequenced genomes. Besides, it is important to consider that viruses have a set of unique features that often break down molecular biology dogmas, e.g., the flux of information from RNA to DNA in retroviruses and the use of RNA molecules as genomes. As a result, extracting meaningful information from viral genomes remains a challenge and standard methods for comparing the unknown and our databases of characterized sequences may need adaptations. Thus, several bioinformatic approaches and tools have been created to address the challenge of analyzing viral data. This chapter offers descriptions and protocols of some of the most important bioinformatic techniques for comparative analysis of viruses. The authors also provide comments and discussion on how viruses’ unique features can affect standard analyses and how to overcome some of the major sources of problems. Protocols and topics emphasize online tools (which are more accessible to users) and give the real experience of what most bioinformaticians do in day-by-day work with command-line pipelines. The topics discussed include (1) clustering related genomes, (2) whole genome multiple sequence alignments for small RNA viruses, (3) protein alignment for marker genes and species affiliation, (4) variant calling and annotation, and (5) virome analyses and pathogen identification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ureta-Vidal A, Ettwiller L, Birney E (2003) Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nat Rev Genet 4:251–262
Edwards R, Rohwer F (2005) Viral metagenomics. Nat Rev Microbiol 3:801–805
Rosario K, Breitbart M (2011) Exploring the viral world through metagenomics. Curr Opin Virol 1:289–297
Domingo E, Escarmis C, Sevilla N et al (1996) Basic concepts in RNA virus evolution. FASEB J 10:859–864
Qin L, Upton C, Hazes B et al (2011) Genomic analysis of the vaccinia virus strain variants found in Dryvax vaccine. J Virol 85:13049–13060
Kristensen DM, Waller AS, Yamada T et al (2013) Orthologous gene clusters and taxon signature genes for viruses of prokaryotes. J Bacteriol 195:941–950
Sharma D, Priyadarshini P, Vrati S (2015) Unraveling the web of viroinformatics: computational tools and databases in virus research. J Virol 89:1489–1501
BĂ©rard S, Chateau A, Pompidor N et al (2016) Aligning the unalignable: bacteriophage whole genome alignments. BMC Bioinformatics 17:30
Pickett BE, Greer DS, Zhang Y et al (2012) Virus pathogen database and analysis resource (ViPR): a comprehensive bioinformatics database and analysis resource for the coronavirus research community. Viruses 4:3209–3226
Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Marchler-Bauer A, Zheng C, Chitsaz F et al (2013) CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res 41:D348–D352
Brister JR, Ako-Adjei D, Bao Y et al (2015) NCBI viral genomes resource. Nucleic Acids Res 43:D571–D577
Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780
Hatfull GF, Jacobs-Sera D, Lawrence JG et al (2010) Comparative genomic analysis of 60 mycobacteriophage genomes: genome clustering, gene acquisition, and gene size. J Mol Biol 397:119–143
Goris J, Konstantinidis KT, Klappenbach JA et al (2007) DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol 57:81–91
de Mello Malta F, Amgarten D, de Seixas Santos Nastri AC et al (2020) Sabiá virus–like mammarenavirus in patient with fatal hemorrhagic fever, Brazil, 2020. Emerg Infect Dis 26:1332
Hatfull GF (2008) Bacteriophage genomics. Curr Opin Microbiol 11:447–453
Laenen L, Vergote V, Calisher CH et al (2019) Hantaviridae: current classification and future perspectives. Viruses 11:788
Sela I, Ashkenazy H, Katoh K et al (2015) GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res 43:W7
Hasegawa M, Fujiwara M (1993) Relative efficiencies of the maximum likelihood, maximum parsimony, and neighbor-joining methods for estimating protein phylogeny. Mol Phylogenet Evol 2(1):1–5
Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313
Garrison E, Marth G (2012) Haplotype-based variant detection from short-read sequencing. ar**v, ar**v: 1207.3907
Bankevich A, Nurk S, Antipov D et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477
Lu J, Rincon N, Wood DE et al (2022) Metagenome analysis using the Kraken software suite. Nat Protoc 17:2815–2839
Marcelino VR, Clausen PTLC, Buchmann JP et al (2020) CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data. Genome Biol 21:1–15
Buchfink B, Reuter K, Drost HG (2021) Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods 18:366–368
Meyer F, Paarmann D, D’Souza M et al (2008) The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:1–8
Chen IMA, Chu K, Palaniappan K et al (2023) The IMG/M data management and analysis system v.7: content updates and new features. Nucleic Acids Res 51:D723–D732
Duffy S, Shackelton LA, Holmes EC (2008) Rates of evolutionary change in viruses: patterns and determinants. Nat Rev Genet 9:267–276
Chenna R, Sugawara H, Koike T et al (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 31:3497–3500
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797
Di Tommaso P, Moretti S, Xenarios I et al (2011) T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension. Nucleic Acids Res 39:13–17
Notredame C (2007) Recent evolutions of multiple sequence alignment algorithms. PLoS Comput Biol 3(8):e123
Darling AE, Mau B, Perna NT (2010) Progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5:e11147
Da Silva M, Upton C (2012) Bioinformatics for analysis of poxvirus genomes. Methods Mol Biol 890:233–258
Tamura K, Stecher G, Peterson D et al (2013) MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30:2725–2729
Yutin N, Wolf YI, Raoult D et al (2009) Eukaryotic large nucleo-cytoplasmic DNA viruses: clusters of orthologous genes and reconstruction of viral genome evolution. Virol J 6:223
Huerta-Cepas J, Szklarczyk D, Forslund K et al (2016) eggNOG 45: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res 44:D286–D293
Ehlers A, Osborne J, Slack S et al (2002) Poxvirus orthologous clusters (POCs). Bioinformatics 18:1544–1545
Chevreux B (2005) MIRA: an automated genome and EST assembler. 1–161
Peng Y, Leung HCM, Yiu SM et al (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428
Ye SH, Siddle KJ, Park DJ et al (2019) Benchmarking metagenomics tools for taxonomic classification. Cell 178:779–794
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Dorlass, E.G., Amgarten, D.E. (2024). Bioinformatic Approaches for Comparative Analysis of Viruses. In: Setubal, J.C., Stadler, P.F., Stoye, J. (eds) Comparative Genomics. Methods in Molecular Biology, vol 2802. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3838-5_13
Download citation
DOI: https://doi.org/10.1007/978-1-0716-3838-5_13
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-3837-8
Online ISBN: 978-1-0716-3838-5
eBook Packages: Springer Protocols