Abstract
The field of viral genomic studies has experienced an unprecedented increase in data volume. New strains of known viruses are constantly being added to the GenBank database and so are completely new species with little or no resemblance to our databases of sequences. In addition to this, metagenomic techniques have the potential to further increase the number and rate of sequenced genomes. Besides, it is important to consider that viruses have a set of unique features that often break down molecular biology dogmas, e.g., the flux of information from RNA to DNA in retroviruses and the use of RNA molecules as genomes. As a result, extracting meaningful information from viral genomes remains a challenge and standard methods for comparing the unknown and our databases of characterized sequences may need to be modified. Thus, several bioinformatic approaches and tools have been created to address the challenge of analyzing viral data. In this chapter, we offer descriptions and protocols of some of the most important bioinformatic techniques for comparative analysis of viruses. We also provide comments and discussion on how viruses’ unique features can affect standard analyses and how to overcome some of the major sources of problems. Topics include: (1) Clustering of related genomes, (2) Whole genome multiple sequence alignments for small RNA viruses, (3) Protein alignments for marker genes, (4) Analyses based on ortholog groups, and (5) Taxonomic identification and comparisons of viruses from environmental datasets.
Similar content being viewed by others
References
Ureta-Vidal A, Ettwiller L, Birney E (2003) Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nat Rev Genet 4:251–262
Edwards R, Rohwer F (2005) Viral metagenomics. Nat Rev Microbiol 3:801–805
Rosario K, Breitbart M (2011) Exploring the viral world through metagenomics. Curr Opin Virol 1:289–297
Domingo E, Escarmis C, Sevilla N et al (1996) Basic concepts in RNA virus evolution. FASEB J 10:859–864
Qin L, Upton C, Hazes B et al (2011) Genomic analysis of the vaccinia virus strain variants found in dryvax vaccine. J Virol 85:13049–13060
Kristensen DM, Waller AS, Yamada T et al (2013) Orthologous gene clusters and taxon signature genes for viruses of prokaryotes. J Bacteriol 195:941–950
Sharma D, Priyadarshini P, Vrati S (2015) Unraveling the web of viroinformatics: computational tools and databases in virus research. J Virol 89:1489–1501
Bérard S, Chateau A, Pompidor N et al (2016) Aligning the unalignable: bacteriophage whole genome alignments. BMC Bioinformatics 17:30
Pickett BE, Greer DS, Zhang Y et al (2012) Virus pathogen database and analysis resource (ViPR): a comprehensive bioinformatics database and analysis resource for the coronavirus research community. Viruses 4:3209–3226
Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Marchler-Bauer A, Zheng C, Chitsaz F et al (2013) CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res 41:D348–D352
Brister JR, Ako-adjei D, Bao Y et al (2014) NCBI viral genomes resource. Nucleic Acids Res 43(Database issue):D571–D577
Roux S, Tournayre J, Mahul A et al (2014) Metavir 2: new tools for viral metagenome comparison and assembled virome analysis. BMC Bioinformatics 15:76
Ehlers A, Osborne J, Slack S et al (2002) Poxvirus orthologous clusters (POCs). Bioinformatics (Oxford, England) 18:1544–1545
Sievers F, Wilm A, Dineen D et al (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Mol Syst Biol 7:539
Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797
Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780
Hillary W, Lin S-H, Upton C (2011) Base-by-base version 2: single nucleotide-level analysis of whole viral genome alignments. Microb Inform Exp 1:2
Hatfull GF, Jacobs-Sera D, Lawrence JG et al (2010) Comparative genomic analysis of 60 mycobacteriophage genomes: genome clustering, gene acquisition, and gene size. J Mol Biol 397:119–143
Goris J, Konstantinidis KT, Klappenbach JA et al (2007) DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol 57:81–91
Hatfull GF (2008) Bacteriophage genomics. Curr Opin Microbiol 11:447–453
Bateman A, Martin MJ, O’Donovan C et al (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:D204–D212
Hasegawa M, Fujiwara M (1993) Relative efficiencies of the maximum likelihood, maximum parsimony, and neighbor-joining methods for estimating protein phylogeny. Mol Phylogenet Evol 2(1):1–5
B. Chevreux (2005) MIRA: an automated genome and EST assembler, Duisburg, Heidelberg. pp 1–161
Martins LF, Antunes LP, Pascon RC et al (2013) Metagenomic analysis of a tropical composting operation at the São Paulo zoo park reveals diversity of biomass degradation functions and organisms. PLoS One 8:e61928
Tatusova T, Ciufo S, Fedorov B et al (2014) RefSeq microbial genomes database: new representation and annotation strategy. Nucleic Acids Res 42:553–559
Angly FE, Willner D, Prieto-Davó A et al (2009) The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Comput Biol 5:e1000593
Ondov BD, Bergman NH, Phillippy AM (2011) Interactive metagenomic visualization in a web browser. BMC Bioinformatics 12:385
Duffy S, Shackelton LA, Holmes EC (2008) Rates of evolutionary change in viruses: patterns and determinants. Nat Rev Genet 9:267–276
Chenna R, Sugawara H, Koike T et al (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 31:3497–3500
Di Tommaso P, Moretti S, Xenarios I et al (2011) T-coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension. Nucleic Acids Res 39:13–17
Notredame C (2007) Recent evolutions of multiple sequence alignment algorithms. PLoS Comput Biol 3(8):e123
Darling AE, Mau B, Perna NT (2010) Progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147
Da Silva M, Upton C (2012) Bioinformatics for analysis of poxvirus genomes. Methods Mol Biol 890:233–258
Yutin N, Wolf YI, Raoult D et al (2009) Eukaryotic large nucleo-cytoplasmic DNA viruses: clusters of orthologous genes and reconstruction of viral genome evolution. Virol J 6:223
Huerta-Cepas J, Szklarczyk D, Forslund K et al (2016) eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res 44:D286–D293
Tamura K, Stecher G, Peterson D et al (2013) MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30:2725–2729
Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313
Bankevich A, Nurk S, Antipov D et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477
Peng Y, Leung HCM, Yiu SM et al (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428
Acknowledgments
This work has been supported by grants #2014/16450-8 and #2015/14334-3, São Paulo Research Foundation (FAPESP) to D.A. and an NSERC Discovery grant to C.U.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media LLC
About this protocol
Cite this protocol
Amgarten, D., Upton, C. (2018). Bioinformatic Approaches for Comparative Analysis of Viruses. In: Setubal, J., Stoye, J., Stadler, P. (eds) Comparative Genomics. Methods in Molecular Biology, vol 1704. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7463-4_15
Download citation
DOI: https://doi.org/10.1007/978-1-4939-7463-4_15
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-7461-0
Online ISBN: 978-1-4939-7463-4
eBook Packages: Springer Protocols