Bioinformatic Approaches for Comparative Analysis of Viruses

  • Protocol
  • First Online:
Comparative Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1704))

Abstract

The field of viral genomic studies has experienced an unprecedented increase in data volume. New strains of known viruses are constantly being added to the GenBank database and so are completely new species with little or no resemblance to our databases of sequences. In addition to this, metagenomic techniques have the potential to further increase the number and rate of sequenced genomes. Besides, it is important to consider that viruses have a set of unique features that often break down molecular biology dogmas, e.g., the flux of information from RNA to DNA in retroviruses and the use of RNA molecules as genomes. As a result, extracting meaningful information from viral genomes remains a challenge and standard methods for comparing the unknown and our databases of characterized sequences may need to be modified. Thus, several bioinformatic approaches and tools have been created to address the challenge of analyzing viral data. In this chapter, we offer descriptions and protocols of some of the most important bioinformatic techniques for comparative analysis of viruses. We also provide comments and discussion on how viruses’ unique features can affect standard analyses and how to overcome some of the major sources of problems. Topics include: (1) Clustering of related genomes, (2) Whole genome multiple sequence alignments for small RNA viruses, (3) Protein alignments for marker genes, (4) Analyses based on ortholog groups, and (5) Taxonomic identification and comparisons of viruses from environmental datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Similar content being viewed by others

References

  1. Ureta-Vidal A, Ettwiller L, Birney E (2003) Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nat Rev Genet 4:251–262

    Article  CAS  PubMed  Google Scholar 

  2. Edwards R, Rohwer F (2005) Viral metagenomics. Nat Rev Microbiol 3:801–805

    Article  Google Scholar 

  3. Rosario K, Breitbart M (2011) Exploring the viral world through metagenomics. Curr Opin Virol 1:289–297

    Article  CAS  PubMed  Google Scholar 

  4. Domingo E, Escarmis C, Sevilla N et al (1996) Basic concepts in RNA virus evolution. FASEB J 10:859–864

    CAS  PubMed  Google Scholar 

  5. Qin L, Upton C, Hazes B et al (2011) Genomic analysis of the vaccinia virus strain variants found in dryvax vaccine. J Virol 85:13049–13060

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Kristensen DM, Waller AS, Yamada T et al (2013) Orthologous gene clusters and taxon signature genes for viruses of prokaryotes. J Bacteriol 195:941–950

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Sharma D, Priyadarshini P, Vrati S (2015) Unraveling the web of viroinformatics: computational tools and databases in virus research. J Virol 89:1489–1501

    Article  PubMed  Google Scholar 

  8. Bérard S, Chateau A, Pompidor N et al (2016) Aligning the unalignable: bacteriophage whole genome alignments. BMC Bioinformatics 17:30

    Article  PubMed  PubMed Central  Google Scholar 

  9. Pickett BE, Greer DS, Zhang Y et al (2012) Virus pathogen database and analysis resource (ViPR): a comprehensive bioinformatics database and analysis resource for the coronavirus research community. Viruses 4:3209–3226

    Article  PubMed  PubMed Central  Google Scholar 

  10. Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    Article  CAS  PubMed  Google Scholar 

  11. Marchler-Bauer A, Zheng C, Chitsaz F et al (2013) CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res 41:D348–D352

    Article  CAS  PubMed  Google Scholar 

  12. Brister JR, Ako-adjei D, Bao Y et al (2014) NCBI viral genomes resource. Nucleic Acids Res 43(Database issue):D571–D577

    PubMed  PubMed Central  Google Scholar 

  13. Roux S, Tournayre J, Mahul A et al (2014) Metavir 2: new tools for viral metagenome comparison and assembled virome analysis. BMC Bioinformatics 15:76

    Article  PubMed  PubMed Central  Google Scholar 

  14. Ehlers A, Osborne J, Slack S et al (2002) Poxvirus orthologous clusters (POCs). Bioinformatics (Oxford, England) 18:1544–1545

    Article  CAS  Google Scholar 

  15. Sievers F, Wilm A, Dineen D et al (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Mol Syst Biol 7:539

    Article  PubMed  PubMed Central  Google Scholar 

  16. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Hillary W, Lin S-H, Upton C (2011) Base-by-base version 2: single nucleotide-level analysis of whole viral genome alignments. Microb Inform Exp 1:2

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Hatfull GF, Jacobs-Sera D, Lawrence JG et al (2010) Comparative genomic analysis of 60 mycobacteriophage genomes: genome clustering, gene acquisition, and gene size. J Mol Biol 397:119–143

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Goris J, Konstantinidis KT, Klappenbach JA et al (2007) DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol 57:81–91

    Article  CAS  PubMed  Google Scholar 

  21. Hatfull GF (2008) Bacteriophage genomics. Curr Opin Microbiol 11:447–453

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Bateman A, Martin MJ, O’Donovan C et al (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:D204–D212

    Article  Google Scholar 

  23. Hasegawa M, Fujiwara M (1993) Relative efficiencies of the maximum likelihood, maximum parsimony, and neighbor-joining methods for estimating protein phylogeny. Mol Phylogenet Evol 2(1):1–5

    Article  PubMed  Google Scholar 

  24. B. Chevreux (2005) MIRA: an automated genome and EST assembler, Duisburg, Heidelberg. pp 1–161

    Google Scholar 

  25. Martins LF, Antunes LP, Pascon RC et al (2013) Metagenomic analysis of a tropical composting operation at the São Paulo zoo park reveals diversity of biomass degradation functions and organisms. PLoS One 8:e61928

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Tatusova T, Ciufo S, Fedorov B et al (2014) RefSeq microbial genomes database: new representation and annotation strategy. Nucleic Acids Res 42:553–559

    Article  Google Scholar 

  27. Angly FE, Willner D, Prieto-Davó A et al (2009) The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Comput Biol 5:e1000593

    Article  PubMed  PubMed Central  Google Scholar 

  28. Ondov BD, Bergman NH, Phillippy AM (2011) Interactive metagenomic visualization in a web browser. BMC Bioinformatics 12:385

    Article  PubMed  PubMed Central  Google Scholar 

  29. Duffy S, Shackelton LA, Holmes EC (2008) Rates of evolutionary change in viruses: patterns and determinants. Nat Rev Genet 9:267–276

    Article  CAS  PubMed  Google Scholar 

  30. Chenna R, Sugawara H, Koike T et al (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 31:3497–3500

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Di Tommaso P, Moretti S, Xenarios I et al (2011) T-coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension. Nucleic Acids Res 39:13–17

    Article  Google Scholar 

  32. Notredame C (2007) Recent evolutions of multiple sequence alignment algorithms. PLoS Comput Biol 3(8):e123

    Article  PubMed  PubMed Central  Google Scholar 

  33. Darling AE, Mau B, Perna NT (2010) Progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5(6):e11147

    Article  PubMed  PubMed Central  Google Scholar 

  34. Da Silva M, Upton C (2012) Bioinformatics for analysis of poxvirus genomes. Methods Mol Biol 890:233–258

    Article  PubMed  Google Scholar 

  35. Yutin N, Wolf YI, Raoult D et al (2009) Eukaryotic large nucleo-cytoplasmic DNA viruses: clusters of orthologous genes and reconstruction of viral genome evolution. Virol J 6:223

    Article  PubMed  PubMed Central  Google Scholar 

  36. Huerta-Cepas J, Szklarczyk D, Forslund K et al (2016) eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res 44:D286–D293

    Article  CAS  PubMed  Google Scholar 

  37. Tamura K, Stecher G, Peterson D et al (2013) MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30:2725–2729

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Bankevich A, Nurk S, Antipov D et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Peng Y, Leung HCM, Yiu SM et al (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

This work has been supported by grants #2014/16450-8 and #2015/14334-3, São Paulo Research Foundation (FAPESP) to D.A. and an NSERC Discovery grant to C.U.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chris Upton .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media LLC

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Amgarten, D., Upton, C. (2018). Bioinformatic Approaches for Comparative Analysis of Viruses. In: Setubal, J., Stoye, J., Stadler, P. (eds) Comparative Genomics. Methods in Molecular Biology, vol 1704. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7463-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-7463-4_15

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-7461-0

  • Online ISBN: 978-1-4939-7463-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Navigation