Bioinformatic Approaches for Comparative Analysis of Viruses

  • Protocol
  • First Online:
Comparative Genomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2802))

Abstract

The field of viral genomic studies has experienced an unprecedented increase in data volume. New strains of known viruses are constantly being added to the GenBank database and so are completely new species with little or no resemblance to our databases of sequences. In addition to this, metagenomic techniques have the potential to further increase the number and rate of sequenced genomes. Besides, it is important to consider that viruses have a set of unique features that often break down molecular biology dogmas, e.g., the flux of information from RNA to DNA in retroviruses and the use of RNA molecules as genomes. As a result, extracting meaningful information from viral genomes remains a challenge and standard methods for comparing the unknown and our databases of characterized sequences may need adaptations. Thus, several bioinformatic approaches and tools have been created to address the challenge of analyzing viral data. This chapter offers descriptions and protocols of some of the most important bioinformatic techniques for comparative analysis of viruses. The authors also provide comments and discussion on how viruses’ unique features can affect standard analyses and how to overcome some of the major sources of problems. Protocols and topics emphasize online tools (which are more accessible to users) and give the real experience of what most bioinformaticians do in day-by-day work with command-line pipelines. The topics discussed include (1) clustering related genomes, (2) whole genome multiple sequence alignments for small RNA viruses, (3) protein alignment for marker genes and species affiliation, (4) variant calling and annotation, and (5) virome analyses and pathogen identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now
Protocol
EUR 44.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 169.99
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
EUR 213.99
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ureta-Vidal A, Ettwiller L, Birney E (2003) Comparative genomics: genome-wide analysis in metazoan eukaryotes. Nat Rev Genet 4:251–262

    Article  CAS  PubMed  Google Scholar 

  2. Edwards R, Rohwer F (2005) Viral metagenomics. Nat Rev Microbiol 3:801–805

    Article  Google Scholar 

  3. Rosario K, Breitbart M (2011) Exploring the viral world through metagenomics. Curr Opin Virol 1:289–297

    Article  CAS  PubMed  Google Scholar 

  4. Domingo E, Escarmis C, Sevilla N et al (1996) Basic concepts in RNA virus evolution. FASEB J 10:859–864

    Article  CAS  PubMed  Google Scholar 

  5. Qin L, Upton C, Hazes B et al (2011) Genomic analysis of the vaccinia virus strain variants found in Dryvax vaccine. J Virol 85:13049–13060

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Kristensen DM, Waller AS, Yamada T et al (2013) Orthologous gene clusters and taxon signature genes for viruses of prokaryotes. J Bacteriol 195:941–950

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Sharma D, Priyadarshini P, Vrati S (2015) Unraveling the web of viroinformatics: computational tools and databases in virus research. J Virol 89:1489–1501

    Article  PubMed  Google Scholar 

  8. BĂ©rard S, Chateau A, Pompidor N et al (2016) Aligning the unalignable: bacteriophage whole genome alignments. BMC Bioinformatics 17:30

    Article  PubMed  PubMed Central  Google Scholar 

  9. Pickett BE, Greer DS, Zhang Y et al (2012) Virus pathogen database and analysis resource (ViPR): a comprehensive bioinformatics database and analysis resource for the coronavirus research community. Viruses 4:3209–3226

    Article  PubMed  PubMed Central  Google Scholar 

  10. Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    Article  CAS  PubMed  Google Scholar 

  11. Marchler-Bauer A, Zheng C, Chitsaz F et al (2013) CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res 41:D348–D352

    Article  CAS  PubMed  Google Scholar 

  12. Brister JR, Ako-Adjei D, Bao Y et al (2015) NCBI viral genomes resource. Nucleic Acids Res 43:D571–D577

    Article  CAS  PubMed  Google Scholar 

  13. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Hatfull GF, Jacobs-Sera D, Lawrence JG et al (2010) Comparative genomic analysis of 60 mycobacteriophage genomes: genome clustering, gene acquisition, and gene size. J Mol Biol 397:119–143

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Goris J, Konstantinidis KT, Klappenbach JA et al (2007) DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol 57:81–91

    Article  CAS  PubMed  Google Scholar 

  16. de Mello Malta F, Amgarten D, de Seixas Santos Nastri AC et al (2020) Sabiá virus–like mammarenavirus in patient with fatal hemorrhagic fever, Brazil, 2020. Emerg Infect Dis 26:1332

    Article  PubMed  PubMed Central  Google Scholar 

  17. Hatfull GF (2008) Bacteriophage genomics. Curr Opin Microbiol 11:447–453

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Laenen L, Vergote V, Calisher CH et al (2019) Hantaviridae: current classification and future perspectives. Viruses 11:788

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Sela I, Ashkenazy H, Katoh K et al (2015) GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res 43:W7

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Hasegawa M, Fujiwara M (1993) Relative efficiencies of the maximum likelihood, maximum parsimony, and neighbor-joining methods for estimating protein phylogeny. Mol Phylogenet Evol 2(1):1–5

    Article  PubMed  Google Scholar 

  21. Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Garrison E, Marth G (2012) Haplotype-based variant detection from short-read sequencing. ar**v, ar**v: 1207.3907

    Google Scholar 

  23. Bankevich A, Nurk S, Antipov D et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19:455–477

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Lu J, Rincon N, Wood DE et al (2022) Metagenome analysis using the Kraken software suite. Nat Protoc 17:2815–2839

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Marcelino VR, Clausen PTLC, Buchmann JP et al (2020) CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data. Genome Biol 21:1–15

    Article  Google Scholar 

  26. Buchfink B, Reuter K, Drost HG (2021) Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods 18:366–368

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Meyer F, Paarmann D, D’Souza M et al (2008) The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:1–8

    Article  Google Scholar 

  28. Chen IMA, Chu K, Palaniappan K et al (2023) The IMG/M data management and analysis system v.7: content updates and new features. Nucleic Acids Res 51:D723–D732

    Article  CAS  PubMed  Google Scholar 

  29. Duffy S, Shackelton LA, Holmes EC (2008) Rates of evolutionary change in viruses: patterns and determinants. Nat Rev Genet 9:267–276

    Article  CAS  PubMed  Google Scholar 

  30. Chenna R, Sugawara H, Koike T et al (2003) Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res 31:3497–3500

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Di Tommaso P, Moretti S, Xenarios I et al (2011) T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension. Nucleic Acids Res 39:13–17

    Article  Google Scholar 

  33. Notredame C (2007) Recent evolutions of multiple sequence alignment algorithms. PLoS Comput Biol 3(8):e123

    Article  PubMed  PubMed Central  Google Scholar 

  34. Darling AE, Mau B, Perna NT (2010) Progressivemauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS One 5:e11147

    Article  PubMed  PubMed Central  Google Scholar 

  35. Da Silva M, Upton C (2012) Bioinformatics for analysis of poxvirus genomes. Methods Mol Biol 890:233–258

    Article  PubMed  Google Scholar 

  36. Tamura K, Stecher G, Peterson D et al (2013) MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30:2725–2729

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Yutin N, Wolf YI, Raoult D et al (2009) Eukaryotic large nucleo-cytoplasmic DNA viruses: clusters of orthologous genes and reconstruction of viral genome evolution. Virol J 6:223

    Article  PubMed  PubMed Central  Google Scholar 

  38. Huerta-Cepas J, Szklarczyk D, Forslund K et al (2016) eggNOG 45: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res 44:D286–D293

    Article  CAS  PubMed  Google Scholar 

  39. Ehlers A, Osborne J, Slack S et al (2002) Poxvirus orthologous clusters (POCs). Bioinformatics 18:1544–1545

    Article  CAS  PubMed  Google Scholar 

  40. Chevreux B (2005) MIRA: an automated genome and EST assembler. 1–161

    Google Scholar 

  41. Peng Y, Leung HCM, Yiu SM et al (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428

    Article  CAS  PubMed  Google Scholar 

  42. Ye SH, Siddle KJ, Park DJ et al (2019) Benchmarking metagenomics tools for taxonomic classification. Cell 178:779–794

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Dorlass, E.G., Amgarten, D.E. (2024). Bioinformatic Approaches for Comparative Analysis of Viruses. In: Setubal, J.C., Stadler, P.F., Stoye, J. (eds) Comparative Genomics. Methods in Molecular Biology, vol 2802. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3838-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-3838-5_13

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-3837-8

  • Online ISBN: 978-1-0716-3838-5

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Navigation