Robust Identification of Orthologues and Paralogues for Microbial Pan-Genomics Using GET_HOMOLOGUES: A Case Study of pIncA/C Plasmids

  • Protocol
Bacterial Pangenomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1231))

Abstract

GET_HOMOLOGUES is an open-source software package written in Perl and R to define robust core- and pan-genomes by computing consensus clusters of orthologous gene families from whole-genome sequences using the bidirectional best-hit, COGtriangles, and OrthoMCL clustering algorithms. The granularity of the clusters can be fine-tuned by a user-configurable filtering strategy based on a combination of blastp pairwise alignment parameters, hmmscan-based scanning of Pfam domain composition of the proteins in each cluster, and a partial synteny criterion. We present detailed protocols to fit exponential and binomial mixture models to estimate core- and pan-genome sizes, compute pan-genome trees from the pan-genome matrix using a parsimony criterion, analyze and graphically represent the pan-genome structure, and identify lineage-specific gene families for the 12 complete pIncA/C plasmids currently available in NCBI’s RefSeq. The software package, license, and detailed user manual can be downloaded for free for academic use from two mirrors: http://www.eead.csic.es/compbio/soft/gethoms.php and http://maya.ccg.unam.mx/soft/gethoms.php.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
EUR 44.95
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 80.24
Price includes VAT (France)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 100.21
Price includes VAT (France)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
EUR 105.49
Price includes VAT (France)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Pagani I, Liolios K, Jansson J et al (2012) The Genomes OnLine Database (GOLD) v. 4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 40:D571–D579

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Welch RA, Burland V, Plunkett G 3rd et al (2002) Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci U S A 99:17020–17024

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Tettelin H, Masignani V, Cieslewicz MJ et al (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci U S A 102:13950–13955

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Mira A, Martin-Cuadrado AB, D'Auria G et al (2010) The bacterial pan-genome: a new paradigm in microbiology. Int Microbiol 13:45–57

    CAS  PubMed  Google Scholar 

  5. Contreras-Moreira B, Vinuesa P (2013) GET_HOMOLOGUES, a versatile software package for scalable and robust microbial pangenome analysis. Appl Environ Microbiol 79:7696–7701

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Tatusova T, Ciufo S, Fedorov B et al (2014) RefSeq microbial genomes database: new representation and annotation strategy. Nucleic Acids Res 42:D553–D559

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421

    Article  PubMed  PubMed Central  Google Scholar 

  8. Eddy SR (2009) A new generation of homology search tools based on probabilistic inference. Genome Inform 23:205–211

    PubMed  Google Scholar 

  9. Kristensen DM, Kannan L, Coleman MK et al (2010) A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches. Bioinformatics 26:1481–1487

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Li L, Stoeckert CJ Jr, Roos DS (2003) OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178–2189

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Altenhoff AM, Dessimoz C (2012) Inferring orthology and paralogy. Methods Mol Biol 855:259–279

    Article  CAS  PubMed  Google Scholar 

  12. Kristensen DM, Wolf YI, Mushegian AR et al (2011) Computational methods for gene orthology inference. Brief Bioinform 12:379–391

    Article  PubMed  PubMed Central  Google Scholar 

  13. Wolf YI, Koonin EV (2012) A tight link between orthologs and bidirectional best hits in bacterial and archaeal genomes. Genome Biol Evol 4:1286–1294

    Article  PubMed  PubMed Central  Google Scholar 

  14. Snipen L, Almoy T, Ussery DW (2009) Microbial comparative pan-genomics using binomial mixture models. BMC Genomics 10:385

    Article  PubMed  PubMed Central  Google Scholar 

  15. Tettelin H, Riley D, Cattuto C et al (2008) Comparative genomics: the bacterial pan-genome. Curr Opin Microbiol 11:472–477

    Article  CAS  PubMed  Google Scholar 

  16. Carattoli A, Villa L, Poirel L et al (2012) Evolution of IncA/C blaCMY-(2)-carrying plasmids by acquisition of the blaNDM-(1) carbapenemase gene. Antimicrob Agents Chemother 56:783–786

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Fricke WF, Welch TJ, McDermott PF et al (2009) Comparative genomics of the IncA/C multidrug resistance plasmid family. J Bacteriol 191:4750–4757

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Johnson TJ, Lang KS (2012) IncA/C plasmids: an emerging threat to human and animal health? Mob Genet Elements 2:55–58

    Article  PubMed  PubMed Central  Google Scholar 

  19. Sekizuka T, Matsui M, Yamane K et al (2011) Complete sequencing of the bla(NDM-1)-positive IncA/C plasmid from Escherichia coli ST38 isolate suggests a possible origin from plant pathogens. PLoS One 6:e25334

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Poirel L, Hombrouck-Alet C, Freneaux C et al (2010) Global spread of New Delhi metallo-beta-lactamase 1. Lancet Infect Dis 10:832

    Article  PubMed  Google Scholar 

  21. Nordmann P, Poirel L, Walsh TR et al (2011) The emerging NDM carbapenemases. Trends Microbiol 19:588–595

    Article  CAS  PubMed  Google Scholar 

  22. Poirel L, Bonnin RA, Nordmann P (2011) Analysis of the resistome of a multidrug-resistant NDM-1-producing Escherichia coli strain by high-throughput genome sequencing. Antimicrob Agents Chemother 55:4224–4229

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Moellering RC Jr (2010) NDM-1 – a cause for worldwide concern. N Engl J Med 363:2377–2379

    Article  CAS  PubMed  Google Scholar 

  24. Finn RD, Tate J, Mistry J et al (2008) The Pfam protein families database. Nucleic Acids Res 36:D281–D288

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Sonnhammer EL, Koonin EV (2002) Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet 18:619–620

    Article  CAS  PubMed  Google Scholar 

  26. Forslund K, Pekkari I, Sonnhammer EL (2011) Domain architecture conservation in orthologs. BMC Bioinformatics 12:326

    Article  PubMed  PubMed Central  Google Scholar 

  27. Vinuesa P, Contreras-Moreira B (2014) Pangenomic analysis of the Rhizobiales using the GET_HOMOLOGUES software package. In: De Bruijn FJ (ed) Biological nitrogen fixation 7. Wiley/Blackwell, Hoboken, NJ

    Google Scholar 

  28. Willenbrock H, Hallin PF, Wassenaar TM et al (2007) Characterization of probiotic Escherichia coli isolates with a novel pan-genome microarray. Genome Biol 8:R267

    Article  PubMed  PubMed Central  Google Scholar 

  29. R Development Core Team (2012) R: a language and environment for statistical computing. http://www.R-project.org. Vienna, Austria

  30. Felsenstein J (2004) PHYLIP (phylogeny inference package). In: Distributed by the author. Department of Genetics, University of Washington, Seattle

    Google Scholar 

  31. Kaas RS, Friis C, Ussery DW et al (2012) Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes. BMC Genomics 13:577

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Koonin EV, Wolf YI (2008) Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res 36:6688–6719

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Contreras-Moreira B, Sachman-Ruiz B, Figueroa-Palacios I et al (2009) primers4clades: a web server that uses phylogenetic trees to design lineage-specific PCR primers for metagenomic and diversity studies. Nucleic Acids Res 37:W95–W100

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Sachman-Ruiz B, Contreras-Moreira B, Zozaya E et al (2011) Primers4clades, a web server to design lineage-specific PCR primers for gene-targeted metagenomics. In: de Bruijn FJ (ed) Handbook of molecular microbial ecology I: metagenomics and complementary approaches. Wiley/Blackwell, Hoboken, NJ, pp 441–452

    Chapter  Google Scholar 

  35. Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637

    Article  CAS  PubMed  Google Scholar 

  36. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Guindon S, Dufayard JF, Lefort V et al (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321

    Article  CAS  PubMed  Google Scholar 

  38. Rambaut A (2009) FigTree v1.4.0. Available from http://tree.bio.ed.ac.uk/software/figtree/

Download references

Acknowledgements

We thank Romualdo Zayas, Víctor del Moral, and Alfredo J. Hernández at CCG-UNAM for technical support. We also thank David M. Kristensen and the development team of OrthoMCL for permission to use their code in our project. Funding for this work was provided by the Fundación ARAID, Consejo Superior de Investigaciones Científicas (grant 200720I038), DGAPA-PAPIIT UNAM-México (grant IN211814), and CONACyT-México (grant 179133).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pablo Vinuesa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer Science+Business Media New York

About this protocol

Cite this protocol

Vinuesa, P., Contreras-Moreira, B. (2015). Robust Identification of Orthologues and Paralogues for Microbial Pan-Genomics Using GET_HOMOLOGUES: A Case Study of pIncA/C Plasmids. In: Mengoni, A., Galardini, M., Fondi, M. (eds) Bacterial Pangenomics. Methods in Molecular Biology, vol 1231. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-1720-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-1720-4_14

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-1719-8

  • Online ISBN: 978-1-4939-1720-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Navigation