De Novo Assembly Algorithms

  • Chapter
  • First Online:
Next Generation Sequencing and Sequence Assembly

Part of the book series: SpringerBriefs in Systems Biology ((BRIEFSBIOSYS,volume 4))

Abstract

In an ideal case, an assembly algorithm should merge overlapped reads to one long continuous sequence, called contig, which is a chromosome in the primitive genome. But due to sequencing errors and the existence of unsequenced parts, contigs gained from the assembly algorithm are not complete enough to form chromosomes. Even with high coverage, there is still a non-zero probability for the existence of unsequenced parts and sequencing errors. The ability of the assembler to form contigs is also affected by repeated regions in the genome. As shown in Fig. 3.3 in the previous chapter, two parts of different repeat areas are mapped to one in the assembler because of the weakness of repeat detection in the assembler. Figure 4.1 shows how a typical assembly algorithm works in overlap detection and contig generation phases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Myers, E. W., et al. (2000). A whole-genome assembly of Drosophila. Science, 287(5461), 2196–2204.

    Article  PubMed  CAS  Google Scholar 

  2. Li, R., et al. (2010). De novo assembly of human genomes with massively parallel short read sequencing. Genome Research, 20(2), 265–272.

    Article  PubMed  CAS  Google Scholar 

  3. Simpson, J. T., et al. (2009). ABySS: A parallel assembler for short read sequence data. Genome Research, 19(6), 1117–1123.

    Article  PubMed  CAS  Google Scholar 

  4. Almeida, N. F., et al. (2009). A draft genome sequence of Pseudomonas syringae pv. tomato T1 reveals a type III effector repertoire significantly divergent from that of Pseudomonas syringae pv. tomato DC3000. Molecular Plant-Microbe Interactions, 22(1), 52–62.

    Article  PubMed  CAS  Google Scholar 

  5. Green, S., et al. (2010). Comparative genome analysis provides insights into the evolution and adaptation of Pseudomonas syringae pv. aesculi on Aesculus hippocastanum. PLoS One, 5(4), e10224.

    Article  PubMed  Google Scholar 

  6. Rees, D., Husselmann, L., & Celton. J. (2009). De novo genome sequencing of the apple scab (Venturia inaequalis) genome, using Illumina sequencing technology. in PAG-XVII Plant and Animal Genomes XVII Conference. Available online at: http://www.intl-pag.org/17/abstracts/P01_PAGXVII_013.html.

  7. Bondy, J., & Murty, U. (2008). Graph Theory (Graduate Texts in Mathematics vol 244). New York: Springer.

    Google Scholar 

  8. Smith, T., & Waterman, M. (1981). ªIdentification of common molecular subsequences º. J. Molecular Biology, 147, 195–197.

    Article  CAS  Google Scholar 

  9. Scheibye-Alsing, K., et al. (2009). Sequence assembly. Computational Biology and Chemistry, 33(2), 121–136.

    Article  PubMed  CAS  Google Scholar 

  10. Pevzner, P. A. (1989). 1-Tuple DNA sequencing: computer analysis. Journal of Biomolecular Structure & Dynamics, 7(1), 63–73.

    CAS  Google Scholar 

  11. Tsur, D. (2010). Sequencing by hybridization in few rounds. Journal of Computer and System Sciences, 76(8), 751–758.

    Article  Google Scholar 

  12. Dramanac, R., et al. (1989). Sequencing of megabase plus DNA by hybridization: theory of the method. Genomics, 4(2), 114–128.

    Article  Google Scholar 

  13. Lysov Iu, P., et al. (1988). Determination of the nucleotide sequence of DNA using hybridization with oligonucleotides. A new method. Doklady Akademii Nauk, 303(6), 1508–1511.

    CAS  Google Scholar 

  14. Medvedev, P., et al., Computability of models for sequence assembly. Algorithms in Bioinformatics, 2007: pp. 289–301.

    Google Scholar 

  15. Pevzner, P. A., Tang, H., & Waterman, M. S. (2001). An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences, 98(17), 9748–9753.

    Article  CAS  Google Scholar 

  16. Narzisi, G., & Mishra, B. (2011). Comparing de novo genome assembly: The long and short of it. PLoS One, 6(4), e19175.

    Article  PubMed  CAS  Google Scholar 

  17. Schwartz, D. C., & Waterman, M. S. (2010). New generations: Sequencing machines and their computational challenges. Journal of Computer Science and Technology, 25(1), 3–9.

    Article  PubMed  Google Scholar 

  18. Ariyaratne, P. N., & Sung, W. K. (2011). PE-Assembler: De novo assembler using short paired-end reads. Bioinformatics, 27(2), 167–174.

    Article  PubMed  CAS  Google Scholar 

  19. Warren, R. L., et al. (2007). Assembling millions of short DNA sequences using SSAKE. Bioinformatics, 23(4), 500–501.

    Article  PubMed  CAS  Google Scholar 

  20. Dohm, J. C., et al. (2007). SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Research, 17(11), 1697–1706.

    Article  PubMed  CAS  Google Scholar 

  21. Jeck, W. R., et al. (2007). Extending assembly of short DNA sequences to handle error. Bioinformatics, 23(21), 2942–2944.

    Article  PubMed  CAS  Google Scholar 

  22. Ewing, B., & Green, P. (1998). Base-calling of automated sequencer traces usingPhred. II. error probabilities. Genome Research, 8(3), 186–194.

    Article  PubMed  CAS  Google Scholar 

  23. Batzoglou, S., et al. (2002). ARACHNE: A whole-genome shotgun assembler. Genome Research, 12(1), 177–189.

    Article  PubMed  CAS  Google Scholar 

  24. Miller, J. R., et al. (2008). Aggressive assembly of pyrosequencing reads with mates. Bioinformatics, 24(24), 2818–2824.

    Article  PubMed  CAS  Google Scholar 

  25. Huang, X., & Madan, A. (1999). CAP3: A DNA sequence assembly program. Genome Research, 9(9), 868–877.

    Article  PubMed  CAS  Google Scholar 

  26. Margulies, M., et al. (2005). Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437(7057), 376–380.

    PubMed  CAS  Google Scholar 

  27. Hernandez, D., et al. (2008). De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Research, 18(5), 802–809.

    Article  PubMed  CAS  Google Scholar 

  28. Hossain, M.S., Azimi, N., Skiena, S. (2009). Crystallizing short-read assemblies around seeds. BMC Bioinformatics 10(Suppl 1), S16.

    Google Scholar 

  29. Miller, J. R., Koren, S., & Sutton, G. (2010). Assembly algorithms for next-generation sequencing data. Genomics, 95(6), 315.

    Article  PubMed  CAS  Google Scholar 

  30. Zerbino, D. R., & Birney, E. (2008). Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18(5), 821–829.

    Article  PubMed  CAS  Google Scholar 

  31. Chaisson, M. J., Brinza, D., & Pevzner, P. A. (2009). De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Research, 19(2), 336–346.

    Article  PubMed  CAS  Google Scholar 

  32. Butler, J., et al. (2008). ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Research, 18(5), 810–820.

    Article  PubMed  CAS  Google Scholar 

  33. Peng, Y., et al. IDBA–a practical iterative de Bruijn graph de novo assembler. in Research in Computational Molecular Biology. 2010. Springer.

    Google Scholar 

  34. MacCallum, I., et al. (2009). ALLPATHS 2: Small genomes assembled accurately and with high continuity from short paired reads. Genome Biology, 10, R103.

    Article  PubMed  Google Scholar 

  35. Chaisson, M. J., & Pevzner, P. A. (2008). Short read fragment assembly of bacterial genomes. Genome Research, 18(2), 324–330.

    Article  PubMed  CAS  Google Scholar 

  36. Narzisi, G., & Mishra, B. (2011). Scoring-and-unfolding trimmed tree assembler: concepts, constructs and comparisons. Bioinformatics, 27(2), 153–160.

    Article  PubMed  CAS  Google Scholar 

  37. Sommer, D. D., et al. (2007). Minimus: A fast, lightweight genome assembler. BMC Bioinformatics, 8(1), 64.

    Article  PubMed  Google Scholar 

  38. Huang, X., et al. (2003). PCAP: A whole-genome assembly program. Genome Research, 13(9), 2164–2170.

    Article  PubMed  CAS  Google Scholar 

  39. Sutton, G. G., et al. (1995). TIGR Assembler: A new tool for assembling large shotgun sequencing projects. Genome Science and Technology, 1(1), 9–19.

    Article  CAS  Google Scholar 

  40. Schmidt, B., et al. (2009). A fast hybrid short read fragment assembly algorithm. Bioinformatics, 25(17), 2279–2280.

    Article  PubMed  CAS  Google Scholar 

  41. Brockman, W., et al. (2008). Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Research, 18(5), 763–770.

    Article  PubMed  CAS  Google Scholar 

  42. Pareek, C. S., Smoczynski, R., & Tretyn, A. (2011). Sequencing technologies and genome sequencing. Journal of Applied Genetics, 52(4), 413–435.

    Article  PubMed  CAS  Google Scholar 

  43. Berglund, E. C., Kiialainen, A., & Syvänen, A. C. (2011). Next-generation sequencing technologies and applications for human genetic history and forensics. Investigative Genetics, 2(1), 1–15.

    Article  Google Scholar 

  44. Shendure, J., & Ji, H. (2008). Next-generation DNA sequencing. Nature Biotechnology, 26(10), 1135–1145.

    Article  PubMed  CAS  Google Scholar 

  45. Kircher, M., & Kelso, J. (2010). High-throughput DNA sequencing–concepts and limitations. BioEssays, 32(6), 524–536.

    Article  PubMed  CAS  Google Scholar 

  46. Novais, R., & Thorstenson, Y. (2011). The evolution of Pyrosequencing® for microbiology: From genes to genomes. Journal of Microbiological Methods, 86(1), 1–7.

    Article  PubMed  CAS  Google Scholar 

  47. Metzker, M. L. (2009). Sequencing technologies—the next generation. Nature Reviews Genetics, 11(1), 31–46.

    Article  PubMed  Google Scholar 

  48. Novák, P., Neumann, P., & Macas, J. (2010). Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics, 11(1), 378.

    Article  PubMed  Google Scholar 

  49. Shendure, J., et al. (2004). Advanced sequencing technologies: Methods and goals. Nature Reviews Genetics, 5(5), 335–344.

    Article  PubMed  CAS  Google Scholar 

  50. Dong, H., & Wang, S. (2012). Exploring the cancer genome in the era of next-generation sequencing. Frontiers of Medicine, 6(1), 48–55.

    Article  PubMed  Google Scholar 

  51. Mardis, E. R. (2008). Next-generation DNA sequencing methods. Annual Review of Genomics and Human Genetics, 9, 387–402.

    Article  PubMed  CAS  Google Scholar 

  52. Wash, S., & Image, C. (2008). DNA sequencing: generation next–next. Nature Methods, 5(3), 267.

    Article  Google Scholar 

  53. Smit, A., R. Hubley, and P. Green, RepeatMasker Open-3.0. 1996–2004. Institute for Systems Biology, 2004.

    Google Scholar 

  54. Liu, L., et al., Comparison of Next-Generation Sequencing Systems. Journal of Biomedicine and Biotechnology, 2012. 2012.

    Google Scholar 

  55. McNally, B., et al. (2010). Optical recognition of converted DNA nucleotides for single-molecule DNA sequencing using nanopore arrays. Nano Letters, 10(6), 2237–2244.

    Article  PubMed  CAS  Google Scholar 

  56. Hui, P., Next generation sequencing: chemistry, technology and applications. [Without Title], 2012: pp. 1–18.

    Google Scholar 

  57. Eid, J., et al. (2009). Real-time DNA sequencing from single polymerase molecules. Science, 323(5910), 133–138.

    Article  PubMed  CAS  Google Scholar 

  58. Clarke, J., et al. (2009). Continuous base identification for single-molecule nanopore DNA sequencing. Nature Nanotechnology, 4(4), 265–270.

    Article  PubMed  CAS  Google Scholar 

  59. Tyagi, S., et al., Molecular beacons: hybridization probes for detection of nucleic acids in homogeneous solutions. Nonradioactive Analysis of Biomolecules, 2nd ed. C. Kessler, ed. Springer-Verlag, Berlin, 2000: pp. 606–616.

    Google Scholar 

  60. Morozova, O., & Marra, M. A. (2008). Applications of next-generation sequencing technologies in functional genomics. Genomics, 92(5), 255–264.

    Article  PubMed  CAS  Google Scholar 

  61. Tammi, M. T., et al. (2003). Correcting errors in shotgun sequences. Nucleic Acids Research, 31(15), 4663–4672.

    Article  PubMed  CAS  Google Scholar 

  62. Paulsen, I. T., et al. (2002). The Brucella suis genome reveals fundamental similarities between animal and plant pathogens and symbionts. Proceedings of the National Academy of Sciences, 99(20), 13148–13153.

    Article  CAS  Google Scholar 

  63. Wu, M., et al. (2004). Phylogenomics of the reproductive parasite Wolbachia pipientis wMel: A streamlined genome overrun by mobile genetic elements. PLoS Biology, 2(3), e69.

    Article  PubMed  Google Scholar 

  64. Gill, S. R., et al. (2005). Insights on evolution of virulence and resistance from the complete genome analysis of an early methicillin-resistant Staphylococcus aureus strain and a biofilm-producing methicillin-resistant Staphylococcus epidermidis strain. Journal of Bacteriology, 187(7), 2426–2438.

    Article  PubMed  CAS  Google Scholar 

  65. Baba, T., et al. (2002). Genome and virulence determinants of high virulence community-acquired MRSA. The Lancet, 359(9320), 1819–1827.

    Article  CAS  Google Scholar 

  66. Ep**er, M., et al. (2006). Who ate whom? Adaptive Helicobacter genomic changes that accompanied a host jump from early humans to large felines. PLoS Genetics, 2(7), e120.

    Article  PubMed  Google Scholar 

  67. Blattner, F. R., et al. (1997). The complete genome sequence of Escherichia coli K-12. Science, 277(5331), 1453–1462.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Masoudi-Nejad .

Rights and permissions

Reprints and permissions

Copyright information

© 2013 The Author(s)

About this chapter

Cite this chapter

Masoudi-Nejad, A., Narimani, Z., Hosseinkhan, N. (2013). De Novo Assembly Algorithms. In: Next Generation Sequencing and Sequence Assembly. SpringerBriefs in Systems Biology, vol 4. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7726-6_4

Download citation

Publish with us

Policies and ethics

Navigation