De Novo Assembly Algorithms

Masoudi-Nejad, Ali; Narimani, Zahra; Hosseinkhan, Nazanin

doi:10.1007/978-1-4614-7726-6_4

Ali Masoudi-Nejad⁴,
Zahra Narimani⁴ &
Nazanin Hosseinkhan⁴

Part of the book series: SpringerBriefs in Systems Biology ((BRIEFSBIOSYS,volume 4))

2993 Accesses
1 Altmetric

Abstract

In an ideal case, an assembly algorithm should merge overlapped reads to one long continuous sequence, called contig, which is a chromosome in the primitive genome. But due to sequencing errors and the existence of unsequenced parts, contigs gained from the assembly algorithm are not complete enough to form chromosomes. Even with high coverage, there is still a non-zero probability for the existence of unsequenced parts and sequencing errors. The ability of the assembler to form contigs is also affected by repeated regions in the genome. As shown in Fig. 3.3 in the previous chapter, two parts of different repeat areas are mapped to one in the assembler because of the weakness of repeat detection in the assembler. Figure 4.1 shows how a typical assembly algorithm works in overlap detection and contig generation phases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Myers, E. W., et al. (2000). A whole-genome assembly of Drosophila. Science, 287(5461), 2196–2204.
Article PubMed CAS Google Scholar
Li, R., et al. (2010). De novo assembly of human genomes with massively parallel short read sequencing. Genome Research, 20(2), 265–272.
Article PubMed CAS Google Scholar
Simpson, J. T., et al. (2009). ABySS: A parallel assembler for short read sequence data. Genome Research, 19(6), 1117–1123.
Article PubMed CAS Google Scholar
Almeida, N. F., et al. (2009). A draft genome sequence of Pseudomonas syringae pv. tomato T1 reveals a type III effector repertoire significantly divergent from that of Pseudomonas syringae pv. tomato DC3000. Molecular Plant-Microbe Interactions, 22(1), 52–62.
Article PubMed CAS Google Scholar
Green, S., et al. (2010). Comparative genome analysis provides insights into the evolution and adaptation of Pseudomonas syringae pv. aesculi on Aesculus hippocastanum. PLoS One, 5(4), e10224.
Article PubMed Google Scholar
Rees, D., Husselmann, L., & Celton. J. (2009). De novo genome sequencing of the apple scab (Venturia inaequalis) genome, using Illumina sequencing technology. in PAG-XVII Plant and Animal Genomes XVII Conference. Available online at: http://www.intl-pag.org/17/abstracts/P01_PAGXVII_013.html.
Bondy, J., & Murty, U. (2008). Graph Theory (Graduate Texts in Mathematics vol 244). New York: Springer.
Google Scholar
Smith, T., & Waterman, M. (1981). ªIdentification of common molecular subsequences º. J. Molecular Biology, 147, 195–197.
Article CAS Google Scholar
Scheibye-Alsing, K., et al. (2009). Sequence assembly. Computational Biology and Chemistry, 33(2), 121–136.
Article PubMed CAS Google Scholar
Pevzner, P. A. (1989). 1-Tuple DNA sequencing: computer analysis. Journal of Biomolecular Structure & Dynamics, 7(1), 63–73.
CAS Google Scholar
Tsur, D. (2010). Sequencing by hybridization in few rounds. Journal of Computer and System Sciences, 76(8), 751–758.
Article Google Scholar
Dramanac, R., et al. (1989). Sequencing of megabase plus DNA by hybridization: theory of the method. Genomics, 4(2), 114–128.
Article Google Scholar
Lysov Iu, P., et al. (1988). Determination of the nucleotide sequence of DNA using hybridization with oligonucleotides. A new method. Doklady Akademii Nauk, 303(6), 1508–1511.
CAS Google Scholar
Medvedev, P., et al., Computability of models for sequence assembly. Algorithms in Bioinformatics, 2007: pp. 289–301.
Google Scholar
Pevzner, P. A., Tang, H., & Waterman, M. S. (2001). An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences, 98(17), 9748–9753.
Article CAS Google Scholar
Narzisi, G., & Mishra, B. (2011). Comparing de novo genome assembly: The long and short of it. PLoS One, 6(4), e19175.
Article PubMed CAS Google Scholar
Schwartz, D. C., & Waterman, M. S. (2010). New generations: Sequencing machines and their computational challenges. Journal of Computer Science and Technology, 25(1), 3–9.
Article PubMed Google Scholar
Ariyaratne, P. N., & Sung, W. K. (2011). PE-Assembler: De novo assembler using short paired-end reads. Bioinformatics, 27(2), 167–174.
Article PubMed CAS Google Scholar
Warren, R. L., et al. (2007). Assembling millions of short DNA sequences using SSAKE. Bioinformatics, 23(4), 500–501.
Article PubMed CAS Google Scholar
Dohm, J. C., et al. (2007). SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Research, 17(11), 1697–1706.
Article PubMed CAS Google Scholar
Jeck, W. R., et al. (2007). Extending assembly of short DNA sequences to handle error. Bioinformatics, 23(21), 2942–2944.
Article PubMed CAS Google Scholar
Ewing, B., & Green, P. (1998). Base-calling of automated sequencer traces usingPhred. II. error probabilities. Genome Research, 8(3), 186–194.
Article PubMed CAS Google Scholar
Batzoglou, S., et al. (2002). ARACHNE: A whole-genome shotgun assembler. Genome Research, 12(1), 177–189.
Article PubMed CAS Google Scholar
Miller, J. R., et al. (2008). Aggressive assembly of pyrosequencing reads with mates. Bioinformatics, 24(24), 2818–2824.
Article PubMed CAS Google Scholar
Huang, X., & Madan, A. (1999). CAP3: A DNA sequence assembly program. Genome Research, 9(9), 868–877.
Article PubMed CAS Google Scholar
Margulies, M., et al. (2005). Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437(7057), 376–380.
PubMed CAS Google Scholar
Hernandez, D., et al. (2008). De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Research, 18(5), 802–809.
Article PubMed CAS Google Scholar
Hossain, M.S., Azimi, N., Skiena, S. (2009). Crystallizing short-read assemblies around seeds. BMC Bioinformatics 10(Suppl 1), S16.
Google Scholar
Miller, J. R., Koren, S., & Sutton, G. (2010). Assembly algorithms for next-generation sequencing data. Genomics, 95(6), 315.
Article PubMed CAS Google Scholar
Zerbino, D. R., & Birney, E. (2008). Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18(5), 821–829.
Article PubMed CAS Google Scholar
Chaisson, M. J., Brinza, D., & Pevzner, P. A. (2009). De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Research, 19(2), 336–346.
Article PubMed CAS Google Scholar
Butler, J., et al. (2008). ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Research, 18(5), 810–820.
Article PubMed CAS Google Scholar
Peng, Y., et al. IDBA–a practical iterative de Bruijn graph de novo assembler. in Research in Computational Molecular Biology. 2010. Springer.
Google Scholar
MacCallum, I., et al. (2009). ALLPATHS 2: Small genomes assembled accurately and with high continuity from short paired reads. Genome Biology, 10, R103.
Article PubMed Google Scholar
Chaisson, M. J., & Pevzner, P. A. (2008). Short read fragment assembly of bacterial genomes. Genome Research, 18(2), 324–330.
Article PubMed CAS Google Scholar
Narzisi, G., & Mishra, B. (2011). Scoring-and-unfolding trimmed tree assembler: concepts, constructs and comparisons. Bioinformatics, 27(2), 153–160.
Article PubMed CAS Google Scholar
Sommer, D. D., et al. (2007). Minimus: A fast, lightweight genome assembler. BMC Bioinformatics, 8(1), 64.
Article PubMed Google Scholar
Huang, X., et al. (2003). PCAP: A whole-genome assembly program. Genome Research, 13(9), 2164–2170.
Article PubMed CAS Google Scholar
Sutton, G. G., et al. (1995). TIGR Assembler: A new tool for assembling large shotgun sequencing projects. Genome Science and Technology, 1(1), 9–19.
Article CAS Google Scholar
Schmidt, B., et al. (2009). A fast hybrid short read fragment assembly algorithm. Bioinformatics, 25(17), 2279–2280.
Article PubMed CAS Google Scholar
Brockman, W., et al. (2008). Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Research, 18(5), 763–770.
Article PubMed CAS Google Scholar
Pareek, C. S., Smoczynski, R., & Tretyn, A. (2011). Sequencing technologies and genome sequencing. Journal of Applied Genetics, 52(4), 413–435.
Article PubMed CAS Google Scholar
Berglund, E. C., Kiialainen, A., & Syvänen, A. C. (2011). Next-generation sequencing technologies and applications for human genetic history and forensics. Investigative Genetics, 2(1), 1–15.
Article Google Scholar
Shendure, J., & Ji, H. (2008). Next-generation DNA sequencing. Nature Biotechnology, 26(10), 1135–1145.
Article PubMed CAS Google Scholar
Kircher, M., & Kelso, J. (2010). High-throughput DNA sequencing–concepts and limitations. BioEssays, 32(6), 524–536.
Article PubMed CAS Google Scholar
Novais, R., & Thorstenson, Y. (2011). The evolution of Pyrosequencing® for microbiology: From genes to genomes. Journal of Microbiological Methods, 86(1), 1–7.
Article PubMed CAS Google Scholar
Metzker, M. L. (2009). Sequencing technologies—the next generation. Nature Reviews Genetics, 11(1), 31–46.
Article PubMed Google Scholar
Novák, P., Neumann, P., & Macas, J. (2010). Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics, 11(1), 378.
Article PubMed Google Scholar
Shendure, J., et al. (2004). Advanced sequencing technologies: Methods and goals. Nature Reviews Genetics, 5(5), 335–344.
Article PubMed CAS Google Scholar
Dong, H., & Wang, S. (2012). Exploring the cancer genome in the era of next-generation sequencing. Frontiers of Medicine, 6(1), 48–55.
Article PubMed Google Scholar
Mardis, E. R. (2008). Next-generation DNA sequencing methods. Annual Review of Genomics and Human Genetics, 9, 387–402.
Article PubMed CAS Google Scholar
Wash, S., & Image, C. (2008). DNA sequencing: generation next–next. Nature Methods, 5(3), 267.
Article Google Scholar
Smit, A., R. Hubley, and P. Green, RepeatMasker Open-3.0. 1996–2004. Institute for Systems Biology, 2004.
Google Scholar
Liu, L., et al., Comparison of Next-Generation Sequencing Systems. Journal of Biomedicine and Biotechnology, 2012. 2012.
Google Scholar
McNally, B., et al. (2010). Optical recognition of converted DNA nucleotides for single-molecule DNA sequencing using nanopore arrays. Nano Letters, 10(6), 2237–2244.
Article PubMed CAS Google Scholar
Hui, P., Next generation sequencing: chemistry, technology and applications. [Without Title], 2012: pp. 1–18.
Google Scholar
Eid, J., et al. (2009). Real-time DNA sequencing from single polymerase molecules. Science, 323(5910), 133–138.
Article PubMed CAS Google Scholar
Clarke, J., et al. (2009). Continuous base identification for single-molecule nanopore DNA sequencing. Nature Nanotechnology, 4(4), 265–270.
Article PubMed CAS Google Scholar
Tyagi, S., et al., Molecular beacons: hybridization probes for detection of nucleic acids in homogeneous solutions. Nonradioactive Analysis of Biomolecules, 2nd ed. C. Kessler, ed. Springer-Verlag, Berlin, 2000: pp. 606–616.
Google Scholar
Morozova, O., & Marra, M. A. (2008). Applications of next-generation sequencing technologies in functional genomics. Genomics, 92(5), 255–264.
Article PubMed CAS Google Scholar
Tammi, M. T., et al. (2003). Correcting errors in shotgun sequences. Nucleic Acids Research, 31(15), 4663–4672.
Article PubMed CAS Google Scholar
Paulsen, I. T., et al. (2002). The Brucella suis genome reveals fundamental similarities between animal and plant pathogens and symbionts. Proceedings of the National Academy of Sciences, 99(20), 13148–13153.
Article CAS Google Scholar
Wu, M., et al. (2004). Phylogenomics of the reproductive parasite Wolbachia pipientis wMel: A streamlined genome overrun by mobile genetic elements. PLoS Biology, 2(3), e69.
Article PubMed Google Scholar
Gill, S. R., et al. (2005). Insights on evolution of virulence and resistance from the complete genome analysis of an early methicillin-resistant Staphylococcus aureus strain and a biofilm-producing methicillin-resistant Staphylococcus epidermidis strain. Journal of Bacteriology, 187(7), 2426–2438.
Article PubMed CAS Google Scholar
Baba, T., et al. (2002). Genome and virulence determinants of high virulence community-acquired MRSA. The Lancet, 359(9320), 1819–1827.
Article CAS Google Scholar
Ep**er, M., et al. (2006). Who ate whom? Adaptive Helicobacter genomic changes that accompanied a host jump from early humans to large felines. PLoS Genetics, 2(7), e120.
Article PubMed Google Scholar
Blattner, F. R., et al. (1997). The complete genome sequence of Escherichia coli K-12. Science, 277(5331), 1453–1462.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
Ali Masoudi-Nejad, Zahra Narimani & Nazanin Hosseinkhan

Authors

Ali Masoudi-Nejad
View author publications
You can also search for this author in PubMed Google Scholar
Zahra Narimani
View author publications
You can also search for this author in PubMed Google Scholar
Nazanin Hosseinkhan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Masoudi-Nejad .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Masoudi-Nejad, A., Narimani, Z., Hosseinkhan, N. (2013). De Novo Assembly Algorithms. In: Next Generation Sequencing and Sequence Assembly. SpringerBriefs in Systems Biology, vol 4. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7726-6_4

Download citation

DOI: https://doi.org/10.1007/978-1-4614-7726-6_4
Published: 09 July 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7725-9
Online ISBN: 978-1-4614-7726-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics