Abstract
In an ideal case, an assembly algorithm should merge overlapped reads to one long continuous sequence, called contig, which is a chromosome in the primitive genome. But due to sequencing errors and the existence of unsequenced parts, contigs gained from the assembly algorithm are not complete enough to form chromosomes. Even with high coverage, there is still a non-zero probability for the existence of unsequenced parts and sequencing errors. The ability of the assembler to form contigs is also affected by repeated regions in the genome. As shown in Fig. 3.3 in the previous chapter, two parts of different repeat areas are mapped to one in the assembler because of the weakness of repeat detection in the assembler. Figure 4.1 shows how a typical assembly algorithm works in overlap detection and contig generation phases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Myers, E. W., et al. (2000). A whole-genome assembly of Drosophila. Science, 287(5461), 2196–2204.
Li, R., et al. (2010). De novo assembly of human genomes with massively parallel short read sequencing. Genome Research, 20(2), 265–272.
Simpson, J. T., et al. (2009). ABySS: A parallel assembler for short read sequence data. Genome Research, 19(6), 1117–1123.
Almeida, N. F., et al. (2009). A draft genome sequence of Pseudomonas syringae pv. tomato T1 reveals a type III effector repertoire significantly divergent from that of Pseudomonas syringae pv. tomato DC3000. Molecular Plant-Microbe Interactions, 22(1), 52–62.
Green, S., et al. (2010). Comparative genome analysis provides insights into the evolution and adaptation of Pseudomonas syringae pv. aesculi on Aesculus hippocastanum. PLoS One, 5(4), e10224.
Rees, D., Husselmann, L., & Celton. J. (2009). De novo genome sequencing of the apple scab (Venturia inaequalis) genome, using Illumina sequencing technology. in PAG-XVII Plant and Animal Genomes XVII Conference. Available online at: http://www.intl-pag.org/17/abstracts/P01_PAGXVII_013.html.
Bondy, J., & Murty, U. (2008). Graph Theory (Graduate Texts in Mathematics vol 244). New York: Springer.
Smith, T., & Waterman, M. (1981). ªIdentification of common molecular subsequences º. J. Molecular Biology, 147, 195–197.
Scheibye-Alsing, K., et al. (2009). Sequence assembly. Computational Biology and Chemistry, 33(2), 121–136.
Pevzner, P. A. (1989). 1-Tuple DNA sequencing: computer analysis. Journal of Biomolecular Structure & Dynamics, 7(1), 63–73.
Tsur, D. (2010). Sequencing by hybridization in few rounds. Journal of Computer and System Sciences, 76(8), 751–758.
Dramanac, R., et al. (1989). Sequencing of megabase plus DNA by hybridization: theory of the method. Genomics, 4(2), 114–128.
Lysov Iu, P., et al. (1988). Determination of the nucleotide sequence of DNA using hybridization with oligonucleotides. A new method. Doklady Akademii Nauk, 303(6), 1508–1511.
Medvedev, P., et al., Computability of models for sequence assembly. Algorithms in Bioinformatics, 2007: pp. 289–301.
Pevzner, P. A., Tang, H., & Waterman, M. S. (2001). An Eulerian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences, 98(17), 9748–9753.
Narzisi, G., & Mishra, B. (2011). Comparing de novo genome assembly: The long and short of it. PLoS One, 6(4), e19175.
Schwartz, D. C., & Waterman, M. S. (2010). New generations: Sequencing machines and their computational challenges. Journal of Computer Science and Technology, 25(1), 3–9.
Ariyaratne, P. N., & Sung, W. K. (2011). PE-Assembler: De novo assembler using short paired-end reads. Bioinformatics, 27(2), 167–174.
Warren, R. L., et al. (2007). Assembling millions of short DNA sequences using SSAKE. Bioinformatics, 23(4), 500–501.
Dohm, J. C., et al. (2007). SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Research, 17(11), 1697–1706.
Jeck, W. R., et al. (2007). Extending assembly of short DNA sequences to handle error. Bioinformatics, 23(21), 2942–2944.
Ewing, B., & Green, P. (1998). Base-calling of automated sequencer traces usingPhred. II. error probabilities. Genome Research, 8(3), 186–194.
Batzoglou, S., et al. (2002). ARACHNE: A whole-genome shotgun assembler. Genome Research, 12(1), 177–189.
Miller, J. R., et al. (2008). Aggressive assembly of pyrosequencing reads with mates. Bioinformatics, 24(24), 2818–2824.
Huang, X., & Madan, A. (1999). CAP3: A DNA sequence assembly program. Genome Research, 9(9), 868–877.
Margulies, M., et al. (2005). Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437(7057), 376–380.
Hernandez, D., et al. (2008). De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Research, 18(5), 802–809.
Hossain, M.S., Azimi, N., Skiena, S. (2009). Crystallizing short-read assemblies around seeds. BMC Bioinformatics 10(Suppl 1), S16.
Miller, J. R., Koren, S., & Sutton, G. (2010). Assembly algorithms for next-generation sequencing data. Genomics, 95(6), 315.
Zerbino, D. R., & Birney, E. (2008). Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Research, 18(5), 821–829.
Chaisson, M. J., Brinza, D., & Pevzner, P. A. (2009). De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Research, 19(2), 336–346.
Butler, J., et al. (2008). ALLPATHS: De novo assembly of whole-genome shotgun microreads. Genome Research, 18(5), 810–820.
Peng, Y., et al. IDBA–a practical iterative de Bruijn graph de novo assembler. in Research in Computational Molecular Biology. 2010. Springer.
MacCallum, I., et al. (2009). ALLPATHS 2: Small genomes assembled accurately and with high continuity from short paired reads. Genome Biology, 10, R103.
Chaisson, M. J., & Pevzner, P. A. (2008). Short read fragment assembly of bacterial genomes. Genome Research, 18(2), 324–330.
Narzisi, G., & Mishra, B. (2011). Scoring-and-unfolding trimmed tree assembler: concepts, constructs and comparisons. Bioinformatics, 27(2), 153–160.
Sommer, D. D., et al. (2007). Minimus: A fast, lightweight genome assembler. BMC Bioinformatics, 8(1), 64.
Huang, X., et al. (2003). PCAP: A whole-genome assembly program. Genome Research, 13(9), 2164–2170.
Sutton, G. G., et al. (1995). TIGR Assembler: A new tool for assembling large shotgun sequencing projects. Genome Science and Technology, 1(1), 9–19.
Schmidt, B., et al. (2009). A fast hybrid short read fragment assembly algorithm. Bioinformatics, 25(17), 2279–2280.
Brockman, W., et al. (2008). Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Research, 18(5), 763–770.
Pareek, C. S., Smoczynski, R., & Tretyn, A. (2011). Sequencing technologies and genome sequencing. Journal of Applied Genetics, 52(4), 413–435.
Berglund, E. C., Kiialainen, A., & Syvänen, A. C. (2011). Next-generation sequencing technologies and applications for human genetic history and forensics. Investigative Genetics, 2(1), 1–15.
Shendure, J., & Ji, H. (2008). Next-generation DNA sequencing. Nature Biotechnology, 26(10), 1135–1145.
Kircher, M., & Kelso, J. (2010). High-throughput DNA sequencing–concepts and limitations. BioEssays, 32(6), 524–536.
Novais, R., & Thorstenson, Y. (2011). The evolution of Pyrosequencing® for microbiology: From genes to genomes. Journal of Microbiological Methods, 86(1), 1–7.
Metzker, M. L. (2009). Sequencing technologies—the next generation. Nature Reviews Genetics, 11(1), 31–46.
Novák, P., Neumann, P., & Macas, J. (2010). Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics, 11(1), 378.
Shendure, J., et al. (2004). Advanced sequencing technologies: Methods and goals. Nature Reviews Genetics, 5(5), 335–344.
Dong, H., & Wang, S. (2012). Exploring the cancer genome in the era of next-generation sequencing. Frontiers of Medicine, 6(1), 48–55.
Mardis, E. R. (2008). Next-generation DNA sequencing methods. Annual Review of Genomics and Human Genetics, 9, 387–402.
Wash, S., & Image, C. (2008). DNA sequencing: generation next–next. Nature Methods, 5(3), 267.
Smit, A., R. Hubley, and P. Green, RepeatMasker Open-3.0. 1996–2004. Institute for Systems Biology, 2004.
Liu, L., et al., Comparison of Next-Generation Sequencing Systems. Journal of Biomedicine and Biotechnology, 2012. 2012.
McNally, B., et al. (2010). Optical recognition of converted DNA nucleotides for single-molecule DNA sequencing using nanopore arrays. Nano Letters, 10(6), 2237–2244.
Hui, P., Next generation sequencing: chemistry, technology and applications. [Without Title], 2012: pp. 1–18.
Eid, J., et al. (2009). Real-time DNA sequencing from single polymerase molecules. Science, 323(5910), 133–138.
Clarke, J., et al. (2009). Continuous base identification for single-molecule nanopore DNA sequencing. Nature Nanotechnology, 4(4), 265–270.
Tyagi, S., et al., Molecular beacons: hybridization probes for detection of nucleic acids in homogeneous solutions. Nonradioactive Analysis of Biomolecules, 2nd ed. C. Kessler, ed. Springer-Verlag, Berlin, 2000: pp. 606–616.
Morozova, O., & Marra, M. A. (2008). Applications of next-generation sequencing technologies in functional genomics. Genomics, 92(5), 255–264.
Tammi, M. T., et al. (2003). Correcting errors in shotgun sequences. Nucleic Acids Research, 31(15), 4663–4672.
Paulsen, I. T., et al. (2002). The Brucella suis genome reveals fundamental similarities between animal and plant pathogens and symbionts. Proceedings of the National Academy of Sciences, 99(20), 13148–13153.
Wu, M., et al. (2004). Phylogenomics of the reproductive parasite Wolbachia pipientis wMel: A streamlined genome overrun by mobile genetic elements. PLoS Biology, 2(3), e69.
Gill, S. R., et al. (2005). Insights on evolution of virulence and resistance from the complete genome analysis of an early methicillin-resistant Staphylococcus aureus strain and a biofilm-producing methicillin-resistant Staphylococcus epidermidis strain. Journal of Bacteriology, 187(7), 2426–2438.
Baba, T., et al. (2002). Genome and virulence determinants of high virulence community-acquired MRSA. The Lancet, 359(9320), 1819–1827.
Ep**er, M., et al. (2006). Who ate whom? Adaptive Helicobacter genomic changes that accompanied a host jump from early humans to large felines. PLoS Genetics, 2(7), e120.
Blattner, F. R., et al. (1997). The complete genome sequence of Escherichia coli K-12. Science, 277(5331), 1453–1462.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2013 The Author(s)
About this chapter
Cite this chapter
Masoudi-Nejad, A., Narimani, Z., Hosseinkhan, N. (2013). De Novo Assembly Algorithms. In: Next Generation Sequencing and Sequence Assembly. SpringerBriefs in Systems Biology, vol 4. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7726-6_4
Download citation
DOI: https://doi.org/10.1007/978-1-4614-7726-6_4
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7725-9
Online ISBN: 978-1-4614-7726-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)