Assembly-Free Techniques for NGS Data

Comin, Matteo; Schimd, Michele

doi:10.1007/978-3-319-59826-0_14

Matteo Comin² &
Michele Schimd²

1800 Accesses
1 Citations

Abstract

Sequencing technologies have undergone a considerable evolution in the last decades; the first expensive machines (appearing in the late 70s) have today been substituted by cheaper and more effective ones. At the same time, data processing evolved concurrently to face new challenges and problems posed by the new type of sequencing records. In this first section, we briefly outline how such an evolution of sequencing technologies developed and how new challenges were posed by each new generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (France)

eBook: EUR 85.59; Price includes VAT (France)

Softcover Book: EUR 105.49; Price includes VAT (France)

Hardcover Book: EUR 147.69; Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.1000genomes.org/.
2.
http://ab.inf.uni-tuebingen.de/software/metasim/.
3.
http://www-rcf.usc.edu/~fsun/Programs/D2_NGS/D2NGSmain.html.
4.
http://flybase.org, dmel-all-intergenic-r5.49.fasta.

References

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Article Google Scholar
Apostolico, A.: Maximal words in sequence comparisons based on subword composition. In: Algorithms and Applications, pp. 34–44. Springer, Berlin/Heidelberg (2010)
Google Scholar
Blaisdell, B.E.: A measure of the similarity of sets of sequences not requiring sequence alignment. Proc. Natl. Acad. Sci. 83(14), 5155–5159 (1986)
Article MATH Google Scholar
Carneiro, M., Russ, C., Ross, M., Gabriel, S., Nusbaum, C., DePristo, M.: Pacific biosciences sequencing technology for genoty** and variation discovery in human data. BMC Genomics 13(1), 375 (2012)
Article Google Scholar
Cole, J.R., Wang, Q., Cardenas, E., Fish, J., Chai, B., Farris, R.J., Kulam-Syed-Mohideen, A.S., McGarrell, D.M., Marsh, T., Garrity, G.M., Tiedje, J.M.: The ribosomal database project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res. 37, D141–D145 (2009)
Article Google Scholar
Comin, M., Antonello, M.: Fast computation of entropic profiles for the detection of conservation in genomes. In: Proceedings of Pattern Recognition in Bioinformatics PRIB. Lecture Notes in Bioinformatics, vol. 7986, pp. 277–288. Springer, Heidelberg (2013)
Google Scholar
Comin, M., Antonello, M.: Fast entropic profiler: an information theoretic approach for the discovery of patterns in genomes. IEEE/ACM Trans. Comput. Biol. Bioinform. 11(3), 500–509 (2014)
Article Google Scholar
Comin, M., Schimd, M.: Assembly-free genome comparison based on next-generation sequencing reads and variable length patterns. BMC Bioinform. 15(Suppl. 9), S1 (2014)
Article Google Scholar
Comin, M., Verzotto, D.: Classification of protein sequences by means of irredundant patterns. BMC Bioinform. 11, S16 (2010)
Article Google Scholar
Comin, M., Verzotto, D.: The irredundant class method for remote homology detection of protein sequences. J. Comput. Biol. 18(12), 1819–1829 (2011)
Article Google Scholar
Comin, M., Verzotto, D.: Alignment-free phylogeny of whole genomes using underlying subwords. Algorithms Mol. Biol. 7(1), 34 (2012)
Article Google Scholar
Comin, M., Verzotto, D.: Whole-genome phylogeny by virtue of unic subwords. In: 23rd International Workshop on Database and Expert Systems Applications (DEXA), 2012, pp. 190–194 (2012)
Google Scholar
Comin, M., Verzotto, D.: Beyond fixed-resolution alignment-free measures for mammalian enhancers sequence comparison. IEEE/ACM Trans. Comput. Biol. Bioinform. 11(4), 628–637 (2014)
Article Google Scholar
Comin, M., Leoni, A., Schimd, M.: Qcluster: extending alignment-free measures with quality values for reads clustering. In: Proceedings of the 14th Workshop on Algorithms in Bioinformatics (WABI). Lecture Notes in BIoinformatics (LNBI), vol. 8701, pp. 1–13 (2014)
Google Scholar
Dai, Q., Wang, T.: Comparison study on k-word statistical measures for protein: from sequence to ‘sequence space’. BMC Bioinform. 9(1), 1–19 (2008)
Article Google Scholar
Djebali, S., Davis, C.A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F., et al.: Landscape of transcription in human cells. Nature 489(7414), 101–108 (2012)
Article Google Scholar
Dohm, J.C., Lottaz, C., Borodina, T., Himmelbauer, H.: Sharcgs, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res. 17(11), 1697–1706 (2007)
Article Google Scholar
Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P., Rank, D., Baybayan, P., Bettman, B., Bibillo, A., Bjornson, K., Chaudhuri, B., Christians, F., Cicero, R., Clark, S., Dalal, R., deWinter, A., Dixon, J., Foquet, M., Gaertner, A., Hardenbol, P., Heiner, C., Hester, K., Holden, D., Kearns, G., Kong, X., Kuse, R., Lacroix, Y., Lin, S., Lundquist, P., Ma, C., Marks, P., Maxham, M., Murphy, D., Park, I., Pham, T., Phillips, M., Roy, J., Sebra, R., Shen, G., Sorenson, J., Tomaney, A., Travers, K., Trulson, M., Vieceli, J., Wegener, J., Wu, D., Yang, A., Zaccarin, D., Zhao, P., Zhong, F., Korlach, J., Turner, S.: Real-time DNA sequencing from single polymerase molecules. Science 323(5910), 133–138 (2009)
Google Scholar
Felsenstein, J.: PHYLIP 1984 (Phylogeny Inference Package), Version 3.5c. Department of Genetics, University of Washington, Seattle (1993)
Google Scholar
Gao, L., Qi, J.: Whole genome molecular phylogeny of large dsdna viruses using composition vector method. BMC Evol. Biol. 7(1), 1–7 (2007)
Article MathSciNet Google Scholar
Göke, J., Schulz, M.H., Lasserre, J., Vingron, M.: Estimation of pairwise sequence similarity of mammalian enhancers with word neighbourhood counts. Bioinformatics 28, 656–663 (2012)
Article Google Scholar
Huang, X., Yang, S.-P.: Generating a genome assembly with PCAP. Curr. Protoc. Bioinformatics 11(3), 11.3.1–11.3.23 (2005)
Google Scholar
Jaffe, D.B., Butler, J., Gnerre, S., Mauceli, E., Lindblad-Toh, K., Mesirov, J.P., Zody, M.C., Lander, E.S.: Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res. 13(1), 91–96 (2003)
Article Google Scholar
Kantorovitz, M.R., Robinson, G.E., Sinha, S.: A statistical method for alignment-free comparison of regulatory sequences. Bioinformatics 23(13), i249–i255 (2007)
Article Google Scholar
Koboldt, D.C., Steinberg, K.M., Larson, D.E., Wilson, R.K., Mardis, E.R.: The next-generation sequencing revolution and its impact on genomics. Cell 155(1), 27–38 (2013)
Article Google Scholar
Lander, E.S., et al.: Initial sequencing and analysis of the human genome. Nature 409(6822), 860–921 (2001)
Article Google Scholar
Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Brief. Bioinform. 11(5), 473–483 (2010)
Article Google Scholar
Lippert, R.A., Huang, H., Waterman, M.S.: Distributional regimes for the number of k-word matches between two random sequences. Proc. Natl. Acad. Sci. 99(22), 13980–13989 (2002)
Article MathSciNet MATH Google Scholar
Liu, X., Wan, L., Li, J., Reinert, G., Waterman, M.S., Sun, F.: New powerful statistics for alignment-free sequence comparison under a pattern transfer model. J. Theor. Biol. 284(1), 106–116 (2011)
Article MathSciNet Google Scholar
Metzker, M.L.: Sequencing technologies – the next generation. Nat. Rev. Genet. 11(1), 31–46 (2010)
Article Google Scholar
Miller, J.R., Koren, S., Sutton, G.: Assembly algorithms for next-generation sequencing data. Genomics 95(6), 315–327 (2010)
Article Google Scholar
Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to dna fragment assembly. Proc. Natl. Acad. Sci. 98(17), 9748–9753 (2001)
Article MathSciNet MATH Google Scholar
Pop, M., Salzberg, S.L.: Bioinformatics challenges of new sequencing technology. Trends Genet. 24(3), 142–149 (2008)
Article Google Scholar
Qi, J., Luo, H., Hao, B.: Cvtree: a phylogenetic tree reconstruction tool based on whole genomes. Nucleic Acids Res. 32(Suppl. 2), W45–W47 (2004)
Article Google Scholar
Reinert, G., Chew, D., Sun, F., Waterman, M.S.: Alignment-free sequence comparison (i): statistics and power. J. Comput. Biol. 16(12), 1615–1634 (2009)
Article MathSciNet Google Scholar
Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: Metasim—a sequencing simulator for genomics and metagenomics. PLoS ONE 3(10), e3373 (2008)
Article Google Scholar
Robinson, D.F., Foulds, L.R.: Comparison of phylogenetic trees. Math. Biosci. 53(1–2), 131–147 (1981)
Article MathSciNet MATH Google Scholar
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4(4), 406–425 (1987)
Google Scholar
Sanger, F., Nicklen, S., Coulson, A.R.: DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. 74(12), 5463–5467 (1977)
Article Google Scholar
Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J.M., Birol, I.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)
Article Google Scholar
Sims, G.E., Jun, S.-R., Wu, G.A., Kim, S.-H.: Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc. Natl. Acad. Sci. 106(8), 2677–2682 (2009)
Article Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)
Article Google Scholar
Sneath, P.H.A., Sokal, R.R.: Unweighted pair group method with arithmetic mean. In: Numerical Taxonomy, pp. 230–234. W. H. Freeman, San Francisco (1973)
Google Scholar
Song, K., Ren, J., Zhai, Z., Liu, X., Deng, M., Sun, F.: Alignment-free sequence comparison based on next-generation sequencing reads. J. Comput. Biol. 20(2), 64–79 (2013)
Article MathSciNet Google Scholar
Song, K., Ren, J., Reinert, G., Deng, M., Waterman, M.S., Sun, F.: New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing. Brief. Bioinform. 15(3), 343–353 (2013). bbt067
Google Scholar
Staden, R.: A strategy of dna sequencing employing computer programs. Nucleic Acids Res. 6(7), 2601–2610 (1979)
Article Google Scholar
Treangen, T.J., Salzberg, S.L.: Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nat. Rev. Genet. 13(1), 36–46 (2011)
Google Scholar
Ulitsky, I., Burstein, D., Tuller, T., Chor, B.: The average common substring approach to phylogenomic reconstruction. J. Comput. Biol. 13(2), 336–350 (2006)
Article MathSciNet Google Scholar
Vinga, S., Almeida, J.: Alignment-free sequence comparison – a review. Bioinformatics 19(4), 513–523 (2003)
Article Google Scholar
Wan, L., Reinert, G., Sun, F., Waterman, M.S.: Alignment-free sequence comparison (II): theoretical power of comparison statistics. J. Comput. Biol. 17(11), 1467–1490 (2010)
Article MathSciNet Google Scholar
Warren, R.L., Sutton, G.G., Jones, S.J.M., Holt, R.A.: Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23(4), 500–501 (2007)
Article Google Scholar
Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5), 821–829 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering, University of Padova, Padova, Italy
Matteo Comin & Michele Schimd

Authors

Matteo Comin
View author publications
You can also search for this author in PubMed Google Scholar
Michele Schimd
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matteo Comin .

Editor information

Editors and Affiliations

LaTICE, Tunis, Tunisia
Mourad Elloumi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Comin, M., Schimd, M. (2017). Assembly-Free Techniques for NGS Data. In: Elloumi, M. (eds) Algorithms for Next-Generation Sequencing Data. Springer, Cham. https://doi.org/10.1007/978-3-319-59826-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-59826-0_14
Published: 19 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59824-6
Online ISBN: 978-3-319-59826-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Assembly-Free Techniques for NGS Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

TagDust2: a generic method to extract reads from sequencing data

QACtools: A Quality Assessment and Quality Control Tool for Next-Generation Sequencing Data

Overview of Sequence Data Formats

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Assembly-Free Techniques for NGS Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

TagDust2: a generic method to extract reads from sequencing data

QACtools: A Quality Assessment and Quality Control Tool for Next-Generation Sequencing Data

Overview of Sequence Data Formats

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation