BWTCP: A Parallel Method for Constructing BWT in Large Collection of Genomic Reads

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9137))

Included in the following conference series:

Abstract

Short-read alignment and assembly are fundamental procedures for analyses of DNA sequencing data. Many state-of-the-art short-read aligners employ Burrows-Wheeler transform (BWT) as an in-memory index for the reference genome. BWT has also found its use in genome assembly, for indexing the reads. In a typical data set, the volume of reads can be as large as several hundred Gigabases. Consequently, fast construction of the BWT index for reads is essential for an efficient sequence processing. In this paper, we present a parallel method called BWTCP for BWT construction at a large scale. BWTCP is characterized by its ability to harness heterogeneous computing power including multi-core CPU, multiple CPUs, and accelerators like GPU or Intel Xeon Phi. BWTCP is also featured by its novel pruning strategy. Using BWTCP, we managed to construct the BWT for 1 billion 100bp reads within 30 m using 16 compute nodes (2 CPUs per node) on Tianhe-2 Supercomputer. It significantly outperforms the baseline tool BCR, which would need 13 h to finish all processing for the same dataset. BWTCP is freely available at https://github.com/hwang91/BWTCP.

H. Wang, S. Peng, and X. Zhu—Joint first authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 42.79
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 53.49
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Deshpande, V., Fung, E.D.K., Pham, S., Bafna, V.: Cerulean: a hybrid assembly using high throughput short and long reads. In: Darling, A., Stoye, J. (eds.) WABI 2013. LNCS, vol. 8126, pp. 349–363. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  2. Deshpande, V.: Sequencing, assembling, and annotating a mid-sized genome. In: Plant and Animal Genome XXII Conference. Plant and Animal Genome (2014)

    Google Scholar 

  3. Huang, L., Popic, V., Batzoglou, S.: Short read alignment with populations of genomes. Bioinformatics 29(13), i361–i370 (2013)

    Article  Google Scholar 

  4. Blazewicz, J., Frohmberg, W., Gawron, P., et al.: DNA sequence assembly involving an acyclic graph model. Found. Comput. Decis. Sci. 38(1), 25–34 (2013)

    Google Scholar 

  5. Li, B., Fillmore, N., Bai, Y., et al.: Evaluation of de novo transcriptome assemblies from RNASeqdata. bioRxiv (2014)

    Google Scholar 

  6. Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm (1994)

    Google Scholar 

  7. Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)

    Article  Google Scholar 

  8. Langmead, B., Salzberg, S.L.: Fast gapped-read alignment with Bowtie 2. Nat. Methods 9(4), 357–359 (2012)

    Article  Google Scholar 

  9. Chan, S.H., Cheung, J., Wu, E., et al.: MICA: a fast short-read aligner that takes full advantage of intel many integrated core architecture (MIC) (2014). ar**v preprint ar**v:1402.4876

  10. Simpson, J.T., Durbin, R.: Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22(3), 549–556 (2012)

    Article  Google Scholar 

  11. Bauer, M.J., Cox, A.J., Rosone, G.: Lightweight algorithms for constructing and inverting the BWT of string collections. Theoret. Comput. Sci. 483, 134–148 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  12. Liu, C.M., Luo, R., Lam, T.W.: GPU-accelerated BWT construction for large collection of short reads (2014). ar**v preprint ar**v:1401.7457

  13. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, p. 390–398. IEEE (2000)

    Google Scholar 

  14. Kärkkäinen, J.: Fast BWT in small space by blockwise suffix sorting. Theoret. Comput. Sci. 387(3), 249–257 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  15. Mcllroy, P.M., Bostic, K., Mcllroy, M.D.: Engineering radix sort. Comput. Syst. 6(1), 5–27 (1993)

    Google Scholar 

  16. Luo, R., Liu, B., **e, Y., et al.: SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience 1(1), 18 (2012)

    Article  Google Scholar 

  17. http://www.cpu-world.com/Compare/501/Intel_Core_i7_i7-3930K_vs_Intel_Xeon_E5-2692_v2.html

Download references

Acknowledgement

We acknowledge Prof. T.W. Lam, Project Manager Ruibang Luo and C.M. Liu in BAL lab, Department of Computer Science, The University of Hong Kong for providing the source codes, related data and constructive advice both in designing and testing of BWTCP. And this work is supported by NSFC Grant 61272056, U1435222, 61133005, 61120106005 and 91432018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaoliang Peng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Wang, H. et al. (2015). BWTCP: A Parallel Method for Constructing BWT in Large Collection of Genomic Reads. In: Kunkel, J., Ludwig, T. (eds) High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science(), vol 9137. Springer, Cham. https://doi.org/10.1007/978-3-319-20119-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20119-1_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20118-4

  • Online ISBN: 978-3-319-20119-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation