Large-Scale Neo-Heterogeneous Programming and Optimization of SNP Detection on Tianhe-2

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9137))

Included in the following conference series:

  • 2871 Accesses

Abstract

SNP detection is a fundamental procedure in genome analysis. A popular SNP detection tool SOAPsnp can take more than one week to analyze one human genome with a 20-fold coverage. To improve the efficiency, we developed mSNP, a parallel version of SOAPsnp. mSNP utilizes CPU cooperated with Intel® Xeon PhiTM for large-scale SNP detection. Firstly, we redesigned the key data structure of SOAPsnp, which significantly reduces the overhead of memory operations. Secondly, we devised a coordinated parallel framework, in which CPU collaborates with Xeon Phi for higher hardware utilization. Thirdly, we proposed a read-based window division strategy to improve throughput and parallel scale on multiple nodes. To the best of our knowledge, mSNP is the first SNP detection tool empowered by Xeon Phi. We achieved a 45x speedup on a single node of Tianhe-2, without any loss in precision. Moreover, mSNP showed promising scalability on 4,096 nodes on Tianhe-2.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 42.79
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 53.49
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    SOAPsnp website: http://soap.genomics.org.cn/soapsnp.html.

References

  1. National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov/SNP/

  2. Li, R., Li, Y., Fang, X.: SNP detection for massively parallel whole-genome resequencing. Genome Res. 19(6), 1124–1132 (2009)

    Article  Google Scholar 

  3. Short Oligonucleotide Analysis Package Sites. http://soap.genomics.org.cn/index.html

  4. James, J., Reinders, J.: Intel Xeon Phi Coprocessor High Performance Programming. Morgan Kaufmann, Newnes (2013)

    Google Scholar 

  5. Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, pp. 273–282. ACM (2013)

    Google Scholar 

  6. Park, J., Bikshandi, G., Vaidyanathan, K., Tang, P.T.P., Dubey, P., Kim, D.: Tera-scale 1D FFT with low-communication algorithm and Intel\(^{\textregistered }\) Xeon Phi\({}^{\rm TM}\) coprocessors. In: Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, p. 34. ACM (2013)

    Google Scholar 

  7. Heinecke, A., Vaidyanathan, K., Smelyanskiy, M., Kobotov, A., Dubtsov, R. et al.: Design and implementation of the linpack benchmark for single and multi-node systems based on intel\({}^{\textregistered }\) Xeon Phi coprocessor. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 126–137. IEEE (2013)

    Google Scholar 

  8. Pennycook, S.J., Hughes, C.J., Smelyanskiy, M., Jarvis, S.A.: Exploring SIMD for molecular dynamics, using intel\({}^{\textregistered }\) Xeon\({}^{\textregistered }\) processors and intel\({}^{\textregistered }\) Xeon Phi coprocessors. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1085–1097. IEEE (2013)

    Google Scholar 

  9. Misra, S., Pamnany, K., Aluru, S.: Parallel mutual information based construction of whole-genome networks on the intel\({}^{\textregistered }\) Xeon PhiTM coprocessor. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 241–250. IEEE (2014)

    Google Scholar 

  10. Graham, S.L., Kessler, P.B., McKusick, M.K.: Gprof: a call graph execution profiler. ACM SIGPLAN Not. 39(4), 49–57 (2004)

    Article  Google Scholar 

  11. Wikipedia Sites of VTune. http://en.wikipedia.org/wiki/VTune

  12. TOP500 Supercomputer Sites. http://www.top500.org/system/177999

  13. Tang, J., Leunissen, J.A.M., Voorrips, R.E.: HaploSNPer: a web-based allele and SNP de-tection tool. BMC Genet. 9(1), 23 (2008)

    Article  Google Scholar 

  14. Dereeper, A., Nicolas, S., Le Cunff, L.: SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects. BMC Bioinform. 12(1), 134 (2011)

    Article  Google Scholar 

  15. Tang, J., Vosman, B., Voorrips, R.E.: QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species. BMC Bioinform. 7(1), 438 (2006)

    Article  Google Scholar 

  16. Li, H., Handsaker, B., Wysoker, A.: The sequence alignment/map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009)

    Article  Google Scholar 

  17. DePristo, M.A., Banks, E., Poplin, R.: A framework for variation discovery and genoty** using next-generation DNA sequencing data. Nature Genet. 43(5), 491–498 (2011)

    Article  Google Scholar 

  18. Raczy, C., Petrovski, R., Saunders, C.T.: Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms. Bioinformatics 29, 2041–2043 (2013). btt314

    Article  Google Scholar 

  19. Langmead, B., Schatz, M.C., Lin, J.: Searching for SNPs with cloud computing. Genome Biol. 10(11), R134 (2009)

    Article  Google Scholar 

  20. Shvachko, K., Kuang, H., Radia, S., The hadoop distributed file system. IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)

    Google Scholar 

  21. Zhao, S., Prenger, K., Smith, L.: Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing. BMC Genomics 14(1), 425 (2013)

    Article  Google Scholar 

  22. Lu, M., Zhao, J., Luo, Q.: GSNP: a DNA single-nucleotide polymorphism detection system with GPU acceleration. In: 2011 International Conference on Parallel Processing (ICPP), pp. 592–601. IEEE (2011)

    Google Scholar 

  23. Kutlu, M., Agrawal, G.: Cluster-based SNP calling on large-scale genome sequencing data. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid computing (CCGrid), pp. 455–464. IEEE (2013)

    Google Scholar 

  24. Cui, Y., Liao, X., Zhu, X.: mBWA: a massively parallel sequence reads aligner. In: Saez-Rodriguez, J., Rocha, M.P., Fdez-Riverola, F., De Paz Santana, J.F. (eds.) PACBB 2014. AISC, vol. 294, pp. 113–120. Springer, Heidelberg (2014)

    Google Scholar 

Download references

Acknowledgments

We would like to thank Mr. Yingrui Li from BGI for providing the source code of SOAPsnp and Dr. Jun Wang from BGI for providing related test data. We would also like to thank Prof. Hans V. Westerhoff from University of Manchester for discussions of the human genome re-sequencing analysis problem and thus improving our own understanding. This work is supported by NSFC Grant 61272056, U1435222, 61133005, 61120106005, 91430218 and 61303191.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaoliang Peng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Cui, Y. et al. (2015). Large-Scale Neo-Heterogeneous Programming and Optimization of SNP Detection on Tianhe-2. In: Kunkel, J., Ludwig, T. (eds) High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science(), vol 9137. Springer, Cham. https://doi.org/10.1007/978-3-319-20119-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20119-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20118-4

  • Online ISBN: 978-3-319-20119-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation