Abstract
SNP detection is a fundamental procedure in genome analysis. A popular SNP detection tool SOAPsnp can take more than one week to analyze one human genome with a 20-fold coverage. To improve the efficiency, we developed mSNP, a parallel version of SOAPsnp. mSNP utilizes CPU cooperated with Intel® Xeon PhiTM for large-scale SNP detection. Firstly, we redesigned the key data structure of SOAPsnp, which significantly reduces the overhead of memory operations. Secondly, we devised a coordinated parallel framework, in which CPU collaborates with Xeon Phi for higher hardware utilization. Thirdly, we proposed a read-based window division strategy to improve throughput and parallel scale on multiple nodes. To the best of our knowledge, mSNP is the first SNP detection tool empowered by Xeon Phi. We achieved a 45x speedup on a single node of Tianhe-2, without any loss in precision. Moreover, mSNP showed promising scalability on 4,096 nodes on Tianhe-2.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
SOAPsnp website: http://soap.genomics.org.cn/soapsnp.html.
References
National Center for Biotechnology Information. http://www.ncbi.nlm.nih.gov/SNP/
Li, R., Li, Y., Fang, X.: SNP detection for massively parallel whole-genome resequencing. Genome Res. 19(6), 1124–1132 (2009)
Short Oligonucleotide Analysis Package Sites. http://soap.genomics.org.cn/index.html
James, J., Reinders, J.: Intel Xeon Phi Coprocessor High Performance Programming. Morgan Kaufmann, Newnes (2013)
Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on x86-based many-core processors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, pp. 273–282. ACM (2013)
Park, J., Bikshandi, G., Vaidyanathan, K., Tang, P.T.P., Dubey, P., Kim, D.: Tera-scale 1D FFT with low-communication algorithm and Intel\(^{\textregistered }\) Xeon Phi\({}^{\rm TM}\) coprocessors. In: Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, p. 34. ACM (2013)
Heinecke, A., Vaidyanathan, K., Smelyanskiy, M., Kobotov, A., Dubtsov, R. et al.: Design and implementation of the linpack benchmark for single and multi-node systems based on intel\({}^{\textregistered }\) Xeon Phi coprocessor. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 126–137. IEEE (2013)
Pennycook, S.J., Hughes, C.J., Smelyanskiy, M., Jarvis, S.A.: Exploring SIMD for molecular dynamics, using intel\({}^{\textregistered }\) Xeon\({}^{\textregistered }\) processors and intel\({}^{\textregistered }\) Xeon Phi coprocessors. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS), pp. 1085–1097. IEEE (2013)
Misra, S., Pamnany, K., Aluru, S.: Parallel mutual information based construction of whole-genome networks on the intel\({}^{\textregistered }\) Xeon PhiTM coprocessor. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 241–250. IEEE (2014)
Graham, S.L., Kessler, P.B., McKusick, M.K.: Gprof: a call graph execution profiler. ACM SIGPLAN Not. 39(4), 49–57 (2004)
Wikipedia Sites of VTune. http://en.wikipedia.org/wiki/VTune
TOP500 Supercomputer Sites. http://www.top500.org/system/177999
Tang, J., Leunissen, J.A.M., Voorrips, R.E.: HaploSNPer: a web-based allele and SNP de-tection tool. BMC Genet. 9(1), 23 (2008)
Dereeper, A., Nicolas, S., Le Cunff, L.: SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects. BMC Bioinform. 12(1), 134 (2011)
Tang, J., Vosman, B., Voorrips, R.E.: QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species. BMC Bioinform. 7(1), 438 (2006)
Li, H., Handsaker, B., Wysoker, A.: The sequence alignment/map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009)
DePristo, M.A., Banks, E., Poplin, R.: A framework for variation discovery and genoty** using next-generation DNA sequencing data. Nature Genet. 43(5), 491–498 (2011)
Raczy, C., Petrovski, R., Saunders, C.T.: Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms. Bioinformatics 29, 2041–2043 (2013). btt314
Langmead, B., Schatz, M.C., Lin, J.: Searching for SNPs with cloud computing. Genome Biol. 10(11), R134 (2009)
Shvachko, K., Kuang, H., Radia, S., The hadoop distributed file system. IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Zhao, S., Prenger, K., Smith, L.: Rainbow: a tool for large-scale whole-genome sequencing data analysis using cloud computing. BMC Genomics 14(1), 425 (2013)
Lu, M., Zhao, J., Luo, Q.: GSNP: a DNA single-nucleotide polymorphism detection system with GPU acceleration. In: 2011 International Conference on Parallel Processing (ICPP), pp. 592–601. IEEE (2011)
Kutlu, M., Agrawal, G.: Cluster-based SNP calling on large-scale genome sequencing data. In: 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid computing (CCGrid), pp. 455–464. IEEE (2013)
Cui, Y., Liao, X., Zhu, X.: mBWA: a massively parallel sequence reads aligner. In: Saez-Rodriguez, J., Rocha, M.P., Fdez-Riverola, F., De Paz Santana, J.F. (eds.) PACBB 2014. AISC, vol. 294, pp. 113–120. Springer, Heidelberg (2014)
Acknowledgments
We would like to thank Mr. Yingrui Li from BGI for providing the source code of SOAPsnp and Dr. Jun Wang from BGI for providing related test data. We would also like to thank Prof. Hans V. Westerhoff from University of Manchester for discussions of the human genome re-sequencing analysis problem and thus improving our own understanding. This work is supported by NSFC Grant 61272056, U1435222, 61133005, 61120106005, 91430218 and 61303191.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Cui, Y. et al. (2015). Large-Scale Neo-Heterogeneous Programming and Optimization of SNP Detection on Tianhe-2. In: Kunkel, J., Ludwig, T. (eds) High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science(), vol 9137. Springer, Cham. https://doi.org/10.1007/978-3-319-20119-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-20119-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20118-4
Online ISBN: 978-3-319-20119-1
eBook Packages: Computer ScienceComputer Science (R0)