PIM-Align: A Processing-in-Memory Architecture for FM-Index Search Algorithm

Li, Xue-Qi; Tan, Guang-Ming; Sun, Ning-Hui

doi:10.1007/s11390-020-0825-3

PIM-Align: A Processing-in-Memory Architecture for FM-Index Search Algorithm

Regular Paper
Published: 30 January 2021

Volume 36, pages 56–70, (2021)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Xue-Qi Li^1,2,
Guang-Ming Tan^1,2 &
Ning-Hui Sun^1,2

531 Accesses
Explore all metrics

Abstract

Genomic sequence alignment is the most critical and time-consuming step in genomic analysis. Alignment algorithms generally follow a seed-and-extend model. Acceleration of the extension phase for sequence alignment has been well explored in computing-centric architectures on field-programmable gate array (FPGA), application-specific integrated circuit (ASIC), and graphics processing unit (GPU) (e.g., the Smith-Waterman algorithm). Compared with the extension phase, the seeding phase is more critical and essential. However, the seeding phase is bounded by memory, i.e., fine-grained random memory access and limited parallelism on conventional system. In this paper, we argue that the processing-in-memory (PIM) concept could be a viable solution to address these problems. This paper describes “PIM-Align”—application-driven near-data processing architecture for sequence alignment. In order to achieve memory-capacity proportional performance by taking advantage of 3D-stacked dynamic random access memory (DRAM) technology, we propose a lightweight message mechanism between different memory partitions, and a specialized hardware prefetcher for memory access patterns of sequence alignment. Our evaluation shows that the proposed architecture can achieve 20x and 1 820x speedup when compared with the best available ASIC implementation and the software running on 32-thread CPU, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Shendure J, Ji H. Next-generation DNA sequencing. Nature Biotechnology, 2008, 26(10): 1135-1145. https://doi.org/10.1038/nbt1486.
Article Google Scholar
Erdmann J. Next generation technology edges genome sequencing toward the clinic. Chemistry & Biology, 2011, 18(12): 1513-1514. https://doi.org/10.1016/j.chembiol.2011.12.006.
Stephens Z D, Lee S Y, Faghri F, Campbell R H, Zhai C, Efron M J, Iyer R, Schatz M C, Sinha S, Robinson G E. Big data: Astronomical or genomical? PLoS Biology, 2015, 13(7): Article No. e1002195. https://doi.org/10.1371/journal.pbio.1002195.
Turakhia Y, Bejerano G, Dally W J. Darwin: A genomics co-processor provides up to 15,000X acceleration on long read assembly. In Proc. the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems, Mar. 2018, pp.199-213. https://doi.org/10.1145/3173162.3173193.
Zhang J, Lin H, Balaji P, Feng W C. Optimizing burrows-wheeler transform-based sequence alignment on multicore architectures. In Proc. the 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, May 2013, pp.377-384. https://doi.org/10.1109/CCGrid.2013.67.
Lu M, Tan Y, Bai G, Luo Q. High-performance short sequence alignment with GPU acceleration. Distributed and Parallel Databases, 2012, 30(5/6): 385-399. https://doi.org/10.1007/s10619-012-7099-x.
Article Google Scholar
Chang M C F, Chen Y T, Cong J, Huang P T, Kuo C L, Yu C H. The SMEM seeding acceleration for DNA sequence alignment. In Proc. the 24th International Symposium on Field-Programmable Custom Computing Machines, May 2016, pp.32-39. https://doi.org/10.1109/FCCM.2016.21.
Wang Y, Li X, Zang D, Tan G, Sun N. Accelerating FM-index search for genomic data processing. In Proc. the 47th International Conference on Parallel Processing, Aug. 2018, Article No. 65. https://doi.org/10.1145/3225058.3225134.
Kocberber O, Grot B, Picorel J, Falsafi B, Lim K, Ranganathan P. Meet the walkers accelerating index traversals for in-memory databases. In Proc. the 46th IEEE/ACM International Symposium on Microarchitecture, Dec. 2013, pp.468-479. https://doi.org/10.1145/2540708.2540748.
Weis C, Wehn N, Igor L, Benini L. Design space exploration for 3D-stacked DRAMs. In Proc. the Design, Automation & Test in Europe, Mar. 2011, pp.389-394. https://doi.org/10.1109/DATE.2011.5763068.
Langmead B, Trapnell C, Pop M, Salzberg S L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology, 2009, 10(3): Article No. R25. https://doi.org/10.1186/gb-2009-10-3-r25.
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ar**v:1303.3997, 2013, Mar. 2013. https://arxiv.org/abs/1303.3997, Nov. 2020.
Langmead B, Salzberg S L. Fast gapped-read alignment with Bowtie 2. Nature Methods, 2012, 9(4): 357-359. https://doi.org/10.1038/nmeth.1923.
Article Google Scholar
Luo R, Wong T, Zhu J et al. SOAP3-dp: Fast, accurate and sensitive GPU-based short read aligner. PloS One, 2013, 8(5): Article No. e65632. https://doi.org/10.1371/journal.pone.0065632.
Ahmed N, Bertels K, Al-Ars Z. A comparison of seed-and-extend techniques in modern DNA read alignment algorithms. In Proc. the 2016 IEEE International Conference on Bioinformatics and Biomedicine, Dec. 2016, pp.1421-1428. 10.1109/BIBM.2016.7822731.
Hu X, Stow D, **e Y. Die stacking is happening. IEEE Micro, 2018, 38(1): 22-28. https://doi.org/10.1109/MM.2018.011441561.
Article Google Scholar
Shevgoor M, Kim J S, Chatterjee N, Balasubramonian R, Davis A, Udipi A N. Quantifying the relationship between the power delivery network and architectural policies in a 3D-stacked memory device. In Proc. the 46th International Symposium on Microarchitecture, Feb. 2013, pp.198-209. https://doi.org/10.1145/2540708.2540726.
Zhu Y, Wang B, Li D, Zhao J. Integrated thermal analysis for processing in die-stacking memory. In Proc. the 2nd International Symposium on Memory Systems, Oct. 2016, pp.402-414. https://doi.org/10.1145/2989081.2989093.
Gao M, Ayers G, Kozyrakis C. Practical near-data processing for in-memory analytics frame-works. In Proc. the 2015 International Conference on Parallel Architecture and Compilation, Mar. 2015, pp.113-124. https://doi.org/10.1109/PACT.2015.22.
Kim Y, Yang W, Mutlu O. Ramulator: A fast and extensible dram simulator. IEEE Computer Architecture Letters, 2015, 15(1): 45-49. https://doi.org/10.1109/LCA.2015.2414456.
Article Google Scholar
Chen K, Li S, Muralimanohar N, Ahn J H, Brockman J B, Jouppi N P. CACTI-3DD: Architecture-level modeling for 3D die-stacked dram main memory. In Proc. the Conference on Design, Automation and Test in Europe, Mar. 2012, pp.33-38. https://doi.org/10.1109/DATE.2012.6176428.
Pugsley S H, Jestes J, Zhang H, Balasubramonian R, Srinivasan V, Buyuktosunoglu A, Davis A, Li F. NDC: Analyzing the impact of 3D-stacked memory+ logic devices on MapReduce workloads. In Proc. the IEEE International Symposium on Performance Analysis of Systems and Software, Mar. 2014, pp.190-200. https://doi.org/10.1109/IS-PASS.2014.6844483.
Pran K, Taher A. Logic Synthesis Using Synopsys^®. Springer Science & Business Media, 2012.
Canzar S, Salzberg S L. Short read map**: An algorithmic tour. Proc. the IEEE, 2017, 105(3): 436-458. https://doi.org/10.1109/JPROC.2015.2455551.
Article Google Scholar
** with FastHASH. BMC Genomics, 2013, 14(Suppl 1): Article No. S13. https://doi.org/10.1186/1471-2164-14-S1-S13.
Alkan C, Kidd J M, Marques-Bonet T et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nature Genetics, 2009, 41(10): 1061-1067. https://doi.org/10.1038/ng.437.
Hach F, Hormozdiari F, Alkan C, Hormozdiari F, Birol I, Eichler E E, Sahinalp S C. mrsFAST: A cache-oblivious algorithm for short-read map**. Nature Methods, 2010, 7(8): 576-577. https://doi.org/10.1038/nmeth0810-576.
Article Google Scholar
David M, Dzamba M, Lister D, Ilie L, Brudno M. SHRiMP2: Sensitive yet practical short read map**. Bioinformatics, 2011, 27(7): 1011-1012. https://doi.org/10.1093/bioinformatics/btr046.
Article Google Scholar
Li H, Durbin R. Fast and accurate short read alignment with burrows wheeler transform. Bioinformatics, 2009, 25(14): 1754-1760. https://doi.org/10.1093/bioinformatics/btp324.
Article Google Scholar
Fernandez E, Najjar W, Lonardi S. String matching in hardware using the FM-index. In Proc. the 19th Annual International Symposium on Field-Programmable Custom Computing Machines, May 2011, pp.218-225. https://doi.org/10.1109/FCCM.2011.55.
Fernandez E B, Najjar W A, Lonardi S, Villarreal J. Multithreaded FPGA acceleration of DNA sequence map**. In Proc. the 2012 IEEE Conference on High Performance Extreme Computing, Sept. 2012. https://doi.org/10.1109/HPEC.2012.6408669.
Fernandez E B, Villarreal J, Lonardi S, Najjar W A. FHAST: FPGA-based acceleration of Bowtie in hardware. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2015, 12(5): 973-981. https://doi.org/10.1109/TCBB.2015.2405333.
Article Google Scholar
Liu Y, Schmidt B. Evaluation of GPU-based seed generation for computational genomics using burrows-wheeler transform. In Proc. the 26th IEEE International Symposium on Parallel and Distributed Processing Symposium Workshops & PhD Forum, Aug. 2012, pp.684-690. https://doi.org/10.1109/IPDPSW.2012.85.
Fujiki D, Subramaniyan A, Zhang T, Zeng Y, Das R, Blaauw D, Narayanasamy S. GenAx: A genome sequencing accelerator. In Proc. the 45th Annual International Symposium on Computer Architecture, July 2018, pp.69-82. https://doi.org/10.1109/ISCA.2018.00017.
Balasubramonian R, Chang J, Manning T, Moreno J H, Murphy R, Nair R, Swanson S. Near-data processing: Insights from a micro-46 workshop. IEEE Micro, 2014, 34(4): 36-42. https://doi.org/10.1109/MM.2014.55.
Article Google Scholar
Seshadri V, Kim Y, Fallin C et al. RowClone: Fast and energy-efficient in-dram bulk data copy and initialization. In Proc. the 46th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 2013, pp.185-197. https://doi.org/10.1145/2540708.2540725.
Zhu Q, Akin B, Sumbul H E, Sadi F, Hoe J C, Pileggi L, Franchetti F. A 3D-stacked logic-in-memory accelerator for application-specific data intensive computing. In Proc. the 2013 IEEE International 3D Systems Integration Conference, Oct. 2013. https://doi.org/10.1109/3DIC.2013.6702348.
Zhu Q, Graf T, Sumbul H E, Pileggi L, Franchetti F. Accelerating sparse matrix-matrix multiplication with 3D-stacked logic-in-memory hardware. In Proc. the 2013 IEEE High Performance Extreme Computing Conference, Sept. 2013. https://doi.org/10.1109/HPEC.2013.6670336.
Vijayaraghavan T, Rajesh A, Sankaralingam K. MPU-BWM: Accelerating sequence alignment. IEEE Computer Architecture Letters, 2018, 17(2): 179-182. https://doi.org/10.1109/LCA.2018.2849064.
Article Google Scholar
Asghari-Moghaddam H, Son Y H, Ahn J H, Kim N S. Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems. In Proc. the 49th Annual IEEE/ACM International Symposium on Microarchitecture, Oct. 2016. https://doi.org/10.1109/MICRO.2016.7783753.
Kaplan R, Yavits L, Ginosar R, Weiser U. A resistive cam processing-in-storage architecture for DNA sequence alignment. IEEE Micro, 2017, 37(4): 20-28. https://doi.org/10.1109/MM.2017.3211121.
Article Google Scholar
Huangfu W, Li S, Hu X, **e Y. RADAR: A 3D-ReRAM based DNA alignment accelerator architecture. In Proc. the 55th Design Automation Conference, Jun. 2018, Article No. 59. https://doi.org/10.1145/3195970.3196098.
Ahn J, Hong S, Yoo S, Mutlu O, Choi K. A scalable processing-in-memory accelerator for parallel graph processing. In Proc. the 42nd Annual International Symposium on Computer Architecture, June 2015, pp.105-117. https://doi.org/10.1145/2749469.2750386.
Nagasaka Y, Nukada A, Matsuoka S. Adaptive multi-level blocking optimization for sparse matrix vector multiplication on GPU. Procedia Computer Science, 2016, 80: 131-142. https://doi.org/10.1016/j.procs.2016.05.304.
Article Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Bei**g, 100190, China
Xue-Qi Li, Guang-Ming Tan & Ning-Hui Sun
University of Chinese Academy of Sciences, Bei**g, 100049, China
Xue-Qi Li, Guang-Ming Tan & Ning-Hui Sun

Authors

Xue-Qi Li
View author publications
You can also search for this author in PubMed Google Scholar
Guang-Ming Tan
View author publications
You can also search for this author in PubMed Google Scholar
Ning-Hui Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xue-Qi Li.

Supplementary Information

ESM 1

(PDF 917 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, XQ., Tan, GM. & Sun, NH. PIM-Align: A Processing-in-Memory Architecture for FM-Index Search Algorithm. J. Comput. Sci. Technol. 36, 56–70 (2021). https://doi.org/10.1007/s11390-020-0825-3

Download citation

Received: 15 July 2020
Accepted: 06 January 2021
Published: 30 January 2021
Issue Date: January 2021
DOI: https://doi.org/10.1007/s11390-020-0825-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PIM-Align: A Processing-in-Memory Architecture for FM-Index Search Algorithm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enabling fast and energy-efficient FM-index exact matching using processing-near-memory

A Block-Based Systolic Array on an HBM2 FPGA for DNA Sequence Alignment

3D-Stacked Many-Core Architecture for Biological Sequence Analysis Problems

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

PIM-Align: A Processing-in-Memory Architecture for FM-Index Search Algorithm

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Enabling fast and energy-efficient FM-index exact matching using processing-near-memory

A Block-Based Systolic Array on an HBM2 FPGA for DNA Sequence Alignment

3D-Stacked Many-Core Architecture for Biological Sequence Analysis Problems

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation