GPU Based Hash Segmentation Index for Fast T-overlap Query

Jia, Lianyin; Zhang, Yongbin; Li, Mengjuan; Ding, Jiaman; You, **guo

doi:10.1007/978-981-10-6385-5_4

Lianyin Jia^15,16,
Yongbin Zhang¹⁵,
Mengjuan Li¹⁷,
Jiaman Ding¹⁵ &
…
**guo You¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 727))

Included in the following conference series:

International Conference of Pioneering Computer Scientists, Engineers and Educators

2570 Accesses

Abstract

T-overlap query is the basis of set similarity query and has been applied in many important fields. Most existing approaches employ a pruning-and-verification framework, thus in low efficiency. Modern GPU has much higher parallelism as well as memory bandwidth than CPU and can be used to accelerate T-overlap query. In this paper, we use hash segmentation to divide inverted lists into segments, then design an efficient inverted index called GHSII on GPU using hash segmentation. Based on GHSII, a new segmentation parallel T-overlap algorithm, GSPS, is proposed. GSPS uses segment at a time to scan segments and uses shared memory to decrease the number of accesses to device memory. Furthermore, an optimized algorithm called GSPS-TLLO using a heuristic query order is proposed to solve the problem of load imbalance. Experiments are carried out on two real datasets and the results show that GSPS-TLLO outperforms the state-of-the-art GPU parallel T-overlap algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 85.59; Price includes VAT (Germany)

Softcover Book: EUR 106.99; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Efficient Filter-Based Algorithms for Exact Set Similarity Join on GPUs

An Approach for Progressive Set Similarity Join with GPU Accelerating

GPU-Accelerated Block-Max Query Processing

Notes

1.
When there is no ambiguity, we also use j-SEGs to denote all the j-th segments of query Q related lists.

References

Kawamoto, H., Kitamura, T.: Similarity of speaker individualities of sentence in ATR speech database set C. In: Proceedings of IEICE Technical Report Speech, pp. 33–34 (2013)
Google Scholar
Hadjieleftheriou, M., Chandel, A., Koudas N., et al.: Fast indexes and algorithms for set similarity selection queries. In: Proceeding of IEEE 24th International Conference on Data Engineering (ICDE 2008), pp. 267–276 (2008)
Google Scholar
He, B., Ke, F., Fang, R., et al.: Relational joins on graphics processors. In: Proceeding of ACM SIGMOD International Conference on Management of Data, pp. 511–524 (2007)
Google Scholar
Shalom, S.A.A., Dash, M., Tue, M.: Efficient K-means clustering using accelerated graphics processors. In: Song, I.-Y., Eder, J., Nguyen, T.M. (eds.) DaWaK 2008. LNCS, vol. 5182, pp. 166–175. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85836-2_16
Chapter Google Scholar
Punronen, S., Terziyan, V.: A similarity evaluation technique for data mining with an ensemble of classifiers. In: Proceeding of IEEE Computer Society, International Workshop on Database & Expert Systems Applications, pp. 1155–1160 (2000)
Google Scholar
Kim, J., Vasardani, M., Winter, S.: Similarity matching for integrating spatial information extracted from place descriptions. Int. J. Geogr. Inf. Sci. 1–25 (2016)
Google Scholar
Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: Proceeding of VLDB, pp. 918–929 (2006)
Google Scholar
Chaudhuri, S., Ganti, V., Kaushik, R.: A primitive operator for similarity joins in data cleaning. In: Proceeding of ICDE (2006)
Google Scholar
Lin, X., Wang, W.: Set and string similarity queries a survey. Chin. J. Comput. 34(10), 1853–1862 (2012)
Article Google Scholar
Deng, D., Li, G., Feng, J., et al.: A unified framework for approximate dictionary-based entity extraction. VLDB J. 24(1), 143–167 (2014)
Article Google Scholar
Sarawagi, S., Kirpal, A.: Efficient set joins on similarity predicates. In: Proceeding of SIGMOD 2004, pp. 743–754 (2004)
Google Scholar
Li, C., Lu, J., Lu, Y.: Efficient merging and filtering algorithms for approximate string searches. In: Proceeding of ICDE 2008, pp. 257–266 (2008)
Google Scholar
Jia, L., **, J., Li, M., et al.: ETI: an efficient index for set similarity queries. Front. Comput. Sci. 6(6), 700–712 (2012)
MathSciNet Google Scholar
Li, M., Jia, L., You, J., et al.: Fast T-overlap query algorithms using graphics processor units and its applications in web data query. World Wide Web-internet Web Inf. Syst. 18(2), 1–17 (2013)
Google Scholar
Tatikonda, S., Junqueira, F., Cambazoglu, B.B., et al.: On efficient posting list intersection with multicore processors. In: Proceeding of ACM SIGIR 2009, pp. 738–739 (2009)
Google Scholar
Ding, B., Nig, A.: Fast set intersection in memory. In: Proceeding of the VLDB Endowment 2011, pp. 255–266 (2011)
Google Scholar
Ao, N., Zhang, F., Wu, D., et al.: Efficient parallel lists intersection and index compression algorithms using graphics processing units. In: Proceeding of the VLDB Endowment, pp. 470–481 (2011)
Google Scholar
Ding, S., He, J., Yan, H.: Using graphics processors for high-performance IR query processing. In: Proceeding of WWW, pp. 1213–1214 (2008)
Google Scholar
Programming of shared memory GPUs shared memory systems. http://site.uottawa.ca/~mbolic/ceg4131/CUDA_Report.pdf. Accessed Jan 2016
Wu, D., Zhang, F., Ao, N., et al.: Efficient lists intersection by CPU-GPU cooperative computing. In: Proceeding of 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), pp. 1–8. IEEE (2010)
Google Scholar
Pagh, R., Rodler, F.: Cuckoo hashing. J. Algorithms 51(2), 122–144 (2004)
Article MathSciNet MATH Google Scholar
Bay, S., Kibler, D., Pazzani, M., et al.: The UCI KDD archive of large data sets fordata mining research and experimentation. ACM SIGKDD Explor. Newsl. 2(2), 14–18 (2002)
Google Scholar
Bayardo, R., Ma, Y., Srikant, R.: Scaling up all pairs similarity search. In: Proceeding of International Conference on World WideWeb 2007, pp. 71–81 (2007)
Google Scholar
Broder, A.Z., Carmel, D., Herscovici, M., et al.: Efficient query evaluation using a two-level retrieval process. In: Proceeding of Twelfth International Conference on Information and Knowledge Management, pp. 426–434. ACM (2003)
Google Scholar

Download references

Acknowledgment

The research was supported by Grants from the National Natural Science Foundation of China (No. 61562054, 51467007, 61462050), and the Personnel Training Project of Yunnan Province (No. KKSY201603016).

Author information

Authors and Affiliations

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500, China
Lianyin Jia, Yongbin Zhang, Jiaman Ding & **guo You
Yunnan Provincial Key Lab of Computer Technologies Application, Kunming University of Science and Technology, Kunming, 650500, China
Lianyin Jia
Library, Yunnan Normal University, Kunming, 650500, China
Mengjuan Li

Authors

Lianyin Jia
View author publications
You can also search for this author in PubMed Google Scholar
Yongbin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Mengjuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiaman Ding
View author publications
You can also search for this author in PubMed Google Scholar
**guo You
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mengjuan Li .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Beiji Zou
Central South University, Changsha, China
Min Li
Harbin Institute of Technology, Harbin, China
Hongzhi Wang
Harbin University of Science and Technology, Harbin, China
**anhua Song
Harbin University of Science and Technology, Harbin, China
Wei **e
Harbin Sea of Clouds and Computer Technology, Harbin, China
Zeguang Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jia, L., Zhang, Y., Li, M., Ding, J., You, J. (2017). GPU Based Hash Segmentation Index for Fast T-overlap Query. In: Zou, B., Li, M., Wang, H., Song, X., **e, W., Lu, Z. (eds) Data Science. ICPCSEE 2017. Communications in Computer and Information Science, vol 727. Springer, Singapore. https://doi.org/10.1007/978-981-10-6385-5_4

Download citation

DOI: https://doi.org/10.1007/978-981-10-6385-5_4
Published: 16 September 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6384-8
Online ISBN: 978-981-10-6385-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

GPU Based Hash Segmentation Index for Fast T-overlap Query

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Filter-Based Algorithms for Exact Set Similarity Join on GPUs

An Approach for Progressive Set Similarity Join with GPU Accelerating

GPU-Accelerated Block-Max Query Processing

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

GPU Based Hash Segmentation Index for Fast T-overlap Query

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Efficient Filter-Based Algorithms for Exact Set Similarity Join on GPUs

An Approach for Progressive Set Similarity Join with GPU Accelerating

GPU-Accelerated Block-Max Query Processing

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation