Sparse Suffix and LCP Array: Simple, Direct, Small, and Fast

Ayad, Lorraine A. K.; Loukides, Grigorios; Pissis, Solon P.; Verbeek, Hilde

doi:10.1007/978-3-031-55598-5_11

Lorraine A. K. Ayad⁹,
Grigorios Loukides¹⁰,
Solon P. Pissis^11,12 &
…
Hilde Verbeek¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14578))

Included in the following conference series:

Latin American Symposium on Theoretical Informatics

215 Accesses

Abstract

Sparse suffix sorting is the problem of sorting \(b=o(n)\) suffixes of a string of length n. Efficient sparse suffix sorting algorithms have existed for more than a decade. Despite the multitude of works and their justified claims for applications in text indexing, the existing algorithms have not been employed by practitioners. Arguably this is because there are no simple, direct, and efficient algorithms for sparse suffix array construction. We provide two new algorithms for constructing the sparse suffix and LCP arrays that are simultaneously simple, direct, small, and fast. In particular, our algorithms are: simple in the sense that they can be implemented using only basic data structures; direct in the sense that the output arrays are not a byproduct of constructing the sparse suffix tree or an LCE data structure; fast in the sense that they run in \(\mathcal {O}(n\log b)\) time, in the worst case, or in \(\mathcal {O}(n)\) time, when the total number of suffixes with an LCP value greater than \(2^{\lfloor \log \frac{n}{b} \rfloor + 1}-1\) is in \(\mathcal {O}(b/\log b)\), matching the time of optimal yet much more complicated algorithms [Gawrychowski and Kociumaka, SODA 2017; Birenzwige et al., SODA 2020]; and small in the sense that they can be implemented using only \(8b+o(b)\) machine words. We also show that our second algorithm can be trivially amended to work in \(\mathcal {O}(n)\) time for any uniformly random string. Our algorithms are non-trivial space-efficient adaptations of the Monte Carlo algorithm by I et al. for constructing the sparse suffix tree in \(\mathcal {O}(n\log b)\) time [STACS 2014].

SPP and HV are supported by the PANGAIA project (GA 872539). SPP is supported by the ALPACA project (GA 956229). HV is supported by a Constance van Eeden Fellowship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 53.49; Price includes VAT (Germany)

Softcover Book: EUR 70.61; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Sparse Suffix Tree Construction in Small Space

Extracting the Sparse Longest Common Prefix Array from the Suffix Binary Search Tree

LCP Array Construction Using O(sort(n)) (or Less) I/Os

Notes

1.
I et al. [16] claim \(\mathcal {O}(s)\) space but from their construction it is evident that in fact \(s+\mathcal {O}(1)\) machine words are used.
2.
We stress that the pseudocode is complete in the sense that it only assumes the implementation of Lemma 1 (Line 11).
3.
We assume that \(A[v_i] + k + 2^j - 1 \le n\); otherwise, the suffix ends at position n.
4.
This is generally not true when \(j_\text {start}\) was set to a value less than \(\lfloor \log n \rfloor \); in this case, the LCP values are only correct if they are at most \(2^{j_\text {start} + 1} - 1\); see Sect. 4.
5.
If this is not the case, we output incorrect arrays deliberately to ensure that our algorithm is Monte Carlo.
6.
If \(i=1\) then the group member id’s are \(L[1], \ldots , L[C[i]]\).

References

Arbitman, Y., Naor, M., Segev, G.: Backyard cuckoo hashing: Constant worst-case operations with a succinct representation. In: FOCS, pp. 787–796 (2010)
Google Scholar
Ayad, L.A.K., Loukides, G., Pissis, S.P.: Text indexing for long patterns: anchors are all you need. Proc. VLDB Endow. 16(9), 2117–2131 (2023)
Article Google Scholar
Ben-Nun, S., Golan, S., Kociumaka, T., Kraus, M.: Time-space tradeoffs for finding a long common substring. In: CPM. LIPIcs, vol. 161, pp. 5:1–5:14 (2020)
Google Scholar
Bender, M.A., Conway, A., Farach-Colton, M., Kuszmaul, W., Tagliavini, G.: Iceberg hashing: optimizing many hash-table criteria at once. J. ACM 70(6) (2023)
Google Scholar
Bernardini, G., Fici, G., Gawrychowski, P., Pissis, S.P.: Substring complexity in sublinear space. In: ISAAC. LIPIcs, vol. 283, pp. 12:1–12:19 (2023)
Google Scholar
Bille, P., Fischer, J., Gørtz, I.L., Kopelowitz, T., Sach, B., Vildhøj, H.W.: Sparse text indexing in small space. ACM Trans. Algorithms 12(3), 39:1–39:19 (2016)
Google Scholar
Birenzwige, O., Golan, S., Porat, E.: Locally consistent parsing for text indexing in small space. In: SODA, pp. 607–626 (2020)
Google Scholar
Bollobás, B., Letzter, S.: Longest common extension. Eur. J. Comb. 68, 242–248 (2018)
Article MathSciNet Google Scholar
Chan, T.M., Munro, J.I., Raman, V.: Selection and sorting in the “restore” model. ACM Trans. Algorithms 14(2), 11:1–11:18 (2018)
Google Scholar
Christiansen, A.R., Ettienne, M.B., Kociumaka, T., Navarro, G., Prezza, N.: Optimal-time dictionary-compressed indexes. ACM Trans. Algorithms 17(1), 8:1–8:39 (2021)
Google Scholar
Dietzfelbinger, M., Gil, J., Matias, Y., Pippenger, N.: Polynomial hash functions are reliable. In: Kuich, W. (ed.) ICALP 1992. LNCS, vol. 623, pp. 235–246. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-55719-9_77
Chapter Google Scholar
Fischer, J., Tomohiro, I., Köppl, D.: Deterministic sparse suffix sorting in the restore model. ACM Trans. Algorithms 16(4), 50:1–50:53 (2020)
Google Scholar
Franceschini, G., Muthukrishnan, S., Pǎtraşcu, M.: Radix sorting with no extra space. In: Arge, L., Hoffmann, M., Welzl, E. (eds.) ESA 2007. LNCS, vol. 4698, pp. 194–205. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-75520-3_19
Chapter Google Scholar
Gawrychowski, P., Kociumaka, T.: Sparse suffix tree construction in optimal time and space. In: SODA, pp. 425–439 (2017)
Google Scholar
Grabowski, S., Raniszewski, M.: Sampled suffix array with minimizers. Softw. Pract. Exp. 47(11), 1755–1771 (2017)
Article Google Scholar
Tomohiro, I., Kärkkäinen, J., Kempa, D.: Faster sparse suffix sorting. In: STACS. LIPIcs, vol. 25, pp. 386–396 (2014)
Google Scholar
Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. J. ACM 53(6), 918–936 (2006)
Article MathSciNet Google Scholar
Kärkkäinen, J., Ukkonen, E.: Sparse suffix trees. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp. 219–230. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61332-3_155
Chapter Google Scholar
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)
Article MathSciNet Google Scholar
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Amir, A. (ed.) CPM 2001. LNCS, vol. 2089, pp. 181–192. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-48194-X_17
Chapter Google Scholar
Katajainen, J., Pasanen, T., Teuhola, J.: Practical in-place mergesort. Nord. J. Comput. 3(1), 27–40 (1996)
MathSciNet Google Scholar
Loukides, G., Pissis, S.P.: Bidirectional string anchors: a new string sampling mechanism. In: ESA. LIPIcs, vol. 204, pp. 64:1–64:21 (2021)
Google Scholar
Loukides, G., Pissis, S.P., Sweering, M.: Bidirectional string anchors for improved text indexing and top-K similarity search. IEEE Trans. Knowl. Data Eng. 35(11), 11093–11111 (2023)
Article Google Scholar
Navarro, G., Prezza, N.: Universal compressed text indexing. Theor. Comput. Sci. 762, 41–50 (2019)
Article MathSciNet Google Scholar
Paige, R., Tarjan, R.E.: Three partition refinement algorithms. SIAM J. Comput. 16(6), 973–989 (1987)
Article MathSciNet Google Scholar
Prezza, N.: Optimal substring equality queries with applications to sparse text indexing. ACM Trans. Algorithms 17(1), 7:1–7:23 (2021)
Google Scholar
Salowe, J.S., Steiger, W.L.: Simplified stable merging tasks. J. Algorithms 8(4), 557–571 (1987)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Brunel University London, London, UK
Lorraine A. K. Ayad
King’s College London, London, UK
Grigorios Loukides
CWI, Amsterdam, The Netherlands
Solon P. Pissis & Hilde Verbeek
Vrije Universiteit, Amsterdam, The Netherlands
Solon P. Pissis

Authors

Lorraine A. K. Ayad
View author publications
You can also search for this author in PubMed Google Scholar
Grigorios Loukides
View author publications
You can also search for this author in PubMed Google Scholar
Solon P. Pissis
View author publications
You can also search for this author in PubMed Google Scholar
Hilde Verbeek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Solon P. Pissis .

Editor information

Editors and Affiliations

DIM-CMM, Universidad de Chile, Santiago, Chile
José A. Soto
Technical University of Munich, Munich, Germany
Andreas Wiese

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ayad, L.A.K., Loukides, G., Pissis, S.P., Verbeek, H. (2024). Sparse Suffix and LCP Array: Simple, Direct, Small, and Fast. In: Soto, J.A., Wiese, A. (eds) LATIN 2024: Theoretical Informatics. LATIN 2024. Lecture Notes in Computer Science, vol 14578. Springer, Cham. https://doi.org/10.1007/978-3-031-55598-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-031-55598-5_11
Published: 06 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-55597-8
Online ISBN: 978-3-031-55598-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Sparse Suffix and LCP Array: Simple, Direct, Small, and Fast

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Sparse Suffix Tree Construction in Small Space

Extracting the Sparse Longest Common Prefix Array from the Suffix Binary Search Tree

LCP Array Construction Using O(sort(n)) (or Less) I/Os

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Sparse Suffix and LCP Array: Simple, Direct, Small, and Fast

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Sparse Suffix Tree Construction in Small Space

Extracting the Sparse Longest Common Prefix Array from the Suffix Binary Search Tree

LCP Array Construction Using O(sort(n)) (or Less) I/Os

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation