Implicit Compression Boosting with Applications to Self-indexing

  • Conference paper
String Processing and Information Retrieval (SPIRE 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4726))

Included in the following conference series:

Abstract

Compression boosting (Ferragina & Manzini, SODA 2004) is a new technique to enhance zeroth order entropy compressors’ performance to k-th order entropy. It works by constructing the Burrows-Wheeler transform of the input text, finding optimal partitioning of the transform, and then compressing each piece using an arbitrary zeroth order compressor. The optimal partitioning has the property that the achieved compression is boosted to k-th order entropy, for any k. The technique has an application to text indexing: Essentially, building a wavelet tree (Grossi et al., SODA 2003) for each piece in the partitioning yields a k-th order compressed full-text self-index providing efficient substring searches on the indexed text (Ferragina et al., SPIRE 2004). In this paper, we show that using explicit compression boosting with wavelet trees is not necessary; our new analysis reveals that the size of the wavelet tree built for the complete Burrows-Wheeler transformed text is, in essence, the sum of those built for the pieces in the optimal partitioning. Hence, the technique provides a way to do compression boosting implicitly, with a trivial linear time algorithm, but fixed to a specific zeroth order compressor (Raman et al., SODA 2002). In addition to having these consequences on compression and static full-text self-indexes, the analysis shows that a recent dynamic zeroth order compressed self-index (Mäkinen & Navarro, CPM 2006) occupies in fact space proportional to k-th order entropy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 42.79
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 52.74
Price includes VAT (France)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Apostolico, A.: The myriad virtues of subword trees. In: Combinatorial Algorithms on Words, NATO ISI Series, pp. 85–96. Springer, Heidelberg (1985)

    Google Scholar 

  2. Arroyuelo, D., Navarro, G.: Space-efficient construction of LZ-index. In: Deng, X., Du, D.-Z. (eds.) ISAAC 2005. LNCS, vol. 3827, pp. 1143–1152. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report Technical Report 124, Digital Equipment Corporation (1994)

    Google Scholar 

  4. Chan, H.-L., Hon, W.-K., Lam, T.-W.: Compressed index for a dynamic collection of texts. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 445–456. Springer, Heidelberg (2004)

    Google Scholar 

  5. Ferragina, P., Giancarlo, R., Manzini, G.: The myriad virtues of wavelet trees. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4051, pp. 560–571. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. FOCS 2000, pp. 390–398 (2000)

    Google Scholar 

  7. Ferragina, P., Manzini, G.: Compression boosting in optimal linear time using the Burrows-Wheeler transform. In: Proc. SODA 2004, pp. 655–663 (2004)

    Google Scholar 

  8. Ferragina, P., Manzini, G.: Indexing compressed texts. J. of the ACM 52(4), 552–581 (2005)

    Article  MathSciNet  Google Scholar 

  9. Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM TALG, 3(2): article 20 (2007)

    Google Scholar 

  10. Gagie, T.: Large alphabets and incompressibility. IPL 99(6), 246–251 (2006)

    Article  MathSciNet  Google Scholar 

  11. Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. SODA 2003, pp. 841–850 (2003)

    Google Scholar 

  12. Grossi, R., Gupta, A., Vitter, J.: When indexing equals compression: Experiments with compressing suffix arrays and applications. In: Proc. SODA 2004, pp. 636–645 (2004)

    Google Scholar 

  13. Hon, W.-K., Sadakane, K., Sung, W.-K.: Breaking a time-and-space barrier in constructing full-text indexes. In: Proc. FOCS 2003, pp. 251–260 (2003)

    Google Scholar 

  14. Jacobson, G.: Space-efficient static trees and graphs. In: Proc. FOCS 1989, pp. 549–554 (1989)

    Google Scholar 

  15. Kärkkäinen, J.: Fast BWT in small space by blockwise suffix sorting. In: Proc. DIMACS Working Group on the Burrows-Wheeler Transform (2004)

    Google Scholar 

  16. Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nordic J. of Computing 12(1), 40–66 (2005)

    Google Scholar 

  17. Mäkinen, V., Navarro, G.: Dynamic entropy-compressed sequences and full-text indexes. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 307–318. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  18. Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. on Computing, 935–948 (1993)

    Google Scholar 

  19. Manzini, G.: An analysis of the Burrows-Wheeler transform. J. of the ACM 48(3), 407–430 (2001)

    Article  MathSciNet  Google Scholar 

  20. Na, J.C.: Linear-time construction of compressed suffix arrays using o(n logn)-bit working space for large alphabets. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 57–67. Springer, Heidelberg (2005)

    Google Scholar 

  21. Pagh, R.: Low redundancy in dictionaries with O(1) worst case lookup time. In: Wiedermann, J., van Emde Boas, P., Nielsen, M. (eds.) ICALP 1999. LNCS, vol. 1644, pp. 595–604. Springer, Heidelberg (1999)

    Google Scholar 

  22. Raman, R., Raman, V., Srinivasa, S.: Succinct dynamic data structures. In: Dehne, F., Sack, J.-R., Tamassia, R. (eds.) WADS 2001. LNCS, vol. 2125, pp. 426–437. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  23. Raman, R., Raman, V., Srinivasa Roa, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. SODA 2002, pp. 233–242 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Nivio Ziviani Ricardo Baeza-Yates

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mäkinen, V., Navarro, G. (2007). Implicit Compression Boosting with Applications to Self-indexing. In: Ziviani, N., Baeza-Yates, R. (eds) String Processing and Information Retrieval. SPIRE 2007. Lecture Notes in Computer Science, vol 4726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75530-2_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75530-2_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75529-6

  • Online ISBN: 978-3-540-75530-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation