Abstract
A string S of length n has period P of length p if \(S[i]=S[i+p]\) for all \(1 \le i \le n-p\) and \(n \ge 2p\). The shortest such substring, P, is called the period of S, and the string S is called periodic in P. In this paper we investigate the period recovery problem. Given a string S of length n, find the primitive period(s) P such that the distance between S and the string that is periodic in P is below a threshold \(\tau \). We consider the period recovery problem over both the Hamming distance and the edit distance. For the Hamming distance case, we present an \(O(n \log n)\) time algorithm, where \(\tau \) is given as \(\frac{n}{(2+\epsilon )p}\), for \(0 < \epsilon < 1\). For the edit distance case, \(\tau =\frac{n}{(4+\epsilon )p}\), and we provide an \(O(n^{4/ 3})\) time algorithm.
A. Amir—Partially supported by the Israel Science Foundation grant 571/14, and grant No. 2014028 from the United States-Israel Binational Science Foundation (BSF).
M. Amit—Partially supported by the Israel Science Foundation grant 571/14, grant No. 2014028 from the United States-Israel Binational Science Foundation (BSF) and DFG.
G. M. Landau—Partially supported by the Israel Science Foundation grant 571/14, grant No. 2014028 from the United States-Israel Binational Science Foundation (BSF) and DFG.
D. Sokol—Partially supported by the United States-Israel Binational Science Foundation (BSF) grant No. 2014028.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In previous work, the lemmas state “up to cyclic rotations” which means that one conjugate is counted/reported for each set of cyclic permutations of a given period P. Here we clarify this language by always finding the single best conjugate.
- 2.
- 3.
References
Amir, A., Eisenberg, E., Levy, A., Porat, E., Shapira, N.: Cycle detection and correction. ACM Trans. Algorithms 9(1), 13:1–13:20 (2012)
Amit, M., Crochemore, M., Landau, G.M.: Locating all maximal approximate runs in a string. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 13–27. Springer, Heidelberg (2013)
Bannai, H.., Inenaga, T.I.S., Nakashima, Y., Takeda, M., Tsuruta, K.: The “runs” theorem. CoRR, abs/1406.0263v4 (2014)
Brodal, G.S., Lyngsø, R.B., Östlin, A., Pedersen, C.N.S.: Solving the string statistics problem in time \({{\cal O}}(n \log n)\). In: Widmayer, P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP 2002. LNCS, vol. 2380, pp. 728–739. Springer, Heidelberg (2002)
Chan, T.M.: Persistent predecessor search and orthogonal point location on the word ram. ACM Trans. Algorithms (TALG) 9(3), 22 (2013)
Chazelle, B.: A functional approach to data structures and its use in multidimensional searching. SIAM J. Comput. 17(3), 427–462 (1988)
Crochemore, M.: An optimal algorithm for computing the repetitions in a word. Inf. Process. Lett. 12(5), 244–250 (1981)
Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings, 392 p. Cambridge University Press, Cambridge (2007)
Crochemore, M., Iliopoulos, C., Kubica, M., Radoszewski, J., Rytter, W., Waleń, T.: Extracting powers and periods in a string from its runs structure. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 258–269. Springer, Heidelberg (2010)
Fine, N.J., Wilf, H.S.: Uniqueness theorems for periodic functions. Proc. Am. Math. Soc. 16, 109–114 (1965)
Fischetti, V.A., Landau, G.M., Sellers, P.H., Schmidt, J.P.: Identifying periodic occurences of a template with applications to protein structure. Inf. Process. Lett. 45(1), 11–18 (1993)
Galil, Z., Giancarlo, R.: Improved string matching with \(k\) mismatches. SIGACT News 17(4), 52–54 (1986)
Gusfield, D., Stoye, J.: Linear time algorithms for finding and representing all the tandem repeats in a string. J. Comput. Syst. Sci. 69(4), 525–546 (2004)
Iliopoulos, C.S., Moore, D., Smyth, W.F.: A characterization of the squares in a Fibonacci string. Theor. Comput. Sci. 172(1–2), 281–291 (1997)
Karp, R.M., Miller, R.E., Rosenberg, A.L.: Rapid identification of repeated patterns in strings, trees, and arrays. In: STOC: ACM Symposium on Theory of Computing (STOC) (1972)
Kolpakov, R.M., Kucherov, G.: Finding maximal repetitions in a word in linear time. In: Proceedings of Symposium on Foundations of Computer Science (FOCS), pp. 596–604 (1999)
Kolpakov, R.M., Kucherov, G.: Finding approximate repetitions under Hamming distance. Theor. Comput. Sci 1(303), 135–156 (2003)
Landau, G.M., Schmidt, J.P., Sokol, D.: An algorithm for approximate tandem repeats. J. Comput. Biol. 8(1), 1–18 (2001)
Landau, G.M., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10(2), 157–169 (1989)
Lothaire, M.: Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications). Cambridge University Press, New York (2005)
Lyndon, R.C.: On Burnside’s problem. Trans. Am. Math. Soc. 77(2), 202–215 (1954)
Myers, E.W., Miller, W.: Approximate matching of regular expressions. Bull. Math. Biol. 51(1), 5–37 (1989)
Sim, J.S., Iliopoulos, C.S., Park, K., Smyth, W.F.: Approximate periods of strings. In: Crochemore, M., Paterson, M. (eds.) CPM 1999. LNCS, vol. 1645, pp. 123–133. Springer, Heidelberg (1999)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Amir, A., Amit, M., Landau, G.M., Sokol, D. (2016). Period Recovery over the Hamming and Edit Distances. In: Kranakis, E., Navarro, G., Chávez, E. (eds) LATIN 2016: Theoretical Informatics. LATIN 2016. Lecture Notes in Computer Science(), vol 9644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-49529-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-662-49529-2_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-49528-5
Online ISBN: 978-3-662-49529-2
eBook Packages: Computer ScienceComputer Science (R0)