Abstract
A tandem repeat is an occurrence of two adjacent identical substrings. In this paper, we introduce the notion of a double string, which consists of two parallel strings, and we study the problem of locating all tandem repeats in a double string. The problem introduced here has applications beyond actual double strings, as we illustrate by solving two different problems with the algorithm of the double string tandem repeats problem. The first problem is that of finding all corner-sharing tandems in a 2-dimensional text, defined by Apostolico and Brimkov. The second problem is that of finding all scaled tandem repeats in a 1d text, where a scaled tandem repeat is defined as a string \(UU'\) such that \(U'\) is discrete scale of U. In addition to the algorithms for exact tandem repeats, we also present algorithms that solve the problem in the inexact sense, allowing up to k mismatches. We believe that this framework will open a new perspective for other problems in the future.
Similar content being viewed by others
Notes
In DNA there are specific relationships between corresponding bases, while our definition of a double string does not imply any such relationship.
References
Amir, A., Butman, A., Lewenstein, M.: Real scaled matching. Inf. Process. Lett. 70(4), 185–190 (1999)
Apostolico, A., Brimkov, V.E.: Optimal discovery of repetitions in 2d. Discret. Appl. Math. 151(1–3), 5–20 (2005)
Butman, A., Eres, R., Landau, G.M.: Scaled and permuted string matching. Inf. Process. Lett. 92(6), 293–297 (2004)
Crochemore, M., Ilie, L., Rytter, W.: Repetitions in strings: Algorithms and combinatorics. Theoretical Computer Science, 410(50):5227 – 5235 (2009). Mathematical Foundations of Computer Science (MFCS 2007)
Galil, Z., Giancarlo, R.: Improved string matching with \(k\) mismatches. SIGACT News 17(4), 52–54 (1986)
Geizhals, S.H., Sokol, D.: Finding maximal 2-dimensional palindromes. Inf. Comput. 266, 161–172 (2019)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997)
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)
Iliopoulos, C.S., Moore, D., Smyth, W.F.: A characterization of the squares in a fibonacci string. Theoret. Comput. Sci. 172(1), 281–291 (1997)
Karp, R. M., Miller, R. E., Rosenberg, A. L.: Rapid identification of repeated patterns in strings, trees and arrays. In: Proceedings of the 4th Annual ACM Symposium on Theory of Computing (STOC), pp. 125–136 (1972)
Knuth, D.E., Morris, J.H., Jr., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)
Kolpakov, R. M., Kucherov, G.: Finding maximal repetitions in a word in linear time. In: 40th Annual Symposium on Foundations of Computer Science, FOCS ’99, 17-18 October, 1999, New York, NY, USA, pp. 596–604. IEEE Computer Society (1999)
Landau, G.M., Schmidt, J.P., Sokol, D.: An algorithm for approximate tandem repeats. J. Comput. Biol. 8, 1–18 (2001)
Landau, G.M., Vishkin, U.: Fast string matching with k differences. J. Comput. Syst. Sci. 37(1), 63–78 (1988)
Landau, G.M., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10(2), 157–169 (1989)
Liu, J.J., Huang, G.S., Wang, Y.L.: A fast algorithm for finding the positions of all squares in a run-length encoded string. Theoret. Comput. Sci. 410(38), 3942–3948 (2009)
Main, M.G., Lorentz, R.J.: An O(n log n) algorithm for finding all repetitions in a string. J. Algorithms 5(3), 422–432 (1984)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Funding
The authors A. Amir and G. M. Landau have been partially supported by Grant No. 2018141 from the United States-Israel Binational Science Foundation (BSF) and Israel Science Foundation Grant 1475-18. D. Sokol was also partially supported by BSF Grant No. 2018141. S. Marcus was partially supported by the Professional Staff Congress City University of New York Research Award 63164-00 51.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Amir, A., Butman, A., Landau, G.M. et al. Double String Tandem Repeats. Algorithmica 85, 170–187 (2023). https://doi.org/10.1007/s00453-022-01016-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-022-01016-9