Abstract
Data analysis typically involves error recovery and detection of regularities as two different key tasks. In this paper we show that there are data types for which these two tasks can be powerfully combined. A common notion of regularity in strings is that of a cover. Data describing measures of a natural coverable phenomenon may be corrupted by errors caused by the measurement process, or by the inexact features of the phenomenon itself. Due to this reason, different variants of approximate covers have been introduced, some of which are \(\mathcal {NP}\)-hard to compute. In this paper we assume that the Hamming distance metric measures the amount of corruption experienced, and study the problem of recovering the correct cover from data corrupted by mismatch errors, formally defined as the cover recovery problem (CRP). We show that for the Hamming distance metric, coverability is a powerful property allowing detecting the original cover and correcting the data, under suitable conditions. We also study a relaxation of another problem, which is called the approximate cover problem (ACP). Since the ACP is proved to be \(\mathcal {NP}\)-hard (Amir et al. in: Approximate cover of strings. CPM, 2017), we study a relaxation, which we call the candidate-relaxation of the ACP, and show it has a polynomial time complexity. As a result, we get that the ACP also has a polynomial time complexity in many practical situations. An important application of our ACP relaxation study is also a polynomial time algorithm for the CRP.
Similar content being viewed by others
References
Abrahamson, K.: Generalized string matching. SIAM J. Comput. 16(6), 1039–1051 (1987)
Amir, A., Eisenberg, E., Levy, A.: Approximate periodicity. In: Proceedings of ISAAC 2010, LNCS 6506, Springer, Berlin, pp. 25–36 (2010)
Amir, A., Eisenberg, E., Levy, A., Porat, E., Shapira, N.: Cycle detection and correction. ACM Trans. Algorithms 9(1), 13 (2012)
Amir, A., Amit, M., Landau, G.M., Sokol, D.: Period recovery over the hamming and edit distances. In: Proceedings of 15th Latin American Theoretical Informatics Symposium (LATIN), pp. 55–67 (2016)
Amir, A., Levy, A., Lubin, R., Porat, E.: Approximate Cover of Strings. CPM (2017)
Apostolico, A., Breslauer, D.: Of periods, quasiperiods, repetitions and covers. In: Proceedings of Structures in Logic and Computer Science, LNCS 1261, pp. 236–248. Springer, Berlin (1997)
Antoniou, P., Crochemore, M., Iliopoulos, C.S., Jayasekera, I., Landau, G.M.: Conservative string covering of indeterminate strings. In: Proceedings of Stringology, pp. 108–115 (2008)
Apostolico, A., Ehrenfeucht, A.: Efficient detection of quasiperiodicities in strings. Theoret. Comput. Sci. 119, 247–265 (1993)
Apostolico, A., Farach, M., Iliopoulos, C.S.: Optimal superprimitivity testing for strings. Inf. Process. Lett. 39, 17–20 (1991)
Breslauer, D.: An on-line string superprimitivity test. Inf. Process. Lett. 44, 345–347 (1992)
Breslauer, D.: Testing string superprimitivity in parallel. Inf. Process. Lett. 49(5), 235–241 (1994)
Cazaux, B., Rivals, E.: A linear time algorithm for shortest cyclic cover of strings. J. Discrete Algorithms 37, 56–67 (2016)
Christodoulakis, M., Iliopoulos, C.S., Park, K., Sim, J.S.: Approximate Seeds of Strings. J. Autom. Lang. Comb. 10, 609–626 (2005)
Crawford, T., Iliopoulos, C.S., Raman, R.: String matching techniques for musical similarity and melodic recognition. Comput. Musicol. 11, 73–100 (1998)
Crochemore, M., Iliopoulos, C.S., Pissis, S.P., Tischler, G.: Cover array string reconstruction. In: Proceedings of CPM, pp. 251–259 (2010)
Crochemore, M., Iliopoulos, C.S., Yu, H.: Algorithms for computing evolutionary chains in molecular and musical sequences. In: Proceedings of 9th Australian Workshop on Combinatorial Algorithms, pp. 172–185 (1998)
Flouri, T., Iliopoulos, C.S., Kociumaka, T., Pissis, S.P., Puglisi, S.J., Smyth, W.F., Tyczynski, W.: Enhanced string covering. Theor. Comput. Sci. 506, 102–114 (2013)
Guth, O., Melichar, B.: Using finite automata approach for searching approximate seeds of strings. In: Intelligent Automation and Computer Engineering. Springer, Netherlands, ISBN 978-90-481-3517-2, pp. 347–360 (2010)
Guth, O., Melichar, B., Balik, M.: Searching all approximate covers and their distance using finite automata. In: Information Technologies Applications and Theory, pp. 21–26, ISBN 978-80-969184-9-2 (2009)
Iliopoulos, C.S., Mouchard, L.: Quasiperiodicity and string covering. Theor. Comput. Sci. 218(1), 205–216 (1999)
Iliopoulus, C.S., Moore, D.W.G., Park, K.: Covering a string. Algorithmica 16(3), 288–297 (1996)
Iliopoulos, C.S., Smyth, W.F.: An on-line algorithm of computing a minimum set of \(k\)-covers of a string. In: Proceedings of 9th Australasian Workshop on Combinatorial Algorithms (AWOCA), pp. 97–106 (1998)
Kociumaka, T., Pissis, S.P., Radoszewski, J., Rytter, W., Walen, T.: Fast algorithm for partial covers in words. In: Proceedings of CPM, pp. 177–188 (2013)
Kociumaka, T., Pissis, S.P., Radoszewski, J., Rytter, W., Walen, T.: Fast algorithm for partial covers in words. Algorithmica 73, 217–233 (2015)
Kolpakov, R.M., Kucherov, G.: Finding approximate repetitions under hamming distance. Theor. Comput. Sci. 303, 135–156 (2003)
Knuth, D.E., Morris, J., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)
Landau, G.M., Schmidt, J.P.: An algorithm for approximate tandem repeats. In: Proceedings of 4th Symposium Combinatorial Pattern Matching, LNCS 648, Springer, Berlin, pp. 120–133 (1993)
Landau, G.M., Schmidt, J.P., Sokol, D.: An algorithm for approximate tandem repeats. J. Comput. Biol. 8(1), 1–18 (2001)
Li, Y., Smyth, W.F.: Computing the cover array in linear time. Algorithmica 32(1), 95–106 (2002)
Lothaire, M.: Combinatorics on Words. Addison-Wesley, Reading (1983)
Mhaskar, N., Smyth, W.F.: String covering with optimal covers. J. Discrete Algorithms 51, 26–38 (2018)
Moore, D., Smyth, W.F.: An optimal algorithm to compute all the covers of a string. Inf. Process. Lett. 50(5), 239–246 (1994)
Moore, D., Smyth, W.F.: A correction to: an optimal algorithm to compute all the covers of a string. Inf. Process. Lett. 54, 101–103 (1995)
Sim, J.S., Iliopoulos, C.S., Park, K., Smyth, W.F.: Approximate periods of strings. Theoret. Comput. Sci. 262, 557–568 (2001)
Smyth, W.F.: Repetitive perhaps, but certainly not boring. Theoret. Comput. Sci. 249(2), 343–355 (2000)
Zhang, H., Guo, Q., Iliopoulos, C.S.: Algorithms for computing the lambda-regularities in strings. Fundam. Inform. 84(1), 33–49 (2008)
Zhang, H., Guo, Q., Iliopoulos, C.S.: Varieties of regularities in weighted sequences. In: Proceedings of AAIM 2010, LNCS 6124, pp. 271–280. Springer, Berlin (2010)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A partial version of this paper appeared in the proceedings of CPM 2017.
Rights and permissions
About this article
Cite this article
Amir, A., Levy, A., Lewenstein, M. et al. Can We Recover the Cover?. Algorithmica 81, 2857–2875 (2019). https://doi.org/10.1007/s00453-019-00559-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-019-00559-8