Abstract
The Closest String Problem is defined as follows. Let \(S\) be a set of \(k\) strings \(\{s_1,\ldots ,s_k\}\), each of length \(\ell \). Find a string \(s^*\), such that the maximum Hamming distance of \(s^*\) from each of the strings is minimized. We denote this distance with \(d\). The string \(s^*\) is called a consensus string. In this paper we present two main algorithms, the Configuration algorithm with \(O(k^2 \ell ^ k)\) running time for this problem, and the Minority algorithm. The problem was introduced by Lanctot et al. [SODA’99 and (Inf Comput 185(1):41–55, 2003)]. They showed that the problem is \(\mathcal {NP}\)-hard and provided an approximation algorithm based on Integer Programming. Since then the closest string problem has been studied extensively both in computational biology and theoretical computer science. This research can be roughly divided into three categories: Approximate, exact and practical solutions. This paper falls under the exact solutions category. Despite the great effort to obtain efficient algorithms for this problem an algorithm with the natural running time of \(O(\ell ^ k)\) was not known. In this paper we close this gap. Our result means that algorithms solving the closest string problem in times \(O(\ell ^2), O(\ell ^3), O(\ell ^4)\) and \(O(\ell ^5)\) exist for the cases of \(k=2,3,4\) and \(5\), respectively. It is known that, in fact, the cases of \(k=2,3,\) and \(4\) can be solved in linear time. No efficient algorithm is currently known for the case of \(k=5\). We prove two lemmas, the unit square lemma and the minority lemma that exploit surprising properties of the closest string problem and enable constructing the closest string in a sequential fashion. These lemmas with some additional ideas give a \(O(\ell ^2)\) algorithm for computing a closest string of \(5\) binary strings. Algorithm Minority is based on these lemmas.
Similar content being viewed by others
References
Amir, A., Landau, G.M., Na, J.C., Park, H., Park, K., Sim, J.S.: Consensus optimizing both distance sum and radius. In: Kalgren, J., Tarhio, J., Hyyrö, H. (eds.) Proceedings of 16th Symposium on String Processing and Information Retrieval (SPIRE), LNCS, vol. 5721. Springer, pp. 234–242 (2009)
Amir, A., Paryenty, H., Roditty, L.: Approximations and partial solutions for the consensus sequence problem. In: Proceedings of 18th Symposium on String Processing and Information Retrieval (SPIRE) (2011, to appear)
Andoni, A., Indyk, P., Patrascu, M.: On the optimality of the dimensionality reduction method. In: Proceedings of 47th IEEE Symposium on the Foundation of Computer Science (FOCS), pp. 449–458 (2006)
Ben-Dor, A., Lancia, G., Perone, J., Ravi, R.: Banishing bias from consensus sequences. In: Proceedings of 8th Annual Symposium on Combinatorial Pattern Matching (CPM), pp. 247–261 (1997)
Boucher, C., Brown, D., Durocher, S.: On the structure of small motif recognition instances. In: Proceedings of 15th Symposium on String Processing and Information Retrieval (SPIRE), pp. 269–281 (2008)
Boucher, C., Wilkie, K.: Why large closest string instances are easy to solve in practice. In: Proceedings of 17th Symposium on String Processing and Information Retrieval (SPIRE), pp. 106–117 (2010)
Chimani, M., Woste, M., Bocker, S.: A closer look at the closest string and closest substring problem. In: Proceedings of 13th Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 13–24 (2011)
Evans, P.A., Smith, A., Wareham, H.T.: The Parameterized Complexity of p-Center Approximate Substring Problems. Technical Report TR01-149, Faculty of Computer Science, University of New Brunswick, Canada (2001)
Frances, M., Litman, A.: On covering problems of codes. Theory Comput. Syst. 30(2), 113–119 (1997)
Gramm, J., Niedermeier, R., Rossmanith, P.: Exact solutions for closest string and related problems. In: Eades, P., Takaoka, T. (eds.) Proceedings of 12th Annual Symposium on Algorithms and Computation (ISAAC), LNCS, vol. 2223. Springer, pp. 441–453 (2001)
Gramm, J., Niedermeier, R., Rossmanith, P.: Fixed-parameter algorithms for closest string and related problems. Algorithmica 37(1), 25–42 (2003)
Hufsky, F., Kuchenbecker, L., Jahn, K., Stoye, J., Bocker, S.: Swiftly computing center strings. In: Proceedings of 10th Workshop on Algorithms in Bioinformatics (WABI), pp. 325–336 (2010)
Lanctot, K., Li, M., Ma, B., Wang, S., Zhang, L.: Distinguishing string selection problems. Inf. Comput. 185(1), 41–55 (2003)
Lenstra, H.W.: Integer programming with a fixed number of variables. Math. Oper. Res. 8, 538–548 (1983)
Li, M., Ma, B., Wang, L.: On the closest string and substring problems. J. ACM 49(2), 157–171 (2002)
Ma, B., Sun, X.: More efficient algorithms for closest string and substring problems. SIAM J. Comput. 39(4), 1432–1443 (2009)
Meneses, C.N., Lu, Z., Oliveira, C.A.S., Pardalos, P.M.: Optimal solutions for the closest-string problem via integer programming. INFORMS J. Comput. 16(4), 419–429 (2004)
Stojanovic, N., Berman, P., Gumucio, D., Hardison, R., Miller, W.: A linear-time algorithm for the 1-mismatch problem. In: Proceedings of 5th International Workshop on Algorithms and Data Structures (WADS), pp. 126–135 (1997)
Sze, S., Lu, S., Chen, J.: Integrating sample-driven and pattern-driven approaches in motif finding. In: Proceedings of 4th Workshop on Algorithms in Bioinformatics (WABI), pp. 438–449 (2004)
Acknowledgments
We would like to thank the anonymous reviewers for their helpful remarks.
Author information
Authors and Affiliations
Corresponding author
Additional information
Amihood Amir: Partly supported by NSF Grant CCR-09-04581 and ISF Grant 571/14.
Haim Paryenty: Partly supported by a Bar-Ilan University President’s Fellowship.
Rights and permissions
About this article
Cite this article
Amir, A., Paryenty, H. & Roditty, L. Configurations and Minority in the String Consensus Problem. Algorithmica 74, 1267–1292 (2016). https://doi.org/10.1007/s00453-015-9996-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-015-9996-7