Extended Pairwise Sequence Alignment

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2023 (ICCSA 2023)

Abstract

A pairwise sequence alignment is a structure describing a set of editing operations that transforms one given sequence into another given sequence. We consider insertion, deletion, and substitution of symbols as editing operations. Given a fixed function assigning a weight for each editing operation, the weight of an alignment A is the sum of the editing operations described by A. Needleman and Wunsch proposed an algorithm for finding a pairwise sequence alignment of minimum editing weight. However, a sequence of editing operations that transforms one sequence into another cannot always be represented by an alignment. We present a more general structure that allows us to represent any sequence of editing operations that transforms one sequence into another. We also show how to find a minimum weight sequence of editing operations to transform one sequence into another in quadratic time, even if they cannot be represented by an alignment. Additionally, we show that there exists no algorithm to solve the problem with subquadratic running time, unless SETH is false. This approach may be used to explain non-trivial evolutionary models in Molecular Biology, where the triangle inequality does not hold for the distance between the sequences, such as those involving adaptive and back mutations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)

    Article  Google Scholar 

  2. Araujo, E., Martinez, F.V., Higa, C.H.A., Soares, J.: Matrices inducing generalized metric on sequences. Discrete Appl. Math. (2023, to appear)

    Google Scholar 

  3. Araujo, E., Rozante, L.C., Rubert, D.P., Martinez, F.V.: Algorithms for normalized multiple sequence alignments. In: Proceedings of ISAAC. LIPIcs, vol. 212, pp. 40:1–40:16 (2021)

    Google Scholar 

  4. Backurs, A., Indyk, P.: Edit distance cannot be computed in strongly subquadratic time (unless SETH is false). In: Proceedings of STOC, pp. 51–58 (2015)

    Google Scholar 

  5. Barton, C., Flouri, T., Iliopoulos, C.S., Pissis, S.P.: Global and local sequence alignment with a bounded number of gaps. Theor. Comput. Sci. 582, 1–16 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  6. Chaurasiya, R.K., Londhe, N.D., Ghosh, S.: A novel weighted edit distance-based spelling correction approach for improving the reliability of Devanagari script-based P300 speller system. IEEE Access 4, 8184–8198 (2016)

    Article  Google Scholar 

  7. Chenna, R., et al.: Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 31(13), 3497–3500 (2003)

    Article  Google Scholar 

  8. Fisman, D., Grogin, J., Margalit, O., Weiss, G.: The Normalized Edit Distance with Uniform Operation Costs is a Metric. ar**v:2201.06115 (2022)

  9. Floyd, R.: Algorithm 97: shortest path. Commun. ACM 5(6), 345 (1962)

    Article  Google Scholar 

  10. Foster, P.: Adaptive mutation in Escherichia coli. J. Bacteriol. 186(15), 4846–4852 (2004)

    Article  Google Scholar 

  11. de la Higuera, C., Micó, L.: A contextual normalised edit distance. In: Proceedings of ICDEW, pp. 354–361. IEEE (2008)

    Google Scholar 

  12. Karplus, K., Barrett, C., Hughey, R.: Hidden Markov models for detecting remote protein homologies. Bioinformatics 14(10), 846–856 (1998)

    Article  Google Scholar 

  13. Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Doklady 10(8), 707–710 (1966)

    MathSciNet  Google Scholar 

  14. Lipman, D.J., Altschul, S.F., Kececioglu, J.D.: A tool for multiple sequence alignment. PNAS 86(12), 4412–4415 (1989)

    Article  Google Scholar 

  15. Lipman, D.J., Pearson, W.R.: Rapid and sensitive protein similarity searches. Science 227(4693), 1435–1441 (1985)

    Article  Google Scholar 

  16. Ichinose, M., Iizuka, M., Kusumi, J., Takefu, M.: Models of compensatory molecular evolution: effects of back mutation. J. Theor. Biol. 323(0), 1–10 (2013)

    Google Scholar 

  17. Marzal, A., Vidal, E.: Computation of normalized edit distance and applications. IEEE T. Pattern Anal. 15(9), 926–932 (1993)

    Article  Google Scholar 

  18. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)

    Article  Google Scholar 

  19. Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302(1), 205–217 (2000)

    Article  Google Scholar 

  20. Rosenberg, S.: Evolving responsively: adaptive mutation. Nat. Rev. Genet. 2, 504–515 (2001)

    Article  Google Scholar 

  21. Setubal, J.C., Meidanis, J.: Introduction to Computational Molecular Biology. PWS Pub. (1997)

    Google Scholar 

  22. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147(1), 195–197 (1981)

    Article  Google Scholar 

  23. Sun, Y., et al.: ICDAR 2019 competition on large-scale street view text with partial labeling-RRC-LSVT. In: Proceedings of ICDAR, pp. 1557–1562. IEEE (2019)

    Google Scholar 

  24. Warshall, S.: A theorem on Boolean matrices. J. ACM 9(1), 11–12 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  25. Yujian, L., Bo, L.: A normalized Levenshtein distance metric. IEEE T. Pattern Anal. 29(6), 1091–1095 (2007)

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank José Augusto Ramos Soares, Said Sadique Adi, and Vagner Pedrotti for valuable discussions on this topic.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eloi Araujo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Araujo, E., Martinez, F.V., Rozante, L.C., Almeida, N.F. (2023). Extended Pairwise Sequence Alignment. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2023. ICCSA 2023. Lecture Notes in Computer Science, vol 13956 . Springer, Cham. https://doi.org/10.1007/978-3-031-36805-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36805-9_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36804-2

  • Online ISBN: 978-3-031-36805-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation