Haplotype-Aware Sequence Alignment to Pangenome Graphs

  • Conference paper
  • First Online:
Research in Computational Molecular Biology (RECOMB 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14758))

  • 352 Accesses

Abstract

Modern pangenome graphs are built using haplotype-resolved genome assemblies. While map** reads to a pangenome graph, prioritizing alignments that are consistent with the known haplotypes has been shown to improve genoty** accuracy. However, the existing rigorous formulations for sequence-to-graph co-linear chaining and alignment problems do not consider the haplotype paths in a pangenome graph. This often leads to spurious read alignments to those paths that are unlikely recombinations of the known haplotypes.

We present novel formulations and algorithms for haplotype-aware sequence alignment to directed acyclic graphs (DAGs). We consider both sequence-to-DAG chaining and sequence-to-DAG alignment problems. Drawing inspiration from the commonly used models for genotype imputation, we assume that a query sequence is an imperfect mosaic of the reference haplotypes. Accordingly, we extend previous chaining and alignment formulations by introducing a recombination penalty for a haplotype switch. First, we solve haplotype-aware sequence-to-DAG alignment in \(O(|Q||E||\mathcal {H}|)\) time where Q is the query sequence, E is the set of edges, and \(\mathcal {H}\) is the set of haplotypes represented in the graph. To complement our solution, we prove that an algorithm significantly faster than \(O(|Q||E||\mathcal {H}|)\) is impossible under the Strong Exponential Time Hypothesis (SETH). Second, we propose a haplotype-aware chaining algorithm that runs in \(O(|\mathcal {H}|N \log {|\mathcal {H}|N})\) time after graph preprocessing, where N is the count of input anchors. We then establish that a chaining algorithm significantly faster than \(O(|\mathcal {H}|N)\) is impossible under SETH. As a proof-of-concept of our algorithmic solutions, we implemented the chaining algorithm in the Minichain aligner (https://github.com/at-cg/minichain). We demonstrate the advantage of the algorithm by aligning sequences sampled from human major histocompatibility complex (MHC) to a pangenome graph of 60 MHC haplotypes. The proposed algorithm offers better consistency with ground-truth recombinations when compared to a haplotype-agnostic algorithm.

A longer version of this paper is available on bioRxiv [3].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abouelhoda, M., Ohlebusch, E.: Chaining algorithms for multiple genome comparison. J. Disc. Algor. 3(2–4), 321–341 (2005)

    Article  MathSciNet  Google Scholar 

  2. Chandra, G., Jain, C.: Gap-sensitive colinear chaining algorithms for acyclic pangenome graphs. J. Comput. Biol. 30(11), 1182–1197 (2023)

    Article  MathSciNet  Google Scholar 

  3. Chandra, G., Jain, C.: Haplotype-aware sequence-to-graph alignment. In: bioRxiv, pp. 2023–11 (2023). https://doi.org/10.1101/2023.11.15.566493

  4. Ebler, J., et al.: Pangenome-based genome inference allows efficient and accurate genoty** across a wide spectrum of variant classes. Nat. Genet. 54(4), 518–525 (2022)

    Article  Google Scholar 

  5. Li, H.: Sample graphs and sequences for testing sequence-to-graph alignment (2022). https://doi.org/10.5281/zenodo.6617246

  6. Li, N., Stephens, M.: Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165(4), 2213–2233 (2003)

    Article  Google Scholar 

  7. Liao, W.W., et al.: A draft human pangenome reference. Nature 617(7960), 312–324 (2023)

    Article  Google Scholar 

  8. Ma, J., Cáceres, M., Salmela, L., Mäkinen, V., Tomescu, A.I.: Chaining for accurate alignment of erroneous long reads to acyclic variation graphs. Bioinformatics 39(8), btad460 (2023)

    Google Scholar 

  9. Mäkinen, V., Tomescu, A.I., Kuosmanen, A., Paavilainen, T., Gagie, T., Chikhi, R.: Sparse dynamic programming on DAGs with small width. ACM Trans. Algor. 15(2), 1–21 (2019)

    Article  MathSciNet  Google Scholar 

  10. Navarro, G.: Improved approximate pattern matching on hypertext. Theoret. Comput. Sci. 237(1–2), 455–463 (2000)

    Article  MathSciNet  Google Scholar 

  11. Pritt, J., Chen, N.C., Langmead, B.: Forge: prioritizing variants for graph genomes. Genome Biol. 19(1), 1–16 (2018)

    Article  Google Scholar 

  12. Williams, V.V.: Hardness of easy problems: basing hardness on popular conjectures such as the strong exponential time hypothesis (invited talk). In: 10th International Symposium on Parameterized and Exact Computation (IPEC 2015). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2015)

    Google Scholar 

Download references

Acknowledgements

This work is supported by funding from the National Supercomputing Mission, India under DST/NSM/ R &D_HPC_Applications, the Science and Engineering Research Board (SERB) under SRG/2021/000044, and the Intel India Research Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chirag Jain .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chandra, G., Gibney, D., Jain, C. (2024). Haplotype-Aware Sequence Alignment to Pangenome Graphs. In: Ma, J. (eds) Research in Computational Molecular Biology. RECOMB 2024. Lecture Notes in Computer Science, vol 14758. Springer, Cham. https://doi.org/10.1007/978-1-0716-3989-4_36

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-3989-4_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-1-0716-3988-7

  • Online ISBN: 978-1-0716-3989-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation