Search
Search Results
-
Metric multidimensional scaling for large single-cell datasets using neural networks
Metric multidimensional scaling is one of the classical methods for embedding data into low-dimensional Euclidean space. It creates the...
-
Compression algorithm for colored de Bruijn graphs
A colored de Bruijn graph (also called a set of k-mer sets), is a set of k-mers with every k-mer assigned a set of colors. Colored de Bruijn graphs...
-
ESKEMAP: exact sketch-based read map**
BackgroundGiven a sequencing read, the broad goal of read map** is to find the location(s) in the reference genome that have a “similar sequence”....
-
NestedBD: Bayesian inference of phylogenetic trees from single-cell copy number profiles under a birth-death model
Copy number aberrations (CNAs) are ubiquitous in many types of cancer. Inferring CNAs from cancer genomic data could help shed light on the...
-
Revisiting the complexity of and algorithms for the graph traversal edit distance and its variants
The graph traversal edit distance (GTED), introduced by Ebrahimpour Boroojeny et al. (2018), is an elegant distance measure defined as the minimum...
-
Fast, parallel, and cache-friendly suffix array construction
PurposeString indexes such as the suffix array ( sa ) and the closely related longest common prefix ( lcp ) array are fundamental objects in...
-
Pfp-fm: an accelerated FM-index
FM-indexes are crucial data structures in DNA alignment, but searching with them usually takes at least one random access per character in the query...
-
Space-efficient computation of k-mer dictionaries for large values of k
Computing k -mer frequencies in a collection of reads is a common procedure in many genomic applications. Several state-of-the-art k -mer counters rely...
-
Infrared: a declarative tree decomposition-powered framework for bioinformatics
MotivationMany bioinformatics problems can be approached as optimization or controlled sampling tasks, and solved exactly and efficiently using...
-
Median quartet tree search algorithms using optimal subtree prune and regraft
Gene trees can be different from the species tree due to biological processes and inference errors. One way to obtain a species tree is to find one...
-
Suffix sorting via matching statistics
We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a...
-
Finding maximal exact matches in graphs
BackgroundWe study the problem of finding maximal exact matches (MEMs) between a query string Q and a labeled graph G . MEMs are an important class...
-
SparseRNAfolD: optimized sparse RNA pseudoknot-free folding with dangle consideration
MotivationComputational RNA secondary structure prediction by free energy minimization is indispensable for analyzing structural RNAs and their...
-
Recombinations, chains and caps: resolving problems with the DCJ-indel model
One of the most fundamental problems in genome rearrangement studies is the (genomic) distance problem. It is typically formulated as finding the...
-
Unifying duplication episode clustering and gene-species map** inference
We present a novel problem, called MetaEC, which aims to infer gene-species assignments in a collection of partially leaf-labeled gene trees labels...
-
Predicting horizontal gene transfers with perfect transfer networks
BackgroundHorizontal gene transfer inference approaches are usually based on gene sequences: parametric methods search for patterns that deviate from...
-
Global exact optimisations for chloroplast structural haplotype scaffolding
BackgroundScaffolding is an intermediate stage of fragment assembly. It consists in orienting and ordering the contigs obtained by the assembly of...
-
Co-linear chaining on pangenome graphs
Pangenome reference graphs are useful in genomics because they compactly represent the genetic diversity within a species, a capability that linear...
-
Fulgor : a fast and compact k-mer index for large-scale matching and color queriesThe problem of sequence identification or matching—determining the subset of reference sequences from a given collection that are likely to contain a...
-
Dollo-CDP: a polynomial-time algorithm for the clade-constrained large Dollo parsimony problem
The last decade of phylogenetics has seen the development of many methods that leverage constraints plus dynamic programming. The goal of this...