-
Chapter
Scalable Text Index Construction
We survey recent advances in scalable text index construction with a focus on practical algorithms in distributed, shared, and external memory.
-
Chapter and Conference Paper
On the Optimisation of the GSACA Suffix Array Construction Algorithm
The suffix array is arguably one of the most important data structures in sequence analysis and consequently there is a multitude of suffix sorting algorithms. However, to this date the GSACA algorithm introduced...
-
Chapter and Conference Paper
On the Computation of Longest Previous Non-overlap** Factors
The f-factorization of a string is similar to the well-known Lempel-Ziv (LZ) factorization, but differs from it in that the factors must be non-overlap**. There are two linear time algorithms that compute the
-
Chapter and Conference Paper
Trickier XBWT Tricks
A trie [11] is one of the best data structures for implementing and searching a dictionary. However, to build the trie structure for larger collections of strings takes up a lot of memory. Since the eXtended Burr...
-
Article
Open AccessErratum to: A representation of a compressed de Bruijn graph for pan-genome analysis that enables search
-
Article
Open AccessA representation of a compressed de Bruijn graph for pan-genome analysis that enables search
Recently, Marcus et al. (Bioinformatics 30:3476–83, 2014) proposed to use a compressed de Bruijn graph to describe the relationship between the genomes of many individuals/strains of the same or closely related s...
-
Chapter and Conference Paper
Parallel Construction of Succinct Representations of Suffix Tree Topologies
A compressed suffix tree usually consists of three components: a compressed suffix array, a compressed \(\mathsf {LCP}\) -array,...
-
Chapter and Conference Paper
Efficient Construction of a Compressed de Bruijn Graph for Pan-Genome Analysis
Recently, Marcus et al. (Bioinformatics 2014) proposed to use a compressed de Bruijn graph of maximal exact matches to describe the relationship between the genomes of many individuals/strains of the same or c...
-
Chapter and Conference Paper
Alphabet-Independent Algorithms for Finding Context-Sensitive Repeats in Linear Time
The identification of repetitive sequences (repeats) is an essential component of genome sequence analysis, and there are dozens of algorithms that search for exact or approximate repeats. The notions of maxim...
-
Chapter and Conference Paper
Space-Efficient Construction of the Burrows-Wheeler Transform
The Burrows-Wheeler transform (BWT), originally invented for data compression, is nowadays also the core of many self-indexes, which can be used to solve many problems in bioinformatics. However, the memory requi...
-
Chapter and Conference Paper
Space-Efficient Computation of Maximal and Supermaximal Repeats in Genome Sequences
The identification of repetitive sequences (repeats) is an essential component of genome sequence analysis, and the notions of maximal and supermaximal repeats capture all exact repeats in a genome in a compac...
-
Chapter and Conference Paper
Computing the Burrows-Wheeler Transform of a String and Its Reverse
The contribution of this paper is twofold. First, we provide new theoretical insights into the relationship between a string and its reverse: If the Burrows-Wheeler transform (BWT) of a string has been compute...
-
Article
Linear Time Algorithms for Generalizations of the Longest Common Substring Problem
In its simplest form, the longest common substring problem is to find a longest substring common to two or multiple strings. Using (generalized) suffix trees, this problem can be solved in linear time and space. ...
-
Chapter and Conference Paper
Lempel-Ziv Factorization Revisited
For 30 years the Lempel-Ziv factorization of a string has played an important role in data compression, and more recently it was used as the basis of linear time algorithms for the detection of all maximal rep...
-
Chapter and Conference Paper
Computing the Longest Common Prefix Array Based on the Burrows-Wheeler Transform
Many sequence analysis tasks can be accomplished with a suffix array, and several of them additionally need the longest common prefix array. In large scale applications, suffix arrays are being replaced with f...
-
Chapter and Conference Paper
Bidirectional Search in a String with Wavelet Trees
Searching for genes encoding microRNAs (miRNAs) is an important task in genome analysis. Because the secondary structure of miRNA (but not the sequence) is highly conserved, the genes encoding it can be determ...
-
Chapter and Conference Paper
CST++
Let A be an array of n elements taken from a totally ordered set. We present a data structure of size 3n + o(n) bits that allows us to answer the following queries on A in constant time, without accessing A: (1) ...
-
Chapter and Conference Paper
Computing Matching Statistics and Maximal Exact Matches on Compressed Full-Text Indexes
Exact string matching is a problem that computer programmers face on a regular basis, and full-text indexes like the suffix tree or the suffix array provide fast string search over large texts. In the last dec...
-
Chapter and Conference Paper
A Compressed Enhanced Suffix Array Supporting Fast String Matching
Index structures like the suffix tree or the suffix array are of utmost importance in stringology, most notably in exact string matching. In the last decade, research on compressed index structures has flouris...
-
Article
Open AccessA fast algorithm for the multiple genome rearrangement problem with weighted reversals and transpositions
Due to recent progress in genome sequencing, more and more data for phylogenetic reconstruction based on rearrangement distances between genomes become available. However, this phylogenetic reconstruction is a...