-
Article
Open AccessPattern Masking for Dictionary Matching: Theory and Practice
Data masking is a common technique for sanitizing sensitive data maintained in database systems which is becoming increasingly important in various application areas, such as in record linkage of personal data...
-
Article
Open AccessEfficient Computation of Sequence Mappability
Sequence mappability is an important task in genome resequencing. In the (k, m)-mappability problem, for a given sequence T of length n, the goal is to compute a table whose ith entry is the number of indices ...
-
Chapter and Conference Paper
Subsequence Covers of Words
We introduce subsequence covers (s-covers, in short), a new type of covers of a word. A word C is an s-cover of a word S if the occurrences of C in S as subsequences cover all the positions in S.
-
Chapter and Conference Paper
On the Hardness of Computing the Edit Distance of Shallow Trees
We consider the edit distance problem on rooted ordered trees parameterized by the trees’ depth. For two trees of size at most n and depth at most d, the state-of-the-art solutions of Zhang and Shasha [SICOMP 198...
-
Article
Internal Dictionary Matching
We introduce data structures answering queries concerning the occurrences of patterns from a given dictionary \(\mathsf {D}\)
-
Chapter and Conference Paper
Fault-Tolerant Distance Labeling for Planar Graphs
In fault-tolerant distance labeling we wish to assign short labels to the vertices of a graph G such that from the labels of any three vertices u, v, f we can infer the u-to-v distance in the graph ...
-
Article
Open AccessDynamic and Internal Longest Common Substring
Given two strings S and T, each of length at most n, the longest common substring (LCS) problem is to find a longest substring common to S and T. This is a classical problem in computer science with an $$\mathca...
-
Chapter and Conference Paper
Efficient Enumeration of Distinct Factors Using Package Representations
We investigate properties and applications of a new compact representation of string factors: families of packages. In a string T, each package \((i,\ell ,k)\) represents the factors of T of length \(\ell \) ...
-
Article
Open AccessEfficient enumeration of non-equivalent squares in partial words with few holes
A word of the form WW for some word \(W\in \varSigma ^*\) ...
-
Chapter and Conference Paper
Circular Pattern Matching with k Mismatches
The k-mismatch problem consists in computing the Hamming distance between a pattern P of length m and every length-m substring of a text T of length n, if this distance is no more than k. In many real-world appli...
-
Chapter and Conference Paper
Weighted Shortest Common Supersequence Problem Revisited
A weighted string, also known as a position weight matrix, is a sequence of probability distributions over some alphabet. We revisit the Weighted Shortest Common Supersequence (WSCS) problem, introduced by Am...
-
Chapter and Conference Paper
On Extended Special Factors of a Word
An extended special factor of a word x is a factor of x whose longest infix can be extended by at least two distinct letters to the left or to the right and still occur in x. It is called extended bispecial if it...
-
Chapter and Conference Paper
Longest Common Prefixes with k-Errors and Applications
Although real-world text datasets, such as DNA sequences, are far from being uniformly random, string searching average-case algorithms perform significantly better than worst-case ones in most applications of...
-
Chapter and Conference Paper
Longest Common Prefixes with k-Mismatches and Applications
We propose a new algorithm for computing the longest prefix of each suffix of a given string of length n over a constant-sized alphabet of size ...
-
Chapter and Conference Paper
Property Suffix Array with Applications
The suffix array is one of the most prevalent data structures for string indexing; it stores the lexicographically sorted list of suffixes of a given string. Its practical advantage compared to the suffix tree...
-
Chapter and Conference Paper
Efficient Computation of Sequence Mappability
Sequence mappability is an important task in genome re-sequencing. In the (k, m)-mappability problem, for a given sequence T of length n, our goal is to compute a table whose ith entry is the number of indices ...
-
Chapter and Conference Paper
How to Answer a Small Batch of RMQs or LCA Queries in Practice
In the Range Minimum Query (RMQ) problem, we are given an array A of n numbers and we are asked to answer queries of the following type: for indices i and j between 0 and
-
Article
Open AccessOn avoided words, absent words, and their application to biological sequence analysis
The deviation of the observed frequency of a word w from its expected frequency in a given sequence x is used to determine whether or not the word is avoided. This concept is particularly useful in DNA linguisti...
-
Chapter and Conference Paper
Longest Common Factor After One Edit Operation
It is well known that the longest common factor (LCF) of two strings over an integer alphabet can be computed in time linear in the total length of the two strings. Our aim here is to present an algorithm that...
-
Chapter and Conference Paper
Efficient Enumeration of Non-Equivalent Squares in Partial Words with Few Holes
A word of the form WW for some word \(W\in \varSigma ^*\) is called a square, where