-
Article
ProS: data series progressive k-NN similarity search and classification with probabilistic quality guarantees
Existing systems dealing with the increasing volume of data series cannot guarantee interactive response times, even for fundamental tasks such as similarity search. Therefore, it is necessary to develop analy...
-
Article
Correction to: Unsupervised and scalable subsequence anomaly detection in large data series
A correction to this paper has been published: https://doi.org/10.1007/s00778-021-00678-1
-
Chapter and Conference Paper
PrivSketch: A Private Sketch-Based Frequency Estimation Protocol for Data Streams
Local differential privacy (LDP) has recently become a popular privacy-preserving data collection technique protecting users’ privacy. The main problem of data stream collection under LDP is the poor utility d...
-
Chapter and Conference Paper
Towards Building a Digital Twin of Complex System Using Causal Modelling
Complex systems, such as communication networks, generate thousands of new data points about the system state every minute. Even if faults are rare events, they can easily propagate, which makes it challenging...
-
Article
Unsupervised and scalable subsequence anomaly detection in large data series
Subsequence anomaly (or outlier) detection in long sequences is an important problem with applications in a wide range of domains. However, the approaches that have been proposed so far in the literature have ...
-
Article
Fast data series indexing for in-memory data
Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance requ...
-
Article
Open AccessUsing social media for sub-event detection during disasters
Social media platforms have become fundamental tools for sharing information during natural disasters or catastrophic events. This paper presents SEDOM-DD (Sub-Events Detection on sOcial Media During Disasters...
-
Article
BestNeighbor: efficient evaluation of kNN queries on large time series databases
This paper presents parallel solutions (developed based on two state-of-the-art algorithms iSAX and sketch) for evaluating k nearest neighbor queries on large databases of time series, compares them based on v...
-
Chapter
Preliminaries
This chapter formally defines the ER task and its core notions along with the measures used for evaluating the generated results. The following definitions and notations are generic and can capture the variati...
-
Chapter
Generation 2: Also Addressing Volume
The main difference between the 1st and the 2nd ER generation is the challenge of Volume, as the input DSs now comprise (dozens of) millions of profiles. The quality of input data remains the same, involving rela...
-
Chapter
Generation 4: Also Addressing Velocity
This generation differs from the previous ones in two aspects:
-
there are time constraints with respect to the ER running time; and
...
-
-
Chapter
Resources for Entity Resolution
We now provide an overview of the available resources for develo**, evaluating and comparing ER methods.
-
Chapter
Entity Resolution: Past, Present, and Yet-to-Come
The core organizational unit in many applications is the profile, i.e., the collection of information that pertains to a particular real-world entity. Profiles are used to organize data of any structuredness, be ...
-
Chapter
Possible Directions for Future Work
Although ER has been attracting the interest of several research groups for decades, there are still several open research problems and opportunities. The evolving nature of data and system challenges constitu...
-
Book
-
Chapter
Generation 1: Addressing Veracity
The target of the ER methods in this generation is Veracity. They focus on transforming the input profiles into an accurate set of entities that is as close as possible to the corresponding real-world objects. Th...
-
Chapter
Generation 3: Also Addressing Variety
A shift in ER was marked by its 3rd generation, where the input comprises large volumes of semi-structured, unstructured, or highly heterogeneous structured data. This means that ER has to address not only Ver...
-
Chapter
Leveraging External Knowledge
During the last few years, External Knowledge was shown to be an extremely interesting mechanism that improves the accuracy of the final set of discovered entities. Hence, methods using external knowledge lie at ...
-
Article
Scalable data series subsequence matching with ULISSE
Data series similarity search is an important operation, and at the core of several analysis tasks and applications related to data series collections. Despite the fact that data series indexes enable fast sim...
-
Article
Matrix profile goes MAD: variable-length motif and discord discovery in data series
In the last 15 years, data series motif and discord discovery have emerged as two useful and well-used primitives for data series mining, with applications to many domains, including robotics, entomology, seismol...