1 Introduction

The capacity to sense the world around them represents the key affordance of computing devices nowadays found under popular terms, such as the Internet of Things (IoT), cyber-physical systems, and ubiquitous computing (ubicomp). This integration of computing and sensing was essential for achieving such early milestones as the Moon landing and the first humanoid robot in late 1960s. Yet, the moment when the first iPhone hit the shelves in 2008 marked the start of a new era of sensor-computing integration, the one in which compact mobile computing devices equipped with an array of sensors will soon outnumber people on this planet. The ever-increasing range of sensors available on mobile devices, nowadays including multiple cameras, microphones, accelerometers, gyroscopes, location, light and temperature sensors, and wireless gesture recognition sensors, to name a few, enabled revolutionary new services to be provided by the mobiles. Furthermore, in parallel to the rise of smartphone and wearable sensing, the advances in embedded computing have propelled the use of sensors in systems ranging from unmanned areal vehicles (UAVs, i.e. “drones”) over factory robots and IoT devices, to self-driving cars. Consequently, the spectrum of applications relying on integrated sensing and computing has already expanded to cover anything from wildfire monitoring to vacuuming a home, and with the increase in number of deployed devices showing no signs of waning, we can safely assume that the true potential of sensing-computing integration is yet to be observed.

Widening the range and the sophistication of sensing-based applications calls for the increased amount of data to be collected as well as for the complex computation to be supported by sensing-computing systems. For instance, high-level inferences from sensor data are possible, but only if enough data is funneled through complex data processing pipelines which include data filtering, extracting features from raw sensor data, and machine learning modeling. Recent advancements in the area of deep learning pushed the complexity of the requested computation and the amount of data needed even further. Thus, models containing millions of parameters can process high-resolution camera images and detect objects present in these images. Similarly, high-frequency samples taken from on-body accelerometers can, with the help of long short-term memory (LSTM) models infer a wearer’s physical activity.

Unfortunately, resource-constrained sensing devices, such as various microcontrollers, wearables, IoT devices and similar devices predominant in today’s ubiquitous computing deployments often cannot cope with the sampling and processing burden of modern machine learning on sensor data. Equipped with multicore CPUs and GPUs and relatively large storage space, modern smartphones can run certain deep learning models. However, even these high-end devices support only sporadically used and carefully optimized models processing sensor data streams of relatively modest sampling rates (Wu et al. 2019a). Even disregarding the processing burden, sensor sampling is itself sometimes the most energy-hungry aspect of ubiquitous computing (Pramanik et al. 2019). With battery energy being the most precious resource in mobile computing and the battery advances heavily lagging behind the storage and processing component improvements (Pramanik et al. 2019), the problem is unlikely to resolve itself with the incoming generations of ubiquitous computing systems.

Supporting advanced inference applications, while reducing the sampling and processing burden appears unsolvable at the first glance. According to the Nyquist theorem, (low-pass) signals can be reliably reconstructed only if sensor sampling rates are as twice as high as the highest frequency expressed in such signals. Real-world phenomena, however, tend to be fast changing. It appears that objects can be recognized only in images of sufficient resolution, wireless radars can detect small movements only if the signal is sampled millions of times per second, etc.

In 2000s a series of papers by Candès et al. (2006), Donoho (2006), Candes and Romberg (2006), Candes et al. (2006), investigated the properties of signals that can be successfully reconstructed even if sampled at rates lower than prescribed by the Nyquist theorem. Signal sparsity, i.e. a property that in a certain projection most of the signal’s components are zero and incoherence, a condition of low correlation between the acquisition domain and the sparsity domain, are needed in order for the signal to be fully preserved with only about \(K log (N/K)\) samples taken from the original K-sparse N-dimensional signal. Compressive sensing (CS)Footnote 1 involves drastically reduced sampling rate, administered when the above conditions are fulfilled, and subsequent signal reconstruction via signal processing, often reduced to finding solutions to underdetermined linear systems. In the last two decades, CS has been successfully demonstrated in image processing, wireless ranging, and numerous other domains.

The benefits of reduced sampling rates do not, however, come for free, as CS remains challenging to implement. First, not all reduced-rate sampling is equal. Compression rates interplay with the properties of the input to impact the ability to successfully reconstruct the signal from limited samples. Furthermore, while the early implementations focused on random sampling, recent research advances demonstrate the utility of carefully crafted reduced sampling strategies (Wang et al. 2019). Second, reconstructing the signal from sparse measurements, in theory, requires solving an NP hard problem of finding non-zero signal components. In practice, the problem is solved through iterative solutions that are nevertheless computationally demanding. Finally, high-level inference, not signal reconstruction, is often the key aim of sensing. Thus, it is essential to construct a full machine learning pipeline that natively supports CS.

1.1 Towards deep learning-supported compressive sensing

Interestingly, the above challenges have a joint characteristic—for a specific use case, a suitable sampling strategy, a reliable reconstruction algorithm, and a highly accurate inference pipeline can be learned from the collected sensor data and data labels. Machine learning methods, therefore, naturally augment CS. The inclusion of GPUs and TPUs together with programming support for deep learning (e.g. TensorFlow Lite Li 2020) made DL pervasively possible, even in embedded and mobile computers.

Over the past decade, compressive sensing evolved from theoretical studies and its initial practicality was predominantly limited by the time complexity of the reconstruction algorithms. Deep learning brought tremendous improvements on that front, enabling real-time reconstruction in certain applications. ReconNet (Kulkarni et al. 2016), for example is up to 2700 times faster than the conventional iterative CS algorithm D-AMP (Metzler et al. 2016) and can reconstruct a \(256 \times 256\) image in only about 0.02 seconds at any given measurement rate. But the true benefits of using deep learning for compressive sensing can be observed in the quality of the reconstruction, where the DL-based approaches surpass conventional algorithms, due to the potential of deep learning to sidestep the sparsity assumptions, and to capture and exploit relevant features in the data.

Especially promising is the revolutionizing potential of CS–DL integration in the area of ubiquitous computing. Here, devices are characterized by wide heterogeneity, and limited computational and battery resources. Besides the general benefits of accelerating signal reconstruction, fine-tuning the sampling matrix, and improving the high-level inference, in the ubiquitous computing domain CS–DL integration can reduce the energy, storage, and processing requirements. As a result, there is a potential for previously prohibitively demanding continuous sensing and inference to finally be realized in certain domains. Furthermore, a graceful degradation in end-result quality can be supported with the CS–DL pipeline (Machidon and Pejović 2022). Through reduced CS sampling and reduced accuracy DL inference we can, in a controlled manner, trade result quality for resource usage. This allows seamless adaptation of the sensing-inference pipeline, so that complex applications can run on low-resource devices, albeit with limited accuracy. Finally, mobile devices operate in dynamic environments. With the environment so can vary the signal properties (i.e. its sparsity) as well as a user’s requirements with respect to the calculated result quality.

Fig. 1
figure 1

CS approaches: a conventional CS; b DL for CS reconstruction; c DL for CS sampling and reconstruction; d DL for CS direct inference

Figure 1 depicts possible ways deep learning and compressive sensing can interplay. A common CS pipeline (a) consists of the reduced-frequency sampling, followed by signal reconstruction, from which high-level inferences are made, if needed. Iterative signal reconstruction algorithms, in particular, tend to represent a weak point in the pipeline due to their temporal requirements. Yet, with sufficient CS-sampled and original signal data available, a rather fast-to-query DL reconstruction model can be built. Using DL for signal reconstruction (b) by either mimicking the iterative CS algorithm (Sect. 3.1) or not (Sect. 3.2), has been successfully demonstrated in numerous domains (Kulkarni et al. 2016; Iliadis et al. 2016; Wang et al. 2016; Schlemper et al. 2017; Han et al. 2018b; Kim et al. 2020b). The sampling matrix, too can be adapted to a problem at hand thanks to DL (c). Often an encoder-like structure is trained to guide the samplingFootnote 2 in the most efficient manner (Sect. 3.2). Finally, as the reconstructed signal is usually used as a basis for high-level inference, DL allows us to short-circuit the expensive reconstruction step and train a network that provides high-level inferences directly from the CS-sampled data (d) (Sect. 4). The performance of such solutions not only matched, but also significantly exceeded the performance of the standard reconstruction approaches as additional signal structure can be captured by the DL models (Polania and Barner 2017; Ma 2017; Grover and Ermon 2019).

1.2 Survey rationale, research methodology, and survey organization

The above-identified natural links between efficient sampling embodied in CS and powerful learning enabled by DL have recently been recognized by the research community. Tremendous research interest that has spurted is evident in a steady increase in the number of scientific papers published on the topic yearly from 2015 to 2020 (see Fig. 2). The exploration is far from theoretical with a range of application fields, including magnetic resonance imaging (MRI), ultra wideband (UWB) radar ranging, human activity recognition, and numerous other domains benefiting from the CS–DL integration.

Fig. 2
figure 2

Number of scientific papers on the topic CS–DL published between 2015 and 2020 (Data from Google Scholar using the search terms “deep learning” and “compressed sensing”)

The building blocks enabling CS–DL integration, i.e. both compressive sensing and deep learning, have been thoroughly addressed already. Compressive sensing remains the main subject of a few monographs (e.g. Eldar and Kutyniok 2012) that introduce the topic from the historical perspective, present key theoretical postulates, and discuss open challenges in the area of signal sampling and processing. Yet, these almost exclusively focus on the mathematical issues and remain implementation platform-oblivious. The volume by Khosravy et al. (2020) investigates the use of CS in healthcare and considers applications, such as electrocardiogram (ECG) and electroencephalogram (EEG) sensing, that are, with the expansion of wearable computing capabilities, highly relevant for the ubicomp domain. Still, the book focuses on the sensing part and does not discuss potential integration with deep learning.

Focused survey papers cover compressive sensing applications in different domains, for instance wireless sensor networks (WSNs) (Wimalajeewa and Varshney 2017), IoT (Djelouat et al. 2018), and EEG processing (Gurve et al. 2020), to name a few. Our survey is orthogonal to these, as we do not focus on a particular domain, but rather merge contributions made in different fields on the common ground of using both the DL techniques and the CS concepts. A summary of the survey articles that are most related to this paper is given in Table 1 and it clearly demonstrates that no published surveys deal with systematically reviewing deep learning for compressive sensing. We opt for this approach with the intention to inform future research in any domain, by providing researchers and practitioners with a toolbox of CS–DL implementations that may be transplanted to any domain. Finally, to the best of our knowledge, ours is the first paper that provides in-detailed consideration of practical issues of CS–DL integration on ubiquitous computing devicesFootnote 3. Very few papers (even outside surveys) deal with practical implementations of CS on ubiquitous devices. It is our hope that the guidelines on efficient implementations presented in this survey will serve as a foundation for practical realizations of deep learning-supported ubiquitous compressive sensing systems of the future.

Table 1 Related survey/review papers from the area of compressive sensing

An extremely popular research area, deep learning is not short of textbooks, surveys, and tutorial on the topic (e.g. Aggarwal 2018). From a range of DL survey papers, we find (Cheng et al. 2017) and (Choudhary et al. 2020) particularly relevant. Both surveys focus on techniques that prune, compress, quantize, and in other manners reshape powerful DL models so that they can be ran on resource-constrained devices. The expansion of context awareness and artificial intelligence over a wide range of devices and applications is our guiding vision, while reduced sampling rates afforded by CS, together with powerful but computationally light inference enabled by intelligently implemented DL pave the path towards our vision.

In this survey we explore deep learning-supported compressive sensing, an area that despite the rapidly gaining popularity (see Fig. 2) has not been systematically studied before. Furthermore, we explore it with a particular accent on its real-world applicability within the ubicomp domain. Thus, the objectives of our work remain threefold:

  • Present CS fundamentals, DL opportunities, and ubiquitous computing constraints to previously disparate research communities, with a goal of opening discussion and collaboration across the discipline borders;

  • Examine a range of research efforts and consolidate DL-based CS advances. In particular, the paper identifies signal sampling, reconstruction, and high-level inference as a major categorization of the reviewed work;

  • Recognize major trends in CS–DL research space and derive guidelines for future evolution of CS–DL within the ubicomp domain.

The methodological approach we take focuses on the identification and examination of the most relevant and high impact papers related to the topic of CS–DL, which were published in top scientific journals and renowned international conferences. More specifically, for this survey we:

  • Searched Google Scholar with terms including: “deep learning”, “compressive sensing”, “compressed sensing”, “compressed sampling”, “sparse sampling”, “ubiquitous computing” and focused predominantly on well-cited articles (i.e. \(>20\) citations per year since published) and articles published in 2020 or 2021;

  • For journal articles we focused on those published in journals indexed by the Web of Science; for conference articles, we retained those published at conferences supported by a major professional society;

  • We manually searched through the proceedings of ubiquitous systems conferences (i.e. ACM MobiSys, ACM UbiComp, ACM SenSys, Asilomar) and machine learning conferences (i.e. NeurIPS, CVPR, ICML, ICLR) for articles related to compressive sensing implementations;

  • We identified a small number of very relevant entries on ar**v and opted for including them in the survey, so that the rapid advances in the CS–DL area are not overlooked. Nevertheless, we caution the reader that these entries might not have been peer reviewed yet.

Fig. 3
figure 3

A schematic illustration of the survey’s organization

Organization-wise (Figure 3), this paper provides both preliminary material as well as the analysis of recent research trends. With respect to the former, Sect. 2 presents a crash-course overview of compressive sensing, highlighting the necessary conditions for successful sampling, as well as main signal recovery approaches, with an emphasis on algorithm efficiency. Section 3 discusses CS–DL efforts in the area of CS signal reconstruction. The advantages and limitations of different DL flavors with regard to the CS reconstruction challenges are exposed and analyzed, together with the most relevant publications in each case. Table 3 is specifically aimed at practitioners in a need of a quick reference. Machine learning, and deep learning in particular, enables high-level inferences directly from CS-sampled signal without intermediate signal reconstruction. These, so-called, reconstruction-free approaches are presented in Sect. 4. Unique to this survey is also a critical examination of the constraints that CS–DL implementations have to face once deployed in real-world ubiquitous computing environments. These are discussed in Sect. 5, together with key lessons learned from different domains and potential directions future research in the CS–DL for ubiquitous computing. Finally, a coherent frame for our survey is set by the introduction (Sect. 1) and the concluding sections (Sect. 6).

2 Compressive sensing primer

In the first part of this section we aim to bring the area of compressive sensing closer to ubiquitous computing researchers and practitioners. Yet, we focus on the bare essentials and points relevant for real-world implementation of CS and direct an interested reader to more in-depth presentations of the subject, such as (Eldar and Kutyniok 2012). Throughout the section we identify possibilities for deep learning (DL) within the CS domain.

2.1 Theoretical basis

Classical signal processing is based on a notion that signals can be modeled as vectors in a vector space. Nyquist sampling rate requirement was derived based on an assumption that signals may exist anywhere within the given vector space and requires that the sampling frequency is at least as twice as high as the highest frequency component present in the low-pass signal. In reality, however, signals exhibit structure that constrains them to only a subset of possible vectors in a certain geometry, i.e many real-world signals are naturally sparse in a certain basis. Furthermore, if not truly sparse, or even if subject to noise, many signals are compressible—i.e. a limited number of the strongest signal components tends to uniquely describe the signal.

The above observations represent the intuition behind compressive sensing (CS). The idea of joint sensing and compression was theoretically developed in Candès et al. (2006), Donoho (2006) by Emmanuel Candès, Justin Romberg, Terence Tao and David Donoho who also formalized the conditions need for efficient reconstruction of a signal from a significantly reduced number of samples, compared to the number of samples assumed under the Nyquist sampling criterion.

The main idea behind CS is that having a K-sparse signal vector \(x \in {\mathcal {R}}^N\) (i.e. a signal that has only K non-zero components), an accurate reconstruction of x can be obtained from the undersampled measurements taken by a sampling:

$$\begin{aligned} y = Ax \in {\mathcal {R}}^M \end{aligned}$$

where the \(M\times N\) matrix A is called the sensing matrix (also the projection matrix) and is used for sampling the signal. Since \(M<N\), this linear system is typically under-determined, permitting an infinite number of solutions. However, according to the CS theory, due to the sparsity of x, the exact reconstruction is possible, by finding the sparsest signal among all those that produce the measurement y, through a norm minimization approach:

$$\begin{aligned} \begin{aligned}&{\text {minimize}}&\Vert x\Vert _0\ \\&\text {subject to}&Ax=y \end{aligned} \end{aligned}$$

where \(\Vert \cdot \Vert _0\) is the \(l_0\)-norm and denotes the number of non-zero components in x, i.e. the sparsity of x.

However, this is generally an NP-hard problem. An alternative solution is to minimize the \(l_1\) norm, i.e. the sum of the absolute value of vector components:

$$\begin{aligned} \begin{aligned}&{\text {minimize}}&\Vert x\Vert _1\ \\&\text {subject to}&Ax=y \end{aligned} \end{aligned}$$

Since the \(l_1\)-norm minimization-guided solution can be found through iterative tractable algorithms, if the solution to the \(l_0\)-norm and \(l_1\)-norm conditioned systems were the same, the CS-sensed signal could be perfectly reconstructed from M measurements (where M is roughly logarithmic in the data dimensionality \(O(K\log {}(N/K)\))), in a reasonable amount of time.

Candès and Tao show that indeed in certain situations solutions to both problems are equivalent. The condition for the above to hold is that the signal’s sparsity K is sufficiently high and that the matrix A satisfies certain properties. One of these properties is the so-called Null Space Property (NSP), a necessary and sufficient condition for guarantying the recovery, requiring that every null space vector of the sensing matrix is not concentrating its energy on any set of entries. A stronger condition on the sensing matrix is the Restricted Isometry Property (RIP), which states that A must behave like an almost orthonormal system, but only for sparse input vectors. More formally, matrix A satisfies K-RIP with restricted isometry constant \(\delta _{k}\) if for every K-sparse vector x:

$$\begin{aligned} (1-\delta _{k})\Vert x\Vert _2^2\le \Vert Ax\Vert _2^2\le (1+\delta _{k})\Vert x\Vert _2^2 \end{aligned}$$

where \(\Vert \cdot \Vert _2\) denotes the \(l_2\)-norm.

A uniquely optimal solution for the the \(l_0\)-norm and \(l_1\)-norm conditioned signal reconstruction systems exists, if \(\delta _{2k}+\delta _{3k}<1\). Intuitively, sampling matrices satisfying this condition preserve signal size and therefore do not distort the measured signal, so the reconstruction is accurate.

In practice, however, assessing RIP is computationally difficult. Another related condition, easier to check, is the incoherence, or the low coherence, meaning than the rows of the sampling matrix should be almost orthogonal to the columns of the matrix representing the basis in which the signal is sparse (often the Fourier basis). Additional mathematical properties that the sensing matrix should satisfy for ensuring the stability of the reconstruction have also been introduced in Donoho (2006). From the real world applications perspective, the sensing matrix should ideally fulfill constraints such as: optimal reconstruction performance (high accuracy), optimal sensing (minimum number of measurements needed), low complexity, fast computation and easy and efficient implementation on hardware. Random sensing matrices such as Gaussian or Bernoulli were shown to satisfy the RIP, however, their unstructured nature raises difficulties for hardware implementation and memory storage, and the processing time can be delayed since no accelerated matrix multiplication is available. Structured matrices, such as Circulant or Toeplitz on the other hand, follow a given structure, which reduces the randomness, memory storage, processing time and energy consumption, subsequently. In an application running in a resource constrained environment, such as those for wearable wireless body sensors, this is of great importance.

Finally, real-world data often hide structures, beyond sparsity, that can be exploited. By learning these regularities from the data through the sensing matrix design, the salient information in the data can be preserved, leading to better reconstruction quality. In addition, most of the existing recovery algorithms, rely on the prior knowledge of the degree of sparsity of the signal to optimal tune their parameters. Difficulties might arise especially when the signal is very large or it exhibits great variations in terms of sparsity. In these cases, the conventional CS approaches cannot perform the optimal reconstruction, but a data-driven approach could learn important signal features and design the signal sampling to work optimally, even for varying sparsity levels.

2.2 Signal reconstruction approaches

The effective and efficient recovery of the original signal from the compressed one is crucial for CS to become a practical tool. Approaches to signal reconstruction from the undersampled measurements can be roughly grouped into convex optimization, greedy, and non-convex minimization algorithms.

The class of convex optimization algorithms solve a convex optimization problem, e.g. the \(l_1\)-norm optimization problem, to obtain the reconstruction. The Iterative Shrinkage/Thresholding Algorithm (ISTA) (Daubechies et al. 2004) or the Alternating Direction Method of Multipliers (ADMM) (Boyd et al. 2011), are two examples of such convex optimization algorithms. One of the advantages of the algorithms in this class is the small number of measurements required for achieving an exact reconstruction. However, their computational complexity is high.

The greedy approach for CS involves a step-by-step method for finding the support set of the sparse signal by iteratively adding nonzero components, and reconstructing the signal by using the constrained least-squares estimation method. These algorithms are characterized by lower implementation cost and improved running time. On the downside, their performance is highly constrained by the level of sparsity of the signal and in general, their theoretical performance guarantees remain weak.

The Non Convex Recovery Algorithms imply the use of non convex minimization algorithms to solve the convex optimization problem of signal recovery, by replacing the \(l_1\)-norm minimization function with other non-convex, surrogate functions (Zhou and Yu 2019; Fan et al. 2019). These methods can show better recovery probability and request fewer measurements than the convex optimization algorithms, but are more challenging to solve because of their non-convexity. Furthermore, their convergence is not always guaranteed.

A joint property of all the above reconstruction algorithms is their high computational cost dictated by the iterative calculations these algorithms rely on. In order to achieve the goal of incorporating CS in real-world ubiquitous computing applications, fast and efficient reconstruction algorithms need to be developed. Deep Learning emerged as an unexpected candidate for such an algorithm. While DL usually requires substantial computing resources and significant memory space for hundreds of thousands of network parameters, the most burdensome computation is still performed during the algorithm training phase and the inference time remains lower than the time needed for running the conventional iterative reconstruction algorithms.

2.3 From samples to inferences

In the last 15 years, compressive sensing transitioned from a theoretical concept to a practical tool. One of the first demonstrations of CS was the so called one-pixel camera. Here, a digital micromirror array is used to optically calculate linear projections of the scene onto pseudorandom binary patterns. A single detection element, i.e. a single “pixel” is then evaluated a sufficient number of times, and from these measurements the original image can be reconstructed. This early success set the trajectory of practical CS, which is nowadays used for a range of image analysis tasks. Thus, CS is used for fMRI image sampling and reconstruction (Chiew et al. 2018; Li 2020), ultrasound images (Kruizinga 2017; Kim et al. 2020a), remote sensing images (Zhao et al. 2020a; Wang 2017), and other image-related domains. WSN data sub-sampling and recovery represents another significant area for CS (** them fixed for the whole network, unrolled methods are able to extend the representation capacity over iterative algorithms, thus are more specifically tailored towards target applications.

Finally, if from the efficiency and performance point of view, the unrolled approach often remains superior to both iterative and neural network approaches not based on algorithm unrolling, from other perspectives, such as parameter dimensionality and generalization, unrolled approaches remain in an intermediate spaces between iterative algorithms and more general DL-based solutions (Monga et al. 2021). A summary of the most relevant methods in this category is provided in Table 2.

Table 2 A summary of CS methods employing algorithm unrolling

3.2 Deep learning for direct reconstruction

Harnessing neural networks in unrolled iterative approaches provides a certain level of intuition, yet, such intuition is not necessary for the optimization to work well. In the rest of the section we describe CS reconstruction (and sampling) approaches that were not inspired by the traditional optimization algorithms. These approaches, together with specific affordances they bring, are summarized in Table 3 (due to spatial constraints, not all approaches referred to in the table are described in the main text). Free from any constraints, most of the research we discuss here optimizes both signal reconstruction as well as signal acquisition, i.e. the sampling matrix, essentially reflecting both approach b) and approach c) in Fig. 1. This brings additional benefits, as many real-world signals, while indeed sparse when projected to a certain space, need not be sparse in the fixed domain we are observing them in—learning the sampling matrix from the data often solves this issue.

3.2.1 Autoencoder-based approaches

A bird’s eye perspective on CS reveals that, with its sampling (i.e. dimensionality reduction) and reconstruction pipelines, the method closely resembles a DL autoencoder (AE). Thus, it is not surprising that some of the early forays of DL in CS utilize AEs, so that the encoding process of the AE replaces the conventional compressed sampling process, while the decoding process of the AE replaces an iterative signal reconstruction process in CS. In such an arrangement the AE brings two immediate benefits. First, based on the training data it adapts the CS sampling matrix, which need not be a random matrix any more. Second, it greatly speeds up the reconstruction process. Once trained, the AE performs signal reconstruction through a relatively modest number of DL layers, making it an attractive alternative to iterative reconstruction, even on ubiquitous computing devices, especially those embedded with GPUs.

A pioneering approach in the area of AE-based CS is presented in Mousavi et al. (2015), where Mousavi et al. propose the use of a stacked denoising autoencoder (SDAE). The SDAE is an extension of the standard AE topology, consisting of several layers of DAE where the output of each hidden layer is connected to the input of the successive hidden layer. A DAE is a type of AE that corrupts with noise the inputs to learn robust (denoised) representations of the inputs. The denoising autoencoder’s main advantage is that it is robust to noise, being able to reconstruct the original signals from noise-corrupted input. One of the challenges of using SDAE is that its network consists of numerous fully-connected layers. Thus, as the signal size grows, so does the network, imposing a large computational complexity on the training algorithm and risking potential overfitting. The solution proposed in Mousavi et al. (2015) and adopted by similar approaches (Kulkarni et al. 2016; Liu et al. 2019; Pei et al. 2020) is to divide the signal into smaller blocks and then sense/reconstruct each block separately. From the reconstruction time point of view, the simulation results show that this approach beats the other methods, whereas the quality of the reconstruction does not necessarily overshadow that of other state-of-the-art recovery algorithms.

The amount of memory and processing power required by DL may prohibit CS–DL on ubiquitous computing devices. Therefore, reducing the number of DL parameters necessary for CS is highly desired. A sparse autoencoder compressed sensing (SAECS) approach is proposed in Han et al. (2018b). The sparse autoencoder’s loss function is constructed in a way that activations are penalized within a layer, resulting in fewer non-zero parameters, thus a “lighter” encoder, that is also less likely to overfit. Furthermore, combining SDAE and SAECS, a stacked sparse denoising autoencoder for CS (SSDAE CS) is proposed in Zhang et al. (2019). The proposed model, consists of an encoder sub-network which performs nonlinear measurements (unlike the conventional CS approach that involves linear measurements) on the original signal and a decoder sub-network that reconstructs the original de-noised signals by minimizing the reconstruction error between the input and the output.

Signals in the AE-compressed space, by default, need not exhibit any regularities. The AE merely ensures that the encoding-decoding process is efficient. The variational autoencoder (VAE) is trained to ensure that the latent space exhibits suitable properties enabling generative decoder behavior, which could improve the CS recovery problem, as shown in Bora et al. (2017). A novel Uncertainty autoencoder (UAE) structure is proposed by Grover and Ermon (2019). The UAE is another AE-based framework for unsupervised representation learning where the compressed measurements can be interpreted as the latent representations. Unlike the VAE, the UAE does not explicitly impose any regularization over the latent space to follow a prior distribution, but optimizes the mutual information between the input space and the latent representations, being explicitly designed to preserve as much information as possible. While not discussed in Grover and Ermon (2019) it would be interesting to examine whether UAE enables faster training with less data—a property that down the road could enable privacy-preserving on-device training in ubicomp environments.

The versatility of the autoencoder approach to CS led to its adaptation to a range of domains. Such adaptation is evident in Adler et al. (2019). To mitigate the noisy distortions caused by unvoiced speech, the authors build an RNN dictionary learning module that learns structured dictionaries for both voiced and unvoiced speech. These learned codebooks further contribute to improving the overall reconstruction performance of compressed speech signals. RNNs are also appropriate for processing sequences of images, particularly if the images are temporally correlated. An RNN model adapted for the image domain can be found in Qin (2018), where a convolutional recurrent neural network (CRNN) is used to improve the reconstruction accuracy and speed by exploiting the temporal dependencies in CS cardiac MRI data. By enabling the propagation of the contextual information across time frames, the CRNN architecture makes the reconstruction process more dynamic, generating less redundant representations.

LSTMs are similarly adapted to particular CS domains. In Zhang et al. (2021) a LSTM is used to extract time features from ECG signals compressed measurements, initially reconstructed with a CNN, for further improving the quality of the reconstruction. The field of distributed compressed sensing or Multiple Measurement Vectors (MMV) was also targeted in Palangi et al. (2016a) and in Palangi et al. (2016b), where an LSTM and a Bidirectional Long Short-Term Memory (BLSTM), respectively were proposed for reconstruction. The LSTM and BLSTM models are good candidates for reconstructing multiple jointly sparse vectors, because of their ability to model difficult sequences and to capture dependencies (using both past and future information, in the case of BLSTM). In the field of biological signals processing, Han et al. (2017) took advantage of the natural sparsity of the clothing pressure time sequences measurements of a region of the human body and targeted a LSTM-based CS reconstruction. In the video compression domain, Xu and Ren (3.2.

The direct high-level inference from CS data using DL is depicted in Fig. 1d). Free from the need to reconstruct the signal, we can work directly in the compressed domain and harness the neural network’s inherent ability to extract discriminative non-linear features, for which an intuitive explanation is not needed. Thus Lohit et al. (2016) proposed a DL approach for image classification compressive learning approach. This approach employed CNNs and random Gaussian sensing matrices (Lohit et al. 2016). Building upon this work, Adler et al. (2020a) use support identification through DNN-based oracles, to first guess which components are non-zero in the sparse signal to recover, and then compute their magnitudes, thus decreasing the complexity of the computation.

Wireless connectivity is a defining characteristic of ubiquitous computing. In the field of WSNs, CS–DL based techniques are motivated by not only the sparsity of the signals, but also by the requirement of efficiency in processing in terms of energy consumption and communication bandwidth utilization. Moreover, the compressive measurements are transmitted over wireless channels, so the impact of channel noise and fading on reconstruction are important factors to be taken into consideration. Some of the most important contributions in the field of CS–DL based methods for WSN are the WDLReconNet, and Fast-WDLReconNet networks (Lu and Bo 2019), validated for the transmission of remote sensing images. Interestingly, the number of parameters in the denoising layer accounts for about 94% of the total number of parameters in the proposed CNN, underlying the impact of the noise on the transmitted data. Energy efficiency is another major aspect in WSN and quantization proved to be able to ensure an efficient wireless transmission (Sun et al. 2016a).

Magnetic resonance imaging (MRI) has revolutionized medical diagnosis in the late \(20^{th}\) century. Nevertheless, conducting MRI scans requires significant time when a subject needs to be immobilized. CS–DL has already proved to be able to reduce the scanning time by reducing the number of samples while simultaneously improving the image reconstruction quality. Some of the CS–DL MRI methods, use deep neural networks to learn the traditional optimization algorithms, by unrolling them (Sun et al. 2016b; Yang et al. 2016) dynamically decomposes a neural network architecture into segments that can each be executed across different processors to maximize energy-efficiency and execution time. DeepSense (Huynh et al. 2016) leverages the architecture of mobile GPU devices and implies optimization techniques for optimal offloading a neural network’ layers and operations on on the CPU/GPU memory and processors to achieve the best accuracy-latency trade-off.

With its adaptability, afforded by sampling rate adjustment and NN compression, CS–DL can be tuned to maximally utilize heterogeneous hardware. An adaptable CS–DL pipeline was explored in Shrivastwa et al. (2018). The authors aimed at porting a deep neural network for ECoG signals compression and reconstruction to a lightweight device and explores three architectural options: using a greedy algorithm (Orthogonal Matching Pursuit), signal compression and reconstruction using a MLP with all layers implemented in the FPGA logic, and finally a heterogeneous architecture consisting of an ARM CPU and FPGA fabric, with just a single layer of the NN being deployed in the FPGA re-configurable logic. Measurements demonstrate that the third, heterogeneous architecture, stands out as the most efficient, since it requires significant less multipliers, and thus has lower overhead compared to implementing the full NN in the FPGA. This system was realized using a Zynq processing system (ARM core) in Zedboard, and opens the door for future explorations of efficient task map** of CS–DL implementations on heterogeneous architectures.

Perhaps the most promising avenue for research lies in energy-efficient task scheduling of a CS–DL pipeline on a mobile device equipped with a heterogeneous hardware architecture and hardware-software codesign for CS–DL. The scheduling would address the optimal task to processor assignment for achieving minimum energy consumption, which is especially important as we expect a range of advanced mobile sensing functionalities, such as speech recognition and live video processing, from our battery powered devices. The hardware-software codesign would ensure that sensors are built with compressive sensing in mind, while the processing pipeline matches the needs of the CS–DL algorithms. With respect to the latter, FPGAs stand out as likely candidate for initial implementations, due to the processing capabilities—flexibility balance they afford.

6 Conclusions

Key takeaway ideas

  • CS–DL methods exhibit consistent speed-up, often being two orders of magnitude faster than the traditional CS algorithms, thus allowing real-time ubiquitous computing applications.

  • Especially at the very aggressive undersampling rates often required by resource-constrained devices, the CS–DL methods are capable of better reconstructions than most of the classical methods.

  • Data-driven measurement matrix does not only improve CS reconstruction/inference results, but is also more suitable for on-device storage compared to conventionally used random measurement matrices.

  • The trade-off between model performance and the number of network parameters in CS–DL can be addressed using residual blocks.

  • Training CS–DL pipelines requires significant computing and data resources, rendering on-device training impractical for a range of device; this issue could be alleviated with transfer and federated learning.

  • New opportunities arise for distributed CS–DL computing, where a new balance can be struck between on-device sensing, partial inference, and compression, and partitioning between edge devices and the cloud; data transmission overhead, energy use, and inference delay can be optimized in this process.

The move from centralized storage and processing towards distributed and edge computing indicates that the intelligence that is expected from future ubiquitous computing environments needs to be realized as close to the physical world as possible. Consequently, sensing and learning from the collected data need to be implemented directly on the ubiquitous sensing devices, and with the support for adaptive, dynamic distributed processing. Reduced rate sampling enabled by compressive sensing (CS) represents a viable solution enabling the reduction of the amount of generated sensing data, yet CS alone does not solve the issue of complex processing that may be overwhelming for ubicomp devices’ limited computational resources. Deep learning (DL) naturally complements CS in the ubicomp domain, as it reduces the computational complexity of signal reconstruction and enables full sensing-learning pipelines to be implemented.

Despite its potential, the CS–DL area remains only sporadically explored. In this survey we identified the current trends in the CS–DL area and reviewed some of the most significant recent efforts. We systematically examined how DL can be used to speed up CS signal reconstruction by alleviating the need for iterative algorithms. Furthermore, classic CS methods were not designed to go beyond sparsity and exploit structure present in the data. DL enables for the sampling matrix to be designed according to the hidden data structure that can further be exploited in the reconstruction phase. The trade-off between model performance and the number of network parameters represents a major issue in CS–DL. It has been shown that deeper network architectures can result in better network performance, yet increasing model complexity requires more intensive computational and memory requirements. Residual blocks represent a viable solution for addressing this trade-off (Yao 2019; Du 2019). Regarding the compression rate, studies (Kulkarni et al. 2016; Schlemper et al. 2017; Yao 2019; Shrivastwa 2020) showed that at very aggressive undersampling rates, the DL based methods are capable of better reconstructions than most of the classical methods. For example, the ReconNet network outperforms other methods by large margins at measurement rates of up to 0.01. Finally, one of the drawbacks of accurately reconstructing signals from few measurements using DL, is the high requirements in terms of time and data for training. Transfer learning might be a solution for this issue, as shown in Han et al. (2018a).

Although compressive sensing is a relatively new field, being around for less than two decades, with deep learning being an even newer addition, CS–DL is characterized by a burgeoning community that produces a growing body of freely available online educational resources are available. A broader collection of resources ranging from conference or journal papers and tutorials to blogs, software tools and video talks can be found at http://dsp.rice.edu/cs/). In addition, novel ideas and methods in this area are often accompanied by free and open-source code of the implementations. A useful repository containing a collection of reproducible Deep Compressive Sensing source code can be found at https://github.com/ngcthuong/Reproducible-Deep-Compressive-Sensing.

Only recently, conventional CS methods have begun to be integrated in commercial products, e.g., Compressed Sensing (Siemens) in 2016, Compressed SENSE (Philips) in 2014, and HyperSense (GE) in 2016, all three for CS MRI. The maturity level of CS–DL methods is much lower than that of conventional CS methods, and to the best of our knowledge no commercial/industry products that use CS–DL methods have yet been marketed. However, due to the promising potential that CS–DL showed in supporting various commercial applications, during the following years CS–DL will come of age and the challenges will shift from proving the concept towards integrating it into commercial products. Already, Facebook developed a DL-based faster MRI system that is currently undergoing a study in collaboration with market-leading MRI scanner vendors Siemens, General Electric, and Philips Healthcare (Marks 2021).

In this survey we presented mostly academic works at the intersection of CS and DL, aiming to provide a valuable resource for future researchers and practitioners in this domain. Furthermore, the survey aims to attract new audience to CS–DL, primarily ubiquitous systems researchers. Such expansion is crucial, as challenges identified in this manuscript, including the realization of distributed CS–DL on heterogeneous architectures and with support for dynamically adaptive sampling rates need to be addressed in order to ensure the further proliferation of sensing systems’ intelligence.