Keywords

1 Introduction

Blind-image deblurring is a classical image restoration problem which has been an active area of research in image and vision community over the past few decades. With increasing use of hand-held imaging devices, especially mobile phones, motion blur has become a major problem to confront with. In scenarios where the light present in the scene is low, the exposure time of the sensor has to be pumped up to capture a well-lit scene. As a consequence, camera shake becomes inevitable resulting in image blur. Motion blur also occurs when the scene is imaged by fast-moving vehicles such as cars and aircrafts even in low-exposure settings. The problem escalates further in data-deprived situations comprising of only a single blurred frame.

Blind-deblurring can be posed as an image-to-image translation where given a blurred image y in blur domain, we need to learn a non-linear map** \(\mathcal {M}\):y \(\rightarrow \) x that maps the blurred image to its equivalent clean image x in the clean domain. Many recent deep learning based deblurring networks [\(\{y_i,x_i\}_{i=1}^N\) paired training data. Even though these networks have shown promising results, the basic assumption of availability of paired data is too demanding. In many a situation, collecting paired training data can be difficult, time-consuming and expensive. For example, in applications like scene conversion from day to night and image dehazing, the availability of paired data is scarce or even non-existent.

This debilitating limitation of supervised deep networks necessitates the need for unsupervised learning approaches [21, 1, 2, 5, 33, 36, 37, 40] are gaining relevance and attracting attention due to the inaccuracy of generic algorithms to deal with real-world data. The general priors learned from natural images are not necessarly well-suited for all classes and often lead to deterioration in performance. Recently, class-specific information has been employed in carrying out deblurring which outperforms blanket prior-based approaches. An exemplar-based deblurring for faces was proposed by Pan et al. in [29]. Anwar et al. [1] introduced a method to restore attenuated image frequencies during convolution using class-specific training examples. Deep learning networks too have attempted the task of class-specific deblurring. Text deblurring network in [12] and deep face deblurring network in [5] are a notable few amongst these.

Following these works, we also propose in this paper a domain-specific deblurring architecture focusing mainly on face, text, and checkerboard classes using a single GAN framework. Faces and texts are considered important classes and many restoration techniques have focused on them explicitly. We also included the checkerboard class to study our network performance and to ease the task of parameter tuning akin to [33].

Fig. 1.
figure 1

Our network with GAN, reblur module and scale-space gradient module.

GAN is used in our network to learn a strong class-specific prior on clean data. The discriminator thus learned captures the semantic domain knowledge of a class but fails to capture the content, colors, and structure properly. These are usually corrected with supervised loss functions in regular networks which is not practical in our unsupervised setting. Hence, we introduce self-guidance using the blurred data itself. Our network is trained with unpaired data from clean and blurred domains. A comprehensive diagram of our network is shown in Fig. 1.

The main contributions of our work are

  • To the best of our knowledge, this is the first ever data-driven attempt at unsupervised learning for the task of deblurring.

  • To overcome the shortcomings of supervision due to unavailability of paired data and to help the network converge to the right solution, we propose self-guidance with two new additional modules

    • A self-supervised reblurring module that guides the generator to produce a deblurred output corresponding to the input blurred image.

    • A gradient module with the key notion that down-sampling decreases gradient matching error and constrains the solution space of generated clean images.

2 Unsupervised Deblurring

A naive approach to unsupervised deblurring would be to adopt existing networks (CoGAN [22], DualGAN [

$$\begin{aligned} E(D,G)=\underset{D}{\max }\underset{G}{\min }\underset{x\sim P_{data}}{E} [ \log D(x)] + \underset{z\sim P_{z}}{E} [\log (1-D(G(z)))]. \end{aligned}$$
(1)

where z is random noise and x denotes the real data. This work was followed by conditional GANs (cGAN) [\(\in \) Y and the generator maps it to a clean image \(\hat{x}\) such that the generated image \(\hat{x}= G(y)\) is indistinguishable from clean data (where clean data statistics are learned from \(\tilde{x}\)s \(\in \) X).

Self-supervision by Reblurring (CNN Module). The goal of GAN in our deblurring framework is to reach an equilibrium where \(P_{clean}\) and \(P_{generated}\) are close. The alternating gradient update procedure (AGD) is used to achieve this. However, this process is highly unstable and often results in mode collapse [9]. Also, an optimal G that translates from Y \(\rightarrow \) X does not guarantee that an individual blurred input y and its corresponding clean output x are paired up in a meaningful way, i.e, there are infinitely many map**s G that will induce the same distribution over \(\hat{x}\) [\(\hat{x}\)) that when reblurred using the CNN module will furnish back the input. Adding such a module ensures that the deblurred result has the same color and texture comparable to the input image thereby constraining the solution to the manifold of images that captures the actual input content.

Fig. 2.
figure 2

(a) Scale space gradient error. (b) Average decrease in gradient error with respect to down scaling.

Gradient Matching Module. With a combined network of GAN and CNN modules, the generator learns to map to clean domain along with color preservation. Now, to enforce the gradients of the generated image to match its corresponding clean image, a gradient module is used in our network as given in Fig. 1. Gradient matching resolves the problem of over-sharpening and ringing in the results. However, since we do not have access to the reference image, determining the desired gradient distribution to match with is difficult. Hence, we borrow a heuristic from [25] that takes advantage of the fact that shrinking a blurry image y by a factor of \(\alpha \) results in a image \(y^{\alpha }\) that is \(\alpha \) times sharper than y. Thus, we use the blurred image gradients at different scales to guide the deblurring process. At the highest scales, the gradients of blurred and generated output match the least but improve while going down in scale space. A visual diagram depicting this effect is shown in Fig. 2(a) where the gradients of a blurred and clean checker-board at different scales are provided. Observe that, at the highest scale, the gradients are very different and as we move down in scale the gradients start to look alike and the \(L_1\) error between them decreases. The plot in Fig. 2(b) is the average per pixel \(L_1\) error with respect to scale for 200 images from each of text, checker-board and face datasets. In all these data, the gradient error decreases with scale and hence forms a good guiding input for training our network.

Fig. 3.
figure 3

Effect of different cost functions. (a) Input blurred image to the generator, (b) result of unsupervised deblurring with just the GAN cost in Eq. (2), (c) result obtained by adding the reblurring cost in Eq. (3) with (b), (d) result obtained with gradient cost in Eq. (4) with (c), and (e) the target output.