Log in

Deep Physics-Guided Unrolling Generalization for Compressed Sensing

  • Manuscript
  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

By absorbing the merits of both the model- and data-driven methods, deep physics-engaged learning scheme achieves high-accuracy and interpretable image reconstruction. It has attracted growing attention and become the mainstream for inverse imaging tasks. Focusing on the image compressed sensing (CS) problem, we find the intrinsic defect of this emerging paradigm, widely implemented by deep algorithm-unrolled networks, in which more plain iterations involving real physics will bring enormous computation cost and long inference time, hindering their practical application. A novel deep Physics-guided unRolled recovery Learning (RL) framework is proposed by generalizing the traditional iterative recovery model from image domain (ID) to the high-dimensional feature domain (FD). A compact multiscale unrolling architecture is then developed to enhance the network capacity and keep real-time inference speeds. Taking two different perspectives of optimization and range-nullspace decomposition, instead of building an algorithm-specific unrolled network, we provide two implementations: PRL-PGD and PRL-RND. Experiments exhibit the significant performance and efficiency leading of PRL networks over other state-of-the-art methods with a large potential for further improvement and real application to other inverse imaging problems or optimization models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. In the field of CS imaging, the condition of orthogonality \(\textbf{A}\textbf{A}^\top =\textbf{I}_M\) is widely implemented by orthogonalizing an i.i.d. random Gaussian matrix or a Hadamard matrix (Zhang et al., 2014b).

  2. Note that this phenomenon indicates that our baseline network is “degraded” with large unrolled stage number instead of “overfitted” as the training loss and test PSNR both become worse. It has been previously discovered and studied in (He et al., 2016).

  3. For reproducible research, the complete source code and pre-trained models of our PRL networks are available at https://github.com/Guaishou74851/PRL.

References

  • Adler, J., & Öktem, O. (2018). Learned primal-dual reconstruction. IEEE Transactions on Medical Imaging, 37(6), 1322–1332.

    Google Scholar 

  • Agustsson, E., & Timofte, R. (2017). NTIRE 2017 challenge on single image super-resolution: dataset and study. In Proceedings of IEEE conference on computer vision and pattern recognition workshops (CVPRW), pp. 126–135.

  • Björck, Å., Elfving, T., & Strakos, Z. (1998). Stability of Conjugate Gradient and lanczos methods for linear least squares problems. SIAM Journal on Matrix Analysis and Applications, 19(3), 720–736.

    MathSciNet  MATH  Google Scholar 

  • Blumensath, T., & Davies, M. E. (2009). Iterative hard thresholding for compressed sensing. Applied and Computational Harmonic Analysis, 27(3), 265–274.

    MathSciNet  MATH  Google Scholar 

  • Boufounos, Petros. T., & Baraniuk, Richard. G. (2008). 1-bit Compressive sensing. In Proceedings of IEEE conference on information sciences and systems (CISS), pp. 16–21.

  • Cai, J.-F., Candès, E. J., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), 1956–1982.

    MathSciNet  MATH  Google Scholar 

  • Candès, E. J., & Wakin, M. B. (2008). An introduction to compressive sampling. IEEE Signal Processing Magazine, 25(2), 21–30.

    Google Scholar 

  • Chen, B., & Zhang, J. (2022). Content-aware scalable deep compressed sensing. IEEE Transactions on Image Processing, 31, 5412–5426.

    Google Scholar 

  • Chen, D., & Davies, Mike. E. (2020). Deep decomposition learning for inverse imaging problems. In Proceedings of European conference on computer vision (ECCV), pp. 510–526.

  • Chen, D., Tachella, J., & Davies, M. E. (2021a). Equivariant imaging: learning beyond the range space. In Proceedings of IEEE international conference on computer vision (ICCV), pp. 4379–4388.

  • Chen, H., Zhang, Y., Zhang, W., Liao, P., Li, K., Zhou, J., & Wang, G. (2017). Low-dose CT via convolutional neural network. Biomedical Optics Express, 8(2), 679–694.

    Google Scholar 

  • Chen, J., Sun, Y., Liu, Q., & Huang, R. (2020). Learning memory augmented cascading network for compressed sensing of images. In Proceedings of European conference on computer vision (ECCV), pp. 513–529.

  • Chen, Y., & Pock, T. (2016). Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1256–1272.

    Google Scholar 

  • Chen, Z., Guo, W., Feng, Y., Li, Y., Zhao, C., Ren, Y., & Shao, L. (2021). Deep-learned regularization and proximal operator for image compressive sensing. IEEE Transactions on Image Processing, 30, 7112–7126.

    MathSciNet  Google Scholar 

  • Coban, S., Andriiashen, V., & Ganguly, P. (2020). Apple CT Data: simulated parallel-beam tomographic datasets. Zenodo.

    Google Scholar 

  • Dabov, K., Foi, A., Katkovnik, V., & Egiazarian, K. (2007). Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Transactions on Image Processing, 16(8), 2080–2095.

    MathSciNet  Google Scholar 

  • Denker, A., Schmidt, M., Leuschner, J., Maass, P., & Behrmann, J. (2020). Conditional normalizing flows for low-dose computed tomography image reconstruction. ar**v preprint ar**v:2006.06270.

  • Dong, C., Loy, C. C., He, K., & Tang, X. (2014a). Learning a deep convolutional network for image super-resolution. In Proceedings of European conference on computer vision (ECCV), pp. 184–199.

  • Dong, W., Shi, G., Li, X., Ma, Y., & Huang, F. (2014b). Compressive sensing via nonlocal low-rank regularization. IEEE Transactions on Image Processing, 23(8), 3618–3632.

    MathSciNet  MATH  Google Scholar 

  • Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52(4), 1289–1306.

    MathSciNet  MATH  Google Scholar 

  • Elad, M. (2010). Sparse and redundant representations: from theory to applications in signal and image processing (Vol. 2). Springer.

    MATH  Google Scholar 

  • Fan, Z.-E., Lian, F., & Quan, J.-N. (2022). Global sensing and measurements reuse for image compressed sensing. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp. 8954–8963.

  • Fowler, J. E., Mun, S., Tramel, E. W., et al. (2012). Block-based compressed sensing of images and video. Foundations and Trends in Signal Processing, 4(4), 297–416.

    MATH  Google Scholar 

  • Gan, L. (2007). Block compressed sensing of natural images. In Proceedings of IEEE international conference on digital signal processing (ICDSP), pp. 403–406. IEEE.

  • Gilton, D., Ongie, G., & Willett, R. (2019). Neumann networks for linear inverse problems in imaging. IEEE Transactions on Computational Imaging, 6, 328–343.

    MathSciNet  Google Scholar 

  • Gu, J., & Dong, C. (2021). Interpreting super-resolution networks with local attribution maps. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp. 9199–9208.

  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp. 770–778.

  • Huang, J. -B., Singh, A., & Ahuja, N. (2015). Single image super-resolution from transformed self-exemplars. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp. 5197–5206.

  • Huang, Y., Würfl, T., Breininger, K., Liu, L., Lauritsch, G., & Maier, A. (2018). Some investigations on robustness of deep learning in limited angle tomography. In Proceedings of international conference on medical image computing and computer-assisted intervention (MICCAI), pp. 145–153.

  • Jacques, L., Laska, J. N., Boufounos, P. T., & Baraniuk, R. G. (2013). Robust 1-bit compressive sensing via binary stable embeddings of sparse vectors. IEEE Transactions on Information Theory, 59(4), 2082–2102.

    MathSciNet  MATH  Google Scholar 

  • Kafle, S., Joseph, G., & Varshney, P. K. (2021). One-bit compressed sensing using untrained network prior. In Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2875–2879.

  • Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of international conference on learning representations (ICLR), pp. 1–15.

  • Kokkinos, F., & Lefkimmiatis, S. (2018). Deep image demosaicking using a cascade of convolutional residual denoising networks. In Proceedings of European conference on computer vision (ECCV), pp. 303–319.

  • Kruse, J., Rother, C., & Schmidt, U. (2017). Learning to push the limits of efficient FFT-based image deconvolution. In Proceedings of IEEE international conference on computer vision (ICCV), pp. 4586–4594.

  • Kulkarni, K., Lohit, S., Turaga, P., Kerviche, R., & Ashok, A. (2016). ReconNet: Non-iterative reconstruction of images from compressively sensed measurements. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp. 449–458.

  • Lefkimmiatis, S. (2017). Non-local color image denoising with convolutional neural networks. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp. 3587–3596.

  • Lefkimmiatis, S. (2018). Universal denoising networks: A novel CNN architecture for image denoising. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp. 3204–3213.

  • Leuschner, J., Schmidt, M., Ganguly, P. S., Andriiashen, V., Coban, S. B., Denker, A., Bauer, D., Hadjifaradji, A., Batenburg, K. J., Maass, P., et al. (2021). Quantitative comparison of deep learning-based image reconstruction methods for low-dose and sparse-angle CT applications. Journal of Imaging, 7(3), 44.

    Google Scholar 

  • Li, Y., Li, K., Zhang, C., Montoya, J., & Chen, G.-H. (2019). Learning to reconstruct computed tomography images directly from sinogram data under a variety of data acquisition conditions. IEEE Transactions on Medical Imaging, 38(10), 2469–2481.

    Google Scholar 

  • Liu, T., Chaman, A., Belius, D., & Dokmanic, I. (2020). Interpreting U-nets via task-driven multiscale dictionary learning. ar**v preprint ar**v:2011.12815.

  • Liu, Y., Long, Z., & Zhu, C. (2018). Image completion using low tensor tree rank and total variation minimization. IEEE Transactions on Multimedia, 21(2), 338–350.

    Google Scholar 

  • Liu, Y., Long, Z., Huang, H., & Zhu, C. (2019). Low CP rank and tucker rank tensor completion for estimating missing components in image data. IEEE Transactions on Circuits and Systems for Video Technology, 30(4), 944–954.

    Google Scholar 

  • Long, Z., Liu, Y., Chen, L., & Zhu, C. (2019). Low rank tensor completion for multiway visual data. Signal Processing, 155, 301–316.

    Google Scholar 

  • Long, Z., Zhu, C., Liu, J., & Liu, Y. (2021). Bayesian low rank tensor ring for image recovery. IEEE Transactions on Image Processing, 30, 3568–3580.

    MathSciNet  Google Scholar 

  • Long, Z., Zhu, C., Liu, J., Comon, P., & Liu, Y. (2022). Trainable subspaces for low rank tensor completion: Model and analysis. IEEE Transactions on Signal Processing, 70, 2502–2517.

    MathSciNet  Google Scholar 

  • Lustig, M., Donoho, D., & Pauly, J. M. (2007). Sparse MRI: The application of compressed sensing for rapid MR imaging. Magnetic Resonance in Medicine, 58(6), 1182–1195.

    Google Scholar 

  • Lustig, M., Donoho, D. L., Santos, J. M., & Pauly, J. M. (2008). Compressed sensing MRI. IEEE Signal Processing Magazine, 25(2), 72–82.

    Google Scholar 

  • Ma, K., Duanmu, Z., Qingbo, W., Wang, Z., Yong, H., Li, H., & Zhang, L. (2016). Waterloo exploration database: New challenges for image quality assessment models. IEEE Transactions on Image Processing, 26(2), 1004–1016.

    MathSciNet  MATH  Google Scholar 

  • Martin, D., Fowlkes, C., Tal, D., & Malik, J. (2001). A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of IEEE international conference on computer vision (ICCV), 2, pp. 416–423.

  • Mousavi, A., & Baraniuk, R. G. (2017). Learning to invert: Signal recovery via deep convolutional networks. In Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2272–2276.

  • Mousavi, A., Patel, A. B., & Baraniuk, R. G. (2015). A deep learning approach to structured signal recovery. In Proceedings of IEEE allerton conference on communication, control, and computing, pp. 1336–1343.

  • Mun, S., & Fowler, J. E. (2009). Block compressed sensing of images using directional transforms. In Proceedings of IEEE international conference on image processing (ICIP), pp. 3021–3024.

  • Nam, S., Davies, M. E., Elad, M., & Gribonval, R. (2013). The cosparse analysis model and algorithms. Applied and Computational Harmonic Analysis, 34(1), 30–56.

    MathSciNet  MATH  Google Scholar 

  • Niu, S., Gao, Y., Bian, Z., Huang, J., Chen, W., Gaohang, Yu., Liang, Z., & Ma, J. (2014). Sparse-View X-Ray CT reconstruction via total generalized variation regularization. Physics in Medicine & Biology, 59(12), 2997.

    Google Scholar 

  • Parikh, N., Boyd, S., et al. (2014). Proximal algorithms. Foundations and in Optimization, 1(3), 127–239.

    Google Scholar 

  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., & Antiga, L. et al. (2019). PyTorch: An imperative style, high-performance deep learning library. Proceedings of Neural Information Processing Systems (NeurIPS), 32.

  • Pelt, D. M., Batenburg, K. J., & Sethian, J. A. (2018). Improving tomographic reconstruction from limited data using mixed-scale dense convolutional neural networks. Journal of Imaging, 4(11), 128.

    Google Scholar 

  • Radon, J. (1986). On the determination of functions from their integral values along certain manifolds. IEEE Transactions on Medical Imaging, 5(4), 170–176.

    Google Scholar 

  • Ravishankar, S., Ye, J. C., & Fessler, J. A. (2019). Image reconstruction: From sparsity to data-adaptive methods and machine learning. Proceedings of the IEEE, 108(1), 86–109.

    Google Scholar 

  • Ren, C., He, X., Wang, C., & Zhao, Zhibo. (2021). Adaptive consistency prior based deep network for image denoising. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp. 8596–8606.

  • Ronneberger, O., Fischer, P., & Brox, T. (2015). U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of international conference on medical image computing and computer-assisted intervention (MICCAI), pp. 234–241.

  • Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A. P., Bishop, R., Rueckert, D., & Wang, Z. (2016). Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp. 1874–1883.

  • Shi, W., Jiang, F., Liu, S., & Zhao, D. (2019a). Image compressed sensing using convolutional neural network. IEEE Transactions on Image Processing, 29, 375–388.

    MathSciNet  MATH  Google Scholar 

  • Shi, W., Jiang, F., Liu, S., & Zhao, D. (2019b). Scalable convolutional neural network for image compressed sensing. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp. 12290–12299.

  • Song, J., Chen, B., & Zhang, J. (2021). Memory-augmented deep unfolding network for compressive sensing. In Proceedings of ACM international conference on multimedia (ACM MM), pp. 4249–4258.

  • Song, J., Chen, B., & Zhang, J. (2023a). Deep memory-augmented proximal unrolling network for compressive sensing. International Journal of Computer Vision, 1–20.

  • Song, J., Chen, B., & Zhang, J. (2023b). Dynamic path-controllable deep unfolding network for compressive sensing. IEEE Transactions on Image Processing, 32, 2202–2214.

    Google Scholar 

  • Sun, J., Li, H., Zongben, X., et al. (2016). Deep ADMM-Net for compressive sensing MRI. Proceedings of Neural Information Processing Systems (NeurIPS), 29, 10–18.

    Google Scholar 

  • Sun, Y., Chen, J., Liu, Q., Liu, B., & Guo, G. (2020). Dual-path attention network for compressed sensing image reconstruction. IEEE Transactions on Image Processing, 29, 9482–9495.

    MATH  Google Scholar 

  • Szczykutowicz, T. P., & Chen, G.-H. (2010). Dual energy CT using slow kVp switching acquisition and prior image constrained compressed sensing. Physics in Medicine & Biology, 55(21), 6411.

    Google Scholar 

  • Tian, C., Yong, X., Li, Z., Zuo, W., Fei, L., & Liu, H. (2020). Attention-guided CNN for image denoising. Neural Networks, 124, 117–129.

    Google Scholar 

  • Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research,9(11).

  • Wang, H., Zhang, T., Yu, M., Sun, J., Ye, W., Wang, C., & Zhang, S. (2020). Stacking networks dynamically for image restoration based on the plug-and-play framework. In Proceedings of European conference on computer vision (ECCV), pp. 446–462.

  • Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4), 600–612.

    Google Scholar 

  • Wu, Z., Zhang, J., & Mou, C. (2021). Dense deep unfolding network with 3D-CNN prior for snapshot compressive imaging. In Proceedings of IEEE international conference on computer vision (ICCV), pp. 4892–4901.

  • **ang, J., Dong, Y., & Yang, Y. (2021). FISTA-Net: Learning a fast iterative shrinkage thresholding network for inverse problems in imaging. IEEE Transactions on Medical Imaging, 40(5), 1329–1339.

    Google Scholar 

  • Yang, J., Wright, J., Huang, T., & Ma, Y. (2008). Image super-resolution as sparse representation of raw image patches. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8.

  • You, D., **e, J., & Zhang, J. (2021a). ISTA-Net\(^{++}\): Flexible deep unfolding network for compressive sensing. In Proceedings of IEEE international conference on multimedia and expo (ICME), pp. 1–6.

  • You, D., Zhang, J., **e, J., Chen, B., & Ma, S. (2021b). COAST: Controllable arbitrary-sampling network for compressive sensing. IEEE Transactions on Image Processing, 30, 6066–6080.

    MathSciNet  Google Scholar 

  • Zamir, S. W., Arora, A., Khan, S., Hayat, M., Khan, F. S., Yang, M.-H., & Shao, L. (2021). Multi-stage progressive image restoration. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp. 14821–14831.

  • Zhang, J., & Ghanem, B. (2018). ISTA-Net: Interpretable optimization-inspired deep network for image compressive sensing. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp. 1828–1837.

  • Zhang, J., Zhao, C., Zhao, D., & Gao, W. (2014a). Image compressive sensing recovery using adaptively learned sparsifying basis via L0 minimization. Signal Processing, 103, 114-126.

    Google Scholar 

  • Zhang, J., Zhao, D., & Gao, W. (2014b). Group-based sparse representation for image restoration. IEEE Transactions on Image Processing, 23(8), 3336–3351.

    MathSciNet  MATH  Google Scholar 

  • Zhang, J., Zhao, C., & Gao, W. (2020a). Optimization-inspired compact deep compressive sensing. IEEE Journal of Selected Topics in Signal Processing, 14(4), 765–774.

    Google Scholar 

  • Zhang, J., Chen, B., **ong, R., & Zhang, Y. (2023). Physics-inspired compressive sensing: Beyond deep unrolling. IEEE Signal Processing Magazine, 40(1), 58–72.

    Google Scholar 

  • Zhang, K., Zuo, W., Chen, Y., Meng, D., & Zhang, L. (2017a). Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising. IEEE Transactions on Image Processing, 26(7), 3142–3155.

    MathSciNet  MATH  Google Scholar 

  • Zhang, K., Zuo, W., Gu, S., & Zhang, L. (2017b). Learning deep CNN denoiser prior for image restoration. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp. 3929–3938.

  • Zhang, K., Van Gool, L., & Timofte, R.(2020b). Deep unfolding network for image super-resolution. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp. 3217–3226.

  • Zhang, K., Li, Y., Zuo, W., Zhang, L., Van Gool, L., & Timofte, R. (2021). Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis and Machine Intelligence.

  • Zhang, Y., Li, K., Li, K., Wang, L., Zhong, B., & Fu, Y. (2018). Image super-resolution using very deep residual channel attention networks. In Proceedings of European conference on computer vision (ECCV), pp. 286–301.

  • Zhang, Z., Liu, Y., Liu, J., Wen, F., & Zhu, C. (2020c). AMP-Net: Denoising-based deep unfolding for compressive image sensing. IEEE Transactions on Image Processing, 30, 1487–1500.

    MathSciNet  Google Scholar 

  • Zhao, C., Ma, S., Zhang, J., **ong, R., & Gao, W. (2016). Video compressive sensing reconstruction via reweighted residual sparsity. IEEE Transactions on Circuits and Systems for Video Technology, 27(6), 1182–1195.

    Google Scholar 

  • Zhao, C., Zhang, J., Ma, S., Fan, X., Zhang, Y., & Gao, W. (2016). Reducing image compression artifacts by structural sparse representation and quantization constraint prior. IEEE Transactions on Circuits and Systems for Video Technology, 27(10), 2057–2071.

    Google Scholar 

  • Zheng, H., Yong, H., & Zhang, L. (2021). Deep convolutional dictionary learning for image denoising. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp. 630–641.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by Shenzhen Fundamental Research Program under Grant GXWD20201231165807007-20200807164903001 and in part by the Shenzhen Research Project under Grant JCYJ20220531093215035.

Appendices

Appendix A: More Comparison Results

1.1 A.1 High-Level Comparisons of Deep CS Networks

Here we provide more conceptual and functional classifications, and qualitative comparisons among existing CS networks and our PRL in Table 10. We can see that the PRL not only exhibits its organic absorption of the merits of optimization algorithms and advanced network structures but also breaks through the bottlenecks including performance degradation, saturation and the surge of time cost with a more general and compact FD physical framework beyond ID unrolling. It is expected and verified to better approximate the theoretically ideal target reconstruction functions.

1.2 A.2 More Comparisons on the CBSD68 and DIV2K Benchmarks

In this subsection, we compare our PRL and the competing methods on CBSD68 (Martin et al., 2001) and DIV2K (Agustsson & Timofte, 2017) validation sets. From Table 11, we can see the consistent leading of PRL networks over the previous networks (about PSNR of 0.8–1.8dB, 1.4–2.1dB, 0.3–1.5dB and 0.7–1.6dB on Set11 (Kulkarni et al., 2016), Urban100 (Huang et al., 2015), CBSD68 and DIV2K). Figure 9 shows that PRL networks recover more details and fine edges, and can especially reconstruct the correct shapes and accurate line textures. We also find that the accuracy leading on CBSD68 is relatively less significant, which comes from the greater recovery difficulties of CBSD68 (Zhang et al., 2017a, 2021) and our default extremely unbalanced training set. More further experiments and related analyses about the training set are provided in Sect. 1.

Table 11 PSNR(dB)/SSIM comparisons of various CS methods on CBSD68 (Kulkarni et al., 2016) and DIV2K (Agustsson & Timofte, 2017). The existing physics-free, physics-engaged, and our methods are in the yellow, green and cyan backgrounds, respectively, with the corresponding highlighted best and second best results
Fig. 9
figure 9

Visual comparisons on recovering two images from CBSD68 (Kulkarni et al., 2016) (top) and DIV2K (Agustsson & Timofte, 2017) (bottom) respectively under the setting of \(\gamma =30\%\)

Table 12 PSNR comparison and vertical increments of the trained improved network variants with seven different training configurations on Set11 (Kulkarni et al., 2016) under the setting of \(\gamma =10\%\)

Appendix B: More Ablation Studies and Discussions

1.1 B.1 Effect of Fully-Utilized Physics in Eliminating Degradation

Under the single-scale/path plain framework, the existing ID unrolling scheme suffers from not only the performance saturation and large inference time increase with the unfavorable noise delicacy due to its sensitive “slender” architecture, but also the training difficulty with serious degradation as our pilot experiment exhibited. It is eliminated by activating the full-channel physics engagements. Here we conduct more experiments on five variants of our PGD-unrolled improved baseline in Fig. 3d with four degrees of physics utilization to give a better understanding of the effect of physical information guidance.

Specifically, we retrain two fully-activated networks with \(r\equiv 1\), \(K\in \left\{ 20, 30\right\} \), \(C=D\equiv 64\) and the maximized physical feature dimensionality \(q=64\), and three partially-activated ones with \(K=30\) and \(q\in \left\{ 1,4,16\right\} \) by disabling the latter \((64-q)\) Conv kernels and activations in \(\mathcal {H}^{(k)}_\text {grad}\) for comparison. Fig. 10 shows the training processes and exhibits that the 30-stage (212-layer) recovery networks are inevitably difficult to converge as expected when \(q\in \left\{ 1, 4\right\} \), and the degradation is alleviated by \(q=16\) and eliminated by \(q=64\). Table 12 gives the final evaluations of all these trained networks on Set11 (Kulkarni et al., 2016) and demonstrates the effectiveness of the physics-feature fusion enhancement with a 0.74dB PSNR improvement in total brought by the sufficiently large q values. Note that our neural physical guidance is implemented by multiscale unrolling in FD, and can not be achieved by only the introduction of extra parameters. We also find that this plain architecture quickly goes saturated when \(K\ge 20\), and obtain similar conclusions from extra trainings with \(C=D\equiv 32\) and \(K\in \{80,100,120\}\). These results reveal the importance of adequate physics injection in stabilizing and enhancing the learning processes of deep physics-engaged networks for improving performance.

Fig. 10
figure 10

Training loss (left) and test PSNR (right) curves on Set11 (Kulkarni et al., 2016) of our five retrained improved variants with \(\gamma =10\%\). The learning of 30-stage (212-layer) network is stabilized and enhanced by more feature channels of physics guidance from \(\{\textbf{A}, \textbf{y}\}\)

Fig. 11
figure 11

Visualization of the trajectories with relative data positions based on the principal component analysis (PCA) technique about recovering an image from CBSD68 (Martin et al., 2001) by the plain OPINE-Net\(^+\) (Zhang et al., 2020a), MADUN (Song et al., 2021) and our PRL networks with/without noise \({\epsilon }\) of a large standard deviation (noise level) \(\sigma =50\) in observation \(\textbf{y}=\textbf{Ax}+{\epsilon }\) (top/middle) when \(\gamma =10\%\). The recovered results (bottom) are marked by \(\sigma \) and PSNR (dB) with corresponding drops. Compared with the existing plain unrolling networks, PRL can stabilize the recovery process with heavy noise interference

Fig. 12
figure 12

Visualization of the [0, 255]-normalized intermediate images/features (left) of recovering three Set11 (Kulkarni et al., 2016) images (including “Lena”, “Monarch” and “Peppers”) from the 5-/15-th stage of OPINE-Net\(^+\) (top) and our default PRL-PGD (bottom) with \(\sigma \in \{0,50\}\) and \(\gamma =10\%\). Features from the third level of PRL-PGD are upscaled to be in 8-dimensional space and visualized channel-by-channel. Their distribution curves are on the right side. All visualizations are marked by PSNR/SSIM of the final recoveries (with drop margins) in their upper left corner. PRL-PGD recovers better under noise disturbance by preserving the image structure and feature distribution shapes based on its effective U-shaped FD unrolling

1.2 B.2 Effect of U-Shaped Residual Recovery in Improving the Noise Robustness

To better understand the reconstruction characteristics of different unrolled architectures, based on Fig. 7, we conduct further image- and feature-level visualizations on the refinement processings of various networks from the perspective of noise robustness. Figure 11 provides the restoration trajectories of four PGD-unrolled networks including the plain 9-stage OPINE-Net\(^+\) (Zhang et al., 2020a) and 25-stage MADUN (Song et al., 2021) in ID, and our 30-stage PRL-PGD and PRL-RND in FD about the recovery of an image from CBSD68 (Martin et al., 2001) (with/without observation noise). Here we use the principal component analysis (PCA) instead of the more advanced t-SNE (Van der Maaten & Hinton, 2008) to reduce the image data dimensionality for all stages, since the PCA technique is simpler and more stable without hyperparameters or randomness in our evaluation. We map the high-dimensional features in PRL networks from FD to ID by \(\times r\)-upscaling and transforming them to be with the shape of \(H\times W\times 1\) through PixelShuffle (Shi et al., 2016) and \(\mathcal {H}_\text {rec}\). From the visualization in Fig. 11, we observe that compared with OPINE-Net\(^+\), MADUN reconstructs better with 25-step milder content refinements in the noise-free case. Both of them are severely affected by observation noise which brings large error accumulation and a “domino” effect of amplification in such a long and slender architecture, especially the deeper ones (see the blocking/ripple/honeycomb artifacts in OPINE-Net\(^+\) recovery with a 6.20dB PSNR drop and more totally distorted blocks from MADUN with a 10.56dB PSNR drop). Similar phenomena are observed in other plain networks (Zhang et al., 2020c; You et al., 2021b) and our baseline variants. PRL preserves the trajectory structures and well keeps the recovery stable under the interference of noise (with <3.50dB PSNR drops). Note that the intra-stage connections and inter-stage short-/long-term memories are introduced in OPINE-Net\(^+\) and MADUN for transmission enhancements. These results demonstrate the positive effect of improving noise robustness from our compact multiscale unrolling with additive long-range skip connections, which conducts three levels of residual FD recoveries and plays a major role in stabilizing the 30-stage reconstructions since our PGD-/RND-unrolling networks share the similar recovery trajectories and are with minor differences in the path detail.

Table 13 Evaluation of the recovery accuracy in PSNR, inference times and parameter numbers of four prior networks (left) and twenty PRL-PGD variants (right) on Set11 (Kulkarni et al., 2016) when CS ratio \(\gamma =50\%\)

We further extract the intermediate refined images/features of three Set11 (Kulkarni et al., 2016) instances from the recoveries of the 5-/15-th stage in OPINE-Net\(^+\)/PRL-PGD, and visualize them in Fig. 12 with \(\sigma \in \{0,50\}\). Specifically, we provide their original distribution density curves (right) and visualize stage outputs channel-by-channel (left) with [0, 255]-normalization and upscaling \((H/4)\times (W/4)\times 128\) features from PRL-PGD to the shape of \(H\times W\times 8\). From Fig. 12 we observe that the ID recoveries of OPINE-Net\(^+\) are with unstable signal distributions and loss of accurate structures of both the image content and distribution shapes under the noise disturbance (with an average PSNR/SSIM drop of 9.25dB/0.4448). Our FD recoveries (with a drop of 5.95dB/0.2458) decompose the images into multiple physics channels and preserve their spatial structures with smooth and stable distribution curves. We also find that with only an end-to-end \(\ell _2\) loss function, OPINE-Net\(^+\) can be regarded as a special FD network reconstructing a single-channel feature instead of totally working in ID since its non-linearly transformed intermediate results tend to be distributed in \([-0.3,1.2]\) (not the original [0, 1] for ID). PRL-PGD recovers better in 8-dimensional space with wider and sparser zero-mean distributions which may be helpful to enhance the edges and textures. These results validate the necessity of our FD restoration for supporting the high-throughput transmissions and sufficient physics guidance with leaving large freedom degrees for information distributions that makes PRL robust.

Fig. 13
figure 13

Visualization of the LAM (Gu & Dong, 2021) attribution results of PRL\(^*\)-PGD and PRL-PGD with \(K=1\) and \(C=D=8\) about the recovery of two images named “Boats” (top) and “Barbara” (bottom) from Set11 (Kulkarni et al., 2016) when \(\gamma =10\%\). The LAM maps represent the importance of different pixels for the selected central \(16\times 16\) local patches and the more informative regions are marked with darker red color. a is the original image with the selected region marked in a red box; b is the LAM result; c is the LAM result with input image; d is the informative area with input image; e is the CS reconstructed result. The upper/lower rows of each instance correspond to the single-/multi-scale PRL\(^*\)/PRL, respectively. Their recovery PSNR (dB) with accuracy distances are exhibited on the right side. One can see that PRL produces much more accurate recovered outputs by significantly increasing the range of pivotal information utilization

Table 14 PSNR(dB)/SSIM comparisons among our default PGD-unrolled PRL networks with three different training set configurations (a), (b) and (c) on four benchmarks in the case of \(\gamma =50\%\)

1.3 B.3 Study of FD Dimensionality and the Inter-Stage Weight Sharing Strategy

We evaluate 20 PRL-PGD variants with the FD dimensionalities \(d\in \{2,4,8,16\}\), the stage numbers of each group \(K\in \{1,3,5,7\}\) and the weight sharing across all the stages in each group with \(K=5\), and report their recovery accuracy, speeds, and parameter numbers when CS ratio \(\gamma =50\%\) in Table 13 with four competing unrolled methods (Zhang et al., 2020a, c; You et al., 2021b; Song et al., 2021). We observe that the PRL performance gives full play when \(d\in \{8,16\}\) and is restricted by \(d\in \{2,4\}\) with poor feature capacity, so a larger FD dimensionality can sufficiently stimulate their high performance. The inter-stage weight sharing strategy brings little impact on recovery accuracy (only 0.04dB PSNR drop on average) and speed (0.506ms time saving) with about \(\times (1/K)\) parameter reduction. It benefits from the compact and efficient PRL structure and is quite effective and considerable in deployments. The PRL-PGD variants most comparable to other methods (in parameter number, depth and speed) bring the average improvements including PSNR leading of 0.66dB, 16% time reduction and 26% parameter number saving under such a high sampling rate. Note that these PRL variants are limited by insufficient configurations but the competing networks are their own optimal and saturated versions. These results verify the higher structural efficiency and appealing compression potential of PRL with its flexible and competent optional settings for real-world applications.

1.4 B.4 Effect of Multiscale Architecture and Large Training Set in Improving Recovery Accuracy

One important characteristic of PRL different from existing unrolled networks is the multiscale design that not only speeds up inference but also provides inter-scale FD communications for capturing the image context and structure. With a parameter number close to PRL, the single-scale U-shaped PRL\(^*\) has a receptive field size of \(R_{\text {PRL}^*}=(84K+13)^2=(2L+1)^2\) in its longest fully-convolutional path, where K is group stage number and L is the \(3\times 3\) Conv layer number or network depth. But PRL has a much larger size of \(R_{\text {PRL}}=(140K+8)^2=(10L/3-12)^2\) based on the introduction of \(2\times 2\) strided/transposed (S-/T-) Conv layers, which leads to \(R_{\text {PRL}}=(5/3)^2 R_{\text {PRL}^*}\) when \(L\rightarrow \infty \). It is quite important to reconstruct a specific region by utilizing a larger informative image part (Zhang et al., 2017b). Here we analyze the effect of the receptive field differences between the single-/multi-scale PRL variants by using the local attribution map (LAM) (Gu & Dong, 2021) to compare the input image parts that have a strong influence on the CS recovered outputs when recovering two images from Set11 (Kulkarni et al., 2016) with \(\gamma =10\%\). The two LAM visualization results in Fig. 13 indicate that both PGD-unrolled PRL and PRL\(^*\) perform the sampling and recovery for the selected local patch with some edges and textures by intensively utilizing a region with rich and relevant information. PRL significantly takes notice of a wider range of informative clean areas to guide its more accurate recoveries and achieves an average PSNR gain of 0.36dB over the PRL\(^*\), thus revealing the stronger information capture capability and higher structural efficiency of our multiscale design. We also find that due to the block-based CS scheme (with a globally unified block size of \(32\times 32\)) and the iterative stage-by-stage physical knowledge injections of \(\{\textbf{A}, \textbf{y}\}\), the actual network receptive field is larger than their fully-convolutional paths as we calculated above. The most heavily utilized informative regions are with complicated and content-correlated specific spatial shapes and distributions.

To enhance the training sample diversity, we follow (Zhang et al., 2021; You et al., 2021a; Song et al., 2021) to use four datasets including T91 (Yang et al., 2008; Dong et al., 2014a), Train400 (Chen & Pock, 2016; Zhang et al., 2017a), DIV2K (Agustsson & Timofte, 2017) training set and the Waterloo exploration database (WED) (Ma et al., 2016). They cover a large image space and can enrich the network prior to many classic image restoration tasks (Zhang et al., 2021). Here we conduct more PRL-PGD trainings with three different dataset settings, report their evaluation results in Table 14, and observe from which that our network trainings on only the Train400 dataset can bring the consistent recovery accuracy average leading of PRL (PGD-unrolled versions) over the unrolled network MADUN (Song et al., 2021) in PSNR/SSIM of about 0.77dB/0.0008, 1.56dB/0.0036, 0.47dB/0.0009 and 0.71dB/0.0011 on Set11 (Kulkarni et al., 2016), Urban100 (Huang et al., 2015), CBSD68 (Martin et al., 2001) and DIV2K validation set. The introductions of T91/DIV2K and WED generally bring negative and positive effects in most benchmarks, respectively. The combination of them provides the average accuracy changes of +0.13dB/+0.0006, +0.10dB/+0.0007, −0.14dB/-0.0005, and +0.08dB/+0.0003. Especially, WED overturns the negative impacts of T91/DIV2K in Set11 and Urban100 with appreciable improvements. We also find that the recovery performance on CBSD68 can not directly benefit from the larger training set due to its unbalanced sample distribution (only \(6.6\%\) training images are \(180\times 180\) grayscale BSD patches), thus verifying the data bias limitation that restricts network learning and causes the overfitting on PRL variants with large capacities.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, B., Song, J., **e, J. et al. Deep Physics-Guided Unrolling Generalization for Compressed Sensing. Int J Comput Vis 131, 2864–2887 (2023). https://doi.org/10.1007/s11263-023-01814-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-023-01814-w

Keywords

Navigation