Introduction

Light, as an electromagnetic wave, has two essential components: amplitude and phase1. Optical detectors, usually relying on photon-to-electron conversion (such as charge-coupled device sensors and the human eye), measure the intensity that is proportional to the square of the amplitude of the light field, which in turn relates to the transmittance or reflectance distribution of the sample (Fig. 1a, b). However, they cannot capture the phase of the light field because of their limited sampling frequency2.

Fig. 1: Light is transmitted through or reflected from different samples.
figure 1

a An absorptive sample with a nonuniform transmittance distribution. b A reflective sample with a nonuniform reflectance distribution. c A transparent (weakly-absorbing) sample with a nonuniform RI or thickness distribution. d A sample with a uniform transmittance distribution. e A sample with a uniform transmittance distribution placed before atmospheric turbulence with inhomogeneous RI distribution. f A reflective sample with a nonuniform surface height distribution

Actually, in many application scenarios, the phase rather than the amplitude of the light field carries the primary information of the samples3,4,5,6. For quantitative structural determination of transparent and weakly scattering samples3 (Fig. 1c), the phase delay is proportional to the sample’s thickness or refractive index (RI) distribution, which is critically important for bioimaging because most living cells are transparent. For quantitative characterization of the aberrated wavefront5 (Fig. 1d, e), the phase aberration is caused by atmospheric turbulence with an inhomogeneous RI distribution in the light path, which is mainly used in adaptive aberration correction. Also, for quantitative measurement of the surface profile6 (Fig. 1f), the phase delay is proportional to the surface height of the sample, which is very useful in material inspection.

Since the phase delay across the wavefront is necessary for the above applications, but the optical detection devices can only perceive and record the amplitude of the light field, how can we recover the desired phase? Fortunately, as the light field propagates, the phase delay also causes changes in the amplitude distribution; therefore, we can record the amplitude of the propagated light field and then calculate the corresponding phase. This operation generally comes under different names according to the application domain; for example, it is quantitative phase imaging (QPI) in biomedicine3, phase retrieval in coherent diffraction imaging (CDI)4 which is the most commonly used term in X-ray optics and non-optical analogs such as electrons and other particles, and wavefront sensing in adaptive optics (AO)5 for astronomy and optical communications. Here, we collectively refer to the way of calculating the phase of a light field from its intensity measurements as phase recovery (PR).

As is common in inverse problems, calculating the phase directly from an intensity measurement after propagation is usually ill-posed7. Suppose the complex field at the sensor plane is known. We can directly calculate the complex field at the sample plane using numerical propagation8 (Fig. 2a). However, in reality, the sensor only records the intensity but loses the phase, and, moreover, it is necessarily sampled by pixels of finite area size. Because of these complications, the complex field distribution at the sample plane generally cannot be calculated in a straightforward manner (Fig. 2b).

Fig. 2: Calculating complex field at the sample plane from the complex field or the intensity at the sensor plane.
figure 2

a The complex field at the sample plane can be directly calculated from the complex field at the sensor plane. b The complex field at the sample plane cannot be directly calculated from the intensity at the sensor plane alone. U: complex field. A: amplitude. θ: phase

We can transform phase recovery into a well-posed/deterministic problem by introducing extra information, such as holography or interferometry at the expense of having to introduce a reference wave8,9, Shack-Hartmann wavefront sensing which introduces a microlens array at the conjugate plane10,11, and transport of intensity equation requiring multiple through-focus amplitudes12,13. Alternatively, we can solve this ill-posed phase recovery problem in an iterative manner by optimization, i.e., the so-called phase retrieval such as Gerchberg-Saxton-Fienup algorithm14,15,16, multi-height algorithm17,18,19, real-space ptychography20,21,22, and Fourier ptychography23,24. Next, we introduce these classical phase recovery methods in more detail.

Holography/interferometry

By interfering the unknown wavefront with a known reference wave, the phase difference between the object wave and the reference wave is converted into the intensity of the resulting hologram/interferogram due to alternating constructive and destructive interference of the two waves across their fronts. This enables direct calculation of the phase from the hologram8.

In in-line holography, where the object beam and the reference beam are along the same optical axis, four-step phase-shifting algorithm is commonly used for phase recovery (Fig. 3)25. At first, the complex field of the object wave at the sensor plane is calculated from the four phase-shifting holograms. Next, the complex field at the sample plane is obtained through numerical propagation. Then, by applying the arctangent function over the final complex field, a phase map in the range of (−π, π] is obtained, i.e., the so-called wrapped phase. The final sample phase is obtained after phase unwrap**. Other multiple-step phase-shifting algorithms are also possible for phase recovery26. Spatial light interference microscopy (SLIM), as a well-known QPI method, combines the phase-shifting algorithm with a phase contrast microscopy for phase recovery over transparent samples27.

Fig. 3: Description of in-line holography based on the four-step phase-shifting algorithm.
figure 3

I0: hologram with 0 phase delay. Iπ/2: hologram with π/2 phase delay. Iπ: hologram with π phase delay. I3π/2: hologram with 3π/2 phase delay

In off-axis holography, where the reference beam is slightly tilted from the optical axis, the phase is modulated into a carrier frequency that can be recovered through spatial spectral filtering with only one holographic measurement (Fig. 4)28. By appropriately designing the carrier frequency, the baseband that contains the reference beam can be well separated from the object beam. After transforming the measured hologram into the spatial frequency domain through a Fourier transform (FT), one can select the +1st or −1st order beam and move it to the baseband. By applying an inverse FT, the object beam can be recovered. One has to be careful, however, not to exceed the Nyquist limit on the camera as the angle between reference and object increases. Moreover, as only a small part of the spatial spectrum is taken for phase recovery, off-axis holography typically wastes a lot of spatial bandwidth product of the system. To enhance the utilization of the spatial bandwidth product, the Kramers-Kronig relationship and other iterative algorithms have been recently applied in off-axis holography29,30,31.

Fig. 4: Description of off-axis holography based on spatial spectral filtering.
figure 4

FT Fourier transform, IFT inverse Fourier transform

Both the in-line and off-axis holography discussed above are lensless, where the sensor and sample planes are not mutually conjugated. Therefore, a backward numerical propagation from the former to the latter is necessary. The process of numerical propagation can be omitted if additional imaging components are added to conjugate the sensor and sample planes, such as digital holographic microscopy32.

Shack-Hartmann wavefront sensing

If we can obtain the horizontal and vertical phase gradients of a wavefront in some ways, then the phase can be recovered by integrating the phase gradients in these orthogonal directions. Shack-Hartmann wavefront sensor10,11 is a classic way to do so from the perspective of geometric optics. It usually consists of a microlens array and an image sensor located at its focal plane (Fig. 5). The phase gradient of the wavefront at the surface of each microlens is calculated linearly from the displacement of the focal point on the focal plane, in both horizontal and vertical (x-axis and y-axis) directions. The phase can then be computed by integrating the gradient at each point, whose resolution depends on the density of the microlens array. In addition, quantitative differential interference contrast microscopy33, quantitative differential phase contrast microscopy34, and quadriwave lateral shearing interferometry35 also recover the phase from its gradients. They may achieve higher resolution than the Shack-Hartmann wavefront sensor.

Fig. 5: Description of the Shack-Hartmann wavefront sensor.
figure 5

f: focal length of microlens array

Transport of intensity equation

For a light field, the wavefront determines the axial variation of the intensity in the direction of propagation. Specifically, there is a quantitative relationship between the gradient and curvature of the phase and the axial differentiation of intensity, the so-called transport of intensity equation (TIE)12. This relationship has an elegant analogy to fluid mechanics, approximating the light intensity as the density of a compressible fluid and the phase gradient as the lateral pressure field36. TIE can be derived from three different perspectives: the Helmholtz equations in the paraxial approximation, and the Fresnel diffraction and Poynting theorem in the paraxial and weak-defocusing approximation13. The gradient and curvature of the phase together determine the wavefront shape, whose normal vector is then parallel to the wavevector at each point of the wavefront, and consequently to the direction of energy propagation. In turn, variations in the lateral energy flux also result in axial variations of the intensity. Convergence of light by a convex lens is an intuitive example (Fig. 6): the wavefront in front of the convex lens is a plane, whose wavevector is parallel to the direction of propagation. As such, the intensity distribution on different planes is constant; that is, the axial variation of the intensity is equal to zero. Then, the convex lens changes the wavefront so that all wavevectors are directed to the focal point, and therefore, as the light propagates, the intensity distribution becomes denser and denser, meaning that the intensity varies in the axial direction (equivalent, its axial derivative is not zero).

Fig. 6
figure 6

A convex lens converges light to a focal point

As there is a quantitative relationship between the gradient and curvature of the phase and the axial differentiation of intensity, we can exploit it for phase recovery (Fig. 7). By shifting the sensor axially, intensity maps at different defocus distances are recorded, which can be used to approximate the axial differential by numerical difference, and thus calculate the phase through TIE. Due to the addition of the imager, the sensor and sample planes are conjugated. Besides, TIE can also be used in lensless systems to recover the phase at the defocus plane, which thus requires an additional numerical propagation13.

Fig. 7
figure 7

Description of phase recovery by transport of intensity equation (TIE)

It is worth noting that TIE is suitable for a complete and partially coherent light source, and the resulting phase is continuous and does not require phase unwrap**, while it is only effective in the case of paraxial and weak-defocusing approximation13.

Phase retrieval

If extra information is not desired to be introduced, then calculating the phase directly from a propagated intensity measurement is an ill-posed problem. We can overcome such difficulty through incorporating prior knowledge. This is also known as regularization. In the Gerchberg-Saxton (GS) algorithm14, the intensity at the sample plane and the far-field sensor plane recorded by the sensor are used as constraints. A complex field is projected forward and backward between these two planes using the Fourier transform and constrained by the intensity iteratively; the resulting complex field will gradually approach a solution (Fig. 8a). Fienup changed the intensity constraint at the sample plane to the aperture (support region) constraint, so that the sensor only needs to record one intensity map, resulting in the error reduction (ER) algorithm and the hybrid input-output (HIO) algorithm (Fig. 8b)15,16. In addition to the aperture constraint, one can introduce other physical constraints such as histogram37, atomicity38, and absorption39 to reduce the ill-posedness of phase retrieval. Furthermore, many types of sparsity priors such as spatial domain40, gradient domain41,42, and wavelet domain43 are effective regularizers for phase retrieval.

Fig. 8: Description of alternating-projection algorithms.
figure 8

a Gerchberg-Saxton algorithm. b Error reduction and hybrid input-output algorithms

Naturally, if more intensity maps are recorded by the sensor, there will be more prior knowledge for regularization, further reducing the ill-posedness of the problem. By moving the sensor axially, the intensity maps of different defocus distances are recorded as an intensity constraint, and then the complex field is computed iteratively like the GS algorithm (Fig. 9a), the so-called multi-height phase retrieval17,18,19. In this axial multi-intensity alternating projection method, the distance between the sample plane and the sensor plane is usually kept as close as possible, so that numerical propagation is used for projection instead of Fourier transform. Meanwhile, with a fixed position of the sensor, multiple intensity maps can also be recorded by radially moving the aperture near the sample, and then the complex field is recovered iteratively like the ER and HIO algorithms (Fig. 9b), the so-called real-space ptychography20,21,22. In this radial multi-intensity alternating projection method, each adjoining aperture constraint overlaps one another and expands the field of view in real space. Furthermore, angular multi-intensity alternating projection is also possible. By switching the aperture constraint from the spatial domain to the frequency domain with a lens system, multiple intensity maps with different frequency information are recorded by changing the angle of the incident light (Fig. 9c), the so-called Fourier ptychography23,24. Due to the change of illumination angle, high-frequency information that originally exceeds the numerical aperture is recorded, expanding the Fourier bandwidth in reciprocal space. Recently, synthetic aperture ptychography44 was proposed to simultaneously expand the bandwidth in real space and reciprocal space, in which an extended plane wave is used to illuminate a stationary object and subsequently a coded image sensor is translated within the far field to record data.

Fig. 9: Description of multi-intensity alternating projection.
figure 9

a Axial multi-intensity alternating projection. b Radial multi-intensity alternating projection. c Angular multi-intensity alternating projection. Forward: forward numerical propagation. Backward: backward numerical propagation

In addition to alternating projections, there are two most representative non-convex optimization methods, namely the Wirtinger flow45 and truncated amplitude flow algorithms46. They can be transformed into convex optimization problems through semidefinite programming, such as the PhaseLift algorithm47.

Recovery of low-frequency phase component

As mentioned at the beginning, because the phase information of the light field is converted into amplitude variations during propagation, one can recover the phase from the recorded amplitude distribution. However, low-frequency phase component causes less amplitude variations, which is difficult for detection. A more quantitative analysis can be performed through the phase transfer function13, which characterizes the transfer response of phase content at different spatial frequencies for an imaging system. For holography and Shack-Hartmann wavefront sensing, due to the interference phenomenon or the microlens array, the low-resolution phase component is converted into a fringe pattern or focus translation, which can be easily detected. For other lensless methods of recovering phase from propagation intensity maps, such as lensless TIE, Gerchberg-Saxton-Fienup algorithm, multi-height algorithm, and real-space ptychography with an unknown probe beam, their phase transfer function of the low-frequency component is close to zero. That is to say, the slow-varying phase gradient cannot induce sufficient intensity contrast to be detected and thus cannot be recovered through subsequent algorithms. Coded ptychography48 is an effective solution, in which the coded layer (such as disorder-engineered surface49 or fixed blood-cell layer50,51) effectively converts the phase information of different spatial frequencies into detectable distortions in the diffraction patterns. Similarly, the coded layer can also be used in the multi-height algorithm to recover the slow-varying phase profiles52. As for the lens-based case, such as lens-based TIE53,54, Fourier ptychography55, and quantitative differential phase contrast microscopy56, the phase transfer function of the imaging system can be modulated by changing the illumination angle, thereby collecting more low-frequency phase information.

Deep learning (DL) for phase recovery

In recent years, as an important step towards true artificial intelligence (AI), deep learning57 has achieved unprecedented performance in many tasks of computer vision with the support of graphics processing units (GPUs) and large datasets. Similarly, since it was first used to solve the inverse problem in imaging in 201658, deep learning has demonstrated promising potential in the field of computational imaging59. In the meantime, there is a rapidly growing interest in using deep learning for phase recovery (Fig. 10).

Fig. 10: Growth in interest in using “deep learning for phase recovery” overtime is depicted by the number of publications and citations on Web of Science.
figure 10

The used search code is “TS = ((“phase recovery” OR “phase retrieval” OR “phase imaging” OR “holography” OR “phase unwrap**” OR “holographic reconstruction” OR “hologram” OR “fringe pattern”) AND (“deep learning” OR “network” OR “deep-learning”))

For the vast majority of “DL for PR”, the implementation of deep learning is based on the training and inference of artificial neural networks (ANNs)60 through input-label paired dataset, known as supervised learning (Fig. 11). In view of its natural advantages in image processing, the convolutional neural network (CNN)61 is the most widely used ANN for phase recovery. Specifically, in order for the neural network to learn the map** from physical quantity A to B, a large number of paired examples need to be collected to form a training dataset that implicitly contains this map** relationship (Fig. 11a). Then, the gradient of the loss function is propagated backward through the neural network, and the network parameters are updated iteratively, thus internalizing this map** relationship (Fig. 11b). After training, the neural network is used to infer Bx from an unseen Ax (Fig. 11c). In this way, deep learning has been used in all stages of phase recovery and phase processing.

Fig. 11: Implementation of deep learning with paired dataset and supervised learning.
figure 11

a Datasets collection. b Network training. c Inference via a trained network. ω the parameters of the neural network, n the sample number of the training dataset

In fact, the rapid pace of deep-learning-based phase recovery has been documented in several excellent review papers. For example, Barbastathis et al.59 and Rivenson et al.62 reviewed how supervised deep learning powers the process of phase retrieval and holographic reconstruction. Zeng et al.63 and Situ et al.64 mainly focused on the use of deep learning in digital holography and its applications. Zhou et al.65 and Wang et al.66 reviewed and compared different usage strategies of AI in phase unwrap**. Dong et al.67 introduced a unifying framework for various algorithms and applications from the perspective of phase retrieval and presented its advances in machine learning. Park et al.68 discussed AI-QPI-based analysis methodologies in the context of life sciences. Differently, depending on where the neural network is used, we review various methods from the following four perspectives:

  • In the section “DL-pre-processing for phase recovery”, the neural network performs some pre-processing on the intensity measurement before phase recovery, such as pixel super-resolution (Fig. 12a), noise reduction, hologram generation, and autofocusing.

    Fig. 12: Overview example of “deep learning (DL) for phase recovery (PR) and phase processing”.
    figure 12

    a DL-pre-processing for PR. b DL-in-processing for PR. c DL-post-processing for PR. d DL for phase processing

  • In the section “DL-in-processing for phase recovery”, the neural network directly performs phase recovery (Fig. 12b) or participates in the process of phase recovery together with the physical model or physics-based algorithm by supervised or unsupervised learning modes.

  • In the section “DL-post-processing for phase recovery”, the neural network performs post-processing after phase recovery, such as noise reduction (Fig. 12c), resolution enhancement, aberration correction, and phase unwrap**.

  • In the section “Deep learning for phase processing”, the neural network uses the recovered phase for specific applications, such as segmentation (Fig. 12d), classification, and imaging modal transformation.

Finally, we summarize how to effectively use deep learning in phase recovery and look forward to potential development directions (see the section “Conclusion and outlook”). To let readers learn more about phase recovery, we present a live-updating resource (https://github.com/kqwang/phase-recovery).

DL-pre-processing for phase recovery

A summary of “DL-pre-processing for phase recovery” is presented in Table 1 and is described below, including the “Pixel super-resolution”, “Noise reduction”, “Hologram generation”, and “Autofocusing” sections.

Table 1 Summary of “DL-pre-processing for phase recovery”

Pixel super-resolution

A high-resolution image generally reveals more detailed information about the object of interest. Therefore, it is desirable to recover a high-resolution image from one or multiple low-resolution measurements of the same field of view, a process known as pixel super-resolution. Similarly, from multiple sub-pixel-shifted low-resolution holograms, a high-resolution hologram can be recovered by pixel super-resolution algorithms69. Luo et al.70 proposed to use the U-Net for this purpose. Compared with iterative pixel super-resolution algorithms, this deep learning method has an advantage in inference time while ensuring the same level of resolution improvement. It maintains high performance even with a reduced number of input low-resolution holograms.

After the pixel super-resolution CNN (SRCNN) was proposed for single-image super-resolution in the field of image processing71, this type of deep learning method was also used in other optical super-resolution problems, such as bright-field microscopy72 and fluorescence microscopy73. Similarly, this method of inferring corresponding high-resolution images from low-resolution versions via deep neural networks can also be used for holograms pixel super-resolution before doing phase recovery by conventional recovery methods (Fig. 13).

Fig. 13
figure 13

Description of deep-learning-based hologram super-resolution

Byeon et al.74 first applied the SRCNN to hologram pixel super-resolution, and named it HG-SRCNN. Compared with conventional focused-image-trained SRCNN and bicubic interpolation, this method, trained with defocus in-line holograms, can infer higher-quality high-resolution holograms. ** capabilities, Gurrola-Ramos et al.87 also improved it for fringe pattern denoising, where dense blocks are leveraged for reusing feature layers, local residual learning is used to address the vanishing gradient problem, and global residual learning is used to estimate the noise of the image instead of the denoised image directly. Compared with other neural networks mentioned above, it has a minor model complexity while maintaining the highest accuracy.

Hologram generation

As mentioned in the Introduction, in order to recover the phase, multiple intensity maps are needed in many cases, such as phase-shifting holography and axial multi-intensity alternating projection. Given its excellent map** capability, the neural network can be used to generate other relevant holograms from known ones, thus enabling phase recovery that requires multiple holograms (Fig. 15). In this approach, the input and output usually belong to the same imaging modality with high feature similarity, so it is easier for the neural network to learn. Moreover, the dataset is collected only by experimental record or simulation generation, without the need for phase recovery as ground truth in advance by conventional methods.

Fig. 15: Description of deep-learning-based hologram generation.
figure 15

a Phase-shifting method. b Axial multi-intensity alternating projection method

Zhang et al.17b). After training, the network is used as an end-to-end map** to infer the phase from intensity (Fig. 17c). Therefore, the DD approach is to guide/drive the training of the neural network with this implicit map**, which is internalized into the neural network as the parameters are iteratively updated.

Fig. 17: Description of dataset-driven network-only phase recovery.
figure 17

a Dataset collection. b Network training. c Inference via a trained network

Sinha et al.114 were among the first to demonstrate this end-to-end deep learning strategy for phase recovery, in which the phase of objects is inferred from corresponding diffraction images via a trained deep neural network. In dataset collection, they used a phase-only spatial light modulator (SLM) to load different public image datasets to generate the phase as ground truth, and after a certain distance, place the image sensor to record the diffraction image as input. The advantage is that both the diffraction image and the phase are known and easily collected in large quantities. Through comparative tests, they verified the adaptability of the deep neural network to unseen types of datasets and different defocus distances. Although this scheme cannot be used in practical application due to the use of the phase-type spatial light modulator, their pioneering work opens the door to deep-learning-inference phase recovery. For instance, Li et al.115 introduced the negative Pearson correlation coefficient (NPCC)116 as a loss function to train the neural network, and enhanced the spatial resolution by a factor of two by flattening the power spectral density of the training dataset. Deng et al.117 found that the higher the Shannon entropy of the training dataset, the stronger the generalization ability of the trained neural network. Goy et al.118 extended the work to phase recovery under weak-light illumination.

Meanwhile, Wang et al.119 extended the diffraction device of Sinha et al.114 to an in-line holographic device by adding a coaxial reference beam, and used the in-line hologram instead of the diffraction image as the input to a neural network for phase recovery. Nguyen et al.120 applied this end-to-end strategy for Fourier ptychography, inferring the high-resolution phase from a series of low-resolution intensity images via a U-Net, and Cheng et al.121 further used a single low-resolution intensity image under optimized illumination as the neural network input. Cherukara et al.122 extended this end-to-end deep learning strategy to CDI, in which they trained two neural networks with simulation datasets to infer the amplitude or phase of objects from far-field diffraction intensity maps, respectively. Ren et al.123 demonstrated the time and accuracy superiority of this end-to-end deep learning strategy over conventional numerical algorithms in the case of off-axis holography. Yin et al.124 introduced the cycle-GAN to extend this end-to-end deep learning strategy to the application scenario of unpaired datasets. Lee et al.125 replaced the forward generator of the cycle-GAN by numerical propagation, improving the phase recovery robustness of neural networks in highly perturbative configurations. Hu et al.126 applied this end-to-end deep learning strategy to the Shack-Hartmann wavefront sensor, inferring the phase directly from a spot intensity image after the microlens array. Wang et al.127 extended this end-to-end deep learning strategy to TIE, using a trained neural network to infer the phase of the cell object from a defocus intensity image illuminated by partially coherent light. Further, Zhou et al.128 used neural networks to infer high-resolution phase from a low-resolution defocus intensity image. Pirone et al.129 applied this hologram-to-phase deep learning strategy to improve the reconstruction speed of 3D optical diffraction tomography (ODT) from tens of minutes to a few seconds. Chang et al.130 expanded the illumination source from photons to electrons, recovering the phase images from electron diffraction patterns of twisted hexagonal boron nitride, monolayer graphene, and Au nanoparticles. Tayal et al.131 demonstrated the use of data augmentation and a symmetric invariant loss function to break the symmetry in the end-to-end deep learning phase recovery.

In addition to expanding the application scenarios of this end-to-end deep learning strategy, some researchers focused on the performance and advantages of different neural networks in phase recovery. Xue et al.132 applied Bayesian neural network (BNN) into Fourier ptychography for inferring model uncertainty while doing phase recovery. Li et al.133 applied GAN for phase recovery, inferring the phase from two symmetric-illumination intensity images. Wang et al.90,134 proposed a one-to-multi CNN, Y-Net90, from which the amplitude and phase of an object can be inferred from the input intensity simultaneously. Zeng et al.135 introduce the capsule network to overcome information loss in the pooling operation and internal data representation of CNNs. Compared with conventional CNNs, their proposed capsule-based CNN (RedCap) saves 75% of network parameters while ensuring higher holographic reconstruction accuracy. Wu et al.136 applied the Y-Net90 to CDI for simultaneous inference of phase and amplitude. Huang et al.137 introduced a recurrent convolution module into U-Net, trained using GAN, for holographic reconstruction with autofocus. Uelwer et al.138 used a cascaded neural network for end-to-end phase recovery. Castaneda et al.139 and Jaferzadeh et al.140 introduced GAN into off-axis holographic reconstruction. Luo et al.141 added dilated convolutions into a CNN, termed mixed-context network (MCN)141, for phase recovery. By comparing in a one-sample-learning scheme, they found that MCN is more accurate and compact than the conventional U-Net. Ding et al.142 added Swin Transformer143 into U-Net and trained it with low-resolution intensity as input and high-resolution phase as ground truth using cycle-GAN. The trained neural network can do phase recovery while enhancing the resolution and has higher accuracy than the conventional U-Net. In CDI, Ye et al.144 used a multi-layer perceptron for feature extraction before a CNN, considering the property of the far-field (Fourier) intensity images where the data are globally correlated. Chen et al.145,146 combined the spatial Fourier transform module with ResNet, termed Fourier imager network (FIN), to achieve holographic reconstruction with superior generalization to new types of samples and faster inference speed (9-fold faster than their previous recurrent neural network, 27-fold faster than conventional iterative algorithms). Shu et al.147 applied neural architecture search (NAS) to automatically optimize the network architecture for phase recovery. Compared with the conventional U-Net, the peak signal-to-noise ratio (PSNR) of their NAS-based network is increased from 34.7 dB to 36.1 dB, and the inference speed is increased by 27-fold.

As a similar deep learning phase recovery strategy in adaptive optics, researchers demonstrated that neural networks could be used to infer the phase of the turbulence-induced aberration wavefront or its Zernike coefficient from the distortion intensity of target objects148. In these applications, only the wavefront subsequently used for aberration correction is of interest, not the RI distribution of turbulence that produces this aberration wavefront.

Physics-driven approach

Different from the dataset-driven approach that uses input-label paired dataset as an implicit prior for neural network training, physical models, such as numerical propagation, can be used as an explicit prior to guide/drive the inference or training of neural networks, termed physics-driven (PD) approach. It only requires measurements of samples as an input-only dataset and is therefore an unsupervised learning mode. On the one hand, this explicit prior can be used to iteratively optimize an untrained neural network to infer the corresponding phase and amplitude from the measured intensity image as input, referred to as the untrained PD (uPD) scheme (Fig. 18a). On the other hand, this explicit prior can be used to train an untrained neural network with a large number of intensity images as input, which then can infer the corresponding phase from unseen intensity images, an approach called the trained PD (tPD) scheme (Fig. 18b).

Fig. 18: Description of physics-driven network-only phase recovery.
figure 18

a Untrained PD (uPD) scheme. b Trained PD (tPD) scheme

In order to more intuitively understand the difference and connection between the DD and PD approaches, let us compare the loss functions in Fig. 17 and Fig. 18:

$$Los{s}_{DD}=\mathop{\sum }\limits_{i=1}^{n}{\Vert {f}_{\omega }({I}_{i})-{\theta }_{i}\Vert }_{2}^{2}$$
(1)
$$Los{s}_{uPD}={\Vert {I}_{x}-H({f}_{\omega }({I}_{x}))\Vert }_{2}^{2}$$
(2)
$$Los{s}_{tPD}=\mathop{\sum }\limits_{i=1}^{n}{\Vert {I}_{i}-H({f}_{\omega }({I}_{i}))\Vert }_{2}^{2}$$
(3)

where \({\Vert \cdot \Vert }_{2}^{2}\) denotes the square of the l2-norm (or other distance functions), \({f}_{\omega }(\cdot )\) is a neural network with trainable parameters \(\omega\), \(H(\cdot )\) is a physical model (such as numerical propagation, Fourier transform, or Fourier ptychography measurement model), \({I}_{i}\) is the measured intensity image in the training dataset, \({\theta }_{i}\) is the phase in the training dataset, \({I}_{x}\) is the measured intensity image of a test sample, and \(n\) is the number of samples in the training dataset. In Eq. (1) for the DD approach, the priors used for network training are the measured intensity image and corresponding ground-truth phase. Meanwhile, in Eqs. (2) and (3) for the PD approaches, the priors used for network inference or training are the measured intensity image and physical model, instead of the phase. It should be noted that the uPD scheme is free from numerous intensity images as a prerequisite, but requires numerous iterations for each inference; while the tPD scheme completes the inference only passing through the trained neural network once, but requires a large number of intensity images for pretraining.

This PD approach was first implemented in the work on Fourier ptychography by Boominathan et al.149. They proposed it in the higher overlap case, including the scheme of directly using an untrained neural network for inference (uPD) and the scheme of training first and then inferring (tPD), and demonstrated the former by simulation.

For the uPD scheme, Wang et al.150 used a U-Net-based scheme to iteratively infer the phase of a phase-only object from a measured diffraction image whose de-focus distance is known. Their method demonstrates higher accuracy than conventional algorithms (such as GS and TIE) and the DD scheme, at the expense of a longer inference time (about 10 minutes for an input with 256 × 256 pixels). Zhang et al.151 extended this work to the case where the defocus distance is unknown by including it as another unknown parameter together with the phase to the loss function. Yang et al.152,153 found that after expanding the tested sample from phase-only to complex-amplitude, obvious artifacts and noise appeared in the recovered results. Therefore, they proposed to add an aperture constraint into the loss function to reduce the ill-posedness of the problem. Regarding the timeliness, they pointed out that it would cost as much as 600 hours to infer 3,600 diffraction images with this uPD scheme. Meanwhile, Bai et al.154 extended this from a single-wavelength case to a dual-wavelength case. Galande et al.155 found that this way of neural network optimization with a single-measurement intensity input lacks information diversity and can easily lead to overfitting of the noise, which can be mitigated by introducing an explicit denoiser. It is worth pointing out that this way of using the object-related intensity image as the neural network input makes it possible to internalize the map** relationship between intensity and phase into the neural network through pre-training. In addition, some researchers proposed to make adjustments to the uPD scheme, using the initial phase and amplitude recovered by backward numerical propagation as the neural network input156,157,158, which reduces the burden on the neural network to obtain higher inference accuracy.

Although the phase can be inferred from the measured intensity image through an untrained neural network without any ground truth, the uPD scheme inevitably requires a large number of iterations, which excludes its use in many dynamic applications. Therefore, to adapt the PD scheme to dynamic inference, Yang et al.152,153 adjusted their previously proposed uPD scheme to the tPD scheme by pre-training the neural network using a small part of the measured diffraction images, and then using the pre-trained neural network to infer the remaining ones. Yao et al.159 trained a 3D version of the Y-Net90 with simulated diffraction images as input, and then used the pre-trained neural network for direct inference or iterative refinement, which is 100 and 10 times faster than conventional iterative algorithms, respectively. Li et al.160 proposed a two-to-one neural network to reconstruct the complex field from two axially displaced diffraction images. They used 500 simulated diffraction images to pre-train the neural network, and then inferred an unseen diffraction image by refining the pre-trained neural network for 100 iterations. Bouchama et al.161 further extended the tPD scheme to Fourier ptychography of low overlap cases by simulated datasets. Different from the above ways of generating training datasets from natural images or real experiments, Huang et al.162 proposed to generate holograms as training datasets from randomly synthesized artificial images with no connection or resemblance to real-world samples. They further trained a neural network with the generated holograms and the tPD scheme, which showed superior external generalization to holograms of real tissues with arbitrarily defocus distances. It is worth mentioning that the PD strategy can also be used in computer-generated holography, generating the corresponding hologram from the target phase or amplitude via a physics-driven neural network163,164.

Network-with-physics strategy

Different from the network-only strategy, in the network-with-physics strategy, either the physical model and neural network are connected in series for phase recovery (physics-connect-network, PcN), or the neural network is integrated into a physics-based algorithm for phase recovery (network-in-physics, NiP), or the physical model or physics-based algorithm is integrated into a neural network for phase recovery (physics-in-network, PiN). A summary of the network-with-physics strategy is presented in Table 3 and is described below.

Table 3 Summary of network-with-physics strategy

Physics-connect-network (PcN)

In this scheme, the role of the neural network is to extract and separate the pure phase from the initial estimate that may suffer from spatial artifacts or low resolution, which allows the neural network to perform a simpler task than the network-only strategy; typically, the initial phase is calculated using a physical model (Fig. 19). This scheme requires paired input-label datasets to teach the neural network and therefore belongs to supervised learning.

Fig. 19
figure 19

Description of physics-connect-network phase recovery

Rivenson et al.165 first applied this PcN scheme in holographic reconstruction in 2018. They used numerical propagation to calculate the initial complex field (including real and imaginary parts) from a single intensity-only hologram, which contained twin-image and self-interference-related spatial artifacts, and then used a data-driven trained neural network to extract the pure complex field from the initial estimate. Compared with the axial multi-intensity alternating projection algorithm17,18,19, their PcN scheme reduces the number of required holograms by 2–3 times while improving the computation time by more than three times. Wu et al.166 then extended the depth of field (DOF) based on this work by training a neural network with pairs of randomly de-focused complex fields and the corresponding in-focus complex field. Meanwhile, Huang et al.137 proposed the use of a recurrent CNN167 for the PcN scheme and the network-only strategy. They compared the performance of neural networks using either a hologram or an initial complex field as input within the same background and discovered that the network-only strategy is more robust for sparse samples, while the PcN scheme demonstrates better inference capabilities on dense samples. Goy et al.118 applied the PcN scheme to phase recovery under weak-light illumination, which is more ill-posed than conventional phase recovery. They showed that the inference performance of the PcN scheme is stronger than that of the network-only strategy under weak-light illumination, especially for dense samples in the extreme photon level case (1 photon). Further, Deng et al.168 introduced a default feature perceptual loss of the VGG layer into the loss function for neural network training, which inferred more fine details than that of the NPCC loss function. They also improved the spatial resolution and noise robustness by learning the low-frequency and high-frequency bands, respectively, through two neural networks and synthesizing these two bands into full-band reconstructions with a third neural network169. By introducing random phase modulation, Kang et al.170 further improved the phase recovery ability of the PcN scheme under weak-light illumination. Zhang et al.171 extended the PcN scheme to Fourier ptychography, inferring high-resolution phase and amplitude using the initial phase and amplitude synthesized from the intensity images as input to a neural network. Moon et al.172 extended the PcN scheme to off-axis holography, using numerical propagation to obtain the initial phase from the Gaber hologram as the input to the neural network.

Network-in-physics (NiP)

In this scheme, trained or untrained neural networks are used in physics-based iterative algorithms as denoisers, structural priors, or generative priors. Regarding phase recovery as one of the most general optimization problems, this approach can be expressed as

$${{\arg }}\mathop{\min }\limits_{\theta }{\Vert {I}_{x}-H(\theta )\Vert }_{2}^{2}+R(\theta )$$
(4)

where \(H(\cdot )\) is the physical model, \(\theta\) is the phase, \({I}_{x}\) is the measured intensity image of a test sample, and \(R(\theta )\) is a regularized constraint. According to the Regularization-by-Denoising (RED)173 framework, a pre-trained neural network for denoising can be used as the regularized constraint:

$${{\arg }}\mathop{\min }\limits_{\theta }{\Vert {I}_{x}-H(\theta )\Vert }_{2}^{2}+\lambda {\theta }^{T}[\theta -D(\theta )]$$
(5)

where \(D(\theta )\) is a pre-trained neural network for denoising, and \(\lambda\) is a weight factor to control the strength of regularization. Metzler et al.174 used the above algorithm for phase recovery and called it PrDeep. They used a DnCNN trained on 300,000 pairs of data as a denoiser and FASTA175 as a solver. In comparison with other conventional iterative methods, PrDeep demonstrates excellent robustness to noise. Wu et al.176 proposed an online extension of PrDeep, which adopts the online processing of data by using only a random subset of measurements at a time. Bai et al.177 extended PrDeep to incorporate a contrast-transfer-function-based forward operator in \(H(\cdot )\) for phase recovery. Wang et al.178 improved PrDeep by changing the solver from FASTA to ADMM, which further improved the noise robustness. Chang et al.179 used a generalized-alternating-projection solver to further expand the performance of PrDeep and made it suitable for the recovery of complex fields. Işıl et al.180 embedded a trained neural network denoiser into HIO, removing artifacts from the results after each iteration. On this basis, Kumar et al.181 added total-variation prior together with the denoiser for regularization.

In addition, according to the deep image prior (DIP)182,183, even an untrained neural network itself can be used as a structural prior for regularization (Fig. 20):

$${{\arg }}\mathop{\min }\limits_{\omega }{\Vert {I}_{x}-H({g}_{\omega }({z}_{f}))\Vert }_{2}^{2}$$
(6)

where \({g}_{\omega }(\cdot )\) is an untrained neural network with trainable parameters \(\omega\) that usually takes a generative decoder architecture, \({I}_{x}\) is the measured intensity image of a test sample, and \({z}_{f}\) is a fixed vector, which means that the input of the neural network is independent of the sample, and therefore the neural network cannot be pre-trained like the PD approach.

Fig. 20
figure 20

Description of structural-prior network-in-physics phase recovery

This DIP-based approach was first introduced to phase recovery by Jagatap et al.184. They solved Eq. (6) using the gradient descent and projected gradient descent algorithms by optimizing over trainable parameters \(\omega\), both of which outperform sparse truncated amplitude flow (SPARTA) algorithm. In follow-up work, they provided rigorous theoretical guarantees for the convergence of their algorithm185. Zhou et al.186 applied this DIP-based algorithm to ODT, alleviating the effects of the missing cone problem. Shamshad et al.187 extended this DIP-based algorithm to subsampled Fourier ptychography, achieving better reconstructions at low subsampling ratios and high noise perturbations. In order to make the algorithm adaptive to different aberrations, Bostan et al.188 added a fully connected neural network with Zernike polynomials as the fixed input, and used it as the second structural prior. In the holographic setting with a reference beam, Lawrence et al.189 demonstrated the powerful information reconstruction ability of the DIP-based algorithm in extreme cases such as low photon counts, beamstop-obscured frequencies, and small oversampling. Niknam et al.190 used the DIP-based algorithm to recover complex fields from an in-line hologram. They further improved the twin-image artifacts suppression capability through some additional regularization, such as bounded activation function, weight decay, and parameter perturbation. Ma et al.191 embed an untrained generation network into the ADMM algorithm to solve the phase recovery at low subsampling ratios, and achieved better results than the gradient descent and projected gradient descent algorithms of Jagatap et al.184. Chen et al.192 extended the DIP-based algorithm to Fourier ptychography, in which four parallel untrained neural networks were used for generating phase, amplitude, pupil aberration, and illumination fluctuation factor correction, respectively.

Similarly, a pre-trained generative neural network can also be used as a generative prior, assuming that the target phase is in the range of the output of this trained neural network (Fig. 21):

$${{\arg }}\mathop{\min }\limits_{z}{\Vert {I}_{x}-H(G(z))\Vert }_{2}^{2}$$
(7)

where \(G(\cdot )\) is a pre-trained fixed neural network that usually takes a generative decoder architecture, \({I}_{x}\) is the measured intensity image of a test sample, and \(z\) is a latent vector to be searched. Due to the use of the generative neural network, the multi-dimensional phase that originally needed to be iteratively searched is converted into a low-dimensional vector, and the solution space is also limited within the range of the trained generative neural network.

Fig. 21
figure 21

Description of generative-prior network-in-physics phase recovery

Hand et al.193 used the generative prior for phase recovery with rigorous theoretical guarantees for random Gaussian measurement matrix, showing better performance than SPARTA at low subsampling ratios. Later on, Shamshad et al.

Table 5 Summary of “DL-post-preprocessing for phase recovery”

Noise reduction

In addition to being part of the pre-processing in “Noise reduction” under the section “DL-pre-processing for phase recovery”, noise reduction can also be performed after phase recovery (Fig. 23). Jeon et al.208 applied the U-Net to perform speckle noise reduction on digital holographic images in an end-to-end manner. Their deep learning method takes only 0.92 s for a reconstructed hologram of 2048 × 2048, while other conventional methods take tens of seconds because of the requirement of multiple holograms. Choi et al.209 introduced the cycle-GAN to train neural networks for noise reduction by unpaired datasets. They demonstrated the advantages of this un-paired-data-driven method with tomograms of different cell samples in optical diffraction chromatography: the non-data-driven ways either remove coherent noise by blurring the entire images or perform no effective denoising, whereas their method can simultaneously remove the noise and preserve the features of the sample.

Fig. 23
figure 23

Description of deep-learning-based phase noise reduction

Zhang et al.210 first proposed to suppress noise directly on the wrapped phase via a neural network. However, this direct way may lead to many wrong jumps in the wrapped phase, which results in larger errors in the unwrapped phase. Thus, Yan et al.211,212 proposed to do noise reduction on the sine and cosine (numerator and denominator) images of the phase via a neural network, and then calculated the wrapped phase from denoised sine and cosine images by the arctangent function. Almost simultaneously, Montresor et al.213 introduced the DnCNN into speckle noise reduction for phase data by their sine and cosine images. As it is difficult to simultaneously collect the phase data with and without speckle noise in an experimental manner, they used a simulator based on a double-diffraction system to numerically generate the dataset. Furthermore, their method yields comparable standard deviation to the WFT and better peak-to-valley, while costing less time. Building on this work, Tahon et al.214 designed a dataset (HOLODEEP) for speckle noise reduction in soft conditions and used a shallower network for faster inference. To go further, they released a more comprehensive dataset for conditions of severe speckle noise215. Fang et al.216 applied GAN to do speckle noise reduction for phase. Murdaca et al.217 applied this deep-learning-based phase noise reduction to interferometric synthetic aperture radar (InSAR)218. The difference is that in addition to the sine and cosine images of the phase, the neural network also reduces noise for the amplitude images at the same time. Tang et al.219 proposed to iteratively reduce the coherent noise in phase with an untrained U-Net. In the above works, various loss functions were employed alongside the conventional l2-norm and l1-norm to enhance performance. These additional functions include the edge function208, which sharpens the edges of the denoised image, as well as gradient and variance functions219 that further suppress noise while preventing excessive smoothing.

Resolution enhancement

Similar to the section “Pixel super-resolution”, resolution enhancement can also be performed after phase recovery as post-processing (Fig. 24). Liu et al.220 first used a neural network to infer the corresponding high-resolution phase from the low-resolution phase. They trained two GANs with both a pixel super-resolution system and a diffraction-limited super-resolution system, which was demonstrated on biological thin tissue slices with the analysis of spatial frequency spectrum. Moreover, they pointed out that this idea can be extended to other resolution-limited imaging systems, such as using a neural network to build a passageway from off-axis holography to in-line holography. Later, Jiao et al.221 proposed to infer the high-resolution noise-free phase from an off-axis-system-acquired low-resolution version with a trained U-Net. To collect the paired dataset, they developed a combined system with diffraction phase microscopy (DPM)222 and spatial light interference microscopy (SLIM)27 to generate both holograms from the same field of view. After training, the U-Net retains the advantages of both the high acquisition speed of DPM and the high transverse resolution of SLIM.

Fig. 24
figure 24

Description of deep-learning-based phase resolution enhancement

Subsequently, Butola et al.223 extended this idea to partially spatially coherent off-axis holography, where the phase recovered at low-numerical-apertures objectives was used as input, and the phase recovered at high-numerical-apertures objectives was used as ground truth. Since low-numerical-apertures objectives have a larger field of view, they aim to obtain a higher resolution at a larger field of view, i.e., a higher spatial bandwidth product. Meng et al.224 used structured-illumination digital holographic microscopy (SI-DHM)225 to collect the high-resolution phase as ground truth. To supplement more high-frequency information by two cascaded neural networks, they used the low-resolution phase along with the high-resolution amplitude inferred from the first neural network both as inputs of the second neural network. Subsequently, Li et al.226 extended this resolution-enhanced post-processing method to quantitative differential phase contrast microscopy for high-resolution phase recovery from the least number of experimental measurements. To solve the problem of out-of-memory for the large size of the input, they disassembled the full-size input into some sub-patches. Moreover, they found that the U-Net trained on the paired dataset has a smaller error than the paired GAN and the unpaired GAN. For GAN, there is more unreasonable information in the inferred phase, which is absent in ground truth. Gupta et al.227 took advantage of the high spatial bandwidth product of this method to achieve a classification throughput rate of 78,000 cells per second with an accuracy of 76.2%. All these works use U-Net as the basic structure, where most neural networks input and output phase maps of the same size and thus have the same number of downsampling times and upsampling times, whereas for the application where the input size is smaller than the output227, the neural network has more upsampling times.

For ODT, due to the limited projection angle imposed by the numerical aperture of the objective lens, there are certain spatial frequency components that cannot be measured, which is called the missing cone problem. To address this problem via a neural network, Lim et al.228 and Ryu et al.229 built a 3D RI tomogram dataset for 3D U-Net training, in which the raw RI tomograms with poor axial resolution were used as input, and the resolution-enhanced RI tomograms from the iterative total variation algorithm were used as ground truth. The trained 3D U-Net can infer the high-resolution version directly from the raw RI tomograms. They demonstrated the feasibility and generalizability using bacterial cells and a human leukemic cell line. Their deep-learning-based resolution-enhanced method outperforms conventional iterative methods by more than an order of magnitude in regularization performance.

Aberration correction

For holography, especially in the off-axis case, the lens and the unstable environment of the sample introduce phase aberrations superimposing on the phase of the sample. To recover the pure phase of the sample, the unwanted phase aberrations should be eliminated physically or numerically. Physical approaches compensate for the phase aberrations by recovering the background phase without the sample from anther hologram, which requires more setups and adjustments230,231.

As for numerical approaches, the compensation of the phase aberrations can be directly achieved by Zernike polynomial fitting (ZPF)232 or principal-component analysis (PCA)233. Yet, in these numerical methods, the aberration is predicted from the whole phase, where the object area should not be considered as an aberration. Thus, before using the Zernike polynomial fitting, the neural network can be used to find out the object area and the background area to avoid the influence of the background area and improve the compensation effect (Fig. 25). This segmentation-based idea, namely CNN + ZPF, was first proposed by Nguyen et al.234 in 2017. They manually made binary masks as ground truth for each phase to distinguish the area of the background and sample. After comparison on different real samples, they found that the compensated result of the CNN + ZPF contains a flatter background than that of PCA. However, the aberration in the initial phase makes it more difficult to do segmentation from the already weak phase distribution of the boundary features, especially for the large tilted phase aberrations. To address this problem, Ma et al.235 proposed to do segmentation with hologram instead of phase as neural network input. Lin et al.236 applied the CNN + ZPF to real-time phase compensation with a phase-only SLM.

Fig. 25
figure 25

Description of deep-learning-based phase aberration correction

In addition to the way of CNN + ZPS, **

In the interferometric and optimization-based phase recovery methods, the recovered light field is in the form of complex exponential. Hence, the calculated phase is limited in the range of (-π, π] on account of the arctangent function. Therefore, the information of the sample cannot be obtained unless the absolute phase is first estimated from the wrapped phase, the so-called phase unwrap**. In addition to phase recovery, the phase unwrap** problem also arises in magnetic resonance imaging240, fringe projection profilometry241, and InSAR. Most conventional methods are based on the phase continuity assumption, and some cases, such as noise, breakpoints, and aliasing, all violate the Itoh condition and affect the effect of the conventional methods242. The advent of deep learning has made it possible to perform phase unwrap** in the above cases. According to the different uses of the neural network, these deep-learning-based phase unwrap** methods can be divided into the following three categories (Fig. 26)66. Deep-learning-performed regression method (dRG) estimates the absolute phase directly from the wrapped phase by a neural network (Fig. 26a)243,244,245,246,247,248,249,250,251,252,253,254,255,256. Deep-learning-performed wrap count method (dWC) first estimates the wrap count from the wrapped phase by a neural network, and then calculates the absolute phase from the wrapped phase and the estimate wrap count (Fig. 26b)210,257,258,259,260,261,262,263,264,265,266,267. Deep-learning-assisted method (dAS) first estimates the wrap count gradient or discontinuity from the wrapped phase by a neural network; next, either reconstruct the wrap count from the wrap count gradient and then calculate the absolute phase like dWC268,269, or directly use optimization-based or branch-cut algorithms to obtain the absolute phase from the warp count gradient or the discontinuity (Fig. 26c)270,271,272,273,274.

Fig. 26: Description of deep-learning-based phase unwrap**.
figure 26

a Deep-learning-performed regression method. b Deep-learning-performed wrap count method. c Deep-learning-assisted method

Deep-learning-performed regression method (dRG)

Dardikman et al.243 presented the dRG method, which utilizes a residual-block-based CNN with a dataset of simulated steep cells. They also validated the dRG method post-processed by congruence in actual cells and compared it with the performance of the dWC method244. Then, Wang et al.245 introduced the U-Net and a phase simulation generation method into the dRG method, wherein they evaluated the trained network on real samples, examined the network’s generalization ability through middle-layer visualization, and demonstrated the superiority of the dRG method over conventional methods in noisy and aliasing cases. In the same year, He et al.246 and Ryu et al.247 evaluated the ability of the 3D-ResNet and recurrent neural network (ReNet) to perform phase unwrap** using magnetic resonance imaging data. Dardikman et al.248 released their real sample dataset as open-source. They demonstrated that the congruence could enhance the accuracy and robustness of the dRG method, particularly when dealing with a limited number of wrap count. Qin et al.249 utilized a Res-UNet with a larger capacity to achieve higher accuracy and introduced two new evaluation indices. Perera et al.250 and Park et al.251 introduced the long short-term memory (LSTM) network and GAN into phase unwrap**. Zhou et al.252,275 enhanced the robustness and efficiency of the dRG method by doing preprocessing and postprocessing steps for the U-Net with EfficientNet275 backbone. Xu et al.253 improved the accuracy and robustness of the U-Net by adding more middle-layers and skip connections and using a composite loss function. Zhou et al.254 used the GAN in the InSAR phase unwrap** and avoided the blur in the unwrapped phase by combining the l1 loss and adversarial loss. ** method based on adaptive noise evaluation. Appl. Opt. 61, 6861–6870 (2022)." href="/article/10.1038/s41377-023-01340-x#ref-CR255" id="ref-link-section-d208299031e11989">255 trained four networks for different noise levels, which made each network more focused on a specific noise level. Zhao et al.256 added a weighted map as the prior to the neural network to make it more focused on the area near the jump edge, similar to an additional attention mechanism. Different from the above methods, Vithin et al.276,277 proposed to use the Y-Net90 to infer the phase gradients from a wrapped phase and then calculate the absolute phase.

Deep-learning-performed wrap count method (dWC)

Liang et al.257 and Spoorthi et al.258 first proposed this idea in 2018. Spoorthi et al.258 proposed a phase dataset generation method by adding and subtracting Gaussian functions with randomly varying mean and variance values, and used the clustering-based smoothness to alleviate the classification imbalance of the SegNet. Further, the prediction accuracy of their methods was improved by introducing the prior of absolute phase values and gradients into the loss function, which they called Phase-Net2.0259. Zhang and Liang et al.210,260 sequentially used three networks to perform phase unwrap** by wrapped phase denoising, wrap count predicting, and post-processing. In addition, they proposed to generate a phase dataset by weighted adding Zernike polynomials of different orders. Immediately after, Zhang and Yan et al.261 verified the performance of the network DeepLab-V3+, but the resulting wrap count still contained a small number of wrong pixels, which will propagate error through the whole phase maps in the conventional phase unwrap** process. They thus proposed to use refinement to correct the wrong pixels. To further improve the unwrapped phase, Zhu et al.262 proposed to use the median filter for the second post-processing to correct wrong pixels in the wrap count predictions. Wu et al.263 enhanced the simulated phase dataset by adding the noise from real data. They also used the full-resolution residual network (FRRNet) with U-Net to further optimize the performance of the U-Net in the Doppler optical coherence tomography. By comparison with real data, their proposed network holds a higher accuracy than that of the Phase-Net and DeepLab-V3+. As for applying the dWC to point diffraction interferometer, Zhao et al.264 proposed an image-analysis-based post-processed method to alleviate the classification imbalance of the task and adopted the iterative-closest-point stitching method to realize dynamic resolution. Vengala et al.90,265,266 used the Y-Net90 to reconstruct the wrap count and pure wrapped phase at the same time. Zhang et al.267 added atrous spatial pyramid pooling (ASPP), positional self-attention (PSA), and edge-enhanced block (EEB) to the U-Net to get higher accuracy and stronger robustness than the networks used in the above methods. Huang et al.278 applied the HRNet to the dWC methods. Their method still needs the median filter for post-processing, although the performance is better than that of the Phase-Net and DeepLab-V3+. Wang et al.279 proposed another EEB based on Laplacian and Prewitt edge enhancement operators for the network, which further enhances classification accuracy and avoids the use of post-processing.

Deep-learning-assisted method (dAS)

The conventional methods estimate the wrap count gradient under the phase continuity assumption, which hence is disturbed by unfavorable factors such as noise. To get rid of it, Zhou et al.270 proposed to estimate the wrap count gradient via a neural network instead of conventional methods. Since the noisy wrapped phase and the corresponding correct wrap count gradient are used as training datasets, the trained neural network is able to estimate the correct wrap count gradient from the noisy wrapped phase without being limited by the phase continuity assumption. The correct result can be obtained by minimizing the difference between the unwrapped phase gradients and the network-output wrap count gradient. Further, Wang et al.271 proposed to input a quality map, as the prior, together with the wrapped phase into the neural network to improve the accuracy of the estimated wrap count gradient. Almost simultaneously, Sica et al.268 directly reconstructed the wrap count from the network-output wrap count gradient and then calculated the absolute phase, like dWC. On this basis, Li et al.269 improved neural network estimation efficiency by using a single fusion gradient instead of the vertical and horizontal gradients. In addition to estimating the wrap count gradient via a neural network, Wu et al.272,273 chose to estimate the horizontal and vertical discontinuities with a neural network, and recover the absolute phase by the optimization-based algorithms. Instead of using the wrapped phase as the network input, Zhou et al.274 embedded the neural network into the branch-cut algorithm to predict the branch-cut map from the residual image, which reduced the computational cost of the branch-cut algorithm.