Introduction

Communication is the basis of daily life and modern civilization. A channel is the path to transfer information from one terminal to another. How to transmit information through a channel optimally has been well studied in information theory [1, 2]. In the context, optimum means the obtained code could determine the information unambiguously, isolating it from all others in the set, and consists of a minimal number of symbols. It also provides methods to separate real information from noise and to determine the channel capacity required for optimal transmission conditioned on the transmission rate. Note that all information has to be transferred into temporal dimensional symbols first in information theory. Imaging is a kind of two or higher dimensional information communication. To enhance imaging capabilities, understanding of the imaging channel is essential. Conventionally, an imaging channel comprises a lens with free space or other light guides at its both sides. As the transfer function of each part of the optical path is usually known, the response of a conventional imaging channel can be well defined. If the lens is replaced with a thin scattering medium, such as a diffuser, the image can still be extracted from the detected optical field [3,4,5,6]. This suggests that the scattering medium retains or reconstructs transmission channels not only for energy (optical intensity) but also for information (imaging). Some researchers have explored the transmission channels in this process [7, 8], which, however, has been largely involved with energy only. Regarding the characteristics of information (imaging) transmission channels in a scattering medium, whether and how they are different from those in a conventional imaging system, none study, to the best of our knowledge, has been reported to date.

Deep learning has been widely applied in many fields including imaging through scattering media due to its data driven and physical-model free feature [9,10,11,12,16], but no image information could be successfully extracted or reconstructed, which will be discussed in detail in the next section. For comparison, we also loaded a random phase map sharing the same statistical features with the diffuser onto the SLM, and then repeated the grid experiment. It should be noted that the recorded intensity patterns were first cropped into1024 × 1024 arrays, and then further down-sampled to 512 × 512 arrays for network training and testing to enhance the speed of deep learning.

Fig. 2
figure 2

Experimental results of different channels to transmit information under coherent and incoherent illumination. Two digits “5” and “8” are used as the ground truths. For each case, the recorded patterns and the corresponding reconstructed images are shown side by side. (Upper panels) Under coherent illumination, the recorded patterns after the aperture are the diffraction patterns of the input objects. The trained network can reconstruct the corresponding images. After inserting the diffuser, the camera records high contrast speckle patterns, and the aperture channel is encoded by the randomly distributed refractive index on the diffuser. Information can still be delivered to the detection plane and extracted by deep learning. (Lower panels) Under incoherent illumination, through the aperture, no image can be extracted from the pattern. But with the presence of the diffuser, images can again be well predicted from the speckle patterns, although the speckles are of lower contrast than those formed under coherent illumination. The rain effect on the recorded patterns, i.e., the pattern has an appearance of rain falling on a window, in the aperture case is due to the rotating of the diffuser to generate the pseudo thermal source

The same UNet as used in Ref. [16] was adopted for image extraction in this study. The inputs of the network were preprocessed 512 × 512 recorded patterns, which was not necessarily speckle patterns. The outputs of the trained network were the reconstructed images with an array size of 512 × 512. The UNet network uses Python compiler based on Keras/Tensorflow 2.0 Library, and the GPU edition is a NVIDIA RTX 3060 laptop. The total number of training epochs is 50, and the learning rate at the beginning is set as 2 × 10− 4. After 5 epochs, if the loss function value does not decrease, the learning rate will be adjusted to one tenth of the previous one until the learning rate is reduced to 2 × 10− 6. If the loss function value does not decrease after 15 epochs, the training will be terminated.

For a spatial shift-invariant optical processing system, under incoherent illumination the output intensity pattern [28]

$$I\left(x,y\right)=\iint {\left|h\left(x-{x}^{\prime },y-{y}^{\prime}\right)\right|}^2{\left|E\left({x}^{\prime },{y}^{\prime}\right)\right|}^2{dx}^{\prime }{dy}^{\prime }.$$
(1)

It is a convolution of the input signal intensity |E(x, y)|2 with respect to the intensity impulse response |h(x − x, y − y)|2. Then, for an optical imaging system with spatial shift-invariance, the output [29,30,31]

$$I\left(x,y\right)=O\left(x,y\right)\ast PSF\left(x,y\right)={\int}_{-\infty}^{\infty }O\left({x}^{\prime },{y}^{\prime}\right) PSF\left(x-{x}^{\prime },y-{y}^{\prime}\right)d{x}^{\prime }d{y}^{\prime },$$
(2)

where ∗ denotes the convolution operation, O(x, y) is an intensity object and PSF(x, y) = |h(x, y)|2 is the PSF of the imaging system. Under incoherent illumination, the intensity pattern on the camera plane equals the convolution of the object and the PSF. That is to say, the shift-invariant system can provide valid channel, and information of the object is delivered through the channel and encoded in the detected pattern. However, the experiment of shift-invariant grid system tells a different story. Without valid channel to transmit information, the recorded pattern is not the calculated convolution of the object and the PSF, and no valid information is encoded in the recorded pattern.

In order to justify whether the recorded pattern and the calculated convolution pattern are equal, their structural similarity (SSIM) was calculated [32].

$$SSIM\left(x,y\right)=\frac{\left(2{\mu}_x{\mu}_y+{c}_1\right)\left(2{\sigma}_{xy}+{c}_2\right)}{\left({\mu}_x^2+{\mu}_y^2+{c}_1\right)\left({\sigma}_x^2+{\sigma}_y^2+{c}_2\right)},$$
(3)

where μx and μy are the averages, \({\sigma}_x^2\) and \({\sigma}_y^2\) are the variances of x and y, respectively, σxy is the covariance of x and y. c1 = (k1L)2 and c2 = (k2L)2 are the constants used to maintain stability and avoid division by zero [33], and L = 2Bit − 1 is the dynamic range of the pixel value. Generally, for 8 Bit data, the L value is 255. By searching for information, generally it is most suitable to compare pictures when k1 is set as 0.01 and k2 as 0.03 [34].

Simulation based on wave optics was also conducted to verify the hypothesis, considering that the simulation was noise free. The configuration and parameters were the same as those for experiment. In simulation, we created 500 frames of independent speckle illuminations on the object. For each speckle illumination, the propagation of complex field was calculated with the angular spectrum method, and the corresponding intensity distribution on the camera plane was recorded. The final intensity pattern was an average of all 500 frames of intensity distributions. It should bear in mind that the intensity pattern obtained from simulation is a response of a physical process, while the convolution of an object and a PSF is simply a mathematical operation. The mathematical operation is independent from the physical process. We can still implement the mathematical operation even when the physical process cannot happen.

Results

Under coherent illumination, the trained UNet can reconstruct the image well (Fig. 2), the aperture is a valid channel to transmit image information from the object plane to the camera plane. After inserting the diffuser, although the recorded patterns are seemly random speckles, images can still be predicted well by the network, which means that image information is still delivered onto the detection plane. That is to say, the channel functions properly. The main difference is that the refractive index distribution of the diffuser encodes the aperture channel, as also inferred from Ref. [43], and each phase pinhole can image the object independently under incoherent illumination. On the other hand, channel bandwidth is critical to determine the quality of reconstructed image. Scattering media, like apertures, lenses and other optical components, construct imaging channels, and different channels in the medium has different capacities. Understanding imaging from the information angle may bring breakthroughs to some long pursued but yet unsolved problems.

Currently, all prevailing techniques of imaging through scattering media are trying to find the connection of an input and the corresponding output of a scattering medium. In scattering matrix measurement and speckle autocorrelation imaging, mathematic relations are given. In optical phase conjugation, an empirical relation based on optical reciprocity is the key. These relations are oversimplified with no internal process in physics. The scattering medium is a black box in all these techniques. We believe that cracking the black box and understanding what happens inside is the right route to eventually realize imaging through thick scattering media. The discovery of different channels is the first step to unveil the mask on information transmission in imaging through scattering media. It may intrigue new adventures on imaging through scattering media. Besides, the failure of convolution law for the shift-invariant grid systems also implies that there are deeper level mechanisms about the spatial shift-invariance unexplored.

Conclusions

In summary, deep learning is applied as a criterion to check whether there is information encoded in a pattern, and as a tool to completely extract information, if any, from the pattern to reconstruct an image. Aided by deep learning, we find that imaging channels is sensitive to illumination modes, which determine manners of source coding in optical imaging, and there are different types of channels in a scattering medium. Under coherent illumination, the channel is the whole space within the aperture while the index distribution of the medium inside the aperture encodes the channel. Under incoherent illumination, the encoded aperture channel is ceased, but a new type of channel formed by the micro structures of the scattering medium is activated to transmit information. It is further confirmed that the micro structures constructing the channels have their own characteristics, which are not shared by the phase-grids. Without valid channels, information cannot transmit through a medium even if it constitutes a shift-invariant system. These results refresh our understandings of scattering, imaging and spatial shift-invariance. It may also intrigue further investigation and applications of deep learning as a powerful tool to study unknown physical principles and/or mechanisms, modeling of transmission channels in scattering media, as well as deeper principles beneath the spatial shift-invariance.