Applying Holo360 Video and Image Super-Resolution Generative Adversarial Networks to Virtual Reality Immersion

Feng, Chia-Hui; Hung, Yu-Hsiu; Yang, Chao-Kuang; Chen, Liang-Chi; Hsu, Wen-Cheng; Lin, Shih-Hao

doi:10.1007/978-3-030-49059-1_42

Chia-Hui Feng^9,10,
Yu-Hsiu Hung⁹,
Chao-Kuang Yang¹¹,
Liang-Chi Chen¹¹,
Wen-Cheng Hsu¹¹ &
…
Shih-Hao Lin¹¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12181))

Included in the following conference series:

International Conference on Human-Computer Interaction

2772 Accesses
1 Citations

Abstract

Super-resolution deep learning methods focus on image processing solutions and discussions in two-dimensional super-resolution image processing NOT for 360 equirectangular images. Therefore, the motivation of this research is to establish the deep learning network model Holo360 SRGAN and data set of 360 equirectangular images, and observe whether the sharpness and noise of Holo360 SRGAN compared with the original image reach the optical verification standard. The results of this study point out two significant points: 1) For a convolution training core neuron with the best model architecture of Holo360 SRGAN with 360 images 8 K (8192 × 4096 px), FOV: 360°, the expanded the convolution core neuron size as 5 × 5 to contains more learning features. And 2) Holo360 SRGAN image experiment results, 6 ROI optical analysis clarity increased by 27%, and sharpness increased by 42%. The experimental original image noise SNR is 28.2 dB, and the Holo360 SRGAN (×2) noise SNR is 36.8 dB, so it is increased by +8.6 dB, and the amount of image detail is also increased. Contributions enhance the super-resolution visual experience of equirectangular video or image.

You have full access to this open access chapter, Download conference paper PDF

Single Picture Super-Resolution Using Generative Adversarial Network

ISRGAN: Improved Super-Resolution Using Generative Adversarial Networks

A Generative Adversarial Network to Denoise Depth Maps for Quality Improvement of DIBR-Synthesized Stereoscopic Images

Article 07 April 2021

Keywords

1 Introduction

In recent years, in-depth studies of learning single-image super-resolution (SR) have produced impressive results. For converting low-resolution (LR) single images to high-resolution (HR), SR has shifted the focus away from classic computer vision. In the digital field, the resolution of an image is a description of the image’s details. HR images are correlated with increases in imaging precision and detail. SR is the estimation of an HR image from an LR image [10].

Currently, generating images or videos from ordinary 2D images produces blurred results because of the influences of optical devices, such as the lens or aperture, and the unstable shooting of distant scenes. Problems associated with LR imaging are widely discussed in contemporary research. LR images affect visual perception capability and hamper identification. However, the current applications of SR are primarily topics relating to 2D images, such as the use of machine learning and HR solutions and the exploration of super-resolution convolutional neural network (SRCNN).

At present, SR is used to enhance the definition of images from long-distance shooting or image capturing in order to reduce the resolution blur and increase precision. HR technologies [11] employ various observation images to rebuild or restore facial recognition, satellite aerial image [14], digital media content, or remote monitoring for security or defense. These technologies can also use specific biological features from observations such as images of medical pathologies or biological features to enhance recognition precision [6, 12].

SR-related studies have focused on three major fields, namely conventional filtering, image-based training, and neural networks such as SRCNNs and super-resolution generative adversarial network (SRGANs). Irani and Peleg suggested an iterative algorithm for SR, which is applied to a single image without increasing the sampling rate, to reduce SR blur. Moreover, this approach results in translation, rotation, and perspective transformation, with multiple dynamic algorithms for an image and the division of the image into multiple areas. Within each area, several even motions can be used to further enhance resolution [9]. Shi et al. used a high-performance sub-pixel interpolation method to produce immediate single images and videos, with an emphasis on high-performance CNN computation featuring +0.15 dB for images and +0.39 dB for videos [13].

In 2016, SRCNN was used to explore different network structures and parameter changes, achieve a trade-off between performance and speed, extend the network structure to deal with three color channels at the same time, and demonstrate improved overall quality [4]. ProGanSR has exhibited continual progress in its system structure and training. It allows the network to sample images during the intermediate steps. The training process is structured according to learning difficulty during the training process. The network follows a total incremental method design [15].

This approach allows expansion to higher up-sampling factors (8×), and it requires only 4.5 s to reach a quality of 8× the original resolution; moreover, it can enhance the reconstruction quality for all sampling factors 5× faster than the original speed.

The aforementioned SR studies have focused on the gradual increase in popularity of 360° camera devices used to capture images or videos for 2D imaging or 2D SRCNN computation. However, viewing the entirety of 360° images on a flat-screen remains difficult. Viewing the entire image requires a specific viewer or virtual reality (VR) device.

When the output reaches the computer, the resolution is as high as 4 or 6 K and is displayed using a 360° equirectangular mode with image contents in spherical panorama or cubic shapes. Imaging with a resolution of 16 K is similar to the resolution of the human eye. However, devices with a 16 K resolution are not currently available. The highest available resolution is approximately 8 K, which requires HR to react with high distance displayed by a head-mounted display (HMD) in a special image.

This study investigated the display of Holo360 images by applying SRGAN in an HMD. SRGAN technology was applied to Holo360 images and the training and inferences of relevant datasets were explored to enhance the VR display resolution. The 360° HR equirectangular image with an HMD was employed to view the Holo360 images. VR devices have three degrees of freedom (DOF), enabling Holo360 imaging to achieve higher SR sharpness and improved stereopsis effects.

The study investigated the probe and target practicing of Holo360 SRGAN VR projection and inferred the following three overall distributions:

(1)
In this study, new in-depth learning methods were established for the Holo360 SRGAN and compared with the original SRGAN. Furthermore, optical verification was used to evaluate whether the sharpness and noise of Holo360 SRGAN achieved the required standard. This also provided suggestions for outcome comparison and the best model structure network when designing Holo360 SRGAN.
(2)
The study proposed a generative adversarial network (GAN) for the SR Holo360 images displayed using an HMD. The network can directly project the LR Holo360 image into the HMD, and optimize the Holo360 image for more detail.
(3)
Universal Windows Platform (UWP) was used to simulate the experiences in HMD; the imaging technology achieved fine quality and fast computation speed.

2 Literature Review

2.1 SR

SR was proposed by Gerchberg in 1974. A new iterative phase retrieval [5] achieves SR and enhances the resolution of data objects through reducing errors. Using a process of continuous section spectrum, SR can achieve a resolution enhancement for area objects. Image SR can be divided into three types: traditional filtering methods, training-based methods, and neural network approaches. The classic computer vision approaches to in-depth SR learning and GAN involve the reconstruction of HR images from multiple LR images or a single LR image. This approach is mostly used for satellite observation imaging and medical imaging, and it is based on in-depth learning. SR enables direct learning through the neural network of the corresponding functions for point to point LR resolution to SR image and display higher resolution [10].

2.2 GANs

In 2014, Goodfellow et al. proposed the GAN machine learning model, an in-depth learning model, which is one of the most promising algorithms in unsupervised learning. GANs use a small amount of real data to generate a large amount of training data. This is currently the most popular machine learning style for artificial intelligence.

The design concept of GANs comprises two neural networks: a data generator and a data discriminator. Training two competing neural networks is a compelling and powerful technique. The data generator learns to generate large quantities of data that are similar to the real data in order to trick the discriminator. The data discriminator continuously learns to enhance the discrimination of real data to counteract the fraud of the data generator. The data generator can generate data that is similar to the real data. The use of generated data can compensate for the lack of real images during training, and the data generator can complete the equivalent training. To deepen the breadth of training, simulating situations that the discriminator would not have encountered will improve the model’s discrimination and learning speed. This would improve the accuracy of the model but can also produce high-quality training data, as well as more favorable effects for the model. In 2017, Ledig et al. conducted an SRGAN experiment for superior performance perception, which was optimized for new perceptual loss. They performed an extensive mean opinion score test with images from three public benchmark datasets. Reconstruction of 4× SRGAN modules obtained higher fidelity. SRGAN became the most advanced technology, which can be applied to the realistic SR of magnified images [10].

2.3 Google SRGAN

For Google’s SRGAN research, a large number of 2D image datasets were collected and deconstructed to enlarge the image four times. Thus, the Google datasets could be observed as standard 2D images rather than the processed 360° images. Random numbers were used to sample the image range, calculate the screenshot, repeat sampling action, and sample random numbers before the screenshot. For each iteration, 384 × 384 px sampling was used, followed by 384 × 384 px, to capture downwards for the 96 × 96 px sampling image.

Furthermore, 96 × 96 px and 384×384 px samples were continuously compared to verify the feasibility, and the data training image was maintained to be similar to the training image. The weights of the SRGAN modules were then recorded. Deconstructing the Google SRGAN reveals a 3 × 3 convolutional neural networks. This method yields a resolution of approximately 2 K and a field of view (FOV) of approximately 90°. This value correlates with the camera lens, with approximately 5.7 px per degree. The present study employed this method to calculate a 360° equirectangular PPD.

2.4 VR HMD

Ivan Sutherland created the first HMD. In 2015, Palmer Luckey initiated a Kickstarter fundraising for the Oculus Rift HMD, which was eventually acquired by Facebook and commercialized as Oculus. Consumer-oriented VR HMDs subsequently entered consumption markets. Furthermore, Microsoft developed the WMR HMD, which does not require an exterior positioning track system. This HMD’s light-weight design lead to a breakthrough in immersion VR headset technology. VR HMDs’ are compatible with independent to top-class workstation computers and can be directly connected to laptop computers to execute computations, watch Acer WebVR [3] online, or view 360° digital content on YouTube, among other functions.

Immersion VR indicates integration of the hardware technology of display headsets and VR software. The intuition of immersion is the understanding of space that allows users to watch virtual environments replacing their actual surroundings. Human brains can perceive the reconstruction of 3D environmental images. When the head turns, the HMDs’ tracking activates the virtual visual angles to change the stereoscopic image. This enables the users to perceive the visual difference between stereopsis and motion parallax [1].

Immersion VR offers profound clues that other technologies fail to provide. A higher degree of immersion can deepen the spatial perception. The FOV in the headset refers to the widest scope of the image captured by the camera and lens. Larger FOV angles are associated with richer content. The vision can reach a maximum of 360° by 90°. The 360° by 90° FOV can display a range of 960 × 960 px. Wider FOV angles of the HMD lead to a stronger perception of the overall experience of immersion. Experience varies according to changes in FOV. Another value that affects HMD video is frames per second (FPS). FPS refers to the number of continuous images displayed per second in the displaying device. Most consumer 360° cameras provide 30 FPS, whereas professional 360° cameras provides ≥60 FPS. In a VR HMD experiment, this enables an improved steadiness in various FOV scenes.

2.5 Holo360 VR Images

The Holo360 images have two commonly used display methods in VR. One is a spherical 2D isometric projection called lat-long and the other is CUBEMAPs, through which the Holo360 image is disassembles into a hexahedron to form a virtual cube. The 3D immersion VR browses the content of the X, Y, and Z axes images in the headset from the Holo360 image content by using a three DOF vertical axis of yaw, pitch, and roll motions.

VR scenes with six DOF consist of front–back, up–down, and left–right exercise reactions along with perception input and output sense. This allows users to immerse themselves in the simulated world because the movement of their perceived surroundings is convincing. They will sense the bona fide perception which realizes the feelings in the simulated world.

When shooting with different types of 360° cameras, the resolution will determine how much visual information can be obtained. Image or video resolution refers to the number of pixels displayed in each dimension, usually indicated in width by height. HD (1920 × 1080 px), 4 K, and 8 K refer to the width and pixels of the image. The standard resolution of 4 K is 3840 × 2160 px. The estimated resolution of human eyes is approximately 16 K. At present, professional cameras provide up to 8 K images or videos. Content uploaded to YouTube has a maximal resolution of 8192 × 8192 px. The 360° images and video on YouTube are immersive VR environments, with a multitude of visual details. Background details that are distant from the camera lens frequently escape notice. This study explored Holo360 SRGAN resolution, which could optimize VR image quality, to improve the sharpness of displayed images through 360° SR.

2.6 Optical Sharpness Measurement

The modulation transfer function (MTF) characterizes system quality in terms of definition. The definition is illustrated as a bar graph of increasing spatial frequency, as illustrated in Fig. 1. The top of the rectangle is sharp with clear boundaries whereas the bottom is blurred. Figure 1 illustrates a black and white bar pattern provided by Imatest. The content captured by all cameras will be blurred to a certain extent. For a spatial frequency ν and a lower MTF, the texture of the scene at low frequencies will become blurred [8], as illustrated in Fig. 1.

Nyquist frequencies are cases in which one pixel is black and the other pixel is white, which means that one cycle occupies two pixels. Therefore, νNy = 0.5 cy/px, where the units cy/px represent the overall quality of the imaging system, regardless of screen size or resolution. In this study, the sharpness of images were compared at 1:1 (1 image pixel to 1 screen pixel). For the same content and different image sizes, comparison with digital standardization is more favorable.

One line-width according to the Nyquist frequency is one pixel. If the image has 2 K pixels in the vertical direction, then the Nyquist frequency is 2 K LWPH. ν [LWPH] = 2 Picture Height [px] × ν [cy/px]. LWPH and cy/px are both units of measurement for spatial frequency. The definition refers to the details or rendering accuracy that can be distinguished in the scene: Output (ν) = MTF (ν) × Input (ν). MTF50 was a crucial basis for the optical evaluation of Holo360 SRGAN in this study.

3 Methodology

3.1 Purpose of the Research Method

This study investigated Holo360 images and verified the SRGAN outcome by using SRGAN immersion VR. The effects on image resolution were assessed based on the training and inference of the Holo360 SRGAN dataset. Currently, no relevant studies have implemented SRGAN in 360° equirectangular and HMD designs. Therefore, the purpose of this study was to explore the Holo360 SRGAN SR module.

3.2 Research Structure

The study was divided into three phases. The first phase was 2D imaging and 360° equirectangular dataset training. The second phase was Holo360 SRGAN evaluation and testing as well as optical verification. The third phase was VR UWP software development and evaluation of VR HMD browsing of 360° images.

3.3 Experiment Equipment

Holo360 SRGAN VR images were used to perform GAN training, which required collocation with software and hardware equipment. The required hardware was mainly the computer to retrain the Holo360 SRGAN module for 360° images. SRGAN used Predator Helios 700 for calculation.

An Acer Holo360 camera was used to capture the required 360° equirectangular image, an Acer Swift 5 (SF514-54T) was used for software development, and an Ice Lake Core i7 plus 64Execution Units UHD was used for inference. TensorFlow SRGAN was used for training and the related capture and inference features of the Holo360 SRGAN. This process produced a 360° equirectangular SRGAN TensorFlow model, and a Windows UWP was developed to display images on the bridged Acer WMR OJO HMD, with the device resolution of 2880 × 1440 px, FOV of 100°, and a single-lens display of 1.4 K. The experimental results were verified using an Acer WMR OJO HMD, and the experimental equipment is displayed in Fig. 2.

3.4 Experiment Procedure

The execution of this study required the Google Tensorlayer SRGAN program as the main base of 360° SRGAN improvements to extend the SR application. The Google Tensorlayer SR GAN program collected the 2D image dataset, executed the training, and used the fisheye spherical 360° equirectangular Image datasets to generate module differences and perform the extraction of general images and Holo360 image features, which were then used to assess the results of the GAN. The SRGAN module performs the GAN training for the 360° equirectangular image dataset, iteratively revises the model, reduces the feature in the model, expands the convolutional range, and adjusts the image multiplications, until the Holo360 SRGAN module can predict the display of high-quality feature image. Through these calculations, the 360° SRGAN module was trained to produce 360° images. Optics were used to verify and capture the regions of interest (ROIs) for further analysis.

This method mainly specifies a rectangular range, but one of the specific capture elements in the rectangular range closely follows this area as the focus of optical analysis [2]. The Holo360 camera was used to capture indoor and outdoor ROIs to analyze the pros and cons of the HR image. Because human visual inspection cannot be verified, sharpness measurement and Imatest related tools were used to perform these measurements [8]. According to the definition of visual observation, estimates an improved consistency with an MTF of 50%. An MTF of 50% can refer to 50% of its low frequency value (MTF50) or 50% of its peak value (MTF50P). This method compares the proportions of image detail that were lost. According to the aforementioned formula analysis, the optical lab measured the specific data of images. Sharpness and noise were compared for an equirectangular photograph taken using the Holo360 camera and verified by the ISO12233:2000 optical scale by measuring the standard degrees of sharpness, as illustrated in Fig. 3. The aforementioned process was the Holo360 image experiment protocol employed in this study.

4 Results

The experimental module was constructed using Google Tensorlayer SRGAN in this study. The structure based on SR continued to Holo360 SRGAN application. Google Tensorlayer SRGAN technology enables the collection of a large number of equiangular rectangular images as a dataset for training. However, training datasets require a large number of normal images with an FOV of 90°. Because the 360° SRGAN Tensorlayer model requires a Holo360 image instead of normal images for module training, the following experiments were conducted to determine the appropriate model framework.

4.1 Experiment 1: Holo360 SRGAN Dataset Training

Holo360 SRGAN was directly applied 360° equirectangular image datasets. Collection of 360° equirectangular images and training were performed to extract features; this differed from the use of random numbers to extract normal 2D images. The 360° image files were recorded using a fisheye spherical lens, and the images had distorted features, contrary to normal images.

The sampled random numbers points were retained in the images to extend the RGB color values surrounding the sample. Surrounding values in normal image random number points differed from those in Holo360 images. Applying the imaging training results for ordinary images also resulted in 2D images, without the spherical panorama feature of 360° images. This outcome was entirely different, and compensating to complete the Holo360 image feature training remains unknown.

Training was conducted similarly with Holo360 images with a resolution of 2 K image (384 × 384 px, FOV of 360°, and approximately 1.4 px per degree); the resulting module did not differ from that generated using the original image at 5.7 px per degree. In this study, training produced unsatisfactory results both with and without PPD. The training was also unsuccessful when applying 4 K images to capture features at approximately 2.8 PPD.

Regarding the size scale, 4 K is smaller than 8 K. An 8 K image is too large for training because insufficient memory causes the overall calculating speed to slow down, which is why 4 K images were applied in training to enhance pixel quality (Table 1).

Table 1. Difference between SRGAN and 360 SRGAN Module PPD

Full size table

(1)
360° SR Training Image Simulation Test 1

The deformation edges of the upper and lower side results in the 360° equirectangular training images were deleted. These calculations aimed to test image information following training. The outcome was unsatisfactory because the 360° fisheye feature was lacking.
(2)
360° SR Training Image Simulation Test 2

The 360° equirectangular images was compressed to half the width to learn features. The images were squeezed to a very flat ratio to train the Holo360 spherical panorama feature; however, this also failed to produce a training image file dataset. After allowing the model to learn the 360° equirectangular feature, the outcome failed to achieve the required training effect. Subsequently, the image was resolution was increased to 16 K in the 360° equirectangular images, the noise was eliminated, and the size was reduced according to Holo360 by using the averaging to enlarge from 2× to 8 K to eliminate excessive noises. The minification process on an enlarged image reduced some noise. Training of these learned features produced rippling image features. Effective training was not achieved.
(3)
360° SR Training Image Simulation Test 3

The 360° equirectangular images were deconstructed, thereby dewar** a 360° spherical panorama image into four 90° visual charts of the top, bottom, front, and back; the aim was to allow the dataset to learn the equirectangular features. This also failed to achieve the effect of 360° training.
(4)
360° SR Training Image Simulation Test 4

In Tensorlayer SRGAN, the resolution of the ordinary images was approximately 2 K (FOV of 90°). Tensorlayer SRGAN randomly cropped the images to 384 × 384 px, and then reduced the cropped images to 96 × 96 px to train the model. The PPD used for training was 5.7 (96 × 2048/90 × 3840). The core size of the original convolutional configuration was 3 × 3.

The ordinary images for the SRGAN with a 90° FOV had the core size of the original convolutional configuration (i.e., 3 × 3). The ordinary images with 90° FOV had a visible overall form. This sampling was too limited. Images of partial scope did not contain an overall view of the image under Holo360 imaging. The 360° SRGAN Tensorlayer model used the initial value setting of a rectangular image dataset such as 360°, and the PPD was 1.4, which was insufficient for training.

Therefore, the 360° SRGAN Tensorlayer model collected 360° images with a resolution of 4 K and other rectangular images as datasets, with the PPD of 2.84 (96 × 4096/360 × 384), which was close to the original PPD of 5.7. Moreover, the convolutional range was increased to 5 × 5. When performing the convolutional process, a pixel point to referred to the surrounding 24 points. Holo360 images (FOV of 360°) expanded the extraction feature to display more results. Based on these results, the convolutional core of training network was modified from 3 × 3 to 5 × 5. The 360° SRGAN Tensorlayer model expanded the size of the convolutional core to 5 × 5. This change introduced more detail into the convolution for in-depth learning to improve the model effects.

Further expanding the convolutional core of the SRGAN to 9 × 9 and increasing the range of image calculations were required. However, the speed of computer operations was limited. Learning features resulted in an increase in speckles and war** noise on the image. This verification expanded the core instead of generating an improved module. The resulting convolutional core was 5 × 5, which was suitable for the 360v equirectangular SRGAN TensorFlow model.

4.2 Experiment 2: Holo360 SRGAN Image Experiment Effectiveness

For the indoor images recorded using the Holo360 camera, inference was executed before the training module pretraining with the aforementioned experiment. However, the indoor and outdoor optical tests followed this inference.

1.
Experimental Test 1: Holo360 SRGAN indoor image test

The location of Experimental Test 1 was primarily indoors. The optical test results were compared with the original image and enlarged. To apply GIMP, cubic interpolation was used, and no sharpening effects were applied. As a result, the image appeared with vertical bands, washed-out regions, edge artifacts, and false data. Their existence affected the image calculation results, as displayed in Fig. 4. Optical verification of the SRGAN experimental image and comparing the result with the original image revealed sharpness increases of 21% (×2) and 45% (×4). Compared with the enlarged version, the MTF50 sharpness increased by 73.5% (×2) and 113% (×4). Compared with the original image, the enlarged 360° SRGAN image did not exhibit loss or gain in the signal-to-noise ratio (SNR). The enlarged 360° SRGAN image was softer than the original image, as summarized in Table 2.

Table 2. Experiment Test 1 Holo360SRGAN indoor optical analysis results

Full size table

2.
Experimental Test 2: Holo360 SRGAN indoor image test

The 360° SRGAN module revealed that the 360° images mostly satisfied the conditions of the Holo360 SRGAN. However, the effects of the small image area were not very favorable. If the small area required analysis and adjustment, Experimental Test 2 employed ROI sampling for analysis, as illustrated in Fig. 5. The original image and the enlarged (×4) image exhibited vertical distortion in the ROI sampling. Noise was identified in the selected area and some inaccuracies appeared in area edges. Image detail increased, but the borders still included mostly unclean areas. The average MTF50 sharpness of the four ROIs in the image for 360° SRGAN was 38.1%, as displayed in Table 3.

Table 3. Experiment Test 2 MTF50 results of 4 ROIs image

Full size table

In Experimental Test 2, the edge sizes were compared between the original image and its enlarged version (×4) to calculate the degree of sharpness. The edge size results of the four ROIs in the image displayed increased sharpness compared with the original image. The average increase in sharpness in the four ROIs was 47.34%, as reported in Table 4.

Table 4. Experiment Test 2 edge size results of 4 ROIs image

Full size table

In Experiment Test 2, the average result SNR of these four ROIs and the degrees of sharpness exhibited reduced noise at −2.0 on SNR grayscale and −2.6 on SNR black scale, as displayed in Table 5. Gray (dB) converted the image to grayscale, and Black (dB) converted the image to black and white. The signal-noise ratio was (×4).

Table 5. Experiment Test 2 noise ratio results of 4 ROI SNR signals

Full size table

3.
Experimental Test 3: Holo360 SRGAN outdoor image test

In Experimental Test 3, an outdoor field test, six ROIs were selected for comparison with the original image, as illustrated in Fig. 6.

The preliminary results of Experimental Test 3 indicated that the three problems of vertical deformation, speckles, and errors on the image were solved. This verification also exhibited higher sharpness. However, edge artifacts existed on the image and speckles on the screen were not resolved effectively, as illustrated in Fig. 7.

The results of Experiment Test 3 are reported in Table 6. Compared with the original image, Experiment Test 3 resulted in an improvement in definition of 27%. With the addition of GIMP to improve sharpness, the definition was enhanced by 42%.

Table 6. Experiment Test 3 360 SRGAN 6 ROI’s image qwerty test analysis results

Full size table

The SNR of the original image was 28.2 dB, whereas the SNR of the Holo360 SRGAN image (×2) was 36.8 dB. Therefore, it increased by +8.6 dB, as summarized in Table 7. The image detail also increased, but defects in image edge and artifact errors were observed. Additionally, a media filter was applied with a value of r = 1 to reduce the speckles on the image and increased its smoothness.

Table 7. Experiment Test 3 Holo360 SRGAN 3 image qwerty test analysis results

Full size table

4.3 Holo360 SRGAN UWP Viewer

UWP software was developed for Windows in this study. In HMD, this software can be used to view the Holo360 image or experiment videos. The UWP software connects the three DOF interaction of the spherical panorama 360° image at 2048 × 2048 px (FOV of 90°), processed from the WMR headset, to produce SR image immersion VR browsing.

5 Discussion

In SR, images and videos are produced from LR content, which is frequently needed in the digital calculation. In medical imaging, satellite imaging, and long-distance security imaging, the images may be too blurred or contain too much noise. Several in-depth learning models of SRCNNs have exhibited increasingly promising performances in rebuilding precision; thus, SR has been developed for single-image SR calculation.

Therefore, the overall contribution of this study was based on the results of the Holo360 SRGAN image feature training. Studies have demonstrated that in-depth training of a Holo360 SRGAN improved sharpness and noise to optical standards. For 360° images with a resolution of 8 K (8192 × 4096 px, FOV of 360°), a convolutional core of 5 × 5 was optimal for training the Holo360 SRGAN network. This core can contain more features in convolution used during deep learning, display the SR image in the Holo360 image optimization details in the HMD, and simulate the SR in the HMD on the UWP computer software to obtain clear Holo360 SRGAN image verification results.

6 Conclusion

The principal aims of this study were the convolutional training of the Holo360 SRGAN network, improving the basis to sustain SR applications, and initiating an increase in Holo360 image and video applications. Given the 360° SRGAN convolutional inference and Experimental Test 3, future research should employ the GIMP software to increase definition. A video processing mean filter could be added to the core program to calculate the average median filter, or media filters could soften high image sharpness to enhance the smoothing process. Screening yielded unnecessary noise and could increase image precision. The degrees of sharpness and noise would thus decrease, thereby smoothing the image.

These research results can provide a reference for 360° equirectangular and SRGAN applications as well as for SR-related research and product development on VR using 360° videos or images. Finally, the possible effects of research restrictions on the results are as follows: 1. UWP VR browser presentation used a spherical panorama without further exploration of a special type browsing presentation. 2. This study was technology-oriented and did not explore user testing and perception. 3. The browsing model of this study did not explore six DOF actual viewing perception. 4. Without eliminating zenith, nadir, or stitch lines, only the camera’s software was used to solve the seam line problem.

References

Bowman, D.A., McMahan, R.P.: Virtual reality: how much immersion is enough? Computer 40(7), 36–43 (2007)
Article Google Scholar
Brinkmann, R.: The Art and Science of Digital Compositing, p. 184. Morgan Kaufmann, Burlington (1999)
Google Scholar
Acer WebVR Start Page. https://acerwebvr.github.io/. Accessed 6 Jan 2019
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016)
Article Google Scholar
Gerchberg, R.W.: Super-resolution through error energy reduction. Opt. Acta: Int. J. Opt. 21(9), 709–720 (1974)
Article Google Scholar
Hayat, K.: Multimedia super-resolution via deep learning: a survey. Digit. Signal Process. 81, 198–217 (2018)
Article Google Scholar
Interactive Analysis of Resolution-Related Charts. http://www.imatest.com/docs/rescharts/. Accessed 25 Dec 2019
Measuring Sharpness. http://www.imatest.com/docs/sharpness/. Accessed 23 Dec 2019
Irani, M., Peleg, S.: CVGIP: Improving resolution by image registration. Graph. Models Image Process. 53(3), 231–239 (1991)
Article Google Scholar
Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)
Google Scholar
Milanfar, P.: Super-Resolution Imaging. CRC Press, Boca Raton (2011)
Google Scholar
Nguyen, K., Fookes, C., Sridharan, S., Tistarelli, M., Nixon, M.: Super-resolution for biometrics: a comprehensive survey. Pattern Recogn. 78, 23–42 (2018)
Article Google Scholar
Shi, W., et al.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 1874–1883. IEEE (2016)
Google Scholar
Thornton, M.W., Atkinson, P.M., Holland, D.A.: Sub-pixel map** of rural land cover objects from fine spatial resolution satellite sensor imagery using super-resolution pixel-swap**. Int. J. Remote Sensing 27(3), 473–491 (2006)
Article Google Scholar
Wang, Y., Perazzi, F., McWilliams, B., Sorkine-Hornung, A., Sorkine-Hornung, O., Schroers, C.: A fully progressive approach to single-image super-resolution. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 864–873 (2018)
Google Scholar

Download references

Acknowledgement

Many thanks to the supports by Acer Product R&D II, Optical RD Supervisor Sergio Cantero with optical verification technology and theoretical basis.

Author information

Authors and Affiliations

Department of Industrial Design, National Cheng Kung University, No. 1, University Road, Tainan, Taiwan R.O.C.
Chia-Hui Feng & Yu-Hsiu Hung
Department of Creative Product Design, Southern Taiwan University of Science and Technology, No. 1, Nan-Tai Street, Yongkang District, Tainan, Taiwan R.O.C.
Chia-Hui Feng
Compute Software Technology, Acer Incorporated, 9F, 88, Sec. 1, **ntai 5th Rd. **zhi, New Taipei City, Taiwan R.O.C.
Chao-Kuang Yang, Liang-Chi Chen, Wen-Cheng Hsu & Shih-Hao Lin

Authors

Chia-Hui Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Hsiu Hung
View author publications
You can also search for this author in PubMed Google Scholar
Chao-Kuang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Liang-Chi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Cheng Hsu
View author publications
You can also search for this author in PubMed Google Scholar
Shih-Hao Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chia-Hui Feng .

Editor information

Editors and Affiliations

The Open University of Japan, Chiba, Japan
Masaaki Kurosu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feng, CH., Hung, YH., Yang, CK., Chen, LC., Hsu, WC., Lin, SH. (2020). Applying Holo360 Video and Image Super-Resolution Generative Adversarial Networks to Virtual Reality Immersion. In: Kurosu, M. (eds) Human-Computer Interaction. Design and User Experience. HCII 2020. Lecture Notes in Computer Science(), vol 12181. Springer, Cham. https://doi.org/10.1007/978-3-030-49059-1_42

Download citation

DOI: https://doi.org/10.1007/978-3-030-49059-1_42
Published: 10 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49058-4
Online ISBN: 978-3-030-49059-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Applying Holo360 Video and Image Super-Resolution Generative Adversarial Networks to Virtual Reality Immersion

Abstract

Similar content being viewed by others

Single Picture Super-Resolution Using Generative Adversarial Network

ISRGAN: Improved Super-Resolution Using Generative Adversarial Networks

A Generative Adversarial Network to Denoise Depth Maps for Quality Improvement of DIBR-Synthesized Stereoscopic Images

Keywords

1 Introduction