1 Introduction

Virtual reality (VR) technology has received a lot of attention due to its wide range of potential applications. NEDs are the primary VR platform devices. VR NEDs are used to create user-immersive environments [1,2,3], resulting in new user experiences for a variety of applications [4,5,6,7,8,9]. Even after several decades since the first commercialized product, VR has yet to reach the mainstream. The most significant barrier to current VR is the discomfort caused by heavy and bulky headset-type hardware [10]. It is impractical to manufacture a lens with a focal length significantly less than the width of the lens for traditional VR devices that use single lenses [11, 12]. It makes the required space between the display and the lens challenging to compress [13, 14], which results in the relatively huge size of head-mounted displays (HMDs). Bulky lenses and mirrors impose weight and form factor constraints on wearable devices. The trend in VR development is toward miniaturization [15,16,17].

ThinVR was a novel VR display system based on a curved lenslet array. The prototype had a thickness of 40 mm, which is still insufficient [18]. Maimone and Wang proposed a flat VR display system using a polarization-based optical folding technique and a holographic lens [19]. However, the display uniformity was severely harmed by the crude fabrication technology of the holographic lens. Lanman and Luebke [13] developed a NED device that uses a microdisplay in front of a lenslet array (LA), providing a means to achieve a thin and lightweight structure. Another study by Bang [10] presented a thin and flat VR prototype using a Fresnel LA with a polarization-based optical folding technique. Although the LA technique can produce a miniaturized VR device, the eye box in which users observe the entire virtual image is insufficiently large [10, 13, 18, 20, 21]. When users turn their eyes to watch the peripheral content of the screen, pupils move out of the eye box, and artifacts appear resulting in degraded image quality. If artifacts in the image are tiny, the brain cannot perceive them [22]. When eyes continue to move out of the eye box, artifacts become visible. Therefore, in the process of designing an optical system, pupil size and human visual perception should be considered [14, 23].

In our previous study [14], we adopted the peak signal-to-noise ratio (PSNR) to quantify the effect of human visual perception on LA-NED artifacts [24]. Typical values of PSNR for lossy images and compressed videos are 30–50 dB [25]. Different types of artifacts correspond to different PSNR thresholds [26,27,28,29,30]. The relationship between pupil size, human visual perception, and eye box has been discussed [14]. We first defined a new metric, pupil practical movable region (PPMR) and demonstrated that PPMR is superior to the eye box in determining the area in which no artifact can be perceived. In a prototype that considers pupil size and human visual perception, artifacts are perceived when a pupil moves out of the PPMR [14].

A novel flexible NED should be designed with short eye relief, such as a sunglasses-like prototype of approximately 10 mm [31,32,33,34] with a large FOV. It is hard to get a close enough eye relief while getting a thin-light prototype because of the limitation of optical structures or rendering methods. Furthermore, the shorter the eye relief, the smaller the PPMR. Due to the limitation of the PPMR, users see artifacts while turning their eyes to focus on the screen’s periphery content. Artifacts can be eliminated by creating microdisplay images based on eye gaze, but due to system latency, microdisplay image generation cannot follow the pupil position, users may see artifacts when they rapidly rotate their eyes to pay attention to the content in different locations on the screen. In addition, ideal HMDs are onboard systems with limited memory [11, 12]. To design a perfect HMD system, the amount of memory usage must be considered. However, look-up tables (LUTs) are used in eye tracking systems due to a large amount of data in the rendering process, which occupies a lot of memory.

1.1 Contributions

Our primary technical contributions are:

  • We optimized the rendering algorithm we proposed to decrease the memory consumption and introduce the principle of optimized algorithm design.

  • We provided GPU rendering to improve system speed and reduce system latency to meet the maximum human eye rotation speed.

  • The prototype is based on the optimized algorithm used in user studies to validate that our method can enlarge PPMR and obtain a short eye relief of 10 mm prototype.

  • We utilize dynamic rendering for images and videos to compare and assess our method effectiveness in reducing artifacts for fast rotation eyes in user studies.

2 Related work

2.1 Artifact reduction in NEDs

Artifacts appear as the pupil exit the PPMR [14]. To see high-quality images, the pupil should be located within this area. Artifacts in LA-NEDs cause images to appear doubled. Because of optics, the PPMR in an LA-NED or holographic NED is limited when compared to a conventional HMD [11, 12]. A novel retinal projection NED that can independently activate two distinct groups of viewpoints via the polarization-multiplexing technique was proposed to address this insufficiency. The system is built with polarization-dependent elements and multiplexed combiners to reduce artifacts and provide an extended eye box with a compact form factor [35]. Another method to reduce retinal projection display artifacts is to build independent viewpoints that can implement retinal projection. They optimized the rendering algorithm for fast-moving eyes, and used see-through views, a NED configuration that uses a transparent optical element instead of a bulk concave mirror to reduce artifacts [36]. This NED can also multiplex multiple concave mirrors with shifted focal spots. Another NED was proposed to reduce artifacts caused by a 2D beam deflector system [37]. The prototype demonstrates a simple method to reduce artifacts while rea** the benefits of the adopted elements. Nonetheless, all these methods have a limited field of view (FOV). The PPMR is still limited because it is focused on a single point. This may still result in artifacts appearing if the eye deviates due to displacement caused by eye rotation.

A method to increase the number of viewpoints is by simultaneously recording multiple lens angles on a holographic optical element [35]. In this case, each viewpoint image overlaps or blanks out as the user changes the viewpoint. An eye tracker is also required, and the image updates based on the viewpoint. Using eye tracking for dynamic image switching is the main technique to expand the pseudo-PPMR across the entire FOV. This technique was used by a novel holographic “Retinal 3D” prototype to provide a “dynamic PPMR,” a potential breakthrough that overcomes the drawbacks of retinal projection-type displays [17]. The PPMR limitation, that the nodal point is translated to follow the pupil of the viewer during eye movements, is used for the Foveated prototype with a wide FOV [38].

Although theoretically, these methods can remove artifacts appearing in the FOV range, the effectiveness of none of them has been verified experimentally. Our previous study verified the effect of eye-tracking-based dynamic rendering on artifact reduction [14]. Although it is theoretically possible to remove artifacts within the entire FOV, experimental results showed that the system delay and small PPMR cause this method to only satisfy slow eye movements of \(94 ^\circ \) /s on our device. For a fast-moving eye, artifacts still appear.

2.2 Wide-FOV HMDs

Wide-FOV displays convey peripheral information, improve orientation, situational awareness, object avoidance, and performance in some tasks, and are generally preferred by users [39]. Therefore, research and development of wide-FOV HMDs are actively pursued [40]. Recently, a novel ThinVR replaced traditional large optics with a curved LA of a custom-designed heterogeneous lenslet placed in front of a curved display. ThinVR enables a head-worn VR display to provide \(180 ^\circ \) /s of horizontal FOV. However, their system needs a curved display and a lenslet array with individually optimized lenslets, both of which are difficult to implement. Furthermore, their prototypes had a thickness of 40 mm, which is still insufficient. The novel approach relies upon curved displays, which can only be driven by the phone that came with the display. Thus, the phone electronics and batteries must be attached to our dynamic head-worn display prototypes [18]. A downward FOV expansion is possible using a pair of additional display units but at the expense of low image resolution in the peripheral visual field [40]. Another novel prototype Lenslet VR consists of a new thin and flat VR display design using a Fresnel LA, a Fresnel lens, and a polarization-based optical folding technique without adding additional displays. The proposed optical system has a wide FOV and a wide eye box [10]. Accommodated depth cues, a wide FOV, and ever-higher resolutions all present major hardware design challenges for near-eye displays. Optimizing a design to overcome one of these challenges typically leads to a trade-off with the other challenges [41]. One of the determining factors of FOV is eye relief, so FOV can be increased by reducing eye relief, but due to the optical structure and rendering method, it is difficult to reduce eye relief to get a large FOV.

2.3 Rendering method in NEDs

Integral imaging, which made an outstanding contribution to digital imaging technology, was first proposed by Lippmann [42]. Reconstructing a scene of integral imaging systems is used in VR. To reconstruct a scene, elemental images are displayed on a display device and the rays pass through an LA to reproduce the scene in space. Integral imaging has been widely researched for various applications due to its many advantages such as continuous view angles and virtual display [43, 44].

To improve the depth of field, which is capable of expanding the accommodation range for HMD, an algorithm for optimizing the structure parameters of the hybrid computational NED was proposed [45]. Another rendering for a NED presents imagery with a near-accurate per-pixel focus across an extended volume allowing the viewer to accommodate freely across this entire depth range [46]. To solve the problem of viewing distances larger than the optimal viewing distances, a dual-view integral imaging 3D display using polarized glasses was proposed [47]. Two kinds of elemental images captured from two different scenes and alternately arranged on the display panel. Two methods were proposed to expand the eye box in holographic displays. One is pupil-shifting rendering method [48]. Another one is computational imaging of optical see-through holographic near-eye displays for extending the eye box [49]. Our rendering method in the current study considers pupil size and introduces the concept of pupil margin, which is different from the methods used in LA-NEDs. We made a prototype that can tolerate a large pupil size and wide transition distance in a robust way using this rendering method.

For the dynamic rendering displays method, many efforts have been made. However, progress on dynamic rendering displays has been hampered by the constraints of the limited bandwidth of the available spatial light modulator and massive amounts of calculations [50,51,52,53,54,55]. LUTs are used to speed up rendering. The problem is that it consumes a lot of memory. To reduce memory consumption, we propose a relatively simple calculation for fast eye rotation when combined with eye tracking for dynamic rendering microdisplay images.

3 The principle of enlarging PPMR for LA-NEDs

In this section, we introduce the basic concept and principle of the rendering method used in this study. An analysis of the design parameters follows. The definition of rendering in LA-NEDs as shown in Sect. 3.2 (and, in Lanman and Luebke paper [13]) is different from the 3D CG rendering. The function of the rendering in this research is to transfer the input 2D image to 2D microdisplay image used in LA-NEDs.

Fig. 1
figure 1

The thinking of the rendering method for microdisplay images considers pupil size and human visual perception. The black curve is the PSNR of the traditional rendering method used in work by Lanman and Luebke [13] with an eye box of 4.5 mm. The red curve represents our rendering method. (Left) Condition in the traditional rendering method as the black curve shown where a pupil located at the center of the prototype; the Td is 0 mm. No artifact can be perceived when the pupil size is smaller than 5.2 mm. Artifacts will be perceived if the pupil continues to grow beyond 5.2 mm. (Right) In the black curve, a pupil size of 4 mm and varied Tds. Td less than \(\pm 0.6 \text {mm}\), there is no artifact. If the Td continues to enlarge from \(\pm 0.6 \text {mm}\), artifacts will be perceived. y is the threshold for our rendering method, which has been verified to be 40 in our previous research [56]. PSNR greater than this value indicates a high-quality image without perceptible artifacts

Fig. 2
figure 2

The theory of the rendering method for microdisplay images. Take pixel A on a microdisplay, whose light emits into the pupil plane through each lens. Artifacts appear when the human eye advances beyond PPMR\(_{y}\). “Pupil margin” \(r_{\text {pm}}\) is a crucial parameter in the system, which represents the maximum boundary of the area that a pupil can enlarge or move to without causing perceptible artifacts. The number of ray lights through each lens goes into the pupil and the pupil margin should be counted. Among all the rays entering the pupil and pupil margin, calculate the weight ratio of each light column to the total light. For the eye, four light columns \(V_7\), \(V_8\), \(V_9\) and \(V_{10}\) on the virtual plane enter the pupil and pupil margin. The areas \(S_7\), \(S_8\), \(S_9\) and \(S_{10}\) are the intersections between the light column and pupil and pupil margin. The lights of pixels A through lens 6 are not going into the pupil, so the weight of virtual pixel V6 at pixel A is zero. The weight of each column is calculated as introduced in Eq. 1. The final value a of pixel A is shown in Eq. 2

3.1 Basic concept of enlarging PPMR

The traditional rendering method does not consider pupil size in the process of generating microdisplay images. Elemental image allocation is employed in the traditional rendering method for LA-NEDs as in Lanman and Luebke’s paper [13]. Each microdisplay pixel is correlated to a lens. In practice, considering the pupil size range from 2 to 8 mm in a prototype whose eye box is 4.5 mm [57], the PSNR curve (black curve) of retina images is displayed on the left in Fig. 1. When the pupil size is smaller than the eye box size of 4.5 mm, the PSNR value is infinite [24], indicating no artifacts. When the pupil size exceeds the 4.5 mm eye box, the lights from some pixels not only pass through one lens into the pupil but also into other lenses, making artifacts appear, which causes PSNR to decrease significantly. Due to visual perception limitations, artifacts still cannot be perceived when a PSNR value is larger than the threshold of 50 for the traditional rendering method, which we demonstrated in the previous study [14]. However, when the pupil continues to enlarge, making PSNR value smaller than the threshold, artifacts are perceived.

In the condition of the pupil size is a constant value about 4 mm [58] as seen on the right side of Fig. 1. Users should move their gaze to focus on the peripheral content of the display. The various transition distances (Tds) correspond to different PSNRs. When Td is less than \(\pm 0.6\,\text {mm}\) for a prototype with an eye box of 4.5 mm, the PSNR value of the retina image is larger than 50, indicating artifacts cannot be perceived. If the pupil keeps moving far away from the prototype center, artifacts will be perceived.

When a pupil dilates or moves out of the eye box, the retina image quality rapidly descends, resulting in the artifacts being perceived in the traditional method. It verifies the robustness of the prototype [59, 60], which refers to the ability of a system to resist change without adapting its initial stable configuration of the traditional rendering method low. Therefore, we aim to improve the system’s robustness by modifying the rendering process as the red curve shown in Fig. 1, with the prototype being able to accommodate large pupils and Td.

Pupil size is considered in the generation of microdisplay images. If the pupil is beyond the eye box, lights from some pixels enter the pupil through different lenses and cause artifacts to appear. These pixels are an essential component of an entire image, and artifacts cannot be eliminated by simply deleting these pixels. If those pixels are deleted, it causes the intensity of image brightness to be non-uniform, negatively impacting image quality. Instead, when a pupil enlarges or moves out of the eye box, each pixel value should be computed based on the lights going into the pupil through several lenses. The rendering method based on this concept results in a difference between the perceived image and the target image. But as long as the PSNR value is greater than the threshold y in Fig. 1, the brain cannot perceive the difference in image quality, and the image is still considered a high-quality image. The expected PSNR curve of the expected rendering method is the red line shown in Fig. 1. When the threshold y equals 50, the expanded \(\text {PPMR}_{50}\) is \((x_1 - 5.2)\). If artifacts caused by the expected algorithm are not as sharp as the traditional one, artifacts are not expected to be perceived. It causes the threshold y to be smaller than 50. The expanded \(\text {PPMR}_y\) is \((x_2 - 5.2)\). The threshold y of PSNR is chosen based on human visual perception, which has been verified to 40 in a previous paper [56]. The pupil margin is not the physical pupil size. It is a new parameter we defined. We consider not only the light rays that enter the pupil, but also the light rays that exit the pupil. The area out of the pupil we considered is pupil margin. We verified when the pupil margin is 0.8 mm set in the program, PPMR is the largest [56].

Fig. 3
figure 3

All and the top 2 incident light columns selection and microdisplay images are rendered based on different Tds. The errors are magnified 30 times to show the difference between simulated and target images. In the error images, the intensity of pixel values is proportional to the error intensity. Black areas represent the low intensity of errors, and bright areas represent the high intensity of errors

3.2 Principle of the dynamic rendering microdisplay images

To enlarge PPMR for LA-NEDs, the rendering method should consider pupil size and pupil margin as shown in Fig. 2. The lights of pixels not only enter the pupil and pupil margin through one lens but also through others, causing artifacts to appear. Each lens should be considered for a pixel on the microdisplay. Ray lights emitted by one microdisplay pixel (MP) enter the pupil and pupil margin area after passing through several lenses. Each lens at the MP corresponds to one virtual pixel (VP) on the virtual image plane. The weight of each VP is the intersection area between the ray light column and the pupil and pupil margin divided by the sum of intersecting spaces between all the ray light columns generated by the MP and the pupil and pupil margin. The value of each MP is determined by the number of VPs and the related weight. As shown in Fig. 2, the lights of a pixel A through number z lenses enter the pupil and pupil margin. We quantify the intensity of the lights from pixel A entering the pupil and pupil margin after passing through different lenses by the area where the lights intersect with the pupil and pupil margin. The intersection area between each light beam and pupil and pupil margin is \(S_n\) as shown in Fig. 2. Therefore, the value a of pixel A is related to pixel V on the virtual image. Pixel V has a weight \(W_n\) for pixel A, and the equation of weight is as follows:

$$\begin{aligned} W_n = \frac{S_n}{\sum _{i=1}^{z}S_{\text {Topi}}} \end{aligned}$$
(1)

\(S_{\text {Topi}}\) is i largest area of the incident light column. \(v_{\text {Topi}}\) is the pixel value on the virtual image corresponding to \(S_{\text {Topi}}\). The value of a pixel A on a microdisplay equals:

$$\begin{aligned} a = \sum _{i=1}^{z}v_{\text {Topi}} W_{\text {Topi}} = \frac{\sum _{i=1}^{z}{ v_{\text {Topi}} S_{\text {Topi}} } }{\sum _{i=1}^{z}S_{\text {Topi}}} \end{aligned}$$
(2)

While generating microdisplay images following pupil position, different pupil positions correspond to different microdisplay images. For the upper eye in Fig. 2, the lights of pixel A enter the pupil and pupil margin through lenses 6, 7, and 8. The lights of pixel A through other lenses are not entering the pupil, so the weight of other VPs at pixel A is zero. The value a of pixel A is related to VPs V6, V7 and V8. For the lower pupil, the lights of pixel A enter the pupil and pupil margin through lenses 9 and 10. The value a of pixel A is related to VPs V9 and V10 (see Visualization 1).

Fig. 4
figure 4

PSNR values with different selections of incident light columns are calculated using the fish image shown in Fig. 3. The artifact intensity of the Top 1 incident light column selection is the same as the traditional rendering method. No big difference between the Top 2, Top 3, and all incident light columns

Fig. 5
figure 5

The top 2 incident light columns selection and microdisplay images are rendered based on different transition distances (Tds). The errors are magnified 30 times to show the difference between simulated and target images. In the error images, the intensity of pixel values is proportional to the error intensity. Black areas represent a low intensity of error, and bright areas represent a high intensity of error

3.3 Incident light columns selection

We optimized the rendering method to decrease memory consumption while generating microdisplay images. The different incident light column (e.g., V6–V10 in Fig. 2) selection has a significant impact on the rendering speed when creating microdisplay images. After passing through each lens, we sort each light column by the area where the rays intersect with the pupil and pupil margin. For the upper eye, S7, S8, and S6 are the highest proportion light column, the second highest proportion light column, and the third highest proportion light column, respectively. If we select the top incident light column, it means that S7 is used in the rendering process. If we select the top two incident light columns, that means S7 and S8, and so on, up to the top n desired. The relationship between PSNR values and different incident light columns is depicted in Fig. 4. Figures 3, 4 and 5 show pupils with a diameter of 4 mm, because we measured the participants’ pupil size before the experiments, and their pupil size is typically around 4 mm. As shown in Fig. 3, all incident light columns are selected in the first row. The top two incident light columns are chosen in the third row. To clearly perceive the artifacts, the error between each retina image and the target image without artifacts times 30 as shown in the second and fourth rows. The brighter the color, the stronger the artifact intensity. When the pupil margin is 0.8 mm, and the PPMR threshold is PSNR 40, as shown in Fig. 4, the PPMR values are nearly identical. As more incident columns are selected in the process of rendering microdisplay images, more memory is used. To reduce memory consumption, the top two highest proportion columns are selected. The optimized rendering method combined with eye tracking can reduce the artifacts to make them invisible as shown in Fig. 5.

Fig. 6
figure 6

The colormap of light weights with different Tds in the rendering process. Two light columns were selected after optimization. The colormap shows the larger weight among two weights and the weight values range from 50 to 100%. The dark red areas indicate one pixel on the microdisplay image corresponding to one pixel on the virtual image. The yellow area means the pixel on the microdisplay image corresponds to two pixels on the virtual image and the larger weight is 80% and the smaller weight is 20%

To show the light weights for the microdisplay images, Fig. 6 colormap was drawn. Two light columns were selected after optimization. The colormap shows the larger weight among two weights, and the weight values range from 50 to 100%. The weight values are changed following pupil Tds.

3.4 FOV in prototype

The FOV, \(\alpha \), in LA-NEDs is shown in Equ. 3. \(d_e\) is eye relief, \(d_o\) is the distance from the virtual image plane to the pupil, \(w_s\) is the width of the microdisplay, and \(d_l\) is the distance between the microdisplay and lenslet array. The first argument of the minimum function applied with magnifier arrays is microdisplay-limited magnifiers, whereas the second argument describes virtual display-limited magnifiers [13].

$$\begin{aligned} \alpha = 2 \arctan \left[ \text {min}\left( \frac{w_s}{2(d_e + d_l)}, \frac{ (d_o - d_e)w_s}{2d_l d_o}\right) \right] \end{aligned}$$
(3)

Two crucial parameters \(w_s\) and \(d_e\) influence FOV. But \(w_s\) is not the research point in this paper. Reducing \(d_e\) has little effect on virtual display-limited magnifiers. The maximum value of the second argument is \(87^\circ \) in the range of de from 0 to 30 mm and dl from 0 to 8 mm, which is large enough in LA-NEDs. The colormap of the microdisplay-limited magnifiers is shown in Fig. 7. The region I denotes the value of FOV larger than \(87 ^\circ \). As long as the optimized parameters do not fall in the shaded region I, the FOV value of the first argument is always smaller than the second argument, which means that reducing the distance can improve the FOV. The region II of the colormap means that, in this region, decreasing eye relief can increase the FOV. In this study, our rendering method reduces the eye relief to 10 mm, which is out of the shaded region and gets a large horizontal FOV of \(75 ^\circ \).

4 Implementation

We implemented a prototype of monocular LA-NED as shown in Fig. 8(1) (see Visualization 10), which is the same as the prototype in our previous research [14]. All programs were implemented in C++ with an OpenGL library on a PC with an Intel Core i7 (Ubuntu, 3.7-GHz CPU, 16-GB RAM, and an NVIDIA GEFORCE GTX 1080Ti GPU). The software structure is also the same as in previous research [14] except for the microdisplay image rendering part. Our prototype is a kind of monocular PC-powered near-eye displays with an OLED display. The glasses frame is printed using a 3D printer and the material is polyester. The rendering process in the PC with the Ubuntu system and then transform the result to the driver board. The resolution of the microdisplay is \(1280 \times 720\). The refresh rate is 60 Hz, horizontal FOV is 75 \(^\circ \), and vertical FOV is 47 \(^\circ \). The subpixel layout is an RGB stripe. The used memory is 439 MB; the detail is shown in Sect. 4.1.

Fig. 7
figure 7

FOV Tradespace of the proposed system. The value in the colormap represents the FOV value calculated based on microdisplay-limited magnifiers. Generally, the distance \(d_l\) between the lenslet and microdisplay is smaller than 8 mm. The region I denotes the minimized value of virtual display-limited magnifiers, which is \(87 ^\circ \) when eye relief \(d_e\) trends to 0 and \(d_l=8\,\text {mm}\). As long as the optimized parameters fall in region II and do not fall in the region I, reducing the eye relief can improve the FOV. Our rendering method reduces the eye relief to 10 mm and gets a large horizontal FOV of \(75 ^\circ \)

Fig. 8
figure 8

a The optical part of the prototype with an eye tracker. The prototype comprises a microdisplay in front of a lenslet array (LA). We used a 3D-printed glass frame. The microdisplay size was \(15.36 \times 8.64 \text {mm}\), and the resolution was \(1,280 \times 720\). The driver was a Sony HMZ-T1 personal media viewer whose magnifying eyepieces were removed. b Input image. c Microdisplay image rendering using the traditional algorithm with a rotation angle of \(0 ^\circ \). d Microdisplay image rendering using the traditional algorithm with a rotation angle of \(8 ^\circ \). e Microdisplay image rendering using our optimized algorithm with a rotation angle of \(0 ^\circ \). f Microdisplay image rendering using our optimized algorithm with a rotation angle of \(8 ^\circ \)

4.1 Dynamic microdisplay image generation

The microdisplay image dynamic generation based on eye-gaze directions aims to dynamically form the PPMR in the LA-NEDs. When generating microdisplay images, each pixel of the microdisplay belongs to several lenses of the LA. The distribution method is mentioned in Sect. 3.2. The prototype used in this paper is shown in Fig. 8a. The detailed hardware parameter is shown in our previous research [14]. The input image is shown in Fig. 8b. Figure 8 c, d is rendered by the traditional algorithm used in the experiments to compare with ours. When the pupil locates at the center of the display, the weight of each pixel is calculated based on the theory in Sect. 3.2, and the microdisplay image generation is shown in Fig. 8e. When the pupil position changes, the relationship between pixels and lenses is redistributed, causing each pixel value to change as shown in Fig. 2. When the pupil Td is 1.2 mm from the center, the weight of each pixel should be recalculated and the microdisplay image generation is shown in Fig. 8f. The corresponding simulated retina images without and with eye tracking are shown in Figs. 3 and 5.

Generally, the map** relationship between the microdisplay image and the virtual image is calculated after obtaining the location information for the next state. We should calculate the map** information for each pixel of the microdisplay image and then calculate the value of each pixel according to each rotation angle. However, this process is tedious and time-consuming. The rotation angle in the vertical direction is much smaller than that in the horizontal direction, and both directions, in theory, are the same. Hence, we created a LUT for initialization only considering the horizontal eye rotation to store the map** relationships between the microdisplay image and the virtual image. When the rotation angle was obtained, the microdisplay image was quickly calculated using the LUT.

In the traditional rendering method, each pixel on the microdisplay image corresponds to one VP. An interval angle \(\theta _{\text {interval1}}\) of \(1 ^\circ \) was chosen as the best parameter [14]. The memory usage equation is as follows:

$$\begin{aligned} M = M_{\text {mi}} \times 100\% \times B_{\text {item}} \times \left( \frac{\theta _{FOV}}{\theta _{\text {interval}}} + 1\right) \end{aligned}$$
(4)

M is used memory. \(M_{\text {mi}}\), the size of each microdisplay image, which is \(1280 \times 720 = 2.64 \text {MB}\). \(\theta _{\text {FOV}}\), the FOV value, is \(54 ^\circ \) when eye relief is 15 mm. Each item of the LUT store location information, so \(B_{\text {item}}\) is 4 bytes. M used 580 MB in the traditional method.

During microdisplay image generation using our method, each pixel on the microdisplay image corresponds to several VPs, which is at most 1:4. Using the previous strategy as Eq. 4, \(B_{\text {item}}\) is 32 bytes. Considering the weight of each VP, the used memory should be 4.54 GB. But most of these VPs have very small weights. To save storage space, we ranked the weight of each VP from large to small and added the top 2 to the LUT. Even though the used memory was 2.27 GB, it is not suitable for an onboard system.

Fig. 9
figure 9

Pixel classification of the microdisplay image in the condition of eye relief 15 mm. Pixels in green areas are pixels on the display panel corresponding to one pixel on the virtual image. Pixels in red areas are pixels on the display panel corresponding to several pixels on the virtual image

In a microdisplay image, the ratio of the number of pixels on the display panel corresponding to one pixel on the virtual image to the number of pixels on the display panel corresponding to several pixels on the virtual image is equal to 7/3 as shown in Fig. 9. In the case where one pixel on the display panel corresponds to multiple pixels on the virtual image, it is necessary to record the weight occupied. We separate those pixels using another LUT (LUT 2) from the original one (LUT 1) and fill it with a label. The used memory equation is as follows:

$$\begin{aligned} M= & {} \left( M_{\text {mi}} \times 100\% \times B_{\text {item1}} \!+\! M_{\text {mi}} \!\times \! 30\% \!\times \! N \times B_{\text {item2}}\right) \nonumber \\{} & {} \times \left( \frac{\theta _{\text {FOV}}}{\theta _{\text {interval}}} + 1\right) \end{aligned}$$
(5)

When the top 2 incident light columns are selected, the size of light column N is 2. LUT 1 stores the position on the virtual plane or label information and the relationship between LUT 1 and LUT 2 for each pixel on the panel, so \(B_{\text {item1}}\) is 8 bytes. LUT 2 stores pixel positions both in the virtual plane and weight. Therefore, \(B_{\text {item2}}\) is also 8 bytes.

When the interval angle is set to 1 or \(2 ^\circ \) for a prototype with a PPMR of \(\pm 7^\circ \), the system is only suitable for a relatively slow rotation speed due to frequent microdisplay image switching. It is unsuitable for fast eye rotation, which loses the meaning of enlarging PPMR. If 5 or \(6^\circ \) is selected, the angle left for the system to switch the microdisplay image is small, and the system delay causes the image switching speed to fail to keep up with the rotation of the human eye. The reduction artifacts effect is ineffective for fast-moving eyes. In terms of perception, the effects of 3– and \(4 ^\circ \) on artifact reduction for fast-rotating eyes are indistinguishable. To save memory, \(4^\circ \) is chosen to be used in applications. In this case, the capacity of the LUT is 13. The used memory is 439 MB, a \(24\%\) memory saving compared to the traditional method.

4.2 Dynamic system evaluation

Generating virtual images with eye tracking can effectively eliminate artifacts, as shown in Fig. 5. However, owing to system latency, the speed of dynamic PPMR generation was unable to fully cover pupils with rapid and extensive movements. During eye tracking, the total latency between the pupil movement and the display of an updated microdisplay image should be small enough to keep the pupil within the dynamically generated PPMR [14]. In theory, pupil Td was used, showing the relative position between the pupil and PPMR compared with the pupil rotation angle. However, the pupil rotation angle was used instead of Td in the dynamic system evaluation and experiments because changes in the relative positional relationship between the pupil and the HMD are caused by eye rotation rather than the parallel transition of the face or eyes [14].

Table 1 Parameter comparison between traditional prototype [14] and current prototype

Table 1 compares the performance of current and previous prototypes. For the previous prototype, the rendering time is 5 ms. When eye relief is 15 mm, the PPMR is \(\pm \,3 ^\circ \). After combining with eye tracking, the system latency is 28 ms. The previous system can cover an eye movement smaller than 107\(^\circ \). It verified in our research [14]. In this research, we provided a GPU rendering method, the rendering time is only 0.6 ms. The average PPMR in our current prototype is \(\pm 1.4 \text {mm}\) to a rotation angle equivalent to \(\pm \, 6 ^\circ \). The current system latency is 24 ms. It satisfies the maximum velocity of saccadic pupil movement at \(250 ^\circ /s\). Our current prototype can satisfy the general speed of saccadic pupil movement, which is \(174 ^\circ /s\) [61].

5 Experiments

The images and videos used in the experiments are rendered based on the results of the theory, including design parameter selection in Sect. 3. To examine the PPMR of the prototype enlarged using our rendering method by comparing it to a subjective evaluation, the eye relief of the prototype can be shortened to 10 mm. Experiment 1 was devised using four images (see visualization 6–9). Experiment 2 was designed using the same four images (see Visualization 3, 4, 5, 6) and four corresponding videos (see Visualization 7, 8, 9, 10) to test the validity of our rendering method combined with the eye tracking technique and to investigate the effect of our method on the artifacts of LA-NEDs.

5.1 Participants

Participants were recruited using announcements and a campus mailing list at the author’s institution. Under the oversight of the ethical review committee of the university, informed consent was obtained from each participant after the experiments were fully explained. Each participant was paid 3000 JPY. Sixteen participants (5 males and 11 females) were enrolled. The age from 20 to 35. All participants took part in the two experiments. They had a maximum of \(600^\circ \) of myopia and a maximum of \(300^\circ \) of astigmatism. During the experiments, the microdisplay image generation was based on the angle change of the pupil relative to the initial position. Therefore, participants’ heads were required to remain still. We confirmed that the pupil sizes of the participants measured by the eye tracker did not change significantly. Thus, we assumed that this value remained static during the experiments.

5.2 Experiment 1: Measure the range in which users cannot perceive artifacts

5.2.1 Overview

Four different types of images that come from videos used in Experiment 1 (see visualization 6–9). Before the experiment, artifacts were explained to the participants. The participants were instructed to wear the NED prototype and match the eye relief to the target value. We used the same auxiliary equipment for eye-relief adjustment as in our previous study [14]. The participant observed the black dots corresponding to different rotation angles on the target images. There were 41 dot location types. The interval angle of the adjacent dots ranged from \(-10^\circ \) to \(10^\circ \) at \(0.5^\circ \) intervals. We defined the right rotation as a positive angle and the left as negative. The starting position was the center point of the image. The experimenter presented each dot randomly and asked the participants to focus on them by rotating their eyes rather than moving their heads. Participants observed the images and told the experimenter whether they could see any artifact when they stared at the current dot for perceived image quality.

The purpose of Experiment 1:

  • To measure the range in which users cannot perceive any artifact.

  • To compare the sizes of PPMR generated by the two methods and determine whether our method can scale up PPMR.

  • To confirm whether our method can obtain a prototype with shorter eye relief.

Fig. 10
figure 10

Results of experiment 1: The boundary when the participants first observed the artifact. ** indicates p \(< 0.01\). The box plot shows the inter-user variation (i.e., one sample per user) of each user’s threshold angle (an eye angle at which the artifact is not just barely visible). In the condition of eye relief of 10 mm, the box plot medians were \((0^\circ , 0^\circ )\) of all the images under the traditional method. The medians of the box plots were \((-3.8^\circ , 3.8^\circ )\), \((-4.5^\circ , 4^\circ )\), \((-4.8^\circ , 5^\circ )\) and \((-4^\circ , 4^\circ )\) under our method. When eye relief is 15 mm, the medians of the box plots were \((-3.5^\circ , 3^\circ )\), \((-4^\circ , 4^\circ )\), \((-5^\circ , 5^\circ )\) and \((-4^\circ , 4^\circ )\) under the traditional method. The medians of the box-plots were \((-5.8^\circ , 5.5^\circ )\), \((-6.5^\circ , 8^\circ )\), \((-7.8^\circ , 7.8^\circ )\) and \((-6.2^\circ , 6.2^\circ )\). This result shows that our method has an obvious effect of expanding PPMR and can get a short eye relief of 10 mm

5.2.2 Procedure

The central marker was displayed several times. If users reported seeing artifacts at \(0 ^\circ \) dot position, the head position shifted, and the trials were repeated.

  1. 1.

    A 15-mm eye relief was applied to each participant as shown in Image 1 with a marker at the center of the image.

  2. 2.

    The participants wore the HMD and adjusted their head positions until they could see a clear image without any artifact, from which point they were instructed to keep their heads still.

  3. 3.

    The participants were randomly shown each marker on the image.

  4. 4.

    The participants then observed the black dots corresponding to different rotation angles on the target images and told the experimenter whether they could see artifacts at each marker.

  5. 5.

    These procedures were performed at least twice for each angle. If the scores of the two trials for the same dot differed, a third trial was performed to ensure consistency. If the participant reported seeing artifacts at \(0 ^\circ \) dot position, the trial was canceled and a new trial was then performed from step 1.

  6. 6.

    After more than two trials with each marker, the image was changed.

  7. 7.

    This procedure was then repeated for each of the four images. The eye relief was set at 10 mm. Where participants failed to find a clear image observation position, it was assumed that no PPMR exists, and another image was selected. The process was repeated for each of the four images.

5.2.3 Results and discussion

From the subjective answers, the box plot in Fig. 10 shows the minimum eye rotation angles at which users can see clear images without any artifact for both positive and negative directions. In the condition of eye relief of 10 mm, the box plot medians were \((0^\circ , 0^\circ )\) of all the images under the traditional method. The medians of the box plots were \((-3.8^\circ , 3.8^\circ )\), \((-4.5^\circ , 4^\circ )\), \((-4.8^\circ , 5^\circ )\) and \((-4^\circ , 4^\circ )\) under our method. When the eye relief was 15 mm, the medians of the boxplots were \((-3.5^\circ , 3^\circ )\), \((-4^\circ , 4^\circ )\), \((-5^\circ , 5^\circ )\) and \((-4^\circ , 4^\circ )\) under the traditional method. The medians of the boxplots were \((-5.8^\circ , 5.5^\circ )\), \((-6.5^\circ , 8^\circ )\), \((-7.8^\circ , 7.8^\circ )\) and \((-6.2^\circ , 6.2^\circ )\). We performed a normality test on the left and right parts of each image for two eye reliefs separately. The number of experimental samples was 16, so the Shapiro–Wilk test showed that the data in Experiment 1 violated normality (p \(<<\) 0.01). To compare the results of the two methods, a nonparametric Wilcoxon test was performed. We found significant differences between the two methods for two eye reliefs (p < 0.01). Our method had a noticeable effect of expanding PPMR and could get a short eye relief of 10 mm prototype. This result shows that our rendering method can reduce the intensity of artifacts.

5.3 Experiment 2: Comparison of the dynamic microdisplay image generation effect using two rendering methods

5.3.1 Overview

Before the experiment, participants received an explanation about artifacts and the scoring process for observed image quality. A Likert-type scale from “5” (best: clear images without any artifact observed) to “1” (worst) was applied. Experiment 2 was designed using the same four images as Experiment 1 and four corresponding videos (see Visualization 7, 8, 9, 10) to test the validity of our rendering method combined with eye-tracking.

The purpose of Experiment 2:

  • Investigate the effect of dynamically generating microdisplay images based on human eye position on artifact removal.

  • Compare the effect of the two rendering methods on artifacts.

  • Check the effects of different appearance properties on human perception using the four images and four videos.

5.3.2 Procedure

Experiment 2 was conducted following a 5-minute break after Experiment 1. The eye-tracking calibration was performed in the same way as in the previous study [14]. The experimenter informed the participants of the ID number, which was used to identify the dot markers on the image. They observed the images and dots while kee** their head still and annotating their five-point image quality measure. The order of the IDs was randomized. Participants switched perspectives between dots and annotated scores for each eye rotation angle according to the worst momentary perceived image quality. The interval of the two adjacent markers was \(5 ^\circ \), and the rotation interval angle ranged from \(5^\circ \) to \(40^\circ \). The order of rotation interval was controlled for all participants.

  1. 1.

    A 15-mm eye relief was applied and shown (Image 1) with a marker at the center of the image to each participant.

  2. 2.

    The participants wore the HMD and adjusted their head positions until they could see a clear image without any artifact. They were instructed to keep their heads still henceforth.

  3. 3.

    Eye tracking calibration was performed.

  4. 4.

    Participants had controlled rotation for eight different intervals from \(5 ^\circ \) to \(40 ^\circ \) and scored for each rotation angle based on the worst momentary perceived image quality.

  5. 5.

    The procedure was repeated for all four images.

Table 2 Questionnaire

Following a 5-minute break, the following procedure was carried out. Four videos are rendered using two methods without eye tracking.

  1. 6.

    A 15-mm eye relief was applied and shown (Image 1) with a marker at the center of the image to each participant.

  2. 7.

    The participants wore the HMD and adjusted their head positions until they could see a clear image without any artifact. They were instructed to keep their heads still henceforth.

  3. 8.

    The experimenter played one of the four videos chosen at random without the dynamic image generation.

  4. 9.

    The participants were free to move their eyes while watching the videos (see Visualization 3, 4, 5,) The participants watched each video for approximately 15 s and scored for video quality based on the worst momentary perceived image quality.

  5. 10.

    The experimenter displayed Image 1 again with the marker at the center of the image to each participant. If the participant reported seeing an artifact at 0\(^\circ \) dot position, the trials were then performed from step 1. If not, the experimenter switched to another video while participants kept their heads still. The procedure was repeated for four videos.

Fig. 11
figure 11

Results of Experiment 2 part 1: Generating microdisplay images based on pupil rotation angle and rating for observed image quality according to different rotation interval angles for the four images. * indicates p < 0.05. ** indicates p < 0.01. For rotation angles less than \(20 ^\circ \), there was no significant difference in the artifact reduction effect of the two methods. For rotation angles greater than \(20 ^\circ \), our method outperforms the traditional one

Fig. 12
figure 12

Results of experiment 2 part 2: Rating for the quality of the four videos without and with eye tracking. * indicates p < 0.05. ** indicates p < 0.01. Our method outperformed the traditional method, especially for images with complex scenes

The following procedure was carried out after a 5-minute break. Two rendering methods with eye tracking are used to render four videos. The participants completed and reviewed a questionnaire about the evaluation items on which they should focus for each video, as shown in Table 2. The following procedure was then used.

  1. 11.

    The participants wore the display again. After performing the initialization and the calibration, they tried the same procedures with the dynamic image generation.

  2. 12.

    After removing their displays, they scored the perceived quality of each video using the five-point scale and gave their feedback.

5.3.3 Result and discussion

Figure 11 shows the score summary for when participants moved their eyes between two dots according to the rotation angles. As the rotation angle increased, the score decreased, indicating the appearance of artifacts. For the fast pupil rotation, when the eye rotation interval was less than \(20 ^\circ \), eye tracking improved the quality. To compare the artifact reduction effect of the two rendering methods, a Shapiro–Wilk test for rotation angles for each image showed that normality was violated p < 0.05. A nonparametric Wilcoxon test was performed to compare the results generated by the two methods. For rotation angles smaller than \(20 ^\circ \), no significant difference was observed in the artifacts reduction effect of the two methods. However, we observed a significant difference between the two methods for rotation angles larger than \(20 ^\circ \) (p \(<<\) 0.01). For rotation angles larger than \(20 ^\circ \), our method was much better than the traditional one.

Figure 12 shows the results of the perceived video quality. A Shapiro–Wilk test showed that the data also violated normality (\(\textit{p} < 0.05\)). A nonparametric Wilcoxon test showed significant differences between the two methods for all videos in the condition without eye tracking. For videos with complex backgrounds such as Videos 2 and 3, our optimized method even without eye tracking can perform well even without eye tracking. Without eye tracking, some participants were freely moving their eyes to observe the videos, no artifacts were seen during the process. Our algorithm performed better than the traditional method, especially for images with complex scenes. With eye tracking, the traditional method also performed well with a median of 4 for all the videos. The median of our algorithm with eye tracking was 5. The average quality of the four video renderings using the traditional method was 4, 4.1, 4.4, and 4.1 compared with 4.5, 4.7, 4.9, and 4.6 using our optimized method. All the average values using our method are larger than those using the traditional method. Our algorithm outperformed the traditional method, and almost all participants found the video quality to be acceptable without artifacts.

Table 3 provides a summary of the participants’ feedback from the questionnaire. According to the results, the artifacts generated by the two algorithms combined with eye tracking were different in actual use and significantly more participants did not see artifacts using our algorithm. For those who saw artifacts, the artifacts were much easier to observe under the traditional algorithm than our algorithm as reported by 94% of the participants.

Table 3 The summary of the feedback on the questionnaire. TM with ET is the traditional rendering method combined with the eye tracking technique. OM with ET is our rendering method combined with the eye tracking technique

6 Summary of the results

We conducted two experiments to verify the feasibility of our optimized rendering method and to evaluate the effects of dynamic image generation with an eye tracker 3.

The purpose of Experiment 1 was to measure the range within which users cannot perceive any artifacts and to compare the sizes of PPMR generated by the two methods using different content of images. In addition, we aimed to verify whether our method can obtain a prototype with shorter eye relief and wide FOV. The result of Experiment 1 demonstrated that our method can enlarge PPMR and can get a short eye relief of 10 mm and a wide FOV.

The purpose of Experiment 2 was to investigate the effect of the dynamic generation of microdisplay images on artifact removal, based on the eye position, and compare the artifact reduction effect of the two rendering methods. Different types of images and videos were used in this process to check the effects of different appearance properties on human perception. The results of Experiment 2 showed that eye tracking can effectively reduce artifacts. Although artifacts could still be seen for a wide range of rapidly rolling eyes, the overall intensity was significantly reduced. Our method proved to be very effective for rotation angles greater than 20 degrees, but the quality of the image still degraded. In practical use scenario of Experiment 2, in which participants watched videos through our HMD, when they moved their eyes to observe the video content controlled by their habits, they did not move their eyes so quickly and rotated their eyes through very large-interval angles. Most participants did not see any artifacts when using our method, which proves that our method is effective. From the user feedback about Videos 2 and 3 (see Visualization 7, 8), we found that for videos with complex backgrounds, our method worked better. Only a small number of participants observed artifacts. For the previous prototype [14], a narrow PPMR had a fatal drawback. Users could still see artifacts during eye rotation even combined with eye-tracking using an LA-NED to dynamically generate PPMR. We proposed a method using the rendering method to enlarge PPMR and then combined it with eye tracking to adapt to fast-rotated eyes. We carried out a detailed analysis of the system and demonstrated through experiments that this method can effectively reduce artifacts for fast-rotated eyes.

Two experiments using different types of images and videos showed that the rendering method can enlarge PPMR to achieve a short enough eye relief with a wide FOV, especially for images and videos with complex backgrounds and backgrounds that are not particularly large in sharp contrast. After being optimized, the method can increase PPMR in practical use and adapt to fast-rotated eyes.

7 Limitation and future works

A monocular VR prototype is used in this paper to verify the effect of the method we proposed in this paper to enlarge the PPMR and reduce the artifact of LA-NEDs for fast-rotation eyes. Users require binocular prototypes. To produce a three-dimensional stereoscopic ray-traced image, we should generate two different perspective views of the same scene, the left-eye and right-eye views being different. Possibly doubling the amount of work required [62]. Furthermore, pupil swimming is more visible in a binocular prototype than it is in a monocular prototype [63,64,65]. This issue must also be addressed in LA-NEDs. Even though the artifact in LA-NEDs can be reduced using the method we proposed, the FOV and resolution in LA-NEDs still need to be improved.

In the future, we will focus on eliminating pupil swimming, getting a large FOV and improving the resolution in binocular LA-NEDs.

8 Conclusion

Lenslet array near-eye displays (LA-NEDs) can effectively achieve thin and long-time wearable VR displays. One disadvantage of this technology is that the PPMR is insufficient. Even when using eye-tracking technology to dynamically render microdisplay images, users can still see artifacts due to system delay when rotating their eyes to watch the content at different positions. In this study, we analyzed the system parameters of the incident pupil and pupil margin light columns and optimized the rendering method to significantly reduce memory consumption in the process of generating microdisplay images. We provide GPU rendering method to decrease the system latency. User studies were conducted to evaluate the effect of the method using the optimized rendering method combined with eye tracking to reduce artifacts for fast eye rotation on different content of images and videos. The results revealed that the optimized rendering method can increase PPMR, particularly for images with low contrast backgrounds and achieve a short enough eye relief prototype to 10 mm with a wide FOV. The optimized artifact reduction method is more effective for eyes that move quickly at wide angles. For users with rapidly rolling eyes, our proposed method can deliver high-quality videos. We anticipate that our efforts will significantly advance the VR industry.